All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/32] KVM: arm64: A stage 2 for the host
@ 2021-03-02 14:59 ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Hi all,

This is the v3 of the series previously posted here:

  https://lore.kernel.org/kvmarm/20201117181607.1761516-1-qperret@google.com/

This basically allows us to wrap the host with a stage 2 when running in
nVHE, hence paving the way for protecting guest memory from the host in
the future (among other use-cases). For more details about the
motivation and the design angle taken here, I would recommend to have a
look at the cover letter of v1, and/or to watch these presentations at
LPC [1] and KVM forum 2020 [2].

V3 includes a bunch of clean-ups and small refactorings all over the
place as well as a few new features. Specifically, this now allows us to
remove memory pages from the host stage 2 cleanly, and this series does
so for all the .hyp memory sections (which has uncovered existing bugs
upstream and in v2 of this series -- see [3] and [4]). This also now
makes good use of block mappings whenever that is possible, and has
gotten a bit more testing on real hardware (which helped uncover other
bugs [5]).

The other changes to v3 include:

 - clean-ups, refactoring and extra comments all over the place (Will);

 - dropped fdt hook in favor of memblock API now that the relevant
   patches ([6]) are merged (Rob);

 - moved the CPU feature copy stuff to __init/__initdata (Marc);

 - fixed FWB support (Mate);

 - rebased on v5.12-rc1.

This series depends on Will's vCPU context fix ([5]) and Marc's PMU
fixes ([7]). And here's a branch with all the goodies applied:

  https://android-kvm.googlesource.com/linux qperret/host-stage2-v3

Thanks,
Quentin

[1] https://youtu.be/54q6RzS9BpQ?t=10859
[2] https://youtu.be/wY-u6n75iXc
[3] https://lore.kernel.org/kvmarm/20210203141931.615898-1-qperret@google.com/
[4] https://lore.kernel.org/kvmarm/20210128173850.2478161-1-qperret@google.com/
[5] https://lore.kernel.org/kvmarm/20210226181211.14542-1-will@kernel.org/
[6] https://lore.kernel.org/lkml/20210115114544.1830068-1-qperret@google.com/
[7] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pmu-undef-NV


Quentin Perret (29):
  KVM: arm64: Initialize kvm_nvhe_init_params early
  KVM: arm64: Avoid free_page() in page-table allocator
  KVM: arm64: Factor memory allocation out of pgtable.c
  KVM: arm64: Introduce a BSS section for use at Hyp
  KVM: arm64: Make kvm_call_hyp() a function call at Hyp
  KVM: arm64: Allow using kvm_nvhe_sym() in hyp code
  KVM: arm64: Introduce an early Hyp page allocator
  KVM: arm64: Stub CONFIG_DEBUG_LIST at Hyp
  KVM: arm64: Introduce a Hyp buddy page allocator
  KVM: arm64: Enable access to sanitized CPU features at EL2
  KVM: arm64: Factor out vector address calculation
  KVM: arm64: Prepare the creation of s1 mappings at EL2
  KVM: arm64: Elevate hypervisor mappings creation at EL2
  KVM: arm64: Use kvm_arch for stage 2 pgtable
  KVM: arm64: Use kvm_arch in kvm_s2_mmu
  KVM: arm64: Set host stage 2 using kvm_nvhe_init_params
  KVM: arm64: Refactor kvm_arm_setup_stage2()
  KVM: arm64: Refactor __load_guest_stage2()
  KVM: arm64: Refactor __populate_fault_info()
  KVM: arm64: Make memcache anonymous in pgtable allocator
  KVM: arm64: Reserve memory for host stage 2
  KVM: arm64: Sort the hypervisor memblocks
  KVM: arm64: Introduce PROT_NONE mappings for stage 2
  KVM: arm64: Refactor stage2_map_set_prot_attr()
  KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
  KVM: arm64: Wrap the host with a stage 2
  KVM: arm64: Page-align the .hyp sections
  KVM: arm64: Disable PMU support in protected mode
  KVM: arm64: Protect the .hyp sections from the host

Will Deacon (3):
  arm64: lib: Annotate {clear,copy}_page() as position-independent
  KVM: arm64: Link position-independent string routines into .hyp.text
  arm64: kvm: Add standalone ticket spinlock implementation for use at
    hyp

 arch/arm64/include/asm/cpufeature.h           |   1 +
 arch/arm64/include/asm/hyp_image.h            |   7 +
 arch/arm64/include/asm/kvm_asm.h              |   9 +
 arch/arm64/include/asm/kvm_cpufeature.h       |  19 ++
 arch/arm64/include/asm/kvm_host.h             |  19 +-
 arch/arm64/include/asm/kvm_hyp.h              |   8 +
 arch/arm64/include/asm/kvm_mmu.h              |  23 +-
 arch/arm64/include/asm/kvm_pgtable.h          | 117 ++++++-
 arch/arm64/include/asm/sections.h             |   1 +
 arch/arm64/kernel/asm-offsets.c               |   3 +
 arch/arm64/kernel/cpufeature.c                |  13 +
 arch/arm64/kernel/image-vars.h                |  30 ++
 arch/arm64/kernel/vmlinux.lds.S               |  74 +++--
 arch/arm64/kvm/arm.c                          | 199 ++++++++++--
 arch/arm64/kvm/hyp/Makefile                   |   2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h       |  37 ++-
 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h |  14 +
 arch/arm64/kvm/hyp/include/nvhe/gfp.h         |  55 ++++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  36 +++
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  52 +++
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |  92 ++++++
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h    |  92 ++++++
 arch/arm64/kvm/hyp/nvhe/Makefile              |   9 +-
 arch/arm64/kvm/hyp/nvhe/cache.S               |  13 +
 arch/arm64/kvm/hyp/nvhe/cpufeature.c          |   8 +
 arch/arm64/kvm/hyp/nvhe/early_alloc.c         |  54 ++++
 arch/arm64/kvm/hyp/nvhe/hyp-init.S            |  46 ++-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  69 ++++
 arch/arm64/kvm/hyp/nvhe/hyp.lds.S             |   1 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 235 ++++++++++++++
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 173 ++++++++++
 arch/arm64/kvm/hyp/nvhe/page_alloc.c          | 195 ++++++++++++
 arch/arm64/kvm/hyp/nvhe/psci-relay.c          |   4 +-
 arch/arm64/kvm/hyp/nvhe/setup.c               | 212 +++++++++++++
 arch/arm64/kvm/hyp/nvhe/stub.c                |  22 ++
 arch/arm64/kvm/hyp/nvhe/switch.c              |  12 +-
 arch/arm64/kvm/hyp/nvhe/tlb.c                 |   4 +-
 arch/arm64/kvm/hyp/pgtable.c                  | 298 ++++++++++++++----
 arch/arm64/kvm/hyp/reserved_mem.c             | 113 +++++++
 arch/arm64/kvm/mmu.c                          | 115 ++++++-
 arch/arm64/kvm/perf.c                         |   3 +-
 arch/arm64/kvm/pmu.c                          |   8 +-
 arch/arm64/kvm/reset.c                        |  42 +--
 arch/arm64/kvm/sys_regs.c                     |  21 ++
 arch/arm64/lib/clear_page.S                   |   4 +-
 arch/arm64/lib/copy_page.S                    |   4 +-
 arch/arm64/mm/init.c                          |   3 +
 47 files changed, 2356 insertions(+), 215 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_cpufeature.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/gfp.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/memory.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mm.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/spinlock.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/cache.S
 create mode 100644 arch/arm64/kvm/hyp/nvhe/cpufeature.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/early_alloc.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/mm.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/page_alloc.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/setup.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/stub.c
 create mode 100644 arch/arm64/kvm/hyp/reserved_mem.c

-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v3 00/32] KVM: arm64: A stage 2 for the host
@ 2021-03-02 14:59 ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

Hi all,

This is the v3 of the series previously posted here:

  https://lore.kernel.org/kvmarm/20201117181607.1761516-1-qperret@google.com/

This basically allows us to wrap the host with a stage 2 when running in
nVHE, hence paving the way for protecting guest memory from the host in
the future (among other use-cases). For more details about the
motivation and the design angle taken here, I would recommend to have a
look at the cover letter of v1, and/or to watch these presentations at
LPC [1] and KVM forum 2020 [2].

V3 includes a bunch of clean-ups and small refactorings all over the
place as well as a few new features. Specifically, this now allows us to
remove memory pages from the host stage 2 cleanly, and this series does
so for all the .hyp memory sections (which has uncovered existing bugs
upstream and in v2 of this series -- see [3] and [4]). This also now
makes good use of block mappings whenever that is possible, and has
gotten a bit more testing on real hardware (which helped uncover other
bugs [5]).

The other changes to v3 include:

 - clean-ups, refactoring and extra comments all over the place (Will);

 - dropped fdt hook in favor of memblock API now that the relevant
   patches ([6]) are merged (Rob);

 - moved the CPU feature copy stuff to __init/__initdata (Marc);

 - fixed FWB support (Mate);

 - rebased on v5.12-rc1.

This series depends on Will's vCPU context fix ([5]) and Marc's PMU
fixes ([7]). And here's a branch with all the goodies applied:

  https://android-kvm.googlesource.com/linux qperret/host-stage2-v3

Thanks,
Quentin

[1] https://youtu.be/54q6RzS9BpQ?t=10859
[2] https://youtu.be/wY-u6n75iXc
[3] https://lore.kernel.org/kvmarm/20210203141931.615898-1-qperret@google.com/
[4] https://lore.kernel.org/kvmarm/20210128173850.2478161-1-qperret@google.com/
[5] https://lore.kernel.org/kvmarm/20210226181211.14542-1-will@kernel.org/
[6] https://lore.kernel.org/lkml/20210115114544.1830068-1-qperret@google.com/
[7] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pmu-undef-NV


Quentin Perret (29):
  KVM: arm64: Initialize kvm_nvhe_init_params early
  KVM: arm64: Avoid free_page() in page-table allocator
  KVM: arm64: Factor memory allocation out of pgtable.c
  KVM: arm64: Introduce a BSS section for use at Hyp
  KVM: arm64: Make kvm_call_hyp() a function call at Hyp
  KVM: arm64: Allow using kvm_nvhe_sym() in hyp code
  KVM: arm64: Introduce an early Hyp page allocator
  KVM: arm64: Stub CONFIG_DEBUG_LIST at Hyp
  KVM: arm64: Introduce a Hyp buddy page allocator
  KVM: arm64: Enable access to sanitized CPU features at EL2
  KVM: arm64: Factor out vector address calculation
  KVM: arm64: Prepare the creation of s1 mappings at EL2
  KVM: arm64: Elevate hypervisor mappings creation at EL2
  KVM: arm64: Use kvm_arch for stage 2 pgtable
  KVM: arm64: Use kvm_arch in kvm_s2_mmu
  KVM: arm64: Set host stage 2 using kvm_nvhe_init_params
  KVM: arm64: Refactor kvm_arm_setup_stage2()
  KVM: arm64: Refactor __load_guest_stage2()
  KVM: arm64: Refactor __populate_fault_info()
  KVM: arm64: Make memcache anonymous in pgtable allocator
  KVM: arm64: Reserve memory for host stage 2
  KVM: arm64: Sort the hypervisor memblocks
  KVM: arm64: Introduce PROT_NONE mappings for stage 2
  KVM: arm64: Refactor stage2_map_set_prot_attr()
  KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
  KVM: arm64: Wrap the host with a stage 2
  KVM: arm64: Page-align the .hyp sections
  KVM: arm64: Disable PMU support in protected mode
  KVM: arm64: Protect the .hyp sections from the host

Will Deacon (3):
  arm64: lib: Annotate {clear,copy}_page() as position-independent
  KVM: arm64: Link position-independent string routines into .hyp.text
  arm64: kvm: Add standalone ticket spinlock implementation for use at
    hyp

 arch/arm64/include/asm/cpufeature.h           |   1 +
 arch/arm64/include/asm/hyp_image.h            |   7 +
 arch/arm64/include/asm/kvm_asm.h              |   9 +
 arch/arm64/include/asm/kvm_cpufeature.h       |  19 ++
 arch/arm64/include/asm/kvm_host.h             |  19 +-
 arch/arm64/include/asm/kvm_hyp.h              |   8 +
 arch/arm64/include/asm/kvm_mmu.h              |  23 +-
 arch/arm64/include/asm/kvm_pgtable.h          | 117 ++++++-
 arch/arm64/include/asm/sections.h             |   1 +
 arch/arm64/kernel/asm-offsets.c               |   3 +
 arch/arm64/kernel/cpufeature.c                |  13 +
 arch/arm64/kernel/image-vars.h                |  30 ++
 arch/arm64/kernel/vmlinux.lds.S               |  74 +++--
 arch/arm64/kvm/arm.c                          | 199 ++++++++++--
 arch/arm64/kvm/hyp/Makefile                   |   2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h       |  37 ++-
 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h |  14 +
 arch/arm64/kvm/hyp/include/nvhe/gfp.h         |  55 ++++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  36 +++
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  52 +++
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |  92 ++++++
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h    |  92 ++++++
 arch/arm64/kvm/hyp/nvhe/Makefile              |   9 +-
 arch/arm64/kvm/hyp/nvhe/cache.S               |  13 +
 arch/arm64/kvm/hyp/nvhe/cpufeature.c          |   8 +
 arch/arm64/kvm/hyp/nvhe/early_alloc.c         |  54 ++++
 arch/arm64/kvm/hyp/nvhe/hyp-init.S            |  46 ++-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  69 ++++
 arch/arm64/kvm/hyp/nvhe/hyp.lds.S             |   1 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 235 ++++++++++++++
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 173 ++++++++++
 arch/arm64/kvm/hyp/nvhe/page_alloc.c          | 195 ++++++++++++
 arch/arm64/kvm/hyp/nvhe/psci-relay.c          |   4 +-
 arch/arm64/kvm/hyp/nvhe/setup.c               | 212 +++++++++++++
 arch/arm64/kvm/hyp/nvhe/stub.c                |  22 ++
 arch/arm64/kvm/hyp/nvhe/switch.c              |  12 +-
 arch/arm64/kvm/hyp/nvhe/tlb.c                 |   4 +-
 arch/arm64/kvm/hyp/pgtable.c                  | 298 ++++++++++++++----
 arch/arm64/kvm/hyp/reserved_mem.c             | 113 +++++++
 arch/arm64/kvm/mmu.c                          | 115 ++++++-
 arch/arm64/kvm/perf.c                         |   3 +-
 arch/arm64/kvm/pmu.c                          |   8 +-
 arch/arm64/kvm/reset.c                        |  42 +--
 arch/arm64/kvm/sys_regs.c                     |  21 ++
 arch/arm64/lib/clear_page.S                   |   4 +-
 arch/arm64/lib/copy_page.S                    |   4 +-
 arch/arm64/mm/init.c                          |   3 +
 47 files changed, 2356 insertions(+), 215 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_cpufeature.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/gfp.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/memory.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mm.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/spinlock.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/cache.S
 create mode 100644 arch/arm64/kvm/hyp/nvhe/cpufeature.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/early_alloc.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/mm.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/page_alloc.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/setup.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/stub.c
 create mode 100644 arch/arm64/kvm/hyp/reserved_mem.c

-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v3 00/32] KVM: arm64: A stage 2 for the host
@ 2021-03-02 14:59 ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Hi all,

This is the v3 of the series previously posted here:

  https://lore.kernel.org/kvmarm/20201117181607.1761516-1-qperret@google.com/

This basically allows us to wrap the host with a stage 2 when running in
nVHE, hence paving the way for protecting guest memory from the host in
the future (among other use-cases). For more details about the
motivation and the design angle taken here, I would recommend to have a
look at the cover letter of v1, and/or to watch these presentations at
LPC [1] and KVM forum 2020 [2].

V3 includes a bunch of clean-ups and small refactorings all over the
place as well as a few new features. Specifically, this now allows us to
remove memory pages from the host stage 2 cleanly, and this series does
so for all the .hyp memory sections (which has uncovered existing bugs
upstream and in v2 of this series -- see [3] and [4]). This also now
makes good use of block mappings whenever that is possible, and has
gotten a bit more testing on real hardware (which helped uncover other
bugs [5]).

The other changes to v3 include:

 - clean-ups, refactoring and extra comments all over the place (Will);

 - dropped fdt hook in favor of memblock API now that the relevant
   patches ([6]) are merged (Rob);

 - moved the CPU feature copy stuff to __init/__initdata (Marc);

 - fixed FWB support (Mate);

 - rebased on v5.12-rc1.

This series depends on Will's vCPU context fix ([5]) and Marc's PMU
fixes ([7]). And here's a branch with all the goodies applied:

  https://android-kvm.googlesource.com/linux qperret/host-stage2-v3

Thanks,
Quentin

[1] https://youtu.be/54q6RzS9BpQ?t=10859
[2] https://youtu.be/wY-u6n75iXc
[3] https://lore.kernel.org/kvmarm/20210203141931.615898-1-qperret@google.com/
[4] https://lore.kernel.org/kvmarm/20210128173850.2478161-1-qperret@google.com/
[5] https://lore.kernel.org/kvmarm/20210226181211.14542-1-will@kernel.org/
[6] https://lore.kernel.org/lkml/20210115114544.1830068-1-qperret@google.com/
[7] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pmu-undef-NV


Quentin Perret (29):
  KVM: arm64: Initialize kvm_nvhe_init_params early
  KVM: arm64: Avoid free_page() in page-table allocator
  KVM: arm64: Factor memory allocation out of pgtable.c
  KVM: arm64: Introduce a BSS section for use at Hyp
  KVM: arm64: Make kvm_call_hyp() a function call at Hyp
  KVM: arm64: Allow using kvm_nvhe_sym() in hyp code
  KVM: arm64: Introduce an early Hyp page allocator
  KVM: arm64: Stub CONFIG_DEBUG_LIST at Hyp
  KVM: arm64: Introduce a Hyp buddy page allocator
  KVM: arm64: Enable access to sanitized CPU features at EL2
  KVM: arm64: Factor out vector address calculation
  KVM: arm64: Prepare the creation of s1 mappings at EL2
  KVM: arm64: Elevate hypervisor mappings creation at EL2
  KVM: arm64: Use kvm_arch for stage 2 pgtable
  KVM: arm64: Use kvm_arch in kvm_s2_mmu
  KVM: arm64: Set host stage 2 using kvm_nvhe_init_params
  KVM: arm64: Refactor kvm_arm_setup_stage2()
  KVM: arm64: Refactor __load_guest_stage2()
  KVM: arm64: Refactor __populate_fault_info()
  KVM: arm64: Make memcache anonymous in pgtable allocator
  KVM: arm64: Reserve memory for host stage 2
  KVM: arm64: Sort the hypervisor memblocks
  KVM: arm64: Introduce PROT_NONE mappings for stage 2
  KVM: arm64: Refactor stage2_map_set_prot_attr()
  KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
  KVM: arm64: Wrap the host with a stage 2
  KVM: arm64: Page-align the .hyp sections
  KVM: arm64: Disable PMU support in protected mode
  KVM: arm64: Protect the .hyp sections from the host

Will Deacon (3):
  arm64: lib: Annotate {clear,copy}_page() as position-independent
  KVM: arm64: Link position-independent string routines into .hyp.text
  arm64: kvm: Add standalone ticket spinlock implementation for use at
    hyp

 arch/arm64/include/asm/cpufeature.h           |   1 +
 arch/arm64/include/asm/hyp_image.h            |   7 +
 arch/arm64/include/asm/kvm_asm.h              |   9 +
 arch/arm64/include/asm/kvm_cpufeature.h       |  19 ++
 arch/arm64/include/asm/kvm_host.h             |  19 +-
 arch/arm64/include/asm/kvm_hyp.h              |   8 +
 arch/arm64/include/asm/kvm_mmu.h              |  23 +-
 arch/arm64/include/asm/kvm_pgtable.h          | 117 ++++++-
 arch/arm64/include/asm/sections.h             |   1 +
 arch/arm64/kernel/asm-offsets.c               |   3 +
 arch/arm64/kernel/cpufeature.c                |  13 +
 arch/arm64/kernel/image-vars.h                |  30 ++
 arch/arm64/kernel/vmlinux.lds.S               |  74 +++--
 arch/arm64/kvm/arm.c                          | 199 ++++++++++--
 arch/arm64/kvm/hyp/Makefile                   |   2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h       |  37 ++-
 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h |  14 +
 arch/arm64/kvm/hyp/include/nvhe/gfp.h         |  55 ++++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  36 +++
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  52 +++
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |  92 ++++++
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h    |  92 ++++++
 arch/arm64/kvm/hyp/nvhe/Makefile              |   9 +-
 arch/arm64/kvm/hyp/nvhe/cache.S               |  13 +
 arch/arm64/kvm/hyp/nvhe/cpufeature.c          |   8 +
 arch/arm64/kvm/hyp/nvhe/early_alloc.c         |  54 ++++
 arch/arm64/kvm/hyp/nvhe/hyp-init.S            |  46 ++-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  69 ++++
 arch/arm64/kvm/hyp/nvhe/hyp.lds.S             |   1 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 235 ++++++++++++++
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 173 ++++++++++
 arch/arm64/kvm/hyp/nvhe/page_alloc.c          | 195 ++++++++++++
 arch/arm64/kvm/hyp/nvhe/psci-relay.c          |   4 +-
 arch/arm64/kvm/hyp/nvhe/setup.c               | 212 +++++++++++++
 arch/arm64/kvm/hyp/nvhe/stub.c                |  22 ++
 arch/arm64/kvm/hyp/nvhe/switch.c              |  12 +-
 arch/arm64/kvm/hyp/nvhe/tlb.c                 |   4 +-
 arch/arm64/kvm/hyp/pgtable.c                  | 298 ++++++++++++++----
 arch/arm64/kvm/hyp/reserved_mem.c             | 113 +++++++
 arch/arm64/kvm/mmu.c                          | 115 ++++++-
 arch/arm64/kvm/perf.c                         |   3 +-
 arch/arm64/kvm/pmu.c                          |   8 +-
 arch/arm64/kvm/reset.c                        |  42 +--
 arch/arm64/kvm/sys_regs.c                     |  21 ++
 arch/arm64/lib/clear_page.S                   |   4 +-
 arch/arm64/lib/copy_page.S                    |   4 +-
 arch/arm64/mm/init.c                          |   3 +
 47 files changed, 2356 insertions(+), 215 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_cpufeature.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/gfp.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/memory.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mm.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/spinlock.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/cache.S
 create mode 100644 arch/arm64/kvm/hyp/nvhe/cpufeature.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/early_alloc.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/mm.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/page_alloc.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/setup.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/stub.c
 create mode 100644 arch/arm64/kvm/hyp/reserved_mem.c

-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* [PATCH v3 01/32] arm64: lib: Annotate {clear,copy}_page() as position-independent
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

From: Will Deacon <will@kernel.org>

clear_page() and copy_page() are suitable for use outside of the kernel
address space, so annotate them as position-independent code.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/lib/clear_page.S | 4 ++--
 arch/arm64/lib/copy_page.S  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
index 073acbf02a7c..b84b179edba3 100644
--- a/arch/arm64/lib/clear_page.S
+++ b/arch/arm64/lib/clear_page.S
@@ -14,7 +14,7 @@
  * Parameters:
  *	x0 - dest
  */
-SYM_FUNC_START(clear_page)
+SYM_FUNC_START_PI(clear_page)
 	mrs	x1, dczid_el0
 	and	w1, w1, #0xf
 	mov	x2, #4
@@ -25,5 +25,5 @@ SYM_FUNC_START(clear_page)
 	tst	x0, #(PAGE_SIZE - 1)
 	b.ne	1b
 	ret
-SYM_FUNC_END(clear_page)
+SYM_FUNC_END_PI(clear_page)
 EXPORT_SYMBOL(clear_page)
diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S
index e7a793961408..29144f4cd449 100644
--- a/arch/arm64/lib/copy_page.S
+++ b/arch/arm64/lib/copy_page.S
@@ -17,7 +17,7 @@
  *	x0 - dest
  *	x1 - src
  */
-SYM_FUNC_START(copy_page)
+SYM_FUNC_START_PI(copy_page)
 alternative_if ARM64_HAS_NO_HW_PREFETCH
 	// Prefetch three cache lines ahead.
 	prfm	pldl1strm, [x1, #128]
@@ -75,5 +75,5 @@ alternative_else_nop_endif
 	stnp	x16, x17, [x0, #112 - 256]
 
 	ret
-SYM_FUNC_END(copy_page)
+SYM_FUNC_END_PI(copy_page)
 EXPORT_SYMBOL(copy_page)
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 01/32] arm64: lib: Annotate {clear, copy}_page() as position-independent
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

From: Will Deacon <will@kernel.org>

clear_page() and copy_page() are suitable for use outside of the kernel
address space, so annotate them as position-independent code.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/lib/clear_page.S | 4 ++--
 arch/arm64/lib/copy_page.S  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
index 073acbf02a7c..b84b179edba3 100644
--- a/arch/arm64/lib/clear_page.S
+++ b/arch/arm64/lib/clear_page.S
@@ -14,7 +14,7 @@
  * Parameters:
  *	x0 - dest
  */
-SYM_FUNC_START(clear_page)
+SYM_FUNC_START_PI(clear_page)
 	mrs	x1, dczid_el0
 	and	w1, w1, #0xf
 	mov	x2, #4
@@ -25,5 +25,5 @@ SYM_FUNC_START(clear_page)
 	tst	x0, #(PAGE_SIZE - 1)
 	b.ne	1b
 	ret
-SYM_FUNC_END(clear_page)
+SYM_FUNC_END_PI(clear_page)
 EXPORT_SYMBOL(clear_page)
diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S
index e7a793961408..29144f4cd449 100644
--- a/arch/arm64/lib/copy_page.S
+++ b/arch/arm64/lib/copy_page.S
@@ -17,7 +17,7 @@
  *	x0 - dest
  *	x1 - src
  */
-SYM_FUNC_START(copy_page)
+SYM_FUNC_START_PI(copy_page)
 alternative_if ARM64_HAS_NO_HW_PREFETCH
 	// Prefetch three cache lines ahead.
 	prfm	pldl1strm, [x1, #128]
@@ -75,5 +75,5 @@ alternative_else_nop_endif
 	stnp	x16, x17, [x0, #112 - 256]
 
 	ret
-SYM_FUNC_END(copy_page)
+SYM_FUNC_END_PI(copy_page)
 EXPORT_SYMBOL(copy_page)
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 01/32] arm64: lib: Annotate {clear, copy}_page() as position-independent
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

From: Will Deacon <will@kernel.org>

clear_page() and copy_page() are suitable for use outside of the kernel
address space, so annotate them as position-independent code.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/lib/clear_page.S | 4 ++--
 arch/arm64/lib/copy_page.S  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
index 073acbf02a7c..b84b179edba3 100644
--- a/arch/arm64/lib/clear_page.S
+++ b/arch/arm64/lib/clear_page.S
@@ -14,7 +14,7 @@
  * Parameters:
  *	x0 - dest
  */
-SYM_FUNC_START(clear_page)
+SYM_FUNC_START_PI(clear_page)
 	mrs	x1, dczid_el0
 	and	w1, w1, #0xf
 	mov	x2, #4
@@ -25,5 +25,5 @@ SYM_FUNC_START(clear_page)
 	tst	x0, #(PAGE_SIZE - 1)
 	b.ne	1b
 	ret
-SYM_FUNC_END(clear_page)
+SYM_FUNC_END_PI(clear_page)
 EXPORT_SYMBOL(clear_page)
diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S
index e7a793961408..29144f4cd449 100644
--- a/arch/arm64/lib/copy_page.S
+++ b/arch/arm64/lib/copy_page.S
@@ -17,7 +17,7 @@
  *	x0 - dest
  *	x1 - src
  */
-SYM_FUNC_START(copy_page)
+SYM_FUNC_START_PI(copy_page)
 alternative_if ARM64_HAS_NO_HW_PREFETCH
 	// Prefetch three cache lines ahead.
 	prfm	pldl1strm, [x1, #128]
@@ -75,5 +75,5 @@ alternative_else_nop_endif
 	stnp	x16, x17, [x0, #112 - 256]
 
 	ret
-SYM_FUNC_END(copy_page)
+SYM_FUNC_END_PI(copy_page)
 EXPORT_SYMBOL(copy_page)
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 02/32] KVM: arm64: Link position-independent string routines into .hyp.text
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

From: Will Deacon <will@kernel.org>

Pull clear_page(), copy_page(), memcpy() and memset() into the nVHE hyp
code and ensure that we always execute the '__pi_' entry point on the
offchance that it changes in future.

[ qperret: Commit title nits and added linker script alias ]

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/hyp_image.h |  3 +++
 arch/arm64/kernel/image-vars.h     | 11 +++++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile   |  4 ++++
 3 files changed, 18 insertions(+)

diff --git a/arch/arm64/include/asm/hyp_image.h b/arch/arm64/include/asm/hyp_image.h
index 737ded6b6d0d..78cd77990c9c 100644
--- a/arch/arm64/include/asm/hyp_image.h
+++ b/arch/arm64/include/asm/hyp_image.h
@@ -56,6 +56,9 @@
  */
 #define KVM_NVHE_ALIAS(sym)	kvm_nvhe_sym(sym) = sym;
 
+/* Defines a linker script alias for KVM nVHE hyp symbols */
+#define KVM_NVHE_ALIAS_HYP(first, sec)	kvm_nvhe_sym(first) = kvm_nvhe_sym(sec);
+
 #endif /* LINKER_SCRIPT */
 
 #endif /* __ARM64_HYP_IMAGE_H__ */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 5aa9ed1e9ec6..4eb7a15c8b60 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -104,6 +104,17 @@ KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
 /* PMU available static key */
 KVM_NVHE_ALIAS(kvm_arm_pmu_available);
 
+/* Position-independent library routines */
+KVM_NVHE_ALIAS_HYP(clear_page, __pi_clear_page);
+KVM_NVHE_ALIAS_HYP(copy_page, __pi_copy_page);
+KVM_NVHE_ALIAS_HYP(memcpy, __pi_memcpy);
+KVM_NVHE_ALIAS_HYP(memset, __pi_memset);
+
+#ifdef CONFIG_KASAN
+KVM_NVHE_ALIAS_HYP(__memcpy, __pi_memcpy);
+KVM_NVHE_ALIAS_HYP(__memset, __pi_memset);
+#endif
+
 #endif /* CONFIG_KVM */
 
 #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index a6707df4f6c0..bc98f8e3d1da 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -9,10 +9,14 @@ ccflags-y := -D__KVM_NVHE_HYPERVISOR__ -D__DISABLE_EXPORTS
 hostprogs := gen-hyprel
 HOST_EXTRACFLAGS += -I$(objtree)/include
 
+lib-objs := clear_page.o copy_page.o memcpy.o memset.o
+lib-objs := $(addprefix ../../../lib/, $(lib-objs))
+
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
 	 hyp-main.o hyp-smp.o psci-relay.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
+obj-y += $(lib-objs)
 
 ##
 ## Build rules for compiling nVHE hyp code
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 02/32] KVM: arm64: Link position-independent string routines into .hyp.text
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

From: Will Deacon <will@kernel.org>

Pull clear_page(), copy_page(), memcpy() and memset() into the nVHE hyp
code and ensure that we always execute the '__pi_' entry point on the
offchance that it changes in future.

[ qperret: Commit title nits and added linker script alias ]

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/hyp_image.h |  3 +++
 arch/arm64/kernel/image-vars.h     | 11 +++++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile   |  4 ++++
 3 files changed, 18 insertions(+)

diff --git a/arch/arm64/include/asm/hyp_image.h b/arch/arm64/include/asm/hyp_image.h
index 737ded6b6d0d..78cd77990c9c 100644
--- a/arch/arm64/include/asm/hyp_image.h
+++ b/arch/arm64/include/asm/hyp_image.h
@@ -56,6 +56,9 @@
  */
 #define KVM_NVHE_ALIAS(sym)	kvm_nvhe_sym(sym) = sym;
 
+/* Defines a linker script alias for KVM nVHE hyp symbols */
+#define KVM_NVHE_ALIAS_HYP(first, sec)	kvm_nvhe_sym(first) = kvm_nvhe_sym(sec);
+
 #endif /* LINKER_SCRIPT */
 
 #endif /* __ARM64_HYP_IMAGE_H__ */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 5aa9ed1e9ec6..4eb7a15c8b60 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -104,6 +104,17 @@ KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
 /* PMU available static key */
 KVM_NVHE_ALIAS(kvm_arm_pmu_available);
 
+/* Position-independent library routines */
+KVM_NVHE_ALIAS_HYP(clear_page, __pi_clear_page);
+KVM_NVHE_ALIAS_HYP(copy_page, __pi_copy_page);
+KVM_NVHE_ALIAS_HYP(memcpy, __pi_memcpy);
+KVM_NVHE_ALIAS_HYP(memset, __pi_memset);
+
+#ifdef CONFIG_KASAN
+KVM_NVHE_ALIAS_HYP(__memcpy, __pi_memcpy);
+KVM_NVHE_ALIAS_HYP(__memset, __pi_memset);
+#endif
+
 #endif /* CONFIG_KVM */
 
 #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index a6707df4f6c0..bc98f8e3d1da 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -9,10 +9,14 @@ ccflags-y := -D__KVM_NVHE_HYPERVISOR__ -D__DISABLE_EXPORTS
 hostprogs := gen-hyprel
 HOST_EXTRACFLAGS += -I$(objtree)/include
 
+lib-objs := clear_page.o copy_page.o memcpy.o memset.o
+lib-objs := $(addprefix ../../../lib/, $(lib-objs))
+
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
 	 hyp-main.o hyp-smp.o psci-relay.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
+obj-y += $(lib-objs)
 
 ##
 ## Build rules for compiling nVHE hyp code
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 02/32] KVM: arm64: Link position-independent string routines into .hyp.text
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

From: Will Deacon <will@kernel.org>

Pull clear_page(), copy_page(), memcpy() and memset() into the nVHE hyp
code and ensure that we always execute the '__pi_' entry point on the
offchance that it changes in future.

[ qperret: Commit title nits and added linker script alias ]

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/hyp_image.h |  3 +++
 arch/arm64/kernel/image-vars.h     | 11 +++++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile   |  4 ++++
 3 files changed, 18 insertions(+)

diff --git a/arch/arm64/include/asm/hyp_image.h b/arch/arm64/include/asm/hyp_image.h
index 737ded6b6d0d..78cd77990c9c 100644
--- a/arch/arm64/include/asm/hyp_image.h
+++ b/arch/arm64/include/asm/hyp_image.h
@@ -56,6 +56,9 @@
  */
 #define KVM_NVHE_ALIAS(sym)	kvm_nvhe_sym(sym) = sym;
 
+/* Defines a linker script alias for KVM nVHE hyp symbols */
+#define KVM_NVHE_ALIAS_HYP(first, sec)	kvm_nvhe_sym(first) = kvm_nvhe_sym(sec);
+
 #endif /* LINKER_SCRIPT */
 
 #endif /* __ARM64_HYP_IMAGE_H__ */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 5aa9ed1e9ec6..4eb7a15c8b60 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -104,6 +104,17 @@ KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
 /* PMU available static key */
 KVM_NVHE_ALIAS(kvm_arm_pmu_available);
 
+/* Position-independent library routines */
+KVM_NVHE_ALIAS_HYP(clear_page, __pi_clear_page);
+KVM_NVHE_ALIAS_HYP(copy_page, __pi_copy_page);
+KVM_NVHE_ALIAS_HYP(memcpy, __pi_memcpy);
+KVM_NVHE_ALIAS_HYP(memset, __pi_memset);
+
+#ifdef CONFIG_KASAN
+KVM_NVHE_ALIAS_HYP(__memcpy, __pi_memcpy);
+KVM_NVHE_ALIAS_HYP(__memset, __pi_memset);
+#endif
+
 #endif /* CONFIG_KVM */
 
 #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index a6707df4f6c0..bc98f8e3d1da 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -9,10 +9,14 @@ ccflags-y := -D__KVM_NVHE_HYPERVISOR__ -D__DISABLE_EXPORTS
 hostprogs := gen-hyprel
 HOST_EXTRACFLAGS += -I$(objtree)/include
 
+lib-objs := clear_page.o copy_page.o memcpy.o memset.o
+lib-objs := $(addprefix ../../../lib/, $(lib-objs))
+
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
 	 hyp-main.o hyp-smp.o psci-relay.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
+obj-y += $(lib-objs)
 
 ##
 ## Build rules for compiling nVHE hyp code
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 03/32] arm64: kvm: Add standalone ticket spinlock implementation for use at hyp
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

From: Will Deacon <will@kernel.org>

We will soon need to synchronise multiple CPUs in the hyp text at EL2.
The qspinlock-based locking used by the host is overkill for this purpose
and relies on the kernel's "percpu" implementation for the MCS nodes.

Implement a simple ticket locking scheme based heavily on the code removed
by commit c11090474d70 ("arm64: locking: Replace ticket lock implementation
with qspinlock").

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 92 ++++++++++++++++++++++
 1 file changed, 92 insertions(+)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/spinlock.h

diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
new file mode 100644
index 000000000000..76b537f8d1c6
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * A stand-alone ticket spinlock implementation for use by the non-VHE
+ * KVM hypervisor code running at EL2.
+ *
+ * Copyright (C) 2020 Google LLC
+ * Author: Will Deacon <will@kernel.org>
+ *
+ * Heavily based on the implementation removed by c11090474d70 which was:
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#ifndef __ARM64_KVM_NVHE_SPINLOCK_H__
+#define __ARM64_KVM_NVHE_SPINLOCK_H__
+
+#include <asm/alternative.h>
+#include <asm/lse.h>
+
+typedef union hyp_spinlock {
+	u32	__val;
+	struct {
+#ifdef __AARCH64EB__
+		u16 next, owner;
+#else
+		u16 owner, next;
+#endif
+	};
+} hyp_spinlock_t;
+
+#define hyp_spin_lock_init(l)						\
+do {									\
+	*(l) = (hyp_spinlock_t){ .__val = 0 };				\
+} while (0)
+
+static inline void hyp_spin_lock(hyp_spinlock_t *lock)
+{
+	u32 tmp;
+	hyp_spinlock_t lockval, newval;
+
+	asm volatile(
+	/* Atomically increment the next ticket. */
+	ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+"	prfm	pstl1strm, %3\n"
+"1:	ldaxr	%w0, %3\n"
+"	add	%w1, %w0, #(1 << 16)\n"
+"	stxr	%w2, %w1, %3\n"
+"	cbnz	%w2, 1b\n",
+	/* LSE atomics */
+"	mov	%w2, #(1 << 16)\n"
+"	ldadda	%w2, %w0, %3\n"
+	__nops(3))
+
+	/* Did we get the lock? */
+"	eor	%w1, %w0, %w0, ror #16\n"
+"	cbz	%w1, 3f\n"
+	/*
+	 * No: spin on the owner. Send a local event to avoid missing an
+	 * unlock before the exclusive load.
+	 */
+"	sevl\n"
+"2:	wfe\n"
+"	ldaxrh	%w2, %4\n"
+"	eor	%w1, %w2, %w0, lsr #16\n"
+"	cbnz	%w1, 2b\n"
+	/* We got the lock. Critical section starts here. */
+"3:"
+	: "=&r" (lockval), "=&r" (newval), "=&r" (tmp), "+Q" (*lock)
+	: "Q" (lock->owner)
+	: "memory");
+}
+
+static inline void hyp_spin_unlock(hyp_spinlock_t *lock)
+{
+	u64 tmp;
+
+	asm volatile(
+	ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	ldrh	%w1, %0\n"
+	"	add	%w1, %w1, #1\n"
+	"	stlrh	%w1, %0",
+	/* LSE atomics */
+	"	mov	%w1, #1\n"
+	"	staddlh	%w1, %0\n"
+	__nops(1))
+	: "=Q" (lock->owner), "=&r" (tmp)
+	:
+	: "memory");
+}
+
+#endif /* __ARM64_KVM_NVHE_SPINLOCK_H__ */
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 03/32] arm64: kvm: Add standalone ticket spinlock implementation for use at hyp
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

From: Will Deacon <will@kernel.org>

We will soon need to synchronise multiple CPUs in the hyp text at EL2.
The qspinlock-based locking used by the host is overkill for this purpose
and relies on the kernel's "percpu" implementation for the MCS nodes.

Implement a simple ticket locking scheme based heavily on the code removed
by commit c11090474d70 ("arm64: locking: Replace ticket lock implementation
with qspinlock").

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 92 ++++++++++++++++++++++
 1 file changed, 92 insertions(+)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/spinlock.h

diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
new file mode 100644
index 000000000000..76b537f8d1c6
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * A stand-alone ticket spinlock implementation for use by the non-VHE
+ * KVM hypervisor code running at EL2.
+ *
+ * Copyright (C) 2020 Google LLC
+ * Author: Will Deacon <will@kernel.org>
+ *
+ * Heavily based on the implementation removed by c11090474d70 which was:
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#ifndef __ARM64_KVM_NVHE_SPINLOCK_H__
+#define __ARM64_KVM_NVHE_SPINLOCK_H__
+
+#include <asm/alternative.h>
+#include <asm/lse.h>
+
+typedef union hyp_spinlock {
+	u32	__val;
+	struct {
+#ifdef __AARCH64EB__
+		u16 next, owner;
+#else
+		u16 owner, next;
+#endif
+	};
+} hyp_spinlock_t;
+
+#define hyp_spin_lock_init(l)						\
+do {									\
+	*(l) = (hyp_spinlock_t){ .__val = 0 };				\
+} while (0)
+
+static inline void hyp_spin_lock(hyp_spinlock_t *lock)
+{
+	u32 tmp;
+	hyp_spinlock_t lockval, newval;
+
+	asm volatile(
+	/* Atomically increment the next ticket. */
+	ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+"	prfm	pstl1strm, %3\n"
+"1:	ldaxr	%w0, %3\n"
+"	add	%w1, %w0, #(1 << 16)\n"
+"	stxr	%w2, %w1, %3\n"
+"	cbnz	%w2, 1b\n",
+	/* LSE atomics */
+"	mov	%w2, #(1 << 16)\n"
+"	ldadda	%w2, %w0, %3\n"
+	__nops(3))
+
+	/* Did we get the lock? */
+"	eor	%w1, %w0, %w0, ror #16\n"
+"	cbz	%w1, 3f\n"
+	/*
+	 * No: spin on the owner. Send a local event to avoid missing an
+	 * unlock before the exclusive load.
+	 */
+"	sevl\n"
+"2:	wfe\n"
+"	ldaxrh	%w2, %4\n"
+"	eor	%w1, %w2, %w0, lsr #16\n"
+"	cbnz	%w1, 2b\n"
+	/* We got the lock. Critical section starts here. */
+"3:"
+	: "=&r" (lockval), "=&r" (newval), "=&r" (tmp), "+Q" (*lock)
+	: "Q" (lock->owner)
+	: "memory");
+}
+
+static inline void hyp_spin_unlock(hyp_spinlock_t *lock)
+{
+	u64 tmp;
+
+	asm volatile(
+	ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	ldrh	%w1, %0\n"
+	"	add	%w1, %w1, #1\n"
+	"	stlrh	%w1, %0",
+	/* LSE atomics */
+	"	mov	%w1, #1\n"
+	"	staddlh	%w1, %0\n"
+	__nops(1))
+	: "=Q" (lock->owner), "=&r" (tmp)
+	:
+	: "memory");
+}
+
+#endif /* __ARM64_KVM_NVHE_SPINLOCK_H__ */
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 03/32] arm64: kvm: Add standalone ticket spinlock implementation for use at hyp
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

From: Will Deacon <will@kernel.org>

We will soon need to synchronise multiple CPUs in the hyp text at EL2.
The qspinlock-based locking used by the host is overkill for this purpose
and relies on the kernel's "percpu" implementation for the MCS nodes.

Implement a simple ticket locking scheme based heavily on the code removed
by commit c11090474d70 ("arm64: locking: Replace ticket lock implementation
with qspinlock").

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 92 ++++++++++++++++++++++
 1 file changed, 92 insertions(+)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/spinlock.h

diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
new file mode 100644
index 000000000000..76b537f8d1c6
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * A stand-alone ticket spinlock implementation for use by the non-VHE
+ * KVM hypervisor code running at EL2.
+ *
+ * Copyright (C) 2020 Google LLC
+ * Author: Will Deacon <will@kernel.org>
+ *
+ * Heavily based on the implementation removed by c11090474d70 which was:
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#ifndef __ARM64_KVM_NVHE_SPINLOCK_H__
+#define __ARM64_KVM_NVHE_SPINLOCK_H__
+
+#include <asm/alternative.h>
+#include <asm/lse.h>
+
+typedef union hyp_spinlock {
+	u32	__val;
+	struct {
+#ifdef __AARCH64EB__
+		u16 next, owner;
+#else
+		u16 owner, next;
+#endif
+	};
+} hyp_spinlock_t;
+
+#define hyp_spin_lock_init(l)						\
+do {									\
+	*(l) = (hyp_spinlock_t){ .__val = 0 };				\
+} while (0)
+
+static inline void hyp_spin_lock(hyp_spinlock_t *lock)
+{
+	u32 tmp;
+	hyp_spinlock_t lockval, newval;
+
+	asm volatile(
+	/* Atomically increment the next ticket. */
+	ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+"	prfm	pstl1strm, %3\n"
+"1:	ldaxr	%w0, %3\n"
+"	add	%w1, %w0, #(1 << 16)\n"
+"	stxr	%w2, %w1, %3\n"
+"	cbnz	%w2, 1b\n",
+	/* LSE atomics */
+"	mov	%w2, #(1 << 16)\n"
+"	ldadda	%w2, %w0, %3\n"
+	__nops(3))
+
+	/* Did we get the lock? */
+"	eor	%w1, %w0, %w0, ror #16\n"
+"	cbz	%w1, 3f\n"
+	/*
+	 * No: spin on the owner. Send a local event to avoid missing an
+	 * unlock before the exclusive load.
+	 */
+"	sevl\n"
+"2:	wfe\n"
+"	ldaxrh	%w2, %4\n"
+"	eor	%w1, %w2, %w0, lsr #16\n"
+"	cbnz	%w1, 2b\n"
+	/* We got the lock. Critical section starts here. */
+"3:"
+	: "=&r" (lockval), "=&r" (newval), "=&r" (tmp), "+Q" (*lock)
+	: "Q" (lock->owner)
+	: "memory");
+}
+
+static inline void hyp_spin_unlock(hyp_spinlock_t *lock)
+{
+	u64 tmp;
+
+	asm volatile(
+	ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	ldrh	%w1, %0\n"
+	"	add	%w1, %w1, #1\n"
+	"	stlrh	%w1, %0",
+	/* LSE atomics */
+	"	mov	%w1, #1\n"
+	"	staddlh	%w1, %0\n"
+	__nops(1))
+	: "=Q" (lock->owner), "=&r" (tmp)
+	:
+	: "memory");
+}
+
+#endif /* __ARM64_KVM_NVHE_SPINLOCK_H__ */
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 04/32] KVM: arm64: Initialize kvm_nvhe_init_params early
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Move the initialization of kvm_nvhe_init_params in a dedicated function
that is run early, and only once during KVM init, rather than every time
the KVM vectors are set and reset.

This also opens the opportunity for the hypervisor to change the init
structs during boot, hence simplifying the replacement of host-provided
page-table by the one the hypervisor will create for itself.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/arm.c | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fc4c95dd2d26..2d1e7ef69c04 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1383,22 +1383,18 @@ static int kvm_init_vector_slots(void)
 	return 0;
 }
 
-static void cpu_init_hyp_mode(void)
+static void cpu_prepare_hyp_mode(int cpu)
 {
-	struct kvm_nvhe_init_params *params = this_cpu_ptr_nvhe_sym(kvm_init_params);
-	struct arm_smccc_res res;
+	struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
 	unsigned long tcr;
 
-	/* Switch from the HYP stub to our own HYP init vector */
-	__hyp_set_vectors(kvm_get_idmap_vector());
-
 	/*
 	 * Calculate the raw per-cpu offset without a translation from the
 	 * kernel's mapping to the linear mapping, and store it in tpidr_el2
 	 * so that we can use adr_l to access per-cpu variables in EL2.
 	 * Also drop the KASAN tag which gets in the way...
 	 */
-	params->tpidr_el2 = (unsigned long)kasan_reset_tag(this_cpu_ptr_nvhe_sym(__per_cpu_start)) -
+	params->tpidr_el2 = (unsigned long)kasan_reset_tag(per_cpu_ptr_nvhe_sym(__per_cpu_start, cpu)) -
 			    (unsigned long)kvm_ksym_ref(CHOOSE_NVHE_SYM(__per_cpu_start));
 
 	params->mair_el2 = read_sysreg(mair_el1);
@@ -1422,7 +1418,7 @@ static void cpu_init_hyp_mode(void)
 	tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
 	params->tcr_el2 = tcr;
 
-	params->stack_hyp_va = kern_hyp_va(__this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE);
+	params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
 	params->pgd_pa = kvm_mmu_get_httbr();
 
 	/*
@@ -1430,6 +1426,15 @@ static void cpu_init_hyp_mode(void)
 	 * be read while the MMU is off.
 	 */
 	kvm_flush_dcache_to_poc(params, sizeof(*params));
+}
+
+static void cpu_init_hyp_mode(void)
+{
+	struct kvm_nvhe_init_params *params;
+	struct arm_smccc_res res;
+
+	/* Switch from the HYP stub to our own HYP init vector */
+	__hyp_set_vectors(kvm_get_idmap_vector());
 
 	/*
 	 * Call initialization code, and switch to the full blown HYP code.
@@ -1438,6 +1443,7 @@ static void cpu_init_hyp_mode(void)
 	 * cpus_have_const_cap() wrapper.
 	 */
 	BUG_ON(!system_capabilities_finalized());
+	params = this_cpu_ptr_nvhe_sym(kvm_init_params);
 	arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init), virt_to_phys(params), &res);
 	WARN_ON(res.a0 != SMCCC_RET_SUCCESS);
 
@@ -1785,19 +1791,19 @@ static int init_hyp_mode(void)
 		}
 	}
 
-	/*
-	 * Map Hyp percpu pages
-	 */
 	for_each_possible_cpu(cpu) {
 		char *percpu_begin = (char *)kvm_arm_hyp_percpu_base[cpu];
 		char *percpu_end = percpu_begin + nvhe_percpu_size();
 
+		/* Map Hyp percpu pages */
 		err = create_hyp_mappings(percpu_begin, percpu_end, PAGE_HYP);
-
 		if (err) {
 			kvm_err("Cannot map hyp percpu region\n");
 			goto out_err;
 		}
+
+		/* Prepare the CPU initialization parameters */
+		cpu_prepare_hyp_mode(cpu);
 	}
 
 	if (is_protected_kvm_enabled()) {
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 04/32] KVM: arm64: Initialize kvm_nvhe_init_params early
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

Move the initialization of kvm_nvhe_init_params in a dedicated function
that is run early, and only once during KVM init, rather than every time
the KVM vectors are set and reset.

This also opens the opportunity for the hypervisor to change the init
structs during boot, hence simplifying the replacement of host-provided
page-table by the one the hypervisor will create for itself.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/arm.c | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fc4c95dd2d26..2d1e7ef69c04 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1383,22 +1383,18 @@ static int kvm_init_vector_slots(void)
 	return 0;
 }
 
-static void cpu_init_hyp_mode(void)
+static void cpu_prepare_hyp_mode(int cpu)
 {
-	struct kvm_nvhe_init_params *params = this_cpu_ptr_nvhe_sym(kvm_init_params);
-	struct arm_smccc_res res;
+	struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
 	unsigned long tcr;
 
-	/* Switch from the HYP stub to our own HYP init vector */
-	__hyp_set_vectors(kvm_get_idmap_vector());
-
 	/*
 	 * Calculate the raw per-cpu offset without a translation from the
 	 * kernel's mapping to the linear mapping, and store it in tpidr_el2
 	 * so that we can use adr_l to access per-cpu variables in EL2.
 	 * Also drop the KASAN tag which gets in the way...
 	 */
-	params->tpidr_el2 = (unsigned long)kasan_reset_tag(this_cpu_ptr_nvhe_sym(__per_cpu_start)) -
+	params->tpidr_el2 = (unsigned long)kasan_reset_tag(per_cpu_ptr_nvhe_sym(__per_cpu_start, cpu)) -
 			    (unsigned long)kvm_ksym_ref(CHOOSE_NVHE_SYM(__per_cpu_start));
 
 	params->mair_el2 = read_sysreg(mair_el1);
@@ -1422,7 +1418,7 @@ static void cpu_init_hyp_mode(void)
 	tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
 	params->tcr_el2 = tcr;
 
-	params->stack_hyp_va = kern_hyp_va(__this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE);
+	params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
 	params->pgd_pa = kvm_mmu_get_httbr();
 
 	/*
@@ -1430,6 +1426,15 @@ static void cpu_init_hyp_mode(void)
 	 * be read while the MMU is off.
 	 */
 	kvm_flush_dcache_to_poc(params, sizeof(*params));
+}
+
+static void cpu_init_hyp_mode(void)
+{
+	struct kvm_nvhe_init_params *params;
+	struct arm_smccc_res res;
+
+	/* Switch from the HYP stub to our own HYP init vector */
+	__hyp_set_vectors(kvm_get_idmap_vector());
 
 	/*
 	 * Call initialization code, and switch to the full blown HYP code.
@@ -1438,6 +1443,7 @@ static void cpu_init_hyp_mode(void)
 	 * cpus_have_const_cap() wrapper.
 	 */
 	BUG_ON(!system_capabilities_finalized());
+	params = this_cpu_ptr_nvhe_sym(kvm_init_params);
 	arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init), virt_to_phys(params), &res);
 	WARN_ON(res.a0 != SMCCC_RET_SUCCESS);
 
@@ -1785,19 +1791,19 @@ static int init_hyp_mode(void)
 		}
 	}
 
-	/*
-	 * Map Hyp percpu pages
-	 */
 	for_each_possible_cpu(cpu) {
 		char *percpu_begin = (char *)kvm_arm_hyp_percpu_base[cpu];
 		char *percpu_end = percpu_begin + nvhe_percpu_size();
 
+		/* Map Hyp percpu pages */
 		err = create_hyp_mappings(percpu_begin, percpu_end, PAGE_HYP);
-
 		if (err) {
 			kvm_err("Cannot map hyp percpu region\n");
 			goto out_err;
 		}
+
+		/* Prepare the CPU initialization parameters */
+		cpu_prepare_hyp_mode(cpu);
 	}
 
 	if (is_protected_kvm_enabled()) {
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 04/32] KVM: arm64: Initialize kvm_nvhe_init_params early
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Move the initialization of kvm_nvhe_init_params in a dedicated function
that is run early, and only once during KVM init, rather than every time
the KVM vectors are set and reset.

This also opens the opportunity for the hypervisor to change the init
structs during boot, hence simplifying the replacement of host-provided
page-table by the one the hypervisor will create for itself.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/arm.c | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fc4c95dd2d26..2d1e7ef69c04 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1383,22 +1383,18 @@ static int kvm_init_vector_slots(void)
 	return 0;
 }
 
-static void cpu_init_hyp_mode(void)
+static void cpu_prepare_hyp_mode(int cpu)
 {
-	struct kvm_nvhe_init_params *params = this_cpu_ptr_nvhe_sym(kvm_init_params);
-	struct arm_smccc_res res;
+	struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
 	unsigned long tcr;
 
-	/* Switch from the HYP stub to our own HYP init vector */
-	__hyp_set_vectors(kvm_get_idmap_vector());
-
 	/*
 	 * Calculate the raw per-cpu offset without a translation from the
 	 * kernel's mapping to the linear mapping, and store it in tpidr_el2
 	 * so that we can use adr_l to access per-cpu variables in EL2.
 	 * Also drop the KASAN tag which gets in the way...
 	 */
-	params->tpidr_el2 = (unsigned long)kasan_reset_tag(this_cpu_ptr_nvhe_sym(__per_cpu_start)) -
+	params->tpidr_el2 = (unsigned long)kasan_reset_tag(per_cpu_ptr_nvhe_sym(__per_cpu_start, cpu)) -
 			    (unsigned long)kvm_ksym_ref(CHOOSE_NVHE_SYM(__per_cpu_start));
 
 	params->mair_el2 = read_sysreg(mair_el1);
@@ -1422,7 +1418,7 @@ static void cpu_init_hyp_mode(void)
 	tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
 	params->tcr_el2 = tcr;
 
-	params->stack_hyp_va = kern_hyp_va(__this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE);
+	params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
 	params->pgd_pa = kvm_mmu_get_httbr();
 
 	/*
@@ -1430,6 +1426,15 @@ static void cpu_init_hyp_mode(void)
 	 * be read while the MMU is off.
 	 */
 	kvm_flush_dcache_to_poc(params, sizeof(*params));
+}
+
+static void cpu_init_hyp_mode(void)
+{
+	struct kvm_nvhe_init_params *params;
+	struct arm_smccc_res res;
+
+	/* Switch from the HYP stub to our own HYP init vector */
+	__hyp_set_vectors(kvm_get_idmap_vector());
 
 	/*
 	 * Call initialization code, and switch to the full blown HYP code.
@@ -1438,6 +1443,7 @@ static void cpu_init_hyp_mode(void)
 	 * cpus_have_const_cap() wrapper.
 	 */
 	BUG_ON(!system_capabilities_finalized());
+	params = this_cpu_ptr_nvhe_sym(kvm_init_params);
 	arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init), virt_to_phys(params), &res);
 	WARN_ON(res.a0 != SMCCC_RET_SUCCESS);
 
@@ -1785,19 +1791,19 @@ static int init_hyp_mode(void)
 		}
 	}
 
-	/*
-	 * Map Hyp percpu pages
-	 */
 	for_each_possible_cpu(cpu) {
 		char *percpu_begin = (char *)kvm_arm_hyp_percpu_base[cpu];
 		char *percpu_end = percpu_begin + nvhe_percpu_size();
 
+		/* Map Hyp percpu pages */
 		err = create_hyp_mappings(percpu_begin, percpu_end, PAGE_HYP);
-
 		if (err) {
 			kvm_err("Cannot map hyp percpu region\n");
 			goto out_err;
 		}
+
+		/* Prepare the CPU initialization parameters */
+		cpu_prepare_hyp_mode(cpu);
 	}
 
 	if (is_protected_kvm_enabled()) {
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 05/32] KVM: arm64: Avoid free_page() in page-table allocator
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Currently, the KVM page-table allocator uses a mix of put_page() and
free_page() calls depending on the context even though page-allocation
is always achieved using variants of __get_free_page().

Make the code consistent by using put_page() throughout, and reduce the
memory management API surface used by the page-table code. This will
ease factoring out page-allocation from pgtable.c, which is a
pre-requisite to creating page-tables at EL2.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 4d177ce1d536..81fe032f34d1 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -413,7 +413,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits)
 static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			   enum kvm_pgtable_walk_flags flag, void * const arg)
 {
-	free_page((unsigned long)kvm_pte_follow(*ptep));
+	put_page(virt_to_page(kvm_pte_follow(*ptep)));
 	return 0;
 }
 
@@ -425,7 +425,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-	free_page((unsigned long)pgt->pgd);
+	put_page(virt_to_page(pgt->pgd));
 	pgt->pgd = NULL;
 }
 
@@ -577,7 +577,7 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 	if (!data->anchor)
 		return 0;
 
-	free_page((unsigned long)kvm_pte_follow(*ptep));
+	put_page(virt_to_page(kvm_pte_follow(*ptep)));
 	put_page(virt_to_page(ptep));
 
 	if (data->anchor == ptep) {
@@ -700,7 +700,7 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	}
 
 	if (childp)
-		free_page((unsigned long)childp);
+		put_page(virt_to_page(childp));
 
 	return 0;
 }
@@ -897,7 +897,7 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	put_page(virt_to_page(ptep));
 
 	if (kvm_pte_table(pte, level))
-		free_page((unsigned long)kvm_pte_follow(pte));
+		put_page(virt_to_page(kvm_pte_follow(pte)));
 
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 05/32] KVM: arm64: Avoid free_page() in page-table allocator
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

Currently, the KVM page-table allocator uses a mix of put_page() and
free_page() calls depending on the context even though page-allocation
is always achieved using variants of __get_free_page().

Make the code consistent by using put_page() throughout, and reduce the
memory management API surface used by the page-table code. This will
ease factoring out page-allocation from pgtable.c, which is a
pre-requisite to creating page-tables at EL2.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 4d177ce1d536..81fe032f34d1 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -413,7 +413,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits)
 static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			   enum kvm_pgtable_walk_flags flag, void * const arg)
 {
-	free_page((unsigned long)kvm_pte_follow(*ptep));
+	put_page(virt_to_page(kvm_pte_follow(*ptep)));
 	return 0;
 }
 
@@ -425,7 +425,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-	free_page((unsigned long)pgt->pgd);
+	put_page(virt_to_page(pgt->pgd));
 	pgt->pgd = NULL;
 }
 
@@ -577,7 +577,7 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 	if (!data->anchor)
 		return 0;
 
-	free_page((unsigned long)kvm_pte_follow(*ptep));
+	put_page(virt_to_page(kvm_pte_follow(*ptep)));
 	put_page(virt_to_page(ptep));
 
 	if (data->anchor == ptep) {
@@ -700,7 +700,7 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	}
 
 	if (childp)
-		free_page((unsigned long)childp);
+		put_page(virt_to_page(childp));
 
 	return 0;
 }
@@ -897,7 +897,7 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	put_page(virt_to_page(ptep));
 
 	if (kvm_pte_table(pte, level))
-		free_page((unsigned long)kvm_pte_follow(pte));
+		put_page(virt_to_page(kvm_pte_follow(pte)));
 
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 05/32] KVM: arm64: Avoid free_page() in page-table allocator
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Currently, the KVM page-table allocator uses a mix of put_page() and
free_page() calls depending on the context even though page-allocation
is always achieved using variants of __get_free_page().

Make the code consistent by using put_page() throughout, and reduce the
memory management API surface used by the page-table code. This will
ease factoring out page-allocation from pgtable.c, which is a
pre-requisite to creating page-tables at EL2.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 4d177ce1d536..81fe032f34d1 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -413,7 +413,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits)
 static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			   enum kvm_pgtable_walk_flags flag, void * const arg)
 {
-	free_page((unsigned long)kvm_pte_follow(*ptep));
+	put_page(virt_to_page(kvm_pte_follow(*ptep)));
 	return 0;
 }
 
@@ -425,7 +425,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-	free_page((unsigned long)pgt->pgd);
+	put_page(virt_to_page(pgt->pgd));
 	pgt->pgd = NULL;
 }
 
@@ -577,7 +577,7 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 	if (!data->anchor)
 		return 0;
 
-	free_page((unsigned long)kvm_pte_follow(*ptep));
+	put_page(virt_to_page(kvm_pte_follow(*ptep)));
 	put_page(virt_to_page(ptep));
 
 	if (data->anchor == ptep) {
@@ -700,7 +700,7 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	}
 
 	if (childp)
-		free_page((unsigned long)childp);
+		put_page(virt_to_page(childp));
 
 	return 0;
 }
@@ -897,7 +897,7 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	put_page(virt_to_page(ptep));
 
 	if (kvm_pte_table(pte, level))
-		free_page((unsigned long)kvm_pte_follow(pte));
+		put_page(virt_to_page(kvm_pte_follow(pte)));
 
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 06/32] KVM: arm64: Factor memory allocation out of pgtable.c
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In preparation for enabling the creation of page-tables at EL2, factor
all memory allocation out of the page-table code, hence making it
re-usable with any compatible memory allocator.

No functional changes intended.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 41 +++++++++++-
 arch/arm64/kvm/hyp/pgtable.c         | 98 +++++++++++++++++-----------
 arch/arm64/kvm/mmu.c                 | 66 ++++++++++++++++++-
 3 files changed, 163 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8886d43cfb11..3c306f90f7da 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -13,17 +13,50 @@
 
 typedef u64 kvm_pte_t;
 
+/**
+ * struct kvm_pgtable_mm_ops - Memory management callbacks.
+ * @zalloc_page:	Allocate a single zeroed memory page. The @arg parameter
+ *			can be used by the walker to pass a memcache. The
+ *			initial refcount of the page is 1.
+ * @zalloc_pages_exact:	Allocate an exact number of zeroed memory pages. The
+ *			@size parameter is in bytes, it is automatically rounded
+ *			to PAGE_SIZE and the resulting allocation is physically
+ *			contiguous.
+ * @free_pages_exact:	Free an exact number of memory pages, to free memory
+ *			allocated with zalloc_pages_exact.
+ * @get_page:		Increment the refcount on a page.
+ * @put_page:		Decrement the refcount on a page. When the refcount
+ *			reaches 0 the page is automatically freed.
+ * @page_count:		Return the refcount of a page.
+ * @phys_to_virt:	Convert a physical address into a virtual address as
+ *			accessible in the current context.
+ * @virt_to_phys:	Convert a virtual address in the current context into a
+ *			physical address.
+ */
+struct kvm_pgtable_mm_ops {
+	void*		(*zalloc_page)(void *arg);
+	void*		(*zalloc_pages_exact)(size_t size);
+	void		(*free_pages_exact)(void *addr, size_t size);
+	void		(*get_page)(void *addr);
+	void		(*put_page)(void *addr);
+	int		(*page_count)(void *addr);
+	void*		(*phys_to_virt)(phys_addr_t phys);
+	phys_addr_t	(*virt_to_phys)(void *addr);
+};
+
 /**
  * struct kvm_pgtable - KVM page-table.
  * @ia_bits:		Maximum input address size, in bits.
  * @start_level:	Level at which the page-table walk starts.
  * @pgd:		Pointer to the first top-level entry of the page-table.
+ * @mm_ops:		Memory management callbacks.
  * @mmu:		Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
  */
 struct kvm_pgtable {
 	u32					ia_bits;
 	u32					start_level;
 	kvm_pte_t				*pgd;
+	struct kvm_pgtable_mm_ops		*mm_ops;
 
 	/* Stage-2 only */
 	struct kvm_s2_mmu			*mmu;
@@ -86,10 +119,12 @@ struct kvm_pgtable_walker {
  * kvm_pgtable_hyp_init() - Initialise a hypervisor stage-1 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
  * @va_bits:	Maximum virtual address bits.
+ * @mm_ops:	Memory management callbacks.
  *
  * Return: 0 on success, negative error code on failure.
  */
-int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits);
+int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
+			 struct kvm_pgtable_mm_ops *mm_ops);
 
 /**
  * kvm_pgtable_hyp_destroy() - Destroy an unused hypervisor stage-1 page-table.
@@ -126,10 +161,12 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
  * kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
  * @kvm:	KVM structure representing the guest virtual machine.
+ * @mm_ops:	Memory management callbacks.
  *
  * Return: 0 on success, negative error code on failure.
  */
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm);
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+			    struct kvm_pgtable_mm_ops *mm_ops);
 
 /**
  * kvm_pgtable_stage2_destroy() - Destroy an unused guest stage-2 page-table.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 81fe032f34d1..b975a67d1f85 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -152,9 +152,9 @@ static kvm_pte_t kvm_phys_to_pte(u64 pa)
 	return pte;
 }
 
-static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte)
+static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte, struct kvm_pgtable_mm_ops *mm_ops)
 {
-	return __va(kvm_pte_to_phys(pte));
+	return mm_ops->phys_to_virt(kvm_pte_to_phys(pte));
 }
 
 static void kvm_set_invalid_pte(kvm_pte_t *ptep)
@@ -163,9 +163,10 @@ static void kvm_set_invalid_pte(kvm_pte_t *ptep)
 	WRITE_ONCE(*ptep, pte & ~KVM_PTE_VALID);
 }
 
-static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
+static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
+			      struct kvm_pgtable_mm_ops *mm_ops)
 {
-	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(__pa(childp));
+	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
 
 	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
 	pte |= KVM_PTE_VALID;
@@ -227,7 +228,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(pte);
+	childp = kvm_pte_follow(pte, data->pgt->mm_ops);
 	ret = __kvm_pgtable_walk(data, childp, level + 1);
 	if (ret)
 		goto out;
@@ -302,8 +303,9 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 }
 
 struct hyp_map_data {
-	u64		phys;
-	kvm_pte_t	attr;
+	u64				phys;
+	kvm_pte_t			attr;
+	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
 static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
@@ -358,6 +360,8 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			  enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t *childp;
+	struct hyp_map_data *data = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
 	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
 		return 0;
@@ -365,11 +369,11 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
-	childp = (kvm_pte_t *)get_zeroed_page(GFP_KERNEL);
+	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
 	if (!childp)
 		return -ENOMEM;
 
-	kvm_set_table_pte(ptep, childp);
+	kvm_set_table_pte(ptep, childp, mm_ops);
 	return 0;
 }
 
@@ -379,6 +383,7 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	int ret;
 	struct hyp_map_data map_data = {
 		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
+		.mm_ops	= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_map_walker,
@@ -396,16 +401,18 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	return ret;
 }
 
-int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits)
+int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
+			 struct kvm_pgtable_mm_ops *mm_ops)
 {
 	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
 
-	pgt->pgd = (kvm_pte_t *)get_zeroed_page(GFP_KERNEL);
+	pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
 	pgt->ia_bits		= va_bits;
 	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
+	pgt->mm_ops		= mm_ops;
 	pgt->mmu		= NULL;
 	return 0;
 }
@@ -413,7 +420,9 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits)
 static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			   enum kvm_pgtable_walk_flags flag, void * const arg)
 {
-	put_page(virt_to_page(kvm_pte_follow(*ptep)));
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
+
+	mm_ops->put_page((void *)kvm_pte_follow(*ptep, mm_ops));
 	return 0;
 }
 
@@ -422,10 +431,11 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_free_walker,
 		.flags	= KVM_PGTABLE_WALK_TABLE_POST,
+		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-	put_page(virt_to_page(pgt->pgd));
+	pgt->mm_ops->put_page(pgt->pgd);
 	pgt->pgd = NULL;
 }
 
@@ -437,6 +447,8 @@ struct stage2_map_data {
 
 	struct kvm_s2_mmu		*mmu;
 	struct kvm_mmu_memory_cache	*memcache;
+
+	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
 static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
@@ -470,7 +482,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 {
 	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
-	struct page *page = virt_to_page(ptep);
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
 	if (!kvm_block_mapping_supported(addr, end, phys, level))
 		return -E2BIG;
@@ -492,11 +504,11 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 		 */
 		kvm_set_invalid_pte(ptep);
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
-		put_page(page);
+		mm_ops->put_page(ptep);
 	}
 
 	smp_store_release(ptep, new);
-	get_page(page);
+	mm_ops->get_page(ptep);
 	data->phys += granule;
 	return 0;
 }
@@ -526,13 +538,13 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 				struct stage2_map_data *data)
 {
-	int ret;
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 	kvm_pte_t *childp, pte = *ptep;
-	struct page *page = virt_to_page(ptep);
+	int ret;
 
 	if (data->anchor) {
 		if (kvm_pte_valid(pte))
-			put_page(page);
+			mm_ops->put_page(ptep);
 
 		return 0;
 	}
@@ -547,7 +559,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (!data->memcache)
 		return -ENOMEM;
 
-	childp = kvm_mmu_memory_cache_alloc(data->memcache);
+	childp = mm_ops->zalloc_page(data->memcache);
 	if (!childp)
 		return -ENOMEM;
 
@@ -559,11 +571,11 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (kvm_pte_valid(pte)) {
 		kvm_set_invalid_pte(ptep);
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
-		put_page(page);
+		mm_ops->put_page(ptep);
 	}
 
-	kvm_set_table_pte(ptep, childp);
-	get_page(page);
+	kvm_set_table_pte(ptep, childp, mm_ops);
+	mm_ops->get_page(ptep);
 
 	return 0;
 }
@@ -572,13 +584,14 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 	int ret = 0;
 
 	if (!data->anchor)
 		return 0;
 
-	put_page(virt_to_page(kvm_pte_follow(*ptep)));
-	put_page(virt_to_page(ptep));
+	mm_ops->put_page(kvm_pte_follow(*ptep, mm_ops));
+	mm_ops->put_page(ptep);
 
 	if (data->anchor == ptep) {
 		data->anchor = NULL;
@@ -633,6 +646,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
 		.mmu		= pgt->mmu,
 		.memcache	= mc,
+		.mm_ops		= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
@@ -669,7 +683,9 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
-	struct kvm_s2_mmu *mmu = arg;
+	struct kvm_pgtable *pgt = arg;
+	struct kvm_s2_mmu *mmu = pgt->mmu;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep, *childp = NULL;
 	bool need_flush = false;
 
@@ -677,9 +693,9 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		return 0;
 
 	if (kvm_pte_table(pte, level)) {
-		childp = kvm_pte_follow(pte);
+		childp = kvm_pte_follow(pte, mm_ops);
 
-		if (page_count(virt_to_page(childp)) != 1)
+		if (mm_ops->page_count(childp) != 1)
 			return 0;
 	} else if (stage2_pte_cacheable(pte)) {
 		need_flush = true;
@@ -692,15 +708,15 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 */
 	kvm_set_invalid_pte(ptep);
 	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
-	put_page(virt_to_page(ptep));
+	mm_ops->put_page(ptep);
 
 	if (need_flush) {
-		stage2_flush_dcache(kvm_pte_follow(pte),
+		stage2_flush_dcache(kvm_pte_follow(pte, mm_ops),
 				    kvm_granule_size(level));
 	}
 
 	if (childp)
-		put_page(virt_to_page(childp));
+		mm_ops->put_page(childp);
 
 	return 0;
 }
@@ -709,7 +725,7 @@ int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_unmap_walker,
-		.arg	= pgt->mmu,
+		.arg	= pgt,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
 	};
 
@@ -841,12 +857,13 @@ static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	kvm_pte_t pte = *ptep;
 
 	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pte))
 		return 0;
 
-	stage2_flush_dcache(kvm_pte_follow(pte), kvm_granule_size(level));
+	stage2_flush_dcache(kvm_pte_follow(pte, mm_ops), kvm_granule_size(level));
 	return 0;
 }
 
@@ -855,6 +872,7 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_flush_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF,
+		.arg	= pgt->mm_ops,
 	};
 
 	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
@@ -863,7 +881,8 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 	return kvm_pgtable_walk(pgt, addr, size, &walker);
 }
 
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+			    struct kvm_pgtable_mm_ops *mm_ops)
 {
 	size_t pgd_sz;
 	u64 vtcr = kvm->arch.vtcr;
@@ -872,12 +891,13 @@ int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
 	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
-	pgt->pgd = alloc_pages_exact(pgd_sz, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
 	pgt->ia_bits		= ia_bits;
 	pgt->start_level	= start_level;
+	pgt->mm_ops		= mm_ops;
 	pgt->mmu		= &kvm->arch.mmu;
 
 	/* Ensure zeroed PGD pages are visible to the hardware walker */
@@ -889,15 +909,16 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	kvm_pte_t pte = *ptep;
 
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	put_page(virt_to_page(ptep));
+	mm_ops->put_page(ptep);
 
 	if (kvm_pte_table(pte, level))
-		put_page(virt_to_page(kvm_pte_follow(pte)));
+		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
 
 	return 0;
 }
@@ -909,10 +930,11 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 		.cb	= stage2_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF |
 			  KVM_PGTABLE_WALK_TABLE_POST,
+		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
 	pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
-	free_pages_exact(pgt->pgd, pgd_sz);
+	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
 	pgt->pgd = NULL;
 }
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 77cb2d28f2a4..4d41d7838d53 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -88,6 +88,44 @@ static bool kvm_is_device_pfn(unsigned long pfn)
 	return !pfn_valid(pfn);
 }
 
+static void *stage2_memcache_zalloc_page(void *arg)
+{
+	struct kvm_mmu_memory_cache *mc = arg;
+
+	/* Allocated with __GFP_ZERO, so no need to zero */
+	return kvm_mmu_memory_cache_alloc(mc);
+}
+
+static void *kvm_host_zalloc_pages_exact(size_t size)
+{
+	return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+}
+
+static void kvm_host_get_page(void *addr)
+{
+	get_page(virt_to_page(addr));
+}
+
+static void kvm_host_put_page(void *addr)
+{
+	put_page(virt_to_page(addr));
+}
+
+static int kvm_host_page_count(void *addr)
+{
+	return page_count(virt_to_page(addr));
+}
+
+static phys_addr_t kvm_host_pa(void *addr)
+{
+	return __pa(addr);
+}
+
+static void *kvm_host_va(phys_addr_t phys)
+{
+	return __va(phys);
+}
+
 /*
  * Unmapping vs dcache management:
  *
@@ -351,6 +389,17 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 	return 0;
 }
 
+static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
+	.zalloc_page		= stage2_memcache_zalloc_page,
+	.zalloc_pages_exact	= kvm_host_zalloc_pages_exact,
+	.free_pages_exact	= free_pages_exact,
+	.get_page		= kvm_host_get_page,
+	.put_page		= kvm_host_put_page,
+	.page_count		= kvm_host_page_count,
+	.phys_to_virt		= kvm_host_va,
+	.virt_to_phys		= kvm_host_pa,
+};
+
 /**
  * kvm_init_stage2_mmu - Initialise a S2 MMU strucrure
  * @kvm:	The pointer to the KVM structure
@@ -374,7 +423,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	if (!pgt)
 		return -ENOMEM;
 
-	err = kvm_pgtable_stage2_init(pgt, kvm);
+	err = kvm_pgtable_stage2_init(pgt, kvm, &kvm_s2_mm_ops);
 	if (err)
 		goto out_free_pgtable;
 
@@ -1208,6 +1257,19 @@ static int kvm_map_idmap_text(void)
 	return err;
 }
 
+static void *kvm_hyp_zalloc_page(void *arg)
+{
+	return (void *)get_zeroed_page(GFP_KERNEL);
+}
+
+static struct kvm_pgtable_mm_ops kvm_hyp_mm_ops = {
+	.zalloc_page		= kvm_hyp_zalloc_page,
+	.get_page		= kvm_host_get_page,
+	.put_page		= kvm_host_put_page,
+	.phys_to_virt		= kvm_host_va,
+	.virt_to_phys		= kvm_host_pa,
+};
+
 int kvm_mmu_init(void)
 {
 	int err;
@@ -1251,7 +1313,7 @@ int kvm_mmu_init(void)
 		goto out;
 	}
 
-	err = kvm_pgtable_hyp_init(hyp_pgtable, hyp_va_bits);
+	err = kvm_pgtable_hyp_init(hyp_pgtable, hyp_va_bits, &kvm_hyp_mm_ops);
 	if (err)
 		goto out_free_pgtable;
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 06/32] KVM: arm64: Factor memory allocation out of pgtable.c
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

In preparation for enabling the creation of page-tables at EL2, factor
all memory allocation out of the page-table code, hence making it
re-usable with any compatible memory allocator.

No functional changes intended.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 41 +++++++++++-
 arch/arm64/kvm/hyp/pgtable.c         | 98 +++++++++++++++++-----------
 arch/arm64/kvm/mmu.c                 | 66 ++++++++++++++++++-
 3 files changed, 163 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8886d43cfb11..3c306f90f7da 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -13,17 +13,50 @@
 
 typedef u64 kvm_pte_t;
 
+/**
+ * struct kvm_pgtable_mm_ops - Memory management callbacks.
+ * @zalloc_page:	Allocate a single zeroed memory page. The @arg parameter
+ *			can be used by the walker to pass a memcache. The
+ *			initial refcount of the page is 1.
+ * @zalloc_pages_exact:	Allocate an exact number of zeroed memory pages. The
+ *			@size parameter is in bytes, it is automatically rounded
+ *			to PAGE_SIZE and the resulting allocation is physically
+ *			contiguous.
+ * @free_pages_exact:	Free an exact number of memory pages, to free memory
+ *			allocated with zalloc_pages_exact.
+ * @get_page:		Increment the refcount on a page.
+ * @put_page:		Decrement the refcount on a page. When the refcount
+ *			reaches 0 the page is automatically freed.
+ * @page_count:		Return the refcount of a page.
+ * @phys_to_virt:	Convert a physical address into a virtual address as
+ *			accessible in the current context.
+ * @virt_to_phys:	Convert a virtual address in the current context into a
+ *			physical address.
+ */
+struct kvm_pgtable_mm_ops {
+	void*		(*zalloc_page)(void *arg);
+	void*		(*zalloc_pages_exact)(size_t size);
+	void		(*free_pages_exact)(void *addr, size_t size);
+	void		(*get_page)(void *addr);
+	void		(*put_page)(void *addr);
+	int		(*page_count)(void *addr);
+	void*		(*phys_to_virt)(phys_addr_t phys);
+	phys_addr_t	(*virt_to_phys)(void *addr);
+};
+
 /**
  * struct kvm_pgtable - KVM page-table.
  * @ia_bits:		Maximum input address size, in bits.
  * @start_level:	Level at which the page-table walk starts.
  * @pgd:		Pointer to the first top-level entry of the page-table.
+ * @mm_ops:		Memory management callbacks.
  * @mmu:		Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
  */
 struct kvm_pgtable {
 	u32					ia_bits;
 	u32					start_level;
 	kvm_pte_t				*pgd;
+	struct kvm_pgtable_mm_ops		*mm_ops;
 
 	/* Stage-2 only */
 	struct kvm_s2_mmu			*mmu;
@@ -86,10 +119,12 @@ struct kvm_pgtable_walker {
  * kvm_pgtable_hyp_init() - Initialise a hypervisor stage-1 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
  * @va_bits:	Maximum virtual address bits.
+ * @mm_ops:	Memory management callbacks.
  *
  * Return: 0 on success, negative error code on failure.
  */
-int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits);
+int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
+			 struct kvm_pgtable_mm_ops *mm_ops);
 
 /**
  * kvm_pgtable_hyp_destroy() - Destroy an unused hypervisor stage-1 page-table.
@@ -126,10 +161,12 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
  * kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
  * @kvm:	KVM structure representing the guest virtual machine.
+ * @mm_ops:	Memory management callbacks.
  *
  * Return: 0 on success, negative error code on failure.
  */
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm);
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+			    struct kvm_pgtable_mm_ops *mm_ops);
 
 /**
  * kvm_pgtable_stage2_destroy() - Destroy an unused guest stage-2 page-table.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 81fe032f34d1..b975a67d1f85 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -152,9 +152,9 @@ static kvm_pte_t kvm_phys_to_pte(u64 pa)
 	return pte;
 }
 
-static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte)
+static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte, struct kvm_pgtable_mm_ops *mm_ops)
 {
-	return __va(kvm_pte_to_phys(pte));
+	return mm_ops->phys_to_virt(kvm_pte_to_phys(pte));
 }
 
 static void kvm_set_invalid_pte(kvm_pte_t *ptep)
@@ -163,9 +163,10 @@ static void kvm_set_invalid_pte(kvm_pte_t *ptep)
 	WRITE_ONCE(*ptep, pte & ~KVM_PTE_VALID);
 }
 
-static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
+static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
+			      struct kvm_pgtable_mm_ops *mm_ops)
 {
-	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(__pa(childp));
+	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
 
 	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
 	pte |= KVM_PTE_VALID;
@@ -227,7 +228,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(pte);
+	childp = kvm_pte_follow(pte, data->pgt->mm_ops);
 	ret = __kvm_pgtable_walk(data, childp, level + 1);
 	if (ret)
 		goto out;
@@ -302,8 +303,9 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 }
 
 struct hyp_map_data {
-	u64		phys;
-	kvm_pte_t	attr;
+	u64				phys;
+	kvm_pte_t			attr;
+	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
 static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
@@ -358,6 +360,8 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			  enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t *childp;
+	struct hyp_map_data *data = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
 	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
 		return 0;
@@ -365,11 +369,11 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
-	childp = (kvm_pte_t *)get_zeroed_page(GFP_KERNEL);
+	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
 	if (!childp)
 		return -ENOMEM;
 
-	kvm_set_table_pte(ptep, childp);
+	kvm_set_table_pte(ptep, childp, mm_ops);
 	return 0;
 }
 
@@ -379,6 +383,7 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	int ret;
 	struct hyp_map_data map_data = {
 		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
+		.mm_ops	= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_map_walker,
@@ -396,16 +401,18 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	return ret;
 }
 
-int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits)
+int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
+			 struct kvm_pgtable_mm_ops *mm_ops)
 {
 	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
 
-	pgt->pgd = (kvm_pte_t *)get_zeroed_page(GFP_KERNEL);
+	pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
 	pgt->ia_bits		= va_bits;
 	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
+	pgt->mm_ops		= mm_ops;
 	pgt->mmu		= NULL;
 	return 0;
 }
@@ -413,7 +420,9 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits)
 static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			   enum kvm_pgtable_walk_flags flag, void * const arg)
 {
-	put_page(virt_to_page(kvm_pte_follow(*ptep)));
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
+
+	mm_ops->put_page((void *)kvm_pte_follow(*ptep, mm_ops));
 	return 0;
 }
 
@@ -422,10 +431,11 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_free_walker,
 		.flags	= KVM_PGTABLE_WALK_TABLE_POST,
+		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-	put_page(virt_to_page(pgt->pgd));
+	pgt->mm_ops->put_page(pgt->pgd);
 	pgt->pgd = NULL;
 }
 
@@ -437,6 +447,8 @@ struct stage2_map_data {
 
 	struct kvm_s2_mmu		*mmu;
 	struct kvm_mmu_memory_cache	*memcache;
+
+	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
 static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
@@ -470,7 +482,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 {
 	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
-	struct page *page = virt_to_page(ptep);
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
 	if (!kvm_block_mapping_supported(addr, end, phys, level))
 		return -E2BIG;
@@ -492,11 +504,11 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 		 */
 		kvm_set_invalid_pte(ptep);
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
-		put_page(page);
+		mm_ops->put_page(ptep);
 	}
 
 	smp_store_release(ptep, new);
-	get_page(page);
+	mm_ops->get_page(ptep);
 	data->phys += granule;
 	return 0;
 }
@@ -526,13 +538,13 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 				struct stage2_map_data *data)
 {
-	int ret;
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 	kvm_pte_t *childp, pte = *ptep;
-	struct page *page = virt_to_page(ptep);
+	int ret;
 
 	if (data->anchor) {
 		if (kvm_pte_valid(pte))
-			put_page(page);
+			mm_ops->put_page(ptep);
 
 		return 0;
 	}
@@ -547,7 +559,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (!data->memcache)
 		return -ENOMEM;
 
-	childp = kvm_mmu_memory_cache_alloc(data->memcache);
+	childp = mm_ops->zalloc_page(data->memcache);
 	if (!childp)
 		return -ENOMEM;
 
@@ -559,11 +571,11 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (kvm_pte_valid(pte)) {
 		kvm_set_invalid_pte(ptep);
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
-		put_page(page);
+		mm_ops->put_page(ptep);
 	}
 
-	kvm_set_table_pte(ptep, childp);
-	get_page(page);
+	kvm_set_table_pte(ptep, childp, mm_ops);
+	mm_ops->get_page(ptep);
 
 	return 0;
 }
@@ -572,13 +584,14 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 	int ret = 0;
 
 	if (!data->anchor)
 		return 0;
 
-	put_page(virt_to_page(kvm_pte_follow(*ptep)));
-	put_page(virt_to_page(ptep));
+	mm_ops->put_page(kvm_pte_follow(*ptep, mm_ops));
+	mm_ops->put_page(ptep);
 
 	if (data->anchor == ptep) {
 		data->anchor = NULL;
@@ -633,6 +646,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
 		.mmu		= pgt->mmu,
 		.memcache	= mc,
+		.mm_ops		= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
@@ -669,7 +683,9 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
-	struct kvm_s2_mmu *mmu = arg;
+	struct kvm_pgtable *pgt = arg;
+	struct kvm_s2_mmu *mmu = pgt->mmu;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep, *childp = NULL;
 	bool need_flush = false;
 
@@ -677,9 +693,9 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		return 0;
 
 	if (kvm_pte_table(pte, level)) {
-		childp = kvm_pte_follow(pte);
+		childp = kvm_pte_follow(pte, mm_ops);
 
-		if (page_count(virt_to_page(childp)) != 1)
+		if (mm_ops->page_count(childp) != 1)
 			return 0;
 	} else if (stage2_pte_cacheable(pte)) {
 		need_flush = true;
@@ -692,15 +708,15 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 */
 	kvm_set_invalid_pte(ptep);
 	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
-	put_page(virt_to_page(ptep));
+	mm_ops->put_page(ptep);
 
 	if (need_flush) {
-		stage2_flush_dcache(kvm_pte_follow(pte),
+		stage2_flush_dcache(kvm_pte_follow(pte, mm_ops),
 				    kvm_granule_size(level));
 	}
 
 	if (childp)
-		put_page(virt_to_page(childp));
+		mm_ops->put_page(childp);
 
 	return 0;
 }
@@ -709,7 +725,7 @@ int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_unmap_walker,
-		.arg	= pgt->mmu,
+		.arg	= pgt,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
 	};
 
@@ -841,12 +857,13 @@ static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	kvm_pte_t pte = *ptep;
 
 	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pte))
 		return 0;
 
-	stage2_flush_dcache(kvm_pte_follow(pte), kvm_granule_size(level));
+	stage2_flush_dcache(kvm_pte_follow(pte, mm_ops), kvm_granule_size(level));
 	return 0;
 }
 
@@ -855,6 +872,7 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_flush_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF,
+		.arg	= pgt->mm_ops,
 	};
 
 	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
@@ -863,7 +881,8 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 	return kvm_pgtable_walk(pgt, addr, size, &walker);
 }
 
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+			    struct kvm_pgtable_mm_ops *mm_ops)
 {
 	size_t pgd_sz;
 	u64 vtcr = kvm->arch.vtcr;
@@ -872,12 +891,13 @@ int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
 	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
-	pgt->pgd = alloc_pages_exact(pgd_sz, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
 	pgt->ia_bits		= ia_bits;
 	pgt->start_level	= start_level;
+	pgt->mm_ops		= mm_ops;
 	pgt->mmu		= &kvm->arch.mmu;
 
 	/* Ensure zeroed PGD pages are visible to the hardware walker */
@@ -889,15 +909,16 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	kvm_pte_t pte = *ptep;
 
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	put_page(virt_to_page(ptep));
+	mm_ops->put_page(ptep);
 
 	if (kvm_pte_table(pte, level))
-		put_page(virt_to_page(kvm_pte_follow(pte)));
+		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
 
 	return 0;
 }
@@ -909,10 +930,11 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 		.cb	= stage2_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF |
 			  KVM_PGTABLE_WALK_TABLE_POST,
+		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
 	pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
-	free_pages_exact(pgt->pgd, pgd_sz);
+	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
 	pgt->pgd = NULL;
 }
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 77cb2d28f2a4..4d41d7838d53 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -88,6 +88,44 @@ static bool kvm_is_device_pfn(unsigned long pfn)
 	return !pfn_valid(pfn);
 }
 
+static void *stage2_memcache_zalloc_page(void *arg)
+{
+	struct kvm_mmu_memory_cache *mc = arg;
+
+	/* Allocated with __GFP_ZERO, so no need to zero */
+	return kvm_mmu_memory_cache_alloc(mc);
+}
+
+static void *kvm_host_zalloc_pages_exact(size_t size)
+{
+	return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+}
+
+static void kvm_host_get_page(void *addr)
+{
+	get_page(virt_to_page(addr));
+}
+
+static void kvm_host_put_page(void *addr)
+{
+	put_page(virt_to_page(addr));
+}
+
+static int kvm_host_page_count(void *addr)
+{
+	return page_count(virt_to_page(addr));
+}
+
+static phys_addr_t kvm_host_pa(void *addr)
+{
+	return __pa(addr);
+}
+
+static void *kvm_host_va(phys_addr_t phys)
+{
+	return __va(phys);
+}
+
 /*
  * Unmapping vs dcache management:
  *
@@ -351,6 +389,17 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 	return 0;
 }
 
+static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
+	.zalloc_page		= stage2_memcache_zalloc_page,
+	.zalloc_pages_exact	= kvm_host_zalloc_pages_exact,
+	.free_pages_exact	= free_pages_exact,
+	.get_page		= kvm_host_get_page,
+	.put_page		= kvm_host_put_page,
+	.page_count		= kvm_host_page_count,
+	.phys_to_virt		= kvm_host_va,
+	.virt_to_phys		= kvm_host_pa,
+};
+
 /**
  * kvm_init_stage2_mmu - Initialise a S2 MMU strucrure
  * @kvm:	The pointer to the KVM structure
@@ -374,7 +423,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	if (!pgt)
 		return -ENOMEM;
 
-	err = kvm_pgtable_stage2_init(pgt, kvm);
+	err = kvm_pgtable_stage2_init(pgt, kvm, &kvm_s2_mm_ops);
 	if (err)
 		goto out_free_pgtable;
 
@@ -1208,6 +1257,19 @@ static int kvm_map_idmap_text(void)
 	return err;
 }
 
+static void *kvm_hyp_zalloc_page(void *arg)
+{
+	return (void *)get_zeroed_page(GFP_KERNEL);
+}
+
+static struct kvm_pgtable_mm_ops kvm_hyp_mm_ops = {
+	.zalloc_page		= kvm_hyp_zalloc_page,
+	.get_page		= kvm_host_get_page,
+	.put_page		= kvm_host_put_page,
+	.phys_to_virt		= kvm_host_va,
+	.virt_to_phys		= kvm_host_pa,
+};
+
 int kvm_mmu_init(void)
 {
 	int err;
@@ -1251,7 +1313,7 @@ int kvm_mmu_init(void)
 		goto out;
 	}
 
-	err = kvm_pgtable_hyp_init(hyp_pgtable, hyp_va_bits);
+	err = kvm_pgtable_hyp_init(hyp_pgtable, hyp_va_bits, &kvm_hyp_mm_ops);
 	if (err)
 		goto out_free_pgtable;
 
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 06/32] KVM: arm64: Factor memory allocation out of pgtable.c
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In preparation for enabling the creation of page-tables at EL2, factor
all memory allocation out of the page-table code, hence making it
re-usable with any compatible memory allocator.

No functional changes intended.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 41 +++++++++++-
 arch/arm64/kvm/hyp/pgtable.c         | 98 +++++++++++++++++-----------
 arch/arm64/kvm/mmu.c                 | 66 ++++++++++++++++++-
 3 files changed, 163 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8886d43cfb11..3c306f90f7da 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -13,17 +13,50 @@
 
 typedef u64 kvm_pte_t;
 
+/**
+ * struct kvm_pgtable_mm_ops - Memory management callbacks.
+ * @zalloc_page:	Allocate a single zeroed memory page. The @arg parameter
+ *			can be used by the walker to pass a memcache. The
+ *			initial refcount of the page is 1.
+ * @zalloc_pages_exact:	Allocate an exact number of zeroed memory pages. The
+ *			@size parameter is in bytes, it is automatically rounded
+ *			to PAGE_SIZE and the resulting allocation is physically
+ *			contiguous.
+ * @free_pages_exact:	Free an exact number of memory pages, to free memory
+ *			allocated with zalloc_pages_exact.
+ * @get_page:		Increment the refcount on a page.
+ * @put_page:		Decrement the refcount on a page. When the refcount
+ *			reaches 0 the page is automatically freed.
+ * @page_count:		Return the refcount of a page.
+ * @phys_to_virt:	Convert a physical address into a virtual address as
+ *			accessible in the current context.
+ * @virt_to_phys:	Convert a virtual address in the current context into a
+ *			physical address.
+ */
+struct kvm_pgtable_mm_ops {
+	void*		(*zalloc_page)(void *arg);
+	void*		(*zalloc_pages_exact)(size_t size);
+	void		(*free_pages_exact)(void *addr, size_t size);
+	void		(*get_page)(void *addr);
+	void		(*put_page)(void *addr);
+	int		(*page_count)(void *addr);
+	void*		(*phys_to_virt)(phys_addr_t phys);
+	phys_addr_t	(*virt_to_phys)(void *addr);
+};
+
 /**
  * struct kvm_pgtable - KVM page-table.
  * @ia_bits:		Maximum input address size, in bits.
  * @start_level:	Level at which the page-table walk starts.
  * @pgd:		Pointer to the first top-level entry of the page-table.
+ * @mm_ops:		Memory management callbacks.
  * @mmu:		Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
  */
 struct kvm_pgtable {
 	u32					ia_bits;
 	u32					start_level;
 	kvm_pte_t				*pgd;
+	struct kvm_pgtable_mm_ops		*mm_ops;
 
 	/* Stage-2 only */
 	struct kvm_s2_mmu			*mmu;
@@ -86,10 +119,12 @@ struct kvm_pgtable_walker {
  * kvm_pgtable_hyp_init() - Initialise a hypervisor stage-1 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
  * @va_bits:	Maximum virtual address bits.
+ * @mm_ops:	Memory management callbacks.
  *
  * Return: 0 on success, negative error code on failure.
  */
-int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits);
+int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
+			 struct kvm_pgtable_mm_ops *mm_ops);
 
 /**
  * kvm_pgtable_hyp_destroy() - Destroy an unused hypervisor stage-1 page-table.
@@ -126,10 +161,12 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
  * kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
  * @kvm:	KVM structure representing the guest virtual machine.
+ * @mm_ops:	Memory management callbacks.
  *
  * Return: 0 on success, negative error code on failure.
  */
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm);
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+			    struct kvm_pgtable_mm_ops *mm_ops);
 
 /**
  * kvm_pgtable_stage2_destroy() - Destroy an unused guest stage-2 page-table.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 81fe032f34d1..b975a67d1f85 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -152,9 +152,9 @@ static kvm_pte_t kvm_phys_to_pte(u64 pa)
 	return pte;
 }
 
-static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte)
+static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte, struct kvm_pgtable_mm_ops *mm_ops)
 {
-	return __va(kvm_pte_to_phys(pte));
+	return mm_ops->phys_to_virt(kvm_pte_to_phys(pte));
 }
 
 static void kvm_set_invalid_pte(kvm_pte_t *ptep)
@@ -163,9 +163,10 @@ static void kvm_set_invalid_pte(kvm_pte_t *ptep)
 	WRITE_ONCE(*ptep, pte & ~KVM_PTE_VALID);
 }
 
-static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
+static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
+			      struct kvm_pgtable_mm_ops *mm_ops)
 {
-	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(__pa(childp));
+	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
 
 	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
 	pte |= KVM_PTE_VALID;
@@ -227,7 +228,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(pte);
+	childp = kvm_pte_follow(pte, data->pgt->mm_ops);
 	ret = __kvm_pgtable_walk(data, childp, level + 1);
 	if (ret)
 		goto out;
@@ -302,8 +303,9 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 }
 
 struct hyp_map_data {
-	u64		phys;
-	kvm_pte_t	attr;
+	u64				phys;
+	kvm_pte_t			attr;
+	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
 static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
@@ -358,6 +360,8 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			  enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t *childp;
+	struct hyp_map_data *data = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
 	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
 		return 0;
@@ -365,11 +369,11 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
-	childp = (kvm_pte_t *)get_zeroed_page(GFP_KERNEL);
+	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
 	if (!childp)
 		return -ENOMEM;
 
-	kvm_set_table_pte(ptep, childp);
+	kvm_set_table_pte(ptep, childp, mm_ops);
 	return 0;
 }
 
@@ -379,6 +383,7 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	int ret;
 	struct hyp_map_data map_data = {
 		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
+		.mm_ops	= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_map_walker,
@@ -396,16 +401,18 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	return ret;
 }
 
-int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits)
+int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
+			 struct kvm_pgtable_mm_ops *mm_ops)
 {
 	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
 
-	pgt->pgd = (kvm_pte_t *)get_zeroed_page(GFP_KERNEL);
+	pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
 	pgt->ia_bits		= va_bits;
 	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
+	pgt->mm_ops		= mm_ops;
 	pgt->mmu		= NULL;
 	return 0;
 }
@@ -413,7 +420,9 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits)
 static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			   enum kvm_pgtable_walk_flags flag, void * const arg)
 {
-	put_page(virt_to_page(kvm_pte_follow(*ptep)));
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
+
+	mm_ops->put_page((void *)kvm_pte_follow(*ptep, mm_ops));
 	return 0;
 }
 
@@ -422,10 +431,11 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_free_walker,
 		.flags	= KVM_PGTABLE_WALK_TABLE_POST,
+		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-	put_page(virt_to_page(pgt->pgd));
+	pgt->mm_ops->put_page(pgt->pgd);
 	pgt->pgd = NULL;
 }
 
@@ -437,6 +447,8 @@ struct stage2_map_data {
 
 	struct kvm_s2_mmu		*mmu;
 	struct kvm_mmu_memory_cache	*memcache;
+
+	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
 static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
@@ -470,7 +482,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 {
 	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
-	struct page *page = virt_to_page(ptep);
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
 	if (!kvm_block_mapping_supported(addr, end, phys, level))
 		return -E2BIG;
@@ -492,11 +504,11 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 		 */
 		kvm_set_invalid_pte(ptep);
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
-		put_page(page);
+		mm_ops->put_page(ptep);
 	}
 
 	smp_store_release(ptep, new);
-	get_page(page);
+	mm_ops->get_page(ptep);
 	data->phys += granule;
 	return 0;
 }
@@ -526,13 +538,13 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 				struct stage2_map_data *data)
 {
-	int ret;
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 	kvm_pte_t *childp, pte = *ptep;
-	struct page *page = virt_to_page(ptep);
+	int ret;
 
 	if (data->anchor) {
 		if (kvm_pte_valid(pte))
-			put_page(page);
+			mm_ops->put_page(ptep);
 
 		return 0;
 	}
@@ -547,7 +559,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (!data->memcache)
 		return -ENOMEM;
 
-	childp = kvm_mmu_memory_cache_alloc(data->memcache);
+	childp = mm_ops->zalloc_page(data->memcache);
 	if (!childp)
 		return -ENOMEM;
 
@@ -559,11 +571,11 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (kvm_pte_valid(pte)) {
 		kvm_set_invalid_pte(ptep);
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
-		put_page(page);
+		mm_ops->put_page(ptep);
 	}
 
-	kvm_set_table_pte(ptep, childp);
-	get_page(page);
+	kvm_set_table_pte(ptep, childp, mm_ops);
+	mm_ops->get_page(ptep);
 
 	return 0;
 }
@@ -572,13 +584,14 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 	int ret = 0;
 
 	if (!data->anchor)
 		return 0;
 
-	put_page(virt_to_page(kvm_pte_follow(*ptep)));
-	put_page(virt_to_page(ptep));
+	mm_ops->put_page(kvm_pte_follow(*ptep, mm_ops));
+	mm_ops->put_page(ptep);
 
 	if (data->anchor == ptep) {
 		data->anchor = NULL;
@@ -633,6 +646,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
 		.mmu		= pgt->mmu,
 		.memcache	= mc,
+		.mm_ops		= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
@@ -669,7 +683,9 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
-	struct kvm_s2_mmu *mmu = arg;
+	struct kvm_pgtable *pgt = arg;
+	struct kvm_s2_mmu *mmu = pgt->mmu;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep, *childp = NULL;
 	bool need_flush = false;
 
@@ -677,9 +693,9 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		return 0;
 
 	if (kvm_pte_table(pte, level)) {
-		childp = kvm_pte_follow(pte);
+		childp = kvm_pte_follow(pte, mm_ops);
 
-		if (page_count(virt_to_page(childp)) != 1)
+		if (mm_ops->page_count(childp) != 1)
 			return 0;
 	} else if (stage2_pte_cacheable(pte)) {
 		need_flush = true;
@@ -692,15 +708,15 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 */
 	kvm_set_invalid_pte(ptep);
 	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
-	put_page(virt_to_page(ptep));
+	mm_ops->put_page(ptep);
 
 	if (need_flush) {
-		stage2_flush_dcache(kvm_pte_follow(pte),
+		stage2_flush_dcache(kvm_pte_follow(pte, mm_ops),
 				    kvm_granule_size(level));
 	}
 
 	if (childp)
-		put_page(virt_to_page(childp));
+		mm_ops->put_page(childp);
 
 	return 0;
 }
@@ -709,7 +725,7 @@ int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_unmap_walker,
-		.arg	= pgt->mmu,
+		.arg	= pgt,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
 	};
 
@@ -841,12 +857,13 @@ static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	kvm_pte_t pte = *ptep;
 
 	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pte))
 		return 0;
 
-	stage2_flush_dcache(kvm_pte_follow(pte), kvm_granule_size(level));
+	stage2_flush_dcache(kvm_pte_follow(pte, mm_ops), kvm_granule_size(level));
 	return 0;
 }
 
@@ -855,6 +872,7 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_flush_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF,
+		.arg	= pgt->mm_ops,
 	};
 
 	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
@@ -863,7 +881,8 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 	return kvm_pgtable_walk(pgt, addr, size, &walker);
 }
 
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+			    struct kvm_pgtable_mm_ops *mm_ops)
 {
 	size_t pgd_sz;
 	u64 vtcr = kvm->arch.vtcr;
@@ -872,12 +891,13 @@ int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
 	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
-	pgt->pgd = alloc_pages_exact(pgd_sz, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
 	pgt->ia_bits		= ia_bits;
 	pgt->start_level	= start_level;
+	pgt->mm_ops		= mm_ops;
 	pgt->mmu		= &kvm->arch.mmu;
 
 	/* Ensure zeroed PGD pages are visible to the hardware walker */
@@ -889,15 +909,16 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	kvm_pte_t pte = *ptep;
 
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	put_page(virt_to_page(ptep));
+	mm_ops->put_page(ptep);
 
 	if (kvm_pte_table(pte, level))
-		put_page(virt_to_page(kvm_pte_follow(pte)));
+		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
 
 	return 0;
 }
@@ -909,10 +930,11 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 		.cb	= stage2_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF |
 			  KVM_PGTABLE_WALK_TABLE_POST,
+		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
 	pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
-	free_pages_exact(pgt->pgd, pgd_sz);
+	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
 	pgt->pgd = NULL;
 }
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 77cb2d28f2a4..4d41d7838d53 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -88,6 +88,44 @@ static bool kvm_is_device_pfn(unsigned long pfn)
 	return !pfn_valid(pfn);
 }
 
+static void *stage2_memcache_zalloc_page(void *arg)
+{
+	struct kvm_mmu_memory_cache *mc = arg;
+
+	/* Allocated with __GFP_ZERO, so no need to zero */
+	return kvm_mmu_memory_cache_alloc(mc);
+}
+
+static void *kvm_host_zalloc_pages_exact(size_t size)
+{
+	return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+}
+
+static void kvm_host_get_page(void *addr)
+{
+	get_page(virt_to_page(addr));
+}
+
+static void kvm_host_put_page(void *addr)
+{
+	put_page(virt_to_page(addr));
+}
+
+static int kvm_host_page_count(void *addr)
+{
+	return page_count(virt_to_page(addr));
+}
+
+static phys_addr_t kvm_host_pa(void *addr)
+{
+	return __pa(addr);
+}
+
+static void *kvm_host_va(phys_addr_t phys)
+{
+	return __va(phys);
+}
+
 /*
  * Unmapping vs dcache management:
  *
@@ -351,6 +389,17 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 	return 0;
 }
 
+static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
+	.zalloc_page		= stage2_memcache_zalloc_page,
+	.zalloc_pages_exact	= kvm_host_zalloc_pages_exact,
+	.free_pages_exact	= free_pages_exact,
+	.get_page		= kvm_host_get_page,
+	.put_page		= kvm_host_put_page,
+	.page_count		= kvm_host_page_count,
+	.phys_to_virt		= kvm_host_va,
+	.virt_to_phys		= kvm_host_pa,
+};
+
 /**
  * kvm_init_stage2_mmu - Initialise a S2 MMU strucrure
  * @kvm:	The pointer to the KVM structure
@@ -374,7 +423,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	if (!pgt)
 		return -ENOMEM;
 
-	err = kvm_pgtable_stage2_init(pgt, kvm);
+	err = kvm_pgtable_stage2_init(pgt, kvm, &kvm_s2_mm_ops);
 	if (err)
 		goto out_free_pgtable;
 
@@ -1208,6 +1257,19 @@ static int kvm_map_idmap_text(void)
 	return err;
 }
 
+static void *kvm_hyp_zalloc_page(void *arg)
+{
+	return (void *)get_zeroed_page(GFP_KERNEL);
+}
+
+static struct kvm_pgtable_mm_ops kvm_hyp_mm_ops = {
+	.zalloc_page		= kvm_hyp_zalloc_page,
+	.get_page		= kvm_host_get_page,
+	.put_page		= kvm_host_put_page,
+	.phys_to_virt		= kvm_host_va,
+	.virt_to_phys		= kvm_host_pa,
+};
+
 int kvm_mmu_init(void)
 {
 	int err;
@@ -1251,7 +1313,7 @@ int kvm_mmu_init(void)
 		goto out;
 	}
 
-	err = kvm_pgtable_hyp_init(hyp_pgtable, hyp_va_bits);
+	err = kvm_pgtable_hyp_init(hyp_pgtable, hyp_va_bits, &kvm_hyp_mm_ops);
 	if (err)
 		goto out_free_pgtable;
 
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 07/32] KVM: arm64: Introduce a BSS section for use at Hyp
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Currently, the hyp code cannot make full use of a bss, as the kernel
section is mapped read-only.

While this mapping could simply be changed to read-write, it would
intermingle even more the hyp and kernel state than they currently are.
Instead, introduce a __hyp_bss section, that uses reserved pages, and
create the appropriate RW hyp mappings during KVM init.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/sections.h |  1 +
 arch/arm64/kernel/vmlinux.lds.S   | 52 ++++++++++++++++++++-----------
 arch/arm64/kvm/arm.c              | 14 ++++++++-
 arch/arm64/kvm/hyp/nvhe/hyp.lds.S |  1 +
 4 files changed, 49 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index 2f36b16a5b5d..e4ad9db53af1 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -13,6 +13,7 @@ extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 extern char __hyp_text_start[], __hyp_text_end[];
 extern char __hyp_rodata_start[], __hyp_rodata_end[];
 extern char __hyp_reloc_begin[], __hyp_reloc_end[];
+extern char __hyp_bss_start[], __hyp_bss_end[];
 extern char __idmap_text_start[], __idmap_text_end[];
 extern char __initdata_begin[], __initdata_end[];
 extern char __inittext_begin[], __inittext_end[];
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 7eea7888bb02..e96173ce211b 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -5,24 +5,7 @@
  * Written by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
  */
 
-#define RO_EXCEPTION_TABLE_ALIGN	8
-#define RUNTIME_DISCARD_EXIT
-
-#include <asm-generic/vmlinux.lds.h>
-#include <asm/cache.h>
 #include <asm/hyp_image.h>
-#include <asm/kernel-pgtable.h>
-#include <asm/memory.h>
-#include <asm/page.h>
-
-#include "image.h"
-
-OUTPUT_ARCH(aarch64)
-ENTRY(_text)
-
-jiffies = jiffies_64;
-
-
 #ifdef CONFIG_KVM
 #define HYPERVISOR_EXTABLE					\
 	. = ALIGN(SZ_8);					\
@@ -51,13 +34,43 @@ jiffies = jiffies_64;
 		__hyp_reloc_end = .;				\
 	}
 
+#define BSS_FIRST_SECTIONS					\
+	__hyp_bss_start = .;					\
+	*(HYP_SECTION_NAME(.bss))				\
+	. = ALIGN(PAGE_SIZE);					\
+	__hyp_bss_end = .;
+
+/*
+ * We require that __hyp_bss_start and __bss_start are aligned, and enforce it
+ * with an assertion. But the BSS_SECTION macro places an empty .sbss section
+ * between them, which can in some cases cause the linker to misalign them. To
+ * work around the issue, force a page alignment for __bss_start.
+ */
+#define SBSS_ALIGN			PAGE_SIZE
 #else /* CONFIG_KVM */
 #define HYPERVISOR_EXTABLE
 #define HYPERVISOR_DATA_SECTIONS
 #define HYPERVISOR_PERCPU_SECTION
 #define HYPERVISOR_RELOC_SECTION
+#define SBSS_ALIGN			0
 #endif
 
+#define RO_EXCEPTION_TABLE_ALIGN	8
+#define RUNTIME_DISCARD_EXIT
+
+#include <asm-generic/vmlinux.lds.h>
+#include <asm/cache.h>
+#include <asm/kernel-pgtable.h>
+#include <asm/memory.h>
+#include <asm/page.h>
+
+#include "image.h"
+
+OUTPUT_ARCH(aarch64)
+ENTRY(_text)
+
+jiffies = jiffies_64;
+
 #define HYPERVISOR_TEXT					\
 	/*						\
 	 * Align to 4 KB so that			\
@@ -276,7 +289,7 @@ SECTIONS
 	__pecoff_data_rawsize = ABSOLUTE(. - __initdata_begin);
 	_edata = .;
 
-	BSS_SECTION(0, 0, 0)
+	BSS_SECTION(SBSS_ALIGN, 0, 0)
 
 	. = ALIGN(PAGE_SIZE);
 	init_pg_dir = .;
@@ -324,6 +337,9 @@ ASSERT(__hibernate_exit_text_end - (__hibernate_exit_text_start & ~(SZ_4K - 1))
 ASSERT((__entry_tramp_text_end - __entry_tramp_text_start) == PAGE_SIZE,
 	"Entry trampoline text too big")
 #endif
+#ifdef CONFIG_KVM
+ASSERT(__hyp_bss_start == __bss_start, "HYP and Host BSS are misaligned")
+#endif
 /*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
  */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2d1e7ef69c04..3f8bcf8db036 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1770,7 +1770,19 @@ static int init_hyp_mode(void)
 		goto out_err;
 	}
 
-	err = create_hyp_mappings(kvm_ksym_ref(__bss_start),
+	/*
+	 * .hyp.bss is guaranteed to be placed at the beginning of the .bss
+	 * section thanks to an assertion in the linker script. Map it RW and
+	 * the rest of .bss RO.
+	 */
+	err = create_hyp_mappings(kvm_ksym_ref(__hyp_bss_start),
+				  kvm_ksym_ref(__hyp_bss_end), PAGE_HYP);
+	if (err) {
+		kvm_err("Cannot map hyp bss section: %d\n", err);
+		goto out_err;
+	}
+
+	err = create_hyp_mappings(kvm_ksym_ref(__hyp_bss_end),
 				  kvm_ksym_ref(__bss_stop), PAGE_HYP_RO);
 	if (err) {
 		kvm_err("Cannot map bss section\n");
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
index cd119d82d8e3..f4562f417d3f 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
@@ -25,4 +25,5 @@ SECTIONS {
 	BEGIN_HYP_SECTION(.data..percpu)
 		PERCPU_INPUT(L1_CACHE_BYTES)
 	END_HYP_SECTION
+	HYP_SECTION(.bss)
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 07/32] KVM: arm64: Introduce a BSS section for use at Hyp
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

Currently, the hyp code cannot make full use of a bss, as the kernel
section is mapped read-only.

While this mapping could simply be changed to read-write, it would
intermingle even more the hyp and kernel state than they currently are.
Instead, introduce a __hyp_bss section, that uses reserved pages, and
create the appropriate RW hyp mappings during KVM init.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/sections.h |  1 +
 arch/arm64/kernel/vmlinux.lds.S   | 52 ++++++++++++++++++++-----------
 arch/arm64/kvm/arm.c              | 14 ++++++++-
 arch/arm64/kvm/hyp/nvhe/hyp.lds.S |  1 +
 4 files changed, 49 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index 2f36b16a5b5d..e4ad9db53af1 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -13,6 +13,7 @@ extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 extern char __hyp_text_start[], __hyp_text_end[];
 extern char __hyp_rodata_start[], __hyp_rodata_end[];
 extern char __hyp_reloc_begin[], __hyp_reloc_end[];
+extern char __hyp_bss_start[], __hyp_bss_end[];
 extern char __idmap_text_start[], __idmap_text_end[];
 extern char __initdata_begin[], __initdata_end[];
 extern char __inittext_begin[], __inittext_end[];
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 7eea7888bb02..e96173ce211b 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -5,24 +5,7 @@
  * Written by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
  */
 
-#define RO_EXCEPTION_TABLE_ALIGN	8
-#define RUNTIME_DISCARD_EXIT
-
-#include <asm-generic/vmlinux.lds.h>
-#include <asm/cache.h>
 #include <asm/hyp_image.h>
-#include <asm/kernel-pgtable.h>
-#include <asm/memory.h>
-#include <asm/page.h>
-
-#include "image.h"
-
-OUTPUT_ARCH(aarch64)
-ENTRY(_text)
-
-jiffies = jiffies_64;
-
-
 #ifdef CONFIG_KVM
 #define HYPERVISOR_EXTABLE					\
 	. = ALIGN(SZ_8);					\
@@ -51,13 +34,43 @@ jiffies = jiffies_64;
 		__hyp_reloc_end = .;				\
 	}
 
+#define BSS_FIRST_SECTIONS					\
+	__hyp_bss_start = .;					\
+	*(HYP_SECTION_NAME(.bss))				\
+	. = ALIGN(PAGE_SIZE);					\
+	__hyp_bss_end = .;
+
+/*
+ * We require that __hyp_bss_start and __bss_start are aligned, and enforce it
+ * with an assertion. But the BSS_SECTION macro places an empty .sbss section
+ * between them, which can in some cases cause the linker to misalign them. To
+ * work around the issue, force a page alignment for __bss_start.
+ */
+#define SBSS_ALIGN			PAGE_SIZE
 #else /* CONFIG_KVM */
 #define HYPERVISOR_EXTABLE
 #define HYPERVISOR_DATA_SECTIONS
 #define HYPERVISOR_PERCPU_SECTION
 #define HYPERVISOR_RELOC_SECTION
+#define SBSS_ALIGN			0
 #endif
 
+#define RO_EXCEPTION_TABLE_ALIGN	8
+#define RUNTIME_DISCARD_EXIT
+
+#include <asm-generic/vmlinux.lds.h>
+#include <asm/cache.h>
+#include <asm/kernel-pgtable.h>
+#include <asm/memory.h>
+#include <asm/page.h>
+
+#include "image.h"
+
+OUTPUT_ARCH(aarch64)
+ENTRY(_text)
+
+jiffies = jiffies_64;
+
 #define HYPERVISOR_TEXT					\
 	/*						\
 	 * Align to 4 KB so that			\
@@ -276,7 +289,7 @@ SECTIONS
 	__pecoff_data_rawsize = ABSOLUTE(. - __initdata_begin);
 	_edata = .;
 
-	BSS_SECTION(0, 0, 0)
+	BSS_SECTION(SBSS_ALIGN, 0, 0)
 
 	. = ALIGN(PAGE_SIZE);
 	init_pg_dir = .;
@@ -324,6 +337,9 @@ ASSERT(__hibernate_exit_text_end - (__hibernate_exit_text_start & ~(SZ_4K - 1))
 ASSERT((__entry_tramp_text_end - __entry_tramp_text_start) == PAGE_SIZE,
 	"Entry trampoline text too big")
 #endif
+#ifdef CONFIG_KVM
+ASSERT(__hyp_bss_start == __bss_start, "HYP and Host BSS are misaligned")
+#endif
 /*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
  */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2d1e7ef69c04..3f8bcf8db036 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1770,7 +1770,19 @@ static int init_hyp_mode(void)
 		goto out_err;
 	}
 
-	err = create_hyp_mappings(kvm_ksym_ref(__bss_start),
+	/*
+	 * .hyp.bss is guaranteed to be placed at the beginning of the .bss
+	 * section thanks to an assertion in the linker script. Map it RW and
+	 * the rest of .bss RO.
+	 */
+	err = create_hyp_mappings(kvm_ksym_ref(__hyp_bss_start),
+				  kvm_ksym_ref(__hyp_bss_end), PAGE_HYP);
+	if (err) {
+		kvm_err("Cannot map hyp bss section: %d\n", err);
+		goto out_err;
+	}
+
+	err = create_hyp_mappings(kvm_ksym_ref(__hyp_bss_end),
 				  kvm_ksym_ref(__bss_stop), PAGE_HYP_RO);
 	if (err) {
 		kvm_err("Cannot map bss section\n");
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
index cd119d82d8e3..f4562f417d3f 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
@@ -25,4 +25,5 @@ SECTIONS {
 	BEGIN_HYP_SECTION(.data..percpu)
 		PERCPU_INPUT(L1_CACHE_BYTES)
 	END_HYP_SECTION
+	HYP_SECTION(.bss)
 }
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 07/32] KVM: arm64: Introduce a BSS section for use at Hyp
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Currently, the hyp code cannot make full use of a bss, as the kernel
section is mapped read-only.

While this mapping could simply be changed to read-write, it would
intermingle even more the hyp and kernel state than they currently are.
Instead, introduce a __hyp_bss section, that uses reserved pages, and
create the appropriate RW hyp mappings during KVM init.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/sections.h |  1 +
 arch/arm64/kernel/vmlinux.lds.S   | 52 ++++++++++++++++++++-----------
 arch/arm64/kvm/arm.c              | 14 ++++++++-
 arch/arm64/kvm/hyp/nvhe/hyp.lds.S |  1 +
 4 files changed, 49 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index 2f36b16a5b5d..e4ad9db53af1 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -13,6 +13,7 @@ extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 extern char __hyp_text_start[], __hyp_text_end[];
 extern char __hyp_rodata_start[], __hyp_rodata_end[];
 extern char __hyp_reloc_begin[], __hyp_reloc_end[];
+extern char __hyp_bss_start[], __hyp_bss_end[];
 extern char __idmap_text_start[], __idmap_text_end[];
 extern char __initdata_begin[], __initdata_end[];
 extern char __inittext_begin[], __inittext_end[];
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 7eea7888bb02..e96173ce211b 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -5,24 +5,7 @@
  * Written by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
  */
 
-#define RO_EXCEPTION_TABLE_ALIGN	8
-#define RUNTIME_DISCARD_EXIT
-
-#include <asm-generic/vmlinux.lds.h>
-#include <asm/cache.h>
 #include <asm/hyp_image.h>
-#include <asm/kernel-pgtable.h>
-#include <asm/memory.h>
-#include <asm/page.h>
-
-#include "image.h"
-
-OUTPUT_ARCH(aarch64)
-ENTRY(_text)
-
-jiffies = jiffies_64;
-
-
 #ifdef CONFIG_KVM
 #define HYPERVISOR_EXTABLE					\
 	. = ALIGN(SZ_8);					\
@@ -51,13 +34,43 @@ jiffies = jiffies_64;
 		__hyp_reloc_end = .;				\
 	}
 
+#define BSS_FIRST_SECTIONS					\
+	__hyp_bss_start = .;					\
+	*(HYP_SECTION_NAME(.bss))				\
+	. = ALIGN(PAGE_SIZE);					\
+	__hyp_bss_end = .;
+
+/*
+ * We require that __hyp_bss_start and __bss_start are aligned, and enforce it
+ * with an assertion. But the BSS_SECTION macro places an empty .sbss section
+ * between them, which can in some cases cause the linker to misalign them. To
+ * work around the issue, force a page alignment for __bss_start.
+ */
+#define SBSS_ALIGN			PAGE_SIZE
 #else /* CONFIG_KVM */
 #define HYPERVISOR_EXTABLE
 #define HYPERVISOR_DATA_SECTIONS
 #define HYPERVISOR_PERCPU_SECTION
 #define HYPERVISOR_RELOC_SECTION
+#define SBSS_ALIGN			0
 #endif
 
+#define RO_EXCEPTION_TABLE_ALIGN	8
+#define RUNTIME_DISCARD_EXIT
+
+#include <asm-generic/vmlinux.lds.h>
+#include <asm/cache.h>
+#include <asm/kernel-pgtable.h>
+#include <asm/memory.h>
+#include <asm/page.h>
+
+#include "image.h"
+
+OUTPUT_ARCH(aarch64)
+ENTRY(_text)
+
+jiffies = jiffies_64;
+
 #define HYPERVISOR_TEXT					\
 	/*						\
 	 * Align to 4 KB so that			\
@@ -276,7 +289,7 @@ SECTIONS
 	__pecoff_data_rawsize = ABSOLUTE(. - __initdata_begin);
 	_edata = .;
 
-	BSS_SECTION(0, 0, 0)
+	BSS_SECTION(SBSS_ALIGN, 0, 0)
 
 	. = ALIGN(PAGE_SIZE);
 	init_pg_dir = .;
@@ -324,6 +337,9 @@ ASSERT(__hibernate_exit_text_end - (__hibernate_exit_text_start & ~(SZ_4K - 1))
 ASSERT((__entry_tramp_text_end - __entry_tramp_text_start) == PAGE_SIZE,
 	"Entry trampoline text too big")
 #endif
+#ifdef CONFIG_KVM
+ASSERT(__hyp_bss_start == __bss_start, "HYP and Host BSS are misaligned")
+#endif
 /*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
  */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2d1e7ef69c04..3f8bcf8db036 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1770,7 +1770,19 @@ static int init_hyp_mode(void)
 		goto out_err;
 	}
 
-	err = create_hyp_mappings(kvm_ksym_ref(__bss_start),
+	/*
+	 * .hyp.bss is guaranteed to be placed at the beginning of the .bss
+	 * section thanks to an assertion in the linker script. Map it RW and
+	 * the rest of .bss RO.
+	 */
+	err = create_hyp_mappings(kvm_ksym_ref(__hyp_bss_start),
+				  kvm_ksym_ref(__hyp_bss_end), PAGE_HYP);
+	if (err) {
+		kvm_err("Cannot map hyp bss section: %d\n", err);
+		goto out_err;
+	}
+
+	err = create_hyp_mappings(kvm_ksym_ref(__hyp_bss_end),
 				  kvm_ksym_ref(__bss_stop), PAGE_HYP_RO);
 	if (err) {
 		kvm_err("Cannot map bss section\n");
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
index cd119d82d8e3..f4562f417d3f 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
@@ -25,4 +25,5 @@ SECTIONS {
 	BEGIN_HYP_SECTION(.data..percpu)
 		PERCPU_INPUT(L1_CACHE_BYTES)
 	END_HYP_SECTION
+	HYP_SECTION(.bss)
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 08/32] KVM: arm64: Make kvm_call_hyp() a function call at Hyp
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

kvm_call_hyp() has some logic to issue a function call or a hypercall
depending on the EL at which the kernel is running. However, all the
code compiled under __KVM_NVHE_HYPERVISOR__ is guaranteed to only run
at EL2 which allows us to simplify.

Add ifdefery to kvm_host.h to simplify kvm_call_hyp() in .hyp.text.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3d10e6527f7d..06ca4828005f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -591,6 +591,7 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 void kvm_arm_halt_guest(struct kvm *kvm);
 void kvm_arm_resume_guest(struct kvm *kvm);
 
+#ifndef __KVM_NVHE_HYPERVISOR__
 #define kvm_call_hyp_nvhe(f, ...)						\
 	({								\
 		struct arm_smccc_res res;				\
@@ -630,6 +631,11 @@ void kvm_arm_resume_guest(struct kvm *kvm);
 									\
 		ret;							\
 	})
+#else /* __KVM_NVHE_HYPERVISOR__ */
+#define kvm_call_hyp(f, ...) f(__VA_ARGS__)
+#define kvm_call_hyp_ret(f, ...) f(__VA_ARGS__)
+#define kvm_call_hyp_nvhe(f, ...) f(__VA_ARGS__)
+#endif /* __KVM_NVHE_HYPERVISOR__ */
 
 void force_vm_exit(const cpumask_t *mask);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 08/32] KVM: arm64: Make kvm_call_hyp() a function call at Hyp
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

kvm_call_hyp() has some logic to issue a function call or a hypercall
depending on the EL at which the kernel is running. However, all the
code compiled under __KVM_NVHE_HYPERVISOR__ is guaranteed to only run
at EL2 which allows us to simplify.

Add ifdefery to kvm_host.h to simplify kvm_call_hyp() in .hyp.text.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3d10e6527f7d..06ca4828005f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -591,6 +591,7 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 void kvm_arm_halt_guest(struct kvm *kvm);
 void kvm_arm_resume_guest(struct kvm *kvm);
 
+#ifndef __KVM_NVHE_HYPERVISOR__
 #define kvm_call_hyp_nvhe(f, ...)						\
 	({								\
 		struct arm_smccc_res res;				\
@@ -630,6 +631,11 @@ void kvm_arm_resume_guest(struct kvm *kvm);
 									\
 		ret;							\
 	})
+#else /* __KVM_NVHE_HYPERVISOR__ */
+#define kvm_call_hyp(f, ...) f(__VA_ARGS__)
+#define kvm_call_hyp_ret(f, ...) f(__VA_ARGS__)
+#define kvm_call_hyp_nvhe(f, ...) f(__VA_ARGS__)
+#endif /* __KVM_NVHE_HYPERVISOR__ */
 
 void force_vm_exit(const cpumask_t *mask);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 08/32] KVM: arm64: Make kvm_call_hyp() a function call at Hyp
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

kvm_call_hyp() has some logic to issue a function call or a hypercall
depending on the EL at which the kernel is running. However, all the
code compiled under __KVM_NVHE_HYPERVISOR__ is guaranteed to only run
at EL2 which allows us to simplify.

Add ifdefery to kvm_host.h to simplify kvm_call_hyp() in .hyp.text.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3d10e6527f7d..06ca4828005f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -591,6 +591,7 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 void kvm_arm_halt_guest(struct kvm *kvm);
 void kvm_arm_resume_guest(struct kvm *kvm);
 
+#ifndef __KVM_NVHE_HYPERVISOR__
 #define kvm_call_hyp_nvhe(f, ...)						\
 	({								\
 		struct arm_smccc_res res;				\
@@ -630,6 +631,11 @@ void kvm_arm_resume_guest(struct kvm *kvm);
 									\
 		ret;							\
 	})
+#else /* __KVM_NVHE_HYPERVISOR__ */
+#define kvm_call_hyp(f, ...) f(__VA_ARGS__)
+#define kvm_call_hyp_ret(f, ...) f(__VA_ARGS__)
+#define kvm_call_hyp_nvhe(f, ...) f(__VA_ARGS__)
+#endif /* __KVM_NVHE_HYPERVISOR__ */
 
 void force_vm_exit(const cpumask_t *mask);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 09/32] KVM: arm64: Allow using kvm_nvhe_sym() in hyp code
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In order to allow the usage of code shared by the host and the hyp in
static inline library functions, allow the usage of kvm_nvhe_sym() at
EL2 by defaulting to the raw symbol name.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/hyp_image.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/hyp_image.h b/arch/arm64/include/asm/hyp_image.h
index 78cd77990c9c..b4b3076a76fb 100644
--- a/arch/arm64/include/asm/hyp_image.h
+++ b/arch/arm64/include/asm/hyp_image.h
@@ -10,11 +10,15 @@
 #define __HYP_CONCAT(a, b)	a ## b
 #define HYP_CONCAT(a, b)	__HYP_CONCAT(a, b)
 
+#ifndef __KVM_NVHE_HYPERVISOR__
 /*
  * KVM nVHE code has its own symbol namespace prefixed with __kvm_nvhe_,
  * to separate it from the kernel proper.
  */
 #define kvm_nvhe_sym(sym)	__kvm_nvhe_##sym
+#else
+#define kvm_nvhe_sym(sym)	sym
+#endif
 
 #ifdef LINKER_SCRIPT
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 09/32] KVM: arm64: Allow using kvm_nvhe_sym() in hyp code
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

In order to allow the usage of code shared by the host and the hyp in
static inline library functions, allow the usage of kvm_nvhe_sym() at
EL2 by defaulting to the raw symbol name.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/hyp_image.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/hyp_image.h b/arch/arm64/include/asm/hyp_image.h
index 78cd77990c9c..b4b3076a76fb 100644
--- a/arch/arm64/include/asm/hyp_image.h
+++ b/arch/arm64/include/asm/hyp_image.h
@@ -10,11 +10,15 @@
 #define __HYP_CONCAT(a, b)	a ## b
 #define HYP_CONCAT(a, b)	__HYP_CONCAT(a, b)
 
+#ifndef __KVM_NVHE_HYPERVISOR__
 /*
  * KVM nVHE code has its own symbol namespace prefixed with __kvm_nvhe_,
  * to separate it from the kernel proper.
  */
 #define kvm_nvhe_sym(sym)	__kvm_nvhe_##sym
+#else
+#define kvm_nvhe_sym(sym)	sym
+#endif
 
 #ifdef LINKER_SCRIPT
 
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 09/32] KVM: arm64: Allow using kvm_nvhe_sym() in hyp code
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In order to allow the usage of code shared by the host and the hyp in
static inline library functions, allow the usage of kvm_nvhe_sym() at
EL2 by defaulting to the raw symbol name.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/hyp_image.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/hyp_image.h b/arch/arm64/include/asm/hyp_image.h
index 78cd77990c9c..b4b3076a76fb 100644
--- a/arch/arm64/include/asm/hyp_image.h
+++ b/arch/arm64/include/asm/hyp_image.h
@@ -10,11 +10,15 @@
 #define __HYP_CONCAT(a, b)	a ## b
 #define HYP_CONCAT(a, b)	__HYP_CONCAT(a, b)
 
+#ifndef __KVM_NVHE_HYPERVISOR__
 /*
  * KVM nVHE code has its own symbol namespace prefixed with __kvm_nvhe_,
  * to separate it from the kernel proper.
  */
 #define kvm_nvhe_sym(sym)	__kvm_nvhe_##sym
+#else
+#define kvm_nvhe_sym(sym)	sym
+#endif
 
 #ifdef LINKER_SCRIPT
 
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 10/32] KVM: arm64: Introduce an early Hyp page allocator
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

With nVHE, the host currently creates all stage 1 hypervisor mappings at
EL1 during boot, installs them at EL2, and extends them as required
(e.g. when creating a new VM). But in a world where the host is no
longer trusted, it cannot have full control over the code mapped in the
hypervisor.

In preparation for enabling the hypervisor to create its own stage 1
mappings during boot, introduce an early page allocator, with minimal
functionality. This allocator is designed to be used only during early
bootstrap of the hyp code when memory protection is enabled, which will
then switch to using a full-fledged page allocator after init.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h | 14 +++++
 arch/arm64/kvm/hyp/include/nvhe/memory.h      | 24 +++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile              |  2 +-
 arch/arm64/kvm/hyp/nvhe/early_alloc.c         | 54 +++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/psci-relay.c          |  4 +-
 5 files changed, 94 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/memory.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/early_alloc.c

diff --git a/arch/arm64/kvm/hyp/include/nvhe/early_alloc.h b/arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
new file mode 100644
index 000000000000..dc61aaa56f31
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_EARLY_ALLOC_H
+#define __KVM_HYP_EARLY_ALLOC_H
+
+#include <asm/kvm_pgtable.h>
+
+void hyp_early_alloc_init(void *virt, unsigned long size);
+unsigned long hyp_early_alloc_nr_used_pages(void);
+void *hyp_early_alloc_page(void *arg);
+void *hyp_early_alloc_contig(unsigned int nr_pages);
+
+extern struct kvm_pgtable_mm_ops hyp_early_alloc_mm_ops;
+
+#endif /* __KVM_HYP_EARLY_ALLOC_H */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
new file mode 100644
index 000000000000..3e49eaa7e682
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_MEMORY_H
+#define __KVM_HYP_MEMORY_H
+
+#include <asm/page.h>
+
+#include <linux/types.h>
+
+extern s64 hyp_physvirt_offset;
+
+#define __hyp_pa(virt)	((phys_addr_t)(virt) + hyp_physvirt_offset)
+#define __hyp_va(phys)	((void *)((phys_addr_t)(phys) - hyp_physvirt_offset))
+
+static inline void *hyp_phys_to_virt(phys_addr_t phys)
+{
+	return __hyp_va(phys);
+}
+
+static inline phys_addr_t hyp_virt_to_phys(void *addr)
+{
+	return __hyp_pa(addr);
+}
+
+#endif /* __KVM_HYP_MEMORY_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index bc98f8e3d1da..24ff99e2eac5 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -13,7 +13,7 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
 lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
-	 hyp-main.o hyp-smp.o psci-relay.o
+	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/early_alloc.c b/arch/arm64/kvm/hyp/nvhe/early_alloc.c
new file mode 100644
index 000000000000..1306c430ab87
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/early_alloc.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <asm/kvm_pgtable.h>
+
+#include <nvhe/early_alloc.h>
+#include <nvhe/memory.h>
+
+struct kvm_pgtable_mm_ops hyp_early_alloc_mm_ops;
+s64 __ro_after_init hyp_physvirt_offset;
+
+static unsigned long base;
+static unsigned long end;
+static unsigned long cur;
+
+unsigned long hyp_early_alloc_nr_used_pages(void)
+{
+	return (cur - base) >> PAGE_SHIFT;
+}
+
+void *hyp_early_alloc_contig(unsigned int nr_pages)
+{
+	unsigned long size = (nr_pages << PAGE_SHIFT);
+	void *ret = (void *)cur;
+
+	if (!nr_pages)
+		return NULL;
+
+	if (end - cur < size)
+		return NULL;
+
+	cur += size;
+	memset(ret, 0, size);
+
+	return ret;
+}
+
+void *hyp_early_alloc_page(void *arg)
+{
+	return hyp_early_alloc_contig(1);
+}
+
+void hyp_early_alloc_init(void *virt, unsigned long size)
+{
+	base = cur = (unsigned long)virt;
+	end = base + size;
+
+	hyp_early_alloc_mm_ops.zalloc_page = hyp_early_alloc_page;
+	hyp_early_alloc_mm_ops.phys_to_virt = hyp_phys_to_virt;
+	hyp_early_alloc_mm_ops.virt_to_phys = hyp_virt_to_phys;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
index 63de71c0481e..08508783ec3d 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
@@ -11,6 +11,7 @@
 #include <linux/kvm_host.h>
 #include <uapi/linux/psci.h>
 
+#include <nvhe/memory.h>
 #include <nvhe/trap_handler.h>
 
 void kvm_hyp_cpu_entry(unsigned long r0);
@@ -20,9 +21,6 @@ void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
 
 /* Config options set by the host. */
 struct kvm_host_psci_config __ro_after_init kvm_host_psci_config;
-s64 __ro_after_init hyp_physvirt_offset;
-
-#define __hyp_pa(x) ((phys_addr_t)((x)) + hyp_physvirt_offset)
 
 #define INVALID_CPU_ID	UINT_MAX
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 10/32] KVM: arm64: Introduce an early Hyp page allocator
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

With nVHE, the host currently creates all stage 1 hypervisor mappings at
EL1 during boot, installs them at EL2, and extends them as required
(e.g. when creating a new VM). But in a world where the host is no
longer trusted, it cannot have full control over the code mapped in the
hypervisor.

In preparation for enabling the hypervisor to create its own stage 1
mappings during boot, introduce an early page allocator, with minimal
functionality. This allocator is designed to be used only during early
bootstrap of the hyp code when memory protection is enabled, which will
then switch to using a full-fledged page allocator after init.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h | 14 +++++
 arch/arm64/kvm/hyp/include/nvhe/memory.h      | 24 +++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile              |  2 +-
 arch/arm64/kvm/hyp/nvhe/early_alloc.c         | 54 +++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/psci-relay.c          |  4 +-
 5 files changed, 94 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/memory.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/early_alloc.c

diff --git a/arch/arm64/kvm/hyp/include/nvhe/early_alloc.h b/arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
new file mode 100644
index 000000000000..dc61aaa56f31
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_EARLY_ALLOC_H
+#define __KVM_HYP_EARLY_ALLOC_H
+
+#include <asm/kvm_pgtable.h>
+
+void hyp_early_alloc_init(void *virt, unsigned long size);
+unsigned long hyp_early_alloc_nr_used_pages(void);
+void *hyp_early_alloc_page(void *arg);
+void *hyp_early_alloc_contig(unsigned int nr_pages);
+
+extern struct kvm_pgtable_mm_ops hyp_early_alloc_mm_ops;
+
+#endif /* __KVM_HYP_EARLY_ALLOC_H */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
new file mode 100644
index 000000000000..3e49eaa7e682
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_MEMORY_H
+#define __KVM_HYP_MEMORY_H
+
+#include <asm/page.h>
+
+#include <linux/types.h>
+
+extern s64 hyp_physvirt_offset;
+
+#define __hyp_pa(virt)	((phys_addr_t)(virt) + hyp_physvirt_offset)
+#define __hyp_va(phys)	((void *)((phys_addr_t)(phys) - hyp_physvirt_offset))
+
+static inline void *hyp_phys_to_virt(phys_addr_t phys)
+{
+	return __hyp_va(phys);
+}
+
+static inline phys_addr_t hyp_virt_to_phys(void *addr)
+{
+	return __hyp_pa(addr);
+}
+
+#endif /* __KVM_HYP_MEMORY_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index bc98f8e3d1da..24ff99e2eac5 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -13,7 +13,7 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
 lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
-	 hyp-main.o hyp-smp.o psci-relay.o
+	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/early_alloc.c b/arch/arm64/kvm/hyp/nvhe/early_alloc.c
new file mode 100644
index 000000000000..1306c430ab87
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/early_alloc.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <asm/kvm_pgtable.h>
+
+#include <nvhe/early_alloc.h>
+#include <nvhe/memory.h>
+
+struct kvm_pgtable_mm_ops hyp_early_alloc_mm_ops;
+s64 __ro_after_init hyp_physvirt_offset;
+
+static unsigned long base;
+static unsigned long end;
+static unsigned long cur;
+
+unsigned long hyp_early_alloc_nr_used_pages(void)
+{
+	return (cur - base) >> PAGE_SHIFT;
+}
+
+void *hyp_early_alloc_contig(unsigned int nr_pages)
+{
+	unsigned long size = (nr_pages << PAGE_SHIFT);
+	void *ret = (void *)cur;
+
+	if (!nr_pages)
+		return NULL;
+
+	if (end - cur < size)
+		return NULL;
+
+	cur += size;
+	memset(ret, 0, size);
+
+	return ret;
+}
+
+void *hyp_early_alloc_page(void *arg)
+{
+	return hyp_early_alloc_contig(1);
+}
+
+void hyp_early_alloc_init(void *virt, unsigned long size)
+{
+	base = cur = (unsigned long)virt;
+	end = base + size;
+
+	hyp_early_alloc_mm_ops.zalloc_page = hyp_early_alloc_page;
+	hyp_early_alloc_mm_ops.phys_to_virt = hyp_phys_to_virt;
+	hyp_early_alloc_mm_ops.virt_to_phys = hyp_virt_to_phys;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
index 63de71c0481e..08508783ec3d 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
@@ -11,6 +11,7 @@
 #include <linux/kvm_host.h>
 #include <uapi/linux/psci.h>
 
+#include <nvhe/memory.h>
 #include <nvhe/trap_handler.h>
 
 void kvm_hyp_cpu_entry(unsigned long r0);
@@ -20,9 +21,6 @@ void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
 
 /* Config options set by the host. */
 struct kvm_host_psci_config __ro_after_init kvm_host_psci_config;
-s64 __ro_after_init hyp_physvirt_offset;
-
-#define __hyp_pa(x) ((phys_addr_t)((x)) + hyp_physvirt_offset)
 
 #define INVALID_CPU_ID	UINT_MAX
 
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 10/32] KVM: arm64: Introduce an early Hyp page allocator
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

With nVHE, the host currently creates all stage 1 hypervisor mappings at
EL1 during boot, installs them at EL2, and extends them as required
(e.g. when creating a new VM). But in a world where the host is no
longer trusted, it cannot have full control over the code mapped in the
hypervisor.

In preparation for enabling the hypervisor to create its own stage 1
mappings during boot, introduce an early page allocator, with minimal
functionality. This allocator is designed to be used only during early
bootstrap of the hyp code when memory protection is enabled, which will
then switch to using a full-fledged page allocator after init.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h | 14 +++++
 arch/arm64/kvm/hyp/include/nvhe/memory.h      | 24 +++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile              |  2 +-
 arch/arm64/kvm/hyp/nvhe/early_alloc.c         | 54 +++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/psci-relay.c          |  4 +-
 5 files changed, 94 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/memory.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/early_alloc.c

diff --git a/arch/arm64/kvm/hyp/include/nvhe/early_alloc.h b/arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
new file mode 100644
index 000000000000..dc61aaa56f31
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_EARLY_ALLOC_H
+#define __KVM_HYP_EARLY_ALLOC_H
+
+#include <asm/kvm_pgtable.h>
+
+void hyp_early_alloc_init(void *virt, unsigned long size);
+unsigned long hyp_early_alloc_nr_used_pages(void);
+void *hyp_early_alloc_page(void *arg);
+void *hyp_early_alloc_contig(unsigned int nr_pages);
+
+extern struct kvm_pgtable_mm_ops hyp_early_alloc_mm_ops;
+
+#endif /* __KVM_HYP_EARLY_ALLOC_H */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
new file mode 100644
index 000000000000..3e49eaa7e682
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_MEMORY_H
+#define __KVM_HYP_MEMORY_H
+
+#include <asm/page.h>
+
+#include <linux/types.h>
+
+extern s64 hyp_physvirt_offset;
+
+#define __hyp_pa(virt)	((phys_addr_t)(virt) + hyp_physvirt_offset)
+#define __hyp_va(phys)	((void *)((phys_addr_t)(phys) - hyp_physvirt_offset))
+
+static inline void *hyp_phys_to_virt(phys_addr_t phys)
+{
+	return __hyp_va(phys);
+}
+
+static inline phys_addr_t hyp_virt_to_phys(void *addr)
+{
+	return __hyp_pa(addr);
+}
+
+#endif /* __KVM_HYP_MEMORY_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index bc98f8e3d1da..24ff99e2eac5 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -13,7 +13,7 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
 lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
-	 hyp-main.o hyp-smp.o psci-relay.o
+	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/early_alloc.c b/arch/arm64/kvm/hyp/nvhe/early_alloc.c
new file mode 100644
index 000000000000..1306c430ab87
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/early_alloc.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <asm/kvm_pgtable.h>
+
+#include <nvhe/early_alloc.h>
+#include <nvhe/memory.h>
+
+struct kvm_pgtable_mm_ops hyp_early_alloc_mm_ops;
+s64 __ro_after_init hyp_physvirt_offset;
+
+static unsigned long base;
+static unsigned long end;
+static unsigned long cur;
+
+unsigned long hyp_early_alloc_nr_used_pages(void)
+{
+	return (cur - base) >> PAGE_SHIFT;
+}
+
+void *hyp_early_alloc_contig(unsigned int nr_pages)
+{
+	unsigned long size = (nr_pages << PAGE_SHIFT);
+	void *ret = (void *)cur;
+
+	if (!nr_pages)
+		return NULL;
+
+	if (end - cur < size)
+		return NULL;
+
+	cur += size;
+	memset(ret, 0, size);
+
+	return ret;
+}
+
+void *hyp_early_alloc_page(void *arg)
+{
+	return hyp_early_alloc_contig(1);
+}
+
+void hyp_early_alloc_init(void *virt, unsigned long size)
+{
+	base = cur = (unsigned long)virt;
+	end = base + size;
+
+	hyp_early_alloc_mm_ops.zalloc_page = hyp_early_alloc_page;
+	hyp_early_alloc_mm_ops.phys_to_virt = hyp_phys_to_virt;
+	hyp_early_alloc_mm_ops.virt_to_phys = hyp_virt_to_phys;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
index 63de71c0481e..08508783ec3d 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
@@ -11,6 +11,7 @@
 #include <linux/kvm_host.h>
 #include <uapi/linux/psci.h>
 
+#include <nvhe/memory.h>
 #include <nvhe/trap_handler.h>
 
 void kvm_hyp_cpu_entry(unsigned long r0);
@@ -20,9 +21,6 @@ void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
 
 /* Config options set by the host. */
 struct kvm_host_psci_config __ro_after_init kvm_host_psci_config;
-s64 __ro_after_init hyp_physvirt_offset;
-
-#define __hyp_pa(x) ((phys_addr_t)((x)) + hyp_physvirt_offset)
 
 #define INVALID_CPU_ID	UINT_MAX
 
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 11/32] KVM: arm64: Stub CONFIG_DEBUG_LIST at Hyp
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In order to use the kernel list library at EL2, introduce stubs for the
CONFIG_DEBUG_LIST out-of-lines calls.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/nvhe/Makefile |  2 +-
 arch/arm64/kvm/hyp/nvhe/stub.c   | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/hyp/nvhe/stub.c

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 24ff99e2eac5..144da72ad510 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -13,7 +13,7 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
 lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
-	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o
+	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/stub.c b/arch/arm64/kvm/hyp/nvhe/stub.c
new file mode 100644
index 000000000000..c0aa6bbfd79d
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/stub.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Stubs for out-of-line function calls caused by re-using kernel
+ * infrastructure at EL2.
+ *
+ * Copyright (C) 2020 - Google LLC
+ */
+
+#include <linux/list.h>
+
+#ifdef CONFIG_DEBUG_LIST
+bool __list_add_valid(struct list_head *new, struct list_head *prev,
+		      struct list_head *next)
+{
+		return true;
+}
+
+bool __list_del_entry_valid(struct list_head *entry)
+{
+		return true;
+}
+#endif
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 11/32] KVM: arm64: Stub CONFIG_DEBUG_LIST at Hyp
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

In order to use the kernel list library at EL2, introduce stubs for the
CONFIG_DEBUG_LIST out-of-lines calls.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/nvhe/Makefile |  2 +-
 arch/arm64/kvm/hyp/nvhe/stub.c   | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/hyp/nvhe/stub.c

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 24ff99e2eac5..144da72ad510 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -13,7 +13,7 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
 lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
-	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o
+	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/stub.c b/arch/arm64/kvm/hyp/nvhe/stub.c
new file mode 100644
index 000000000000..c0aa6bbfd79d
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/stub.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Stubs for out-of-line function calls caused by re-using kernel
+ * infrastructure at EL2.
+ *
+ * Copyright (C) 2020 - Google LLC
+ */
+
+#include <linux/list.h>
+
+#ifdef CONFIG_DEBUG_LIST
+bool __list_add_valid(struct list_head *new, struct list_head *prev,
+		      struct list_head *next)
+{
+		return true;
+}
+
+bool __list_del_entry_valid(struct list_head *entry)
+{
+		return true;
+}
+#endif
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 11/32] KVM: arm64: Stub CONFIG_DEBUG_LIST at Hyp
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In order to use the kernel list library at EL2, introduce stubs for the
CONFIG_DEBUG_LIST out-of-lines calls.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/nvhe/Makefile |  2 +-
 arch/arm64/kvm/hyp/nvhe/stub.c   | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/hyp/nvhe/stub.c

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 24ff99e2eac5..144da72ad510 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -13,7 +13,7 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
 lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
-	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o
+	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/stub.c b/arch/arm64/kvm/hyp/nvhe/stub.c
new file mode 100644
index 000000000000..c0aa6bbfd79d
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/stub.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Stubs for out-of-line function calls caused by re-using kernel
+ * infrastructure at EL2.
+ *
+ * Copyright (C) 2020 - Google LLC
+ */
+
+#include <linux/list.h>
+
+#ifdef CONFIG_DEBUG_LIST
+bool __list_add_valid(struct list_head *new, struct list_head *prev,
+		      struct list_head *next)
+{
+		return true;
+}
+
+bool __list_del_entry_valid(struct list_head *entry)
+{
+		return true;
+}
+#endif
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 12/32] KVM: arm64: Introduce a Hyp buddy page allocator
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

When memory protection is enabled, the hyp code will require a basic
form of memory management in order to allocate and free memory pages at
EL2. This is needed for various use-cases, including the creation of hyp
mappings or the allocation of stage 2 page tables.

To address these use-case, introduce a simple memory allocator in the
hyp code. The allocator is designed as a conventional 'buddy allocator',
working with a page granularity. It allows to allocate and free
physically contiguous pages from memory 'pools', with a guaranteed order
alignment in the PA space. Each page in a memory pool is associated
with a struct hyp_page which holds the page's metadata, including its
refcount, as well as its current order, hence mimicking the kernel's
buddy system in the GFP infrastructure. The hyp_page metadata are made
accessible through a hyp_vmemmap, following the concept of
SPARSE_VMEMMAP in the kernel.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  55 +++++++
 arch/arm64/kvm/hyp/include/nvhe/memory.h |  28 ++++
 arch/arm64/kvm/hyp/nvhe/Makefile         |   2 +-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 195 +++++++++++++++++++++++
 4 files changed, 279 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/gfp.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/page_alloc.c

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
new file mode 100644
index 000000000000..d039086d86b5
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_GFP_H
+#define __KVM_HYP_GFP_H
+
+#include <linux/list.h>
+
+#include <nvhe/memory.h>
+#include <nvhe/spinlock.h>
+
+#define HYP_NO_ORDER	UINT_MAX
+
+struct hyp_pool {
+	/*
+	 * Spinlock protecting concurrent changes to the memory pool as well as
+	 * the struct hyp_page of the pool's pages until we have a proper atomic
+	 * API at EL2.
+	 */
+	hyp_spinlock_t lock;
+	struct list_head free_area[MAX_ORDER];
+	phys_addr_t range_start;
+	phys_addr_t range_end;
+	unsigned int max_order;
+};
+
+static inline void hyp_page_ref_inc(struct hyp_page *p)
+{
+	struct hyp_pool *pool = hyp_page_to_pool(p);
+
+	hyp_spin_lock(&pool->lock);
+	p->refcount++;
+	hyp_spin_unlock(&pool->lock);
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+	struct hyp_pool *pool = hyp_page_to_pool(p);
+	int ret;
+
+	hyp_spin_lock(&pool->lock);
+	p->refcount--;
+	ret = (p->refcount == 0);
+	hyp_spin_unlock(&pool->lock);
+
+	return ret;
+}
+
+/* Allocation */
+void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order);
+void hyp_get_page(void *addr);
+void hyp_put_page(void *addr);
+
+/* Used pages cannot be freed */
+int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
+		  unsigned int reserved_pages);
+#endif /* __KVM_HYP_GFP_H */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 3e49eaa7e682..d2fb307c5952 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -6,7 +6,17 @@
 
 #include <linux/types.h>
 
+struct hyp_pool;
+struct hyp_page {
+	unsigned int refcount;
+	unsigned int order;
+	struct hyp_pool *pool;
+	struct list_head node;
+};
+
 extern s64 hyp_physvirt_offset;
+extern u64 __hyp_vmemmap;
+#define hyp_vmemmap ((struct hyp_page *)__hyp_vmemmap)
 
 #define __hyp_pa(virt)	((phys_addr_t)(virt) + hyp_physvirt_offset)
 #define __hyp_va(phys)	((void *)((phys_addr_t)(phys) - hyp_physvirt_offset))
@@ -21,4 +31,22 @@ static inline phys_addr_t hyp_virt_to_phys(void *addr)
 	return __hyp_pa(addr);
 }
 
+#define hyp_phys_to_pfn(phys)	((phys) >> PAGE_SHIFT)
+#define hyp_pfn_to_phys(pfn)	((phys_addr_t)((pfn) << PAGE_SHIFT))
+#define hyp_phys_to_page(phys)	(&hyp_vmemmap[hyp_phys_to_pfn(phys)])
+#define hyp_virt_to_page(virt)	hyp_phys_to_page(__hyp_pa(virt))
+#define hyp_virt_to_pfn(virt)	hyp_phys_to_pfn(__hyp_pa(virt))
+
+#define hyp_page_to_pfn(page)	((struct hyp_page *)(page) - hyp_vmemmap)
+#define hyp_page_to_phys(page)  hyp_pfn_to_phys((hyp_page_to_pfn(page)))
+#define hyp_page_to_virt(page)	__hyp_va(hyp_page_to_phys(page))
+#define hyp_page_to_pool(page)	(((struct hyp_page *)page)->pool)
+
+static inline int hyp_page_count(void *addr)
+{
+	struct hyp_page *p = hyp_virt_to_page(addr);
+
+	return p->refcount;
+}
+
 #endif /* __KVM_HYP_MEMORY_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 144da72ad510..6894a917f290 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -13,7 +13,7 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
 lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
-	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o
+	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
new file mode 100644
index 000000000000..348fc4329e8f
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <asm/kvm_hyp.h>
+#include <nvhe/gfp.h>
+
+u64 __hyp_vmemmap;
+
+/*
+ * Index the hyp_vmemmap to find a potential buddy page, but make no assumption
+ * about its current state.
+ *
+ * Example buddy-tree for a 4-pages physically contiguous pool:
+ *
+ *                 o : Page 3
+ *                /
+ *               o-o : Page 2
+ *              /
+ *             /   o : Page 1
+ *            /   /
+ *           o---o-o : Page 0
+ *    Order  2   1 0
+ *
+ * Example of requests on this pool:
+ *   __find_buddy_nocheck(pool, page 0, order 0) => page 1
+ *   __find_buddy_nocheck(pool, page 0, order 1) => page 2
+ *   __find_buddy_nocheck(pool, page 1, order 0) => page 0
+ *   __find_buddy_nocheck(pool, page 2, order 0) => page 3
+ */
+static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
+					     struct hyp_page *p,
+					     unsigned int order)
+{
+	phys_addr_t addr = hyp_page_to_phys(p);
+
+	addr ^= (PAGE_SIZE << order);
+
+	/*
+	 * Don't return a page outside the pool range -- it belongs to
+	 * something else and may not be mapped in hyp_vmemmap.
+	 */
+	if (addr < pool->range_start || addr >= pool->range_end)
+		return NULL;
+
+	return hyp_phys_to_page(addr);
+}
+
+/* Find a buddy page currently available for allocation */
+static struct hyp_page *__find_buddy_avail(struct hyp_pool *pool,
+					   struct hyp_page *p,
+					   unsigned int order)
+{
+	struct hyp_page *buddy = __find_buddy_nocheck(pool, p, order);
+
+	if (!buddy || buddy->order != order || list_empty(&buddy->node))
+		return NULL;
+
+	return buddy;
+
+}
+
+static void __hyp_attach_page(struct hyp_pool *pool,
+			      struct hyp_page *p)
+{
+	unsigned int order = p->order;
+	struct hyp_page *buddy;
+
+	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
+
+	/*
+	 * Only the first struct hyp_page of a high-order page (otherwise known
+	 * as the 'head') should have p->order set. The non-head pages should
+	 * have p->order = HYP_NO_ORDER. Here @p may no longer be the head
+	 * after coallescing, so make sure to mark it HYP_NO_ORDER proactively.
+	 */
+	p->order = HYP_NO_ORDER;
+	for (; (order + 1) < pool->max_order; order++) {
+		buddy = __find_buddy_avail(pool, p, order);
+		if (!buddy)
+			break;
+
+		/* Take the buddy out of its list, and coallesce with @p */
+		list_del_init(&buddy->node);
+		buddy->order = HYP_NO_ORDER;
+		p = (p < buddy) ? p : buddy;
+	}
+
+	/* Mark the new head, and insert it */
+	p->order = order;
+	list_add_tail(&p->node, &pool->free_area[order]);
+}
+
+static void hyp_attach_page(struct hyp_page *p)
+{
+	struct hyp_pool *pool = hyp_page_to_pool(p);
+
+	hyp_spin_lock(&pool->lock);
+	__hyp_attach_page(pool, p);
+	hyp_spin_unlock(&pool->lock);
+}
+
+static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
+					   struct hyp_page *p,
+					   unsigned int order)
+{
+	struct hyp_page *buddy;
+
+	list_del_init(&p->node);
+	while (p->order > order) {
+		/*
+		 * The buddy of order n - 1 currently has HYP_NO_ORDER as it
+		 * is covered by a higher-level page (whose head is @p). Use
+		 * __find_buddy_nocheck() to find it and inject it in the
+		 * free_list[n - 1], effectively splitting @p in half.
+		 */
+		p->order--;
+		buddy = __find_buddy_nocheck(pool, p, p->order);
+		buddy->order = p->order;
+		list_add_tail(&buddy->node, &pool->free_area[buddy->order]);
+	}
+
+	return p;
+}
+
+void hyp_put_page(void *addr)
+{
+	struct hyp_page *p = hyp_virt_to_page(addr);
+
+	if (hyp_page_ref_dec_and_test(p))
+		hyp_attach_page(p);
+}
+
+void hyp_get_page(void *addr)
+{
+	struct hyp_page *p = hyp_virt_to_page(addr);
+
+	hyp_page_ref_inc(p);
+}
+
+void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order)
+{
+	unsigned int i = order;
+	struct hyp_page *p;
+
+	hyp_spin_lock(&pool->lock);
+
+	/* Look for a high-enough-order page */
+	while (i < pool->max_order && list_empty(&pool->free_area[i]))
+		i++;
+	if (i >= pool->max_order) {
+		hyp_spin_unlock(&pool->lock);
+		return NULL;
+	}
+
+	/* Extract it from the tree at the right order */
+	p = list_first_entry(&pool->free_area[i], struct hyp_page, node);
+	p = __hyp_extract_page(pool, p, order);
+
+	hyp_spin_unlock(&pool->lock);
+	hyp_page_ref_inc(p);
+
+	return p ? hyp_page_to_virt(p) : NULL;
+}
+
+int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
+		  unsigned int reserved_pages)
+{
+	phys_addr_t phys = hyp_pfn_to_phys(pfn);
+	struct hyp_page *p;
+	int i;
+
+	hyp_spin_lock_init(&pool->lock);
+	pool->max_order = min(MAX_ORDER, get_order(nr_pages << PAGE_SHIFT));
+	for (i = 0; i < pool->max_order; i++)
+		INIT_LIST_HEAD(&pool->free_area[i]);
+	pool->range_start = phys;
+	pool->range_end = phys + (nr_pages << PAGE_SHIFT);
+
+	/* Init the vmemmap portion */
+	p = hyp_phys_to_page(phys);
+	memset(p, 0, sizeof(*p) * nr_pages);
+	for (i = 0; i < nr_pages; i++) {
+		p[i].pool = pool;
+		INIT_LIST_HEAD(&p[i].node);
+	}
+
+	/* Attach the unused pages to the buddy tree */
+	for (i = reserved_pages; i < nr_pages; i++)
+		__hyp_attach_page(pool, &p[i]);
+
+	return 0;
+}
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 12/32] KVM: arm64: Introduce a Hyp buddy page allocator
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

When memory protection is enabled, the hyp code will require a basic
form of memory management in order to allocate and free memory pages at
EL2. This is needed for various use-cases, including the creation of hyp
mappings or the allocation of stage 2 page tables.

To address these use-case, introduce a simple memory allocator in the
hyp code. The allocator is designed as a conventional 'buddy allocator',
working with a page granularity. It allows to allocate and free
physically contiguous pages from memory 'pools', with a guaranteed order
alignment in the PA space. Each page in a memory pool is associated
with a struct hyp_page which holds the page's metadata, including its
refcount, as well as its current order, hence mimicking the kernel's
buddy system in the GFP infrastructure. The hyp_page metadata are made
accessible through a hyp_vmemmap, following the concept of
SPARSE_VMEMMAP in the kernel.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  55 +++++++
 arch/arm64/kvm/hyp/include/nvhe/memory.h |  28 ++++
 arch/arm64/kvm/hyp/nvhe/Makefile         |   2 +-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 195 +++++++++++++++++++++++
 4 files changed, 279 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/gfp.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/page_alloc.c

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
new file mode 100644
index 000000000000..d039086d86b5
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_GFP_H
+#define __KVM_HYP_GFP_H
+
+#include <linux/list.h>
+
+#include <nvhe/memory.h>
+#include <nvhe/spinlock.h>
+
+#define HYP_NO_ORDER	UINT_MAX
+
+struct hyp_pool {
+	/*
+	 * Spinlock protecting concurrent changes to the memory pool as well as
+	 * the struct hyp_page of the pool's pages until we have a proper atomic
+	 * API at EL2.
+	 */
+	hyp_spinlock_t lock;
+	struct list_head free_area[MAX_ORDER];
+	phys_addr_t range_start;
+	phys_addr_t range_end;
+	unsigned int max_order;
+};
+
+static inline void hyp_page_ref_inc(struct hyp_page *p)
+{
+	struct hyp_pool *pool = hyp_page_to_pool(p);
+
+	hyp_spin_lock(&pool->lock);
+	p->refcount++;
+	hyp_spin_unlock(&pool->lock);
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+	struct hyp_pool *pool = hyp_page_to_pool(p);
+	int ret;
+
+	hyp_spin_lock(&pool->lock);
+	p->refcount--;
+	ret = (p->refcount == 0);
+	hyp_spin_unlock(&pool->lock);
+
+	return ret;
+}
+
+/* Allocation */
+void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order);
+void hyp_get_page(void *addr);
+void hyp_put_page(void *addr);
+
+/* Used pages cannot be freed */
+int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
+		  unsigned int reserved_pages);
+#endif /* __KVM_HYP_GFP_H */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 3e49eaa7e682..d2fb307c5952 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -6,7 +6,17 @@
 
 #include <linux/types.h>
 
+struct hyp_pool;
+struct hyp_page {
+	unsigned int refcount;
+	unsigned int order;
+	struct hyp_pool *pool;
+	struct list_head node;
+};
+
 extern s64 hyp_physvirt_offset;
+extern u64 __hyp_vmemmap;
+#define hyp_vmemmap ((struct hyp_page *)__hyp_vmemmap)
 
 #define __hyp_pa(virt)	((phys_addr_t)(virt) + hyp_physvirt_offset)
 #define __hyp_va(phys)	((void *)((phys_addr_t)(phys) - hyp_physvirt_offset))
@@ -21,4 +31,22 @@ static inline phys_addr_t hyp_virt_to_phys(void *addr)
 	return __hyp_pa(addr);
 }
 
+#define hyp_phys_to_pfn(phys)	((phys) >> PAGE_SHIFT)
+#define hyp_pfn_to_phys(pfn)	((phys_addr_t)((pfn) << PAGE_SHIFT))
+#define hyp_phys_to_page(phys)	(&hyp_vmemmap[hyp_phys_to_pfn(phys)])
+#define hyp_virt_to_page(virt)	hyp_phys_to_page(__hyp_pa(virt))
+#define hyp_virt_to_pfn(virt)	hyp_phys_to_pfn(__hyp_pa(virt))
+
+#define hyp_page_to_pfn(page)	((struct hyp_page *)(page) - hyp_vmemmap)
+#define hyp_page_to_phys(page)  hyp_pfn_to_phys((hyp_page_to_pfn(page)))
+#define hyp_page_to_virt(page)	__hyp_va(hyp_page_to_phys(page))
+#define hyp_page_to_pool(page)	(((struct hyp_page *)page)->pool)
+
+static inline int hyp_page_count(void *addr)
+{
+	struct hyp_page *p = hyp_virt_to_page(addr);
+
+	return p->refcount;
+}
+
 #endif /* __KVM_HYP_MEMORY_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 144da72ad510..6894a917f290 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -13,7 +13,7 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
 lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
-	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o
+	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
new file mode 100644
index 000000000000..348fc4329e8f
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <asm/kvm_hyp.h>
+#include <nvhe/gfp.h>
+
+u64 __hyp_vmemmap;
+
+/*
+ * Index the hyp_vmemmap to find a potential buddy page, but make no assumption
+ * about its current state.
+ *
+ * Example buddy-tree for a 4-pages physically contiguous pool:
+ *
+ *                 o : Page 3
+ *                /
+ *               o-o : Page 2
+ *              /
+ *             /   o : Page 1
+ *            /   /
+ *           o---o-o : Page 0
+ *    Order  2   1 0
+ *
+ * Example of requests on this pool:
+ *   __find_buddy_nocheck(pool, page 0, order 0) => page 1
+ *   __find_buddy_nocheck(pool, page 0, order 1) => page 2
+ *   __find_buddy_nocheck(pool, page 1, order 0) => page 0
+ *   __find_buddy_nocheck(pool, page 2, order 0) => page 3
+ */
+static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
+					     struct hyp_page *p,
+					     unsigned int order)
+{
+	phys_addr_t addr = hyp_page_to_phys(p);
+
+	addr ^= (PAGE_SIZE << order);
+
+	/*
+	 * Don't return a page outside the pool range -- it belongs to
+	 * something else and may not be mapped in hyp_vmemmap.
+	 */
+	if (addr < pool->range_start || addr >= pool->range_end)
+		return NULL;
+
+	return hyp_phys_to_page(addr);
+}
+
+/* Find a buddy page currently available for allocation */
+static struct hyp_page *__find_buddy_avail(struct hyp_pool *pool,
+					   struct hyp_page *p,
+					   unsigned int order)
+{
+	struct hyp_page *buddy = __find_buddy_nocheck(pool, p, order);
+
+	if (!buddy || buddy->order != order || list_empty(&buddy->node))
+		return NULL;
+
+	return buddy;
+
+}
+
+static void __hyp_attach_page(struct hyp_pool *pool,
+			      struct hyp_page *p)
+{
+	unsigned int order = p->order;
+	struct hyp_page *buddy;
+
+	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
+
+	/*
+	 * Only the first struct hyp_page of a high-order page (otherwise known
+	 * as the 'head') should have p->order set. The non-head pages should
+	 * have p->order = HYP_NO_ORDER. Here @p may no longer be the head
+	 * after coallescing, so make sure to mark it HYP_NO_ORDER proactively.
+	 */
+	p->order = HYP_NO_ORDER;
+	for (; (order + 1) < pool->max_order; order++) {
+		buddy = __find_buddy_avail(pool, p, order);
+		if (!buddy)
+			break;
+
+		/* Take the buddy out of its list, and coallesce with @p */
+		list_del_init(&buddy->node);
+		buddy->order = HYP_NO_ORDER;
+		p = (p < buddy) ? p : buddy;
+	}
+
+	/* Mark the new head, and insert it */
+	p->order = order;
+	list_add_tail(&p->node, &pool->free_area[order]);
+}
+
+static void hyp_attach_page(struct hyp_page *p)
+{
+	struct hyp_pool *pool = hyp_page_to_pool(p);
+
+	hyp_spin_lock(&pool->lock);
+	__hyp_attach_page(pool, p);
+	hyp_spin_unlock(&pool->lock);
+}
+
+static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
+					   struct hyp_page *p,
+					   unsigned int order)
+{
+	struct hyp_page *buddy;
+
+	list_del_init(&p->node);
+	while (p->order > order) {
+		/*
+		 * The buddy of order n - 1 currently has HYP_NO_ORDER as it
+		 * is covered by a higher-level page (whose head is @p). Use
+		 * __find_buddy_nocheck() to find it and inject it in the
+		 * free_list[n - 1], effectively splitting @p in half.
+		 */
+		p->order--;
+		buddy = __find_buddy_nocheck(pool, p, p->order);
+		buddy->order = p->order;
+		list_add_tail(&buddy->node, &pool->free_area[buddy->order]);
+	}
+
+	return p;
+}
+
+void hyp_put_page(void *addr)
+{
+	struct hyp_page *p = hyp_virt_to_page(addr);
+
+	if (hyp_page_ref_dec_and_test(p))
+		hyp_attach_page(p);
+}
+
+void hyp_get_page(void *addr)
+{
+	struct hyp_page *p = hyp_virt_to_page(addr);
+
+	hyp_page_ref_inc(p);
+}
+
+void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order)
+{
+	unsigned int i = order;
+	struct hyp_page *p;
+
+	hyp_spin_lock(&pool->lock);
+
+	/* Look for a high-enough-order page */
+	while (i < pool->max_order && list_empty(&pool->free_area[i]))
+		i++;
+	if (i >= pool->max_order) {
+		hyp_spin_unlock(&pool->lock);
+		return NULL;
+	}
+
+	/* Extract it from the tree at the right order */
+	p = list_first_entry(&pool->free_area[i], struct hyp_page, node);
+	p = __hyp_extract_page(pool, p, order);
+
+	hyp_spin_unlock(&pool->lock);
+	hyp_page_ref_inc(p);
+
+	return p ? hyp_page_to_virt(p) : NULL;
+}
+
+int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
+		  unsigned int reserved_pages)
+{
+	phys_addr_t phys = hyp_pfn_to_phys(pfn);
+	struct hyp_page *p;
+	int i;
+
+	hyp_spin_lock_init(&pool->lock);
+	pool->max_order = min(MAX_ORDER, get_order(nr_pages << PAGE_SHIFT));
+	for (i = 0; i < pool->max_order; i++)
+		INIT_LIST_HEAD(&pool->free_area[i]);
+	pool->range_start = phys;
+	pool->range_end = phys + (nr_pages << PAGE_SHIFT);
+
+	/* Init the vmemmap portion */
+	p = hyp_phys_to_page(phys);
+	memset(p, 0, sizeof(*p) * nr_pages);
+	for (i = 0; i < nr_pages; i++) {
+		p[i].pool = pool;
+		INIT_LIST_HEAD(&p[i].node);
+	}
+
+	/* Attach the unused pages to the buddy tree */
+	for (i = reserved_pages; i < nr_pages; i++)
+		__hyp_attach_page(pool, &p[i]);
+
+	return 0;
+}
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 12/32] KVM: arm64: Introduce a Hyp buddy page allocator
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

When memory protection is enabled, the hyp code will require a basic
form of memory management in order to allocate and free memory pages at
EL2. This is needed for various use-cases, including the creation of hyp
mappings or the allocation of stage 2 page tables.

To address these use-case, introduce a simple memory allocator in the
hyp code. The allocator is designed as a conventional 'buddy allocator',
working with a page granularity. It allows to allocate and free
physically contiguous pages from memory 'pools', with a guaranteed order
alignment in the PA space. Each page in a memory pool is associated
with a struct hyp_page which holds the page's metadata, including its
refcount, as well as its current order, hence mimicking the kernel's
buddy system in the GFP infrastructure. The hyp_page metadata are made
accessible through a hyp_vmemmap, following the concept of
SPARSE_VMEMMAP in the kernel.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  55 +++++++
 arch/arm64/kvm/hyp/include/nvhe/memory.h |  28 ++++
 arch/arm64/kvm/hyp/nvhe/Makefile         |   2 +-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 195 +++++++++++++++++++++++
 4 files changed, 279 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/gfp.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/page_alloc.c

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
new file mode 100644
index 000000000000..d039086d86b5
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_GFP_H
+#define __KVM_HYP_GFP_H
+
+#include <linux/list.h>
+
+#include <nvhe/memory.h>
+#include <nvhe/spinlock.h>
+
+#define HYP_NO_ORDER	UINT_MAX
+
+struct hyp_pool {
+	/*
+	 * Spinlock protecting concurrent changes to the memory pool as well as
+	 * the struct hyp_page of the pool's pages until we have a proper atomic
+	 * API at EL2.
+	 */
+	hyp_spinlock_t lock;
+	struct list_head free_area[MAX_ORDER];
+	phys_addr_t range_start;
+	phys_addr_t range_end;
+	unsigned int max_order;
+};
+
+static inline void hyp_page_ref_inc(struct hyp_page *p)
+{
+	struct hyp_pool *pool = hyp_page_to_pool(p);
+
+	hyp_spin_lock(&pool->lock);
+	p->refcount++;
+	hyp_spin_unlock(&pool->lock);
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+	struct hyp_pool *pool = hyp_page_to_pool(p);
+	int ret;
+
+	hyp_spin_lock(&pool->lock);
+	p->refcount--;
+	ret = (p->refcount == 0);
+	hyp_spin_unlock(&pool->lock);
+
+	return ret;
+}
+
+/* Allocation */
+void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order);
+void hyp_get_page(void *addr);
+void hyp_put_page(void *addr);
+
+/* Used pages cannot be freed */
+int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
+		  unsigned int reserved_pages);
+#endif /* __KVM_HYP_GFP_H */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 3e49eaa7e682..d2fb307c5952 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -6,7 +6,17 @@
 
 #include <linux/types.h>
 
+struct hyp_pool;
+struct hyp_page {
+	unsigned int refcount;
+	unsigned int order;
+	struct hyp_pool *pool;
+	struct list_head node;
+};
+
 extern s64 hyp_physvirt_offset;
+extern u64 __hyp_vmemmap;
+#define hyp_vmemmap ((struct hyp_page *)__hyp_vmemmap)
 
 #define __hyp_pa(virt)	((phys_addr_t)(virt) + hyp_physvirt_offset)
 #define __hyp_va(phys)	((void *)((phys_addr_t)(phys) - hyp_physvirt_offset))
@@ -21,4 +31,22 @@ static inline phys_addr_t hyp_virt_to_phys(void *addr)
 	return __hyp_pa(addr);
 }
 
+#define hyp_phys_to_pfn(phys)	((phys) >> PAGE_SHIFT)
+#define hyp_pfn_to_phys(pfn)	((phys_addr_t)((pfn) << PAGE_SHIFT))
+#define hyp_phys_to_page(phys)	(&hyp_vmemmap[hyp_phys_to_pfn(phys)])
+#define hyp_virt_to_page(virt)	hyp_phys_to_page(__hyp_pa(virt))
+#define hyp_virt_to_pfn(virt)	hyp_phys_to_pfn(__hyp_pa(virt))
+
+#define hyp_page_to_pfn(page)	((struct hyp_page *)(page) - hyp_vmemmap)
+#define hyp_page_to_phys(page)  hyp_pfn_to_phys((hyp_page_to_pfn(page)))
+#define hyp_page_to_virt(page)	__hyp_va(hyp_page_to_phys(page))
+#define hyp_page_to_pool(page)	(((struct hyp_page *)page)->pool)
+
+static inline int hyp_page_count(void *addr)
+{
+	struct hyp_page *p = hyp_virt_to_page(addr);
+
+	return p->refcount;
+}
+
 #endif /* __KVM_HYP_MEMORY_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 144da72ad510..6894a917f290 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -13,7 +13,7 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
 lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
-	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o
+	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
new file mode 100644
index 000000000000..348fc4329e8f
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <asm/kvm_hyp.h>
+#include <nvhe/gfp.h>
+
+u64 __hyp_vmemmap;
+
+/*
+ * Index the hyp_vmemmap to find a potential buddy page, but make no assumption
+ * about its current state.
+ *
+ * Example buddy-tree for a 4-pages physically contiguous pool:
+ *
+ *                 o : Page 3
+ *                /
+ *               o-o : Page 2
+ *              /
+ *             /   o : Page 1
+ *            /   /
+ *           o---o-o : Page 0
+ *    Order  2   1 0
+ *
+ * Example of requests on this pool:
+ *   __find_buddy_nocheck(pool, page 0, order 0) => page 1
+ *   __find_buddy_nocheck(pool, page 0, order 1) => page 2
+ *   __find_buddy_nocheck(pool, page 1, order 0) => page 0
+ *   __find_buddy_nocheck(pool, page 2, order 0) => page 3
+ */
+static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
+					     struct hyp_page *p,
+					     unsigned int order)
+{
+	phys_addr_t addr = hyp_page_to_phys(p);
+
+	addr ^= (PAGE_SIZE << order);
+
+	/*
+	 * Don't return a page outside the pool range -- it belongs to
+	 * something else and may not be mapped in hyp_vmemmap.
+	 */
+	if (addr < pool->range_start || addr >= pool->range_end)
+		return NULL;
+
+	return hyp_phys_to_page(addr);
+}
+
+/* Find a buddy page currently available for allocation */
+static struct hyp_page *__find_buddy_avail(struct hyp_pool *pool,
+					   struct hyp_page *p,
+					   unsigned int order)
+{
+	struct hyp_page *buddy = __find_buddy_nocheck(pool, p, order);
+
+	if (!buddy || buddy->order != order || list_empty(&buddy->node))
+		return NULL;
+
+	return buddy;
+
+}
+
+static void __hyp_attach_page(struct hyp_pool *pool,
+			      struct hyp_page *p)
+{
+	unsigned int order = p->order;
+	struct hyp_page *buddy;
+
+	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
+
+	/*
+	 * Only the first struct hyp_page of a high-order page (otherwise known
+	 * as the 'head') should have p->order set. The non-head pages should
+	 * have p->order = HYP_NO_ORDER. Here @p may no longer be the head
+	 * after coallescing, so make sure to mark it HYP_NO_ORDER proactively.
+	 */
+	p->order = HYP_NO_ORDER;
+	for (; (order + 1) < pool->max_order; order++) {
+		buddy = __find_buddy_avail(pool, p, order);
+		if (!buddy)
+			break;
+
+		/* Take the buddy out of its list, and coallesce with @p */
+		list_del_init(&buddy->node);
+		buddy->order = HYP_NO_ORDER;
+		p = (p < buddy) ? p : buddy;
+	}
+
+	/* Mark the new head, and insert it */
+	p->order = order;
+	list_add_tail(&p->node, &pool->free_area[order]);
+}
+
+static void hyp_attach_page(struct hyp_page *p)
+{
+	struct hyp_pool *pool = hyp_page_to_pool(p);
+
+	hyp_spin_lock(&pool->lock);
+	__hyp_attach_page(pool, p);
+	hyp_spin_unlock(&pool->lock);
+}
+
+static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
+					   struct hyp_page *p,
+					   unsigned int order)
+{
+	struct hyp_page *buddy;
+
+	list_del_init(&p->node);
+	while (p->order > order) {
+		/*
+		 * The buddy of order n - 1 currently has HYP_NO_ORDER as it
+		 * is covered by a higher-level page (whose head is @p). Use
+		 * __find_buddy_nocheck() to find it and inject it in the
+		 * free_list[n - 1], effectively splitting @p in half.
+		 */
+		p->order--;
+		buddy = __find_buddy_nocheck(pool, p, p->order);
+		buddy->order = p->order;
+		list_add_tail(&buddy->node, &pool->free_area[buddy->order]);
+	}
+
+	return p;
+}
+
+void hyp_put_page(void *addr)
+{
+	struct hyp_page *p = hyp_virt_to_page(addr);
+
+	if (hyp_page_ref_dec_and_test(p))
+		hyp_attach_page(p);
+}
+
+void hyp_get_page(void *addr)
+{
+	struct hyp_page *p = hyp_virt_to_page(addr);
+
+	hyp_page_ref_inc(p);
+}
+
+void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order)
+{
+	unsigned int i = order;
+	struct hyp_page *p;
+
+	hyp_spin_lock(&pool->lock);
+
+	/* Look for a high-enough-order page */
+	while (i < pool->max_order && list_empty(&pool->free_area[i]))
+		i++;
+	if (i >= pool->max_order) {
+		hyp_spin_unlock(&pool->lock);
+		return NULL;
+	}
+
+	/* Extract it from the tree at the right order */
+	p = list_first_entry(&pool->free_area[i], struct hyp_page, node);
+	p = __hyp_extract_page(pool, p, order);
+
+	hyp_spin_unlock(&pool->lock);
+	hyp_page_ref_inc(p);
+
+	return p ? hyp_page_to_virt(p) : NULL;
+}
+
+int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
+		  unsigned int reserved_pages)
+{
+	phys_addr_t phys = hyp_pfn_to_phys(pfn);
+	struct hyp_page *p;
+	int i;
+
+	hyp_spin_lock_init(&pool->lock);
+	pool->max_order = min(MAX_ORDER, get_order(nr_pages << PAGE_SHIFT));
+	for (i = 0; i < pool->max_order; i++)
+		INIT_LIST_HEAD(&pool->free_area[i]);
+	pool->range_start = phys;
+	pool->range_end = phys + (nr_pages << PAGE_SHIFT);
+
+	/* Init the vmemmap portion */
+	p = hyp_phys_to_page(phys);
+	memset(p, 0, sizeof(*p) * nr_pages);
+	for (i = 0; i < nr_pages; i++) {
+		p[i].pool = pool;
+		INIT_LIST_HEAD(&p[i].node);
+	}
+
+	/* Attach the unused pages to the buddy tree */
+	for (i = reserved_pages; i < nr_pages; i++)
+		__hyp_attach_page(pool, &p[i]);
+
+	return 0;
+}
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 13/32] KVM: arm64: Enable access to sanitized CPU features at EL2
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Introduce the infrastructure in KVM enabling to copy CPU feature
registers into EL2-owned data-structures, to allow reading sanitised
values directly at EL2 in nVHE.

Given that only a subset of these features are being read by the
hypervisor, the ones that need to be copied are to be listed under
<asm/kvm_cpufeature.h> together with the name of the nVHE variable that
will hold the copy.

While at it, introduce the first user of this infrastructure by
implementing __flush_dcache_area at EL2, which needs
arm64_ftr_reg_ctrel0.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/cpufeature.h     |  1 +
 arch/arm64/include/asm/kvm_cpufeature.h | 17 +++++++++++++++++
 arch/arm64/include/asm/kvm_host.h       |  4 ++++
 arch/arm64/kernel/cpufeature.c          | 13 +++++++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile        |  3 ++-
 arch/arm64/kvm/hyp/nvhe/cache.S         | 13 +++++++++++++
 arch/arm64/kvm/hyp/nvhe/cpufeature.c    |  8 ++++++++
 arch/arm64/kvm/sys_regs.c               | 21 +++++++++++++++++++++
 8 files changed, 79 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/kvm_cpufeature.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/cache.S
 create mode 100644 arch/arm64/kvm/hyp/nvhe/cpufeature.c

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 61177bac49fa..a85cea2cac57 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -607,6 +607,7 @@ void check_local_cpu_capabilities(void);
 
 u64 read_sanitised_ftr_reg(u32 id);
 u64 __read_sysreg_by_encoding(u32 sys_id);
+int copy_ftr_reg(u32 id, struct arm64_ftr_reg *dst);
 
 static inline bool cpu_supports_mixed_endian_el0(void)
 {
diff --git a/arch/arm64/include/asm/kvm_cpufeature.h b/arch/arm64/include/asm/kvm_cpufeature.h
new file mode 100644
index 000000000000..d34f85cba358
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_cpufeature.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <asm/cpufeature.h>
+
+#ifndef KVM_HYP_CPU_FTR_REG
+#if defined(__KVM_NVHE_HYPERVISOR__)
+#define KVM_HYP_CPU_FTR_REG(id, name) extern struct arm64_ftr_reg name;
+#else
+#define KVM_HYP_CPU_FTR_REG(id, name) DECLARE_KVM_NVHE_SYM(name);
+#endif
+#endif
+
+KVM_HYP_CPU_FTR_REG(SYS_CTR_EL0, arm64_ftr_reg_ctrel0)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 06ca4828005f..459ee557f87c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -751,9 +751,13 @@ void kvm_clr_pmu_events(u32 clr);
 
 void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu);
 void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu);
+
+void setup_kvm_el2_caps(void);
 #else
 static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
 static inline void kvm_clr_pmu_events(u32 clr) {}
+
+static inline void setup_kvm_el2_caps(void) {}
 #endif
 
 void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 066030717a4c..f2d8b479ff74 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1154,6 +1154,18 @@ u64 read_sanitised_ftr_reg(u32 id)
 }
 EXPORT_SYMBOL_GPL(read_sanitised_ftr_reg);
 
+int copy_ftr_reg(u32 id, struct arm64_ftr_reg *dst)
+{
+	struct arm64_ftr_reg *regp = get_arm64_ftr_reg(id);
+
+	if (!regp)
+		return -EINVAL;
+
+	memcpy(dst, regp, sizeof(*regp));
+
+	return 0;
+}
+
 #define read_sysreg_case(r)	\
 	case r:		val = read_sysreg_s(r); break;
 
@@ -2773,6 +2785,7 @@ void __init setup_cpu_features(void)
 
 	setup_system_capabilities();
 	setup_elf_hwcaps(arm64_elf_hwcaps);
+	setup_kvm_el2_caps();
 
 	if (system_supports_32bit_el0())
 		setup_elf_hwcaps(compat_elf_hwcaps);
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 6894a917f290..0033591553fc 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -13,7 +13,8 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
 lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
-	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o
+	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
+	 cache.o cpufeature.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
new file mode 100644
index 000000000000..36cef6915428
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/cache.S
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Code copied from arch/arm64/mm/cache.S.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+#include <asm/alternative.h>
+
+SYM_FUNC_START_PI(__flush_dcache_area)
+	dcache_by_line_op civac, sy, x0, x1, x2, x3
+	ret
+SYM_FUNC_END_PI(__flush_dcache_area)
diff --git a/arch/arm64/kvm/hyp/nvhe/cpufeature.c b/arch/arm64/kvm/hyp/nvhe/cpufeature.c
new file mode 100644
index 000000000000..a887508f996f
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/cpufeature.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#define KVM_HYP_CPU_FTR_REG(id, name) struct arm64_ftr_reg name;
+#include <asm/kvm_cpufeature.h>
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4f2f1e3145de..84be93df52fa 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -21,6 +21,7 @@
 #include <asm/debug-monitors.h>
 #include <asm/esr.h>
 #include <asm/kvm_arm.h>
+#include <asm/kvm_cpufeature.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
@@ -2775,3 +2776,23 @@ void kvm_sys_reg_table_init(void)
 	/* Clear all higher bits. */
 	cache_levels &= (1 << (i*3))-1;
 }
+
+#undef KVM_HYP_CPU_FTR_REG
+#define KVM_HYP_CPU_FTR_REG(id, name) \
+	{ .sys_id = id, .dst = (struct arm64_ftr_reg *)&kvm_nvhe_sym(name) },
+struct __ftr_reg_copy_entry {
+	u32			sys_id;
+	struct arm64_ftr_reg	*dst;
+} hyp_ftr_regs[] __initdata = {
+	#include <asm/kvm_cpufeature.h>
+};
+
+void __init setup_kvm_el2_caps(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(hyp_ftr_regs); i++) {
+		WARN(copy_ftr_reg(hyp_ftr_regs[i].sys_id, hyp_ftr_regs[i].dst),
+		     "%u feature register not found\n", hyp_ftr_regs[i].sys_id);
+	}
+}
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 13/32] KVM: arm64: Enable access to sanitized CPU features at EL2
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

Introduce the infrastructure in KVM enabling to copy CPU feature
registers into EL2-owned data-structures, to allow reading sanitised
values directly at EL2 in nVHE.

Given that only a subset of these features are being read by the
hypervisor, the ones that need to be copied are to be listed under
<asm/kvm_cpufeature.h> together with the name of the nVHE variable that
will hold the copy.

While at it, introduce the first user of this infrastructure by
implementing __flush_dcache_area at EL2, which needs
arm64_ftr_reg_ctrel0.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/cpufeature.h     |  1 +
 arch/arm64/include/asm/kvm_cpufeature.h | 17 +++++++++++++++++
 arch/arm64/include/asm/kvm_host.h       |  4 ++++
 arch/arm64/kernel/cpufeature.c          | 13 +++++++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile        |  3 ++-
 arch/arm64/kvm/hyp/nvhe/cache.S         | 13 +++++++++++++
 arch/arm64/kvm/hyp/nvhe/cpufeature.c    |  8 ++++++++
 arch/arm64/kvm/sys_regs.c               | 21 +++++++++++++++++++++
 8 files changed, 79 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/kvm_cpufeature.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/cache.S
 create mode 100644 arch/arm64/kvm/hyp/nvhe/cpufeature.c

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 61177bac49fa..a85cea2cac57 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -607,6 +607,7 @@ void check_local_cpu_capabilities(void);
 
 u64 read_sanitised_ftr_reg(u32 id);
 u64 __read_sysreg_by_encoding(u32 sys_id);
+int copy_ftr_reg(u32 id, struct arm64_ftr_reg *dst);
 
 static inline bool cpu_supports_mixed_endian_el0(void)
 {
diff --git a/arch/arm64/include/asm/kvm_cpufeature.h b/arch/arm64/include/asm/kvm_cpufeature.h
new file mode 100644
index 000000000000..d34f85cba358
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_cpufeature.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <asm/cpufeature.h>
+
+#ifndef KVM_HYP_CPU_FTR_REG
+#if defined(__KVM_NVHE_HYPERVISOR__)
+#define KVM_HYP_CPU_FTR_REG(id, name) extern struct arm64_ftr_reg name;
+#else
+#define KVM_HYP_CPU_FTR_REG(id, name) DECLARE_KVM_NVHE_SYM(name);
+#endif
+#endif
+
+KVM_HYP_CPU_FTR_REG(SYS_CTR_EL0, arm64_ftr_reg_ctrel0)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 06ca4828005f..459ee557f87c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -751,9 +751,13 @@ void kvm_clr_pmu_events(u32 clr);
 
 void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu);
 void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu);
+
+void setup_kvm_el2_caps(void);
 #else
 static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
 static inline void kvm_clr_pmu_events(u32 clr) {}
+
+static inline void setup_kvm_el2_caps(void) {}
 #endif
 
 void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 066030717a4c..f2d8b479ff74 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1154,6 +1154,18 @@ u64 read_sanitised_ftr_reg(u32 id)
 }
 EXPORT_SYMBOL_GPL(read_sanitised_ftr_reg);
 
+int copy_ftr_reg(u32 id, struct arm64_ftr_reg *dst)
+{
+	struct arm64_ftr_reg *regp = get_arm64_ftr_reg(id);
+
+	if (!regp)
+		return -EINVAL;
+
+	memcpy(dst, regp, sizeof(*regp));
+
+	return 0;
+}
+
 #define read_sysreg_case(r)	\
 	case r:		val = read_sysreg_s(r); break;
 
@@ -2773,6 +2785,7 @@ void __init setup_cpu_features(void)
 
 	setup_system_capabilities();
 	setup_elf_hwcaps(arm64_elf_hwcaps);
+	setup_kvm_el2_caps();
 
 	if (system_supports_32bit_el0())
 		setup_elf_hwcaps(compat_elf_hwcaps);
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 6894a917f290..0033591553fc 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -13,7 +13,8 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
 lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
-	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o
+	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
+	 cache.o cpufeature.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
new file mode 100644
index 000000000000..36cef6915428
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/cache.S
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Code copied from arch/arm64/mm/cache.S.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+#include <asm/alternative.h>
+
+SYM_FUNC_START_PI(__flush_dcache_area)
+	dcache_by_line_op civac, sy, x0, x1, x2, x3
+	ret
+SYM_FUNC_END_PI(__flush_dcache_area)
diff --git a/arch/arm64/kvm/hyp/nvhe/cpufeature.c b/arch/arm64/kvm/hyp/nvhe/cpufeature.c
new file mode 100644
index 000000000000..a887508f996f
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/cpufeature.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#define KVM_HYP_CPU_FTR_REG(id, name) struct arm64_ftr_reg name;
+#include <asm/kvm_cpufeature.h>
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4f2f1e3145de..84be93df52fa 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -21,6 +21,7 @@
 #include <asm/debug-monitors.h>
 #include <asm/esr.h>
 #include <asm/kvm_arm.h>
+#include <asm/kvm_cpufeature.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
@@ -2775,3 +2776,23 @@ void kvm_sys_reg_table_init(void)
 	/* Clear all higher bits. */
 	cache_levels &= (1 << (i*3))-1;
 }
+
+#undef KVM_HYP_CPU_FTR_REG
+#define KVM_HYP_CPU_FTR_REG(id, name) \
+	{ .sys_id = id, .dst = (struct arm64_ftr_reg *)&kvm_nvhe_sym(name) },
+struct __ftr_reg_copy_entry {
+	u32			sys_id;
+	struct arm64_ftr_reg	*dst;
+} hyp_ftr_regs[] __initdata = {
+	#include <asm/kvm_cpufeature.h>
+};
+
+void __init setup_kvm_el2_caps(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(hyp_ftr_regs); i++) {
+		WARN(copy_ftr_reg(hyp_ftr_regs[i].sys_id, hyp_ftr_regs[i].dst),
+		     "%u feature register not found\n", hyp_ftr_regs[i].sys_id);
+	}
+}
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 13/32] KVM: arm64: Enable access to sanitized CPU features at EL2
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Introduce the infrastructure in KVM enabling to copy CPU feature
registers into EL2-owned data-structures, to allow reading sanitised
values directly at EL2 in nVHE.

Given that only a subset of these features are being read by the
hypervisor, the ones that need to be copied are to be listed under
<asm/kvm_cpufeature.h> together with the name of the nVHE variable that
will hold the copy.

While at it, introduce the first user of this infrastructure by
implementing __flush_dcache_area at EL2, which needs
arm64_ftr_reg_ctrel0.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/cpufeature.h     |  1 +
 arch/arm64/include/asm/kvm_cpufeature.h | 17 +++++++++++++++++
 arch/arm64/include/asm/kvm_host.h       |  4 ++++
 arch/arm64/kernel/cpufeature.c          | 13 +++++++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile        |  3 ++-
 arch/arm64/kvm/hyp/nvhe/cache.S         | 13 +++++++++++++
 arch/arm64/kvm/hyp/nvhe/cpufeature.c    |  8 ++++++++
 arch/arm64/kvm/sys_regs.c               | 21 +++++++++++++++++++++
 8 files changed, 79 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/kvm_cpufeature.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/cache.S
 create mode 100644 arch/arm64/kvm/hyp/nvhe/cpufeature.c

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 61177bac49fa..a85cea2cac57 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -607,6 +607,7 @@ void check_local_cpu_capabilities(void);
 
 u64 read_sanitised_ftr_reg(u32 id);
 u64 __read_sysreg_by_encoding(u32 sys_id);
+int copy_ftr_reg(u32 id, struct arm64_ftr_reg *dst);
 
 static inline bool cpu_supports_mixed_endian_el0(void)
 {
diff --git a/arch/arm64/include/asm/kvm_cpufeature.h b/arch/arm64/include/asm/kvm_cpufeature.h
new file mode 100644
index 000000000000..d34f85cba358
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_cpufeature.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <asm/cpufeature.h>
+
+#ifndef KVM_HYP_CPU_FTR_REG
+#if defined(__KVM_NVHE_HYPERVISOR__)
+#define KVM_HYP_CPU_FTR_REG(id, name) extern struct arm64_ftr_reg name;
+#else
+#define KVM_HYP_CPU_FTR_REG(id, name) DECLARE_KVM_NVHE_SYM(name);
+#endif
+#endif
+
+KVM_HYP_CPU_FTR_REG(SYS_CTR_EL0, arm64_ftr_reg_ctrel0)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 06ca4828005f..459ee557f87c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -751,9 +751,13 @@ void kvm_clr_pmu_events(u32 clr);
 
 void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu);
 void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu);
+
+void setup_kvm_el2_caps(void);
 #else
 static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
 static inline void kvm_clr_pmu_events(u32 clr) {}
+
+static inline void setup_kvm_el2_caps(void) {}
 #endif
 
 void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 066030717a4c..f2d8b479ff74 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1154,6 +1154,18 @@ u64 read_sanitised_ftr_reg(u32 id)
 }
 EXPORT_SYMBOL_GPL(read_sanitised_ftr_reg);
 
+int copy_ftr_reg(u32 id, struct arm64_ftr_reg *dst)
+{
+	struct arm64_ftr_reg *regp = get_arm64_ftr_reg(id);
+
+	if (!regp)
+		return -EINVAL;
+
+	memcpy(dst, regp, sizeof(*regp));
+
+	return 0;
+}
+
 #define read_sysreg_case(r)	\
 	case r:		val = read_sysreg_s(r); break;
 
@@ -2773,6 +2785,7 @@ void __init setup_cpu_features(void)
 
 	setup_system_capabilities();
 	setup_elf_hwcaps(arm64_elf_hwcaps);
+	setup_kvm_el2_caps();
 
 	if (system_supports_32bit_el0())
 		setup_elf_hwcaps(compat_elf_hwcaps);
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 6894a917f290..0033591553fc 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -13,7 +13,8 @@ lib-objs := clear_page.o copy_page.o memcpy.o memset.o
 lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
-	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o
+	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
+	 cache.o cpufeature.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
new file mode 100644
index 000000000000..36cef6915428
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/cache.S
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Code copied from arch/arm64/mm/cache.S.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+#include <asm/alternative.h>
+
+SYM_FUNC_START_PI(__flush_dcache_area)
+	dcache_by_line_op civac, sy, x0, x1, x2, x3
+	ret
+SYM_FUNC_END_PI(__flush_dcache_area)
diff --git a/arch/arm64/kvm/hyp/nvhe/cpufeature.c b/arch/arm64/kvm/hyp/nvhe/cpufeature.c
new file mode 100644
index 000000000000..a887508f996f
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/cpufeature.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#define KVM_HYP_CPU_FTR_REG(id, name) struct arm64_ftr_reg name;
+#include <asm/kvm_cpufeature.h>
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4f2f1e3145de..84be93df52fa 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -21,6 +21,7 @@
 #include <asm/debug-monitors.h>
 #include <asm/esr.h>
 #include <asm/kvm_arm.h>
+#include <asm/kvm_cpufeature.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
@@ -2775,3 +2776,23 @@ void kvm_sys_reg_table_init(void)
 	/* Clear all higher bits. */
 	cache_levels &= (1 << (i*3))-1;
 }
+
+#undef KVM_HYP_CPU_FTR_REG
+#define KVM_HYP_CPU_FTR_REG(id, name) \
+	{ .sys_id = id, .dst = (struct arm64_ftr_reg *)&kvm_nvhe_sym(name) },
+struct __ftr_reg_copy_entry {
+	u32			sys_id;
+	struct arm64_ftr_reg	*dst;
+} hyp_ftr_regs[] __initdata = {
+	#include <asm/kvm_cpufeature.h>
+};
+
+void __init setup_kvm_el2_caps(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(hyp_ftr_regs); i++) {
+		WARN(copy_ftr_reg(hyp_ftr_regs[i].sys_id, hyp_ftr_regs[i].dst),
+		     "%u feature register not found\n", hyp_ftr_regs[i].sys_id);
+	}
+}
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 14/32] KVM: arm64: Factor out vector address calculation
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In order to re-map the guest vectors at EL2 when pKVM is enabled,
refactor __kvm_vector_slot2idx() and kvm_init_vector_slot() to move all
the address calculation logic in a static inline function.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 8 ++++++++
 arch/arm64/kvm/arm.c             | 9 +--------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 90873851f677..5c42ec023cc7 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -168,6 +168,14 @@ phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
 int kvm_mmu_init(void);
 
+static inline void *__kvm_vector_slot2addr(void *base,
+					   enum arm64_hyp_spectre_vector slot)
+{
+	int idx = slot - (slot != HYP_VECTOR_DIRECT);
+
+	return base + (idx * SZ_2K);
+}
+
 struct kvm;
 
 #define kvm_flush_dcache_to_poc(a,l)	__flush_dcache_area((a), (l))
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3f8bcf8db036..26e573cdede3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1345,16 +1345,9 @@ static unsigned long nvhe_percpu_order(void)
 /* A lookup table holding the hypervisor VA for each vector slot */
 static void *hyp_spectre_vector_selector[BP_HARDEN_EL2_SLOTS];
 
-static int __kvm_vector_slot2idx(enum arm64_hyp_spectre_vector slot)
-{
-	return slot - (slot != HYP_VECTOR_DIRECT);
-}
-
 static void kvm_init_vector_slot(void *base, enum arm64_hyp_spectre_vector slot)
 {
-	int idx = __kvm_vector_slot2idx(slot);
-
-	hyp_spectre_vector_selector[slot] = base + (idx * SZ_2K);
+	hyp_spectre_vector_selector[slot] = __kvm_vector_slot2addr(base, slot);
 }
 
 static int kvm_init_vector_slots(void)
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 14/32] KVM: arm64: Factor out vector address calculation
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

In order to re-map the guest vectors at EL2 when pKVM is enabled,
refactor __kvm_vector_slot2idx() and kvm_init_vector_slot() to move all
the address calculation logic in a static inline function.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 8 ++++++++
 arch/arm64/kvm/arm.c             | 9 +--------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 90873851f677..5c42ec023cc7 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -168,6 +168,14 @@ phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
 int kvm_mmu_init(void);
 
+static inline void *__kvm_vector_slot2addr(void *base,
+					   enum arm64_hyp_spectre_vector slot)
+{
+	int idx = slot - (slot != HYP_VECTOR_DIRECT);
+
+	return base + (idx * SZ_2K);
+}
+
 struct kvm;
 
 #define kvm_flush_dcache_to_poc(a,l)	__flush_dcache_area((a), (l))
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3f8bcf8db036..26e573cdede3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1345,16 +1345,9 @@ static unsigned long nvhe_percpu_order(void)
 /* A lookup table holding the hypervisor VA for each vector slot */
 static void *hyp_spectre_vector_selector[BP_HARDEN_EL2_SLOTS];
 
-static int __kvm_vector_slot2idx(enum arm64_hyp_spectre_vector slot)
-{
-	return slot - (slot != HYP_VECTOR_DIRECT);
-}
-
 static void kvm_init_vector_slot(void *base, enum arm64_hyp_spectre_vector slot)
 {
-	int idx = __kvm_vector_slot2idx(slot);
-
-	hyp_spectre_vector_selector[slot] = base + (idx * SZ_2K);
+	hyp_spectre_vector_selector[slot] = __kvm_vector_slot2addr(base, slot);
 }
 
 static int kvm_init_vector_slots(void)
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 14/32] KVM: arm64: Factor out vector address calculation
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In order to re-map the guest vectors at EL2 when pKVM is enabled,
refactor __kvm_vector_slot2idx() and kvm_init_vector_slot() to move all
the address calculation logic in a static inline function.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 8 ++++++++
 arch/arm64/kvm/arm.c             | 9 +--------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 90873851f677..5c42ec023cc7 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -168,6 +168,14 @@ phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
 int kvm_mmu_init(void);
 
+static inline void *__kvm_vector_slot2addr(void *base,
+					   enum arm64_hyp_spectre_vector slot)
+{
+	int idx = slot - (slot != HYP_VECTOR_DIRECT);
+
+	return base + (idx * SZ_2K);
+}
+
 struct kvm;
 
 #define kvm_flush_dcache_to_poc(a,l)	__flush_dcache_area((a), (l))
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3f8bcf8db036..26e573cdede3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1345,16 +1345,9 @@ static unsigned long nvhe_percpu_order(void)
 /* A lookup table holding the hypervisor VA for each vector slot */
 static void *hyp_spectre_vector_selector[BP_HARDEN_EL2_SLOTS];
 
-static int __kvm_vector_slot2idx(enum arm64_hyp_spectre_vector slot)
-{
-	return slot - (slot != HYP_VECTOR_DIRECT);
-}
-
 static void kvm_init_vector_slot(void *base, enum arm64_hyp_spectre_vector slot)
 {
-	int idx = __kvm_vector_slot2idx(slot);
-
-	hyp_spectre_vector_selector[slot] = base + (idx * SZ_2K);
+	hyp_spectre_vector_selector[slot] = __kvm_vector_slot2addr(base, slot);
 }
 
 static int kvm_init_vector_slots(void)
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 15/32] KVM: arm64: Prepare the creation of s1 mappings at EL2
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

When memory protection is enabled, the EL2 code needs the ability to
create and manage its own page-table. To do so, introduce a new set of
hypercalls to bootstrap a memory management system at EL2.

This leads to the following boot flow in nVHE Protected mode:

 1. the host allocates memory for the hypervisor very early on, using
    the memblock API;

 2. the host creates a set of stage 1 page-table for EL2, installs the
    EL2 vectors, and issues the __pkvm_init hypercall;

 3. during __pkvm_init, the hypervisor re-creates its stage 1 page-table
    and stores it in the memory pool provided by the host;

 4. the hypervisor then extends its stage 1 mappings to include a
    vmemmap in the EL2 VA space, hence allowing to use the buddy
    allocator introduced in a previous patch;

 5. the hypervisor jumps back in the idmap page, switches from the
    host-provided page-table to the new one, and wraps up its
    initialization by enabling the new allocator, before returning to
    the host.

 6. the host can free the now unused page-table created for EL2, and
    will now need to issue hypercalls to make changes to the EL2 stage 1
    mappings instead of modifying them directly.

Note that for the sake of simplifying the review, this patch focuses on
the hypervisor side of things. In other words, this only implements the
new hypercalls, but does not make use of them from the host yet. The
host-side changes will follow in a subsequent patch.

Credits to Will for __pkvm_init_switch_pgd.

Co-authored-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h     |   4 +
 arch/arm64/include/asm/kvm_host.h    |   7 +
 arch/arm64/include/asm/kvm_hyp.h     |   8 ++
 arch/arm64/include/asm/kvm_pgtable.h |   2 +
 arch/arm64/kernel/image-vars.h       |  16 +++
 arch/arm64/kvm/hyp/Makefile          |   2 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h |  71 ++++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile     |   4 +-
 arch/arm64/kvm/hyp/nvhe/hyp-init.S   |  31 +++++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  49 +++++++
 arch/arm64/kvm/hyp/nvhe/mm.c         | 173 ++++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c      | 195 +++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         |   2 -
 arch/arm64/kvm/hyp/reserved_mem.c    |  92 +++++++++++++
 arch/arm64/mm/init.c                 |   3 +
 15 files changed, 654 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mm.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/mm.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/setup.c
 create mode 100644 arch/arm64/kvm/hyp/reserved_mem.c

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 22d933e9b59e..db20a9477870 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -57,6 +57,10 @@
 #define __KVM_HOST_SMCCC_FUNC___kvm_get_mdcr_el2		12
 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs		13
 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs		14
+#define __KVM_HOST_SMCCC_FUNC___pkvm_init			15
+#define __KVM_HOST_SMCCC_FUNC___pkvm_create_mappings		16
+#define __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping	17
+#define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 459ee557f87c..b9d45a1f8538 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -781,5 +781,12 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 	(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
 
 int kvm_trng_call(struct kvm_vcpu *vcpu);
+#ifdef CONFIG_KVM
+extern phys_addr_t hyp_mem_base;
+extern phys_addr_t hyp_mem_size;
+void __init kvm_hyp_reserve(void);
+#else
+static inline void kvm_hyp_reserve(void) { }
+#endif
 
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index c0450828378b..ae55351b99a4 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -100,4 +100,12 @@ void __noreturn hyp_panic(void);
 void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
 #endif
 
+#ifdef __KVM_NVHE_HYPERVISOR__
+void __pkvm_init_switch_pgd(phys_addr_t phys, unsigned long size,
+			    phys_addr_t pgd, void *sp, void *cont_fn);
+int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
+		unsigned long *per_cpu_base, u32 hyp_va_bits);
+void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
+#endif
+
 #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 3c306f90f7da..7898610fbeeb 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,6 +11,8 @@
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 
+#define KVM_PGTABLE_MAX_LEVELS		4U
+
 typedef u64 kvm_pte_t;
 
 /**
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 4eb7a15c8b60..940c378fa837 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -115,6 +115,22 @@ KVM_NVHE_ALIAS_HYP(__memcpy, __pi_memcpy);
 KVM_NVHE_ALIAS_HYP(__memset, __pi_memset);
 #endif
 
+/* Kernel memory sections */
+KVM_NVHE_ALIAS(__start_rodata);
+KVM_NVHE_ALIAS(__end_rodata);
+KVM_NVHE_ALIAS(__bss_start);
+KVM_NVHE_ALIAS(__bss_stop);
+
+/* Hyp memory sections */
+KVM_NVHE_ALIAS(__hyp_idmap_text_start);
+KVM_NVHE_ALIAS(__hyp_idmap_text_end);
+KVM_NVHE_ALIAS(__hyp_text_start);
+KVM_NVHE_ALIAS(__hyp_text_end);
+KVM_NVHE_ALIAS(__hyp_bss_start);
+KVM_NVHE_ALIAS(__hyp_bss_end);
+KVM_NVHE_ALIAS(__hyp_rodata_start);
+KVM_NVHE_ALIAS(__hyp_rodata_end);
+
 #endif /* CONFIG_KVM */
 
 #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index 687598e41b21..b726332eec49 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -10,4 +10,4 @@ subdir-ccflags-y := -I$(incdir)				\
 		    -DDISABLE_BRANCH_PROFILING		\
 		    $(DISABLE_STACKLEAK_PLUGIN)
 
-obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o
+obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o reserved_mem.o
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
new file mode 100644
index 000000000000..ac0f7fcffd08
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_MM_H
+#define __KVM_HYP_MM_H
+
+#include <asm/kvm_pgtable.h>
+#include <asm/spectre.h>
+#include <linux/memblock.h>
+#include <linux/types.h>
+
+#include <nvhe/memory.h>
+#include <nvhe/spinlock.h>
+
+#define HYP_MEMBLOCK_REGIONS 128
+extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
+extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
+extern struct kvm_pgtable pkvm_pgtable;
+extern hyp_spinlock_t pkvm_pgd_lock;
+extern struct hyp_pool hpool;
+extern u64 __io_map_base;
+
+int hyp_create_idmap(u32 hyp_va_bits);
+int hyp_map_vectors(void);
+int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
+int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
+int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
+int __pkvm_create_mappings(unsigned long start, unsigned long size,
+			   unsigned long phys, enum kvm_pgtable_prot prot);
+unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
+					    enum kvm_pgtable_prot prot);
+
+static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
+				     unsigned long *start, unsigned long *end)
+{
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	struct hyp_page *p = hyp_phys_to_page(phys);
+
+	*start = (unsigned long)p;
+	*end = *start + nr_pages * sizeof(struct hyp_page);
+	*start = ALIGN_DOWN(*start, PAGE_SIZE);
+	*end = ALIGN(*end, PAGE_SIZE);
+}
+
+static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
+{
+	unsigned long total = 0, i;
+
+	/* Provision the worst case scenario */
+	for (i = 0; i < KVM_PGTABLE_MAX_LEVELS; i++) {
+		nr_pages = DIV_ROUND_UP(nr_pages, PTRS_PER_PTE);
+		total += nr_pages;
+	}
+
+	return total;
+}
+
+static inline unsigned long hyp_s1_pgtable_pages(void)
+{
+	unsigned long res = 0, i;
+
+	/* Cover all of memory with page-granularity */
+	for (i = 0; i < kvm_nvhe_sym(hyp_memblock_nr); i++) {
+		struct memblock_region *reg = &kvm_nvhe_sym(hyp_memory)[i];
+		res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT);
+	}
+
+	/* Allow 1 GiB for private mappings */
+	res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
+
+	return res;
+}
+#endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 0033591553fc..e204ea77ab27 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -14,9 +14,9 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
 	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
-	 cache.o cpufeature.o
+	 cache.o cpufeature.o setup.o mm.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
-	 ../fpsimd.o ../hyp-entry.o ../exception.o
+	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
 obj-y += $(lib-objs)
 
 ##
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index c631e29fb001..bc56ea92b812 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -244,4 +244,35 @@ alternative_else_nop_endif
 
 SYM_CODE_END(__kvm_handle_stub_hvc)
 
+SYM_FUNC_START(__pkvm_init_switch_pgd)
+	/* Turn the MMU off */
+	pre_disable_mmu_workaround
+	mrs	x2, sctlr_el2
+	bic	x3, x2, #SCTLR_ELx_M
+	msr	sctlr_el2, x3
+	isb
+
+	tlbi	alle2
+
+	/* Install the new pgtables */
+	ldr	x3, [x0, #NVHE_INIT_PGD_PA]
+	phys_to_ttbr x4, x3
+alternative_if ARM64_HAS_CNP
+	orr	x4, x4, #TTBR_CNP_BIT
+alternative_else_nop_endif
+	msr	ttbr0_el2, x4
+
+	/* Set the new stack pointer */
+	ldr	x0, [x0, #NVHE_INIT_STACK_HYP_VA]
+	mov	sp, x0
+
+	/* And turn the MMU back on! */
+	dsb	nsh
+	isb
+	msr	sctlr_el2, x2
+	ic	iallu
+	isb
+	ret	x1
+SYM_FUNC_END(__pkvm_init_switch_pgd)
+
 	.popsection
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index f012f8665ecc..ae6503c9be15 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -6,12 +6,14 @@
 
 #include <hyp/switch.h>
 
+#include <asm/pgtable-types.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_host.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 
+#include <nvhe/mm.h>
 #include <nvhe/trap_handler.h>
 
 DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
@@ -106,6 +108,49 @@ static void handle___vgic_v3_restore_aprs(struct kvm_cpu_context *host_ctxt)
 	__vgic_v3_restore_aprs(kern_hyp_va(cpu_if));
 }
 
+static void handle___pkvm_init(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
+	DECLARE_REG(unsigned long, size, host_ctxt, 2);
+	DECLARE_REG(unsigned long, nr_cpus, host_ctxt, 3);
+	DECLARE_REG(unsigned long *, per_cpu_base, host_ctxt, 4);
+	DECLARE_REG(u32, hyp_va_bits, host_ctxt, 5);
+
+	/*
+	 * __pkvm_init() will return only if an error occurred, otherwise it
+	 * will tail-call in __pkvm_init_finalise() which will have to deal
+	 * with the host context directly.
+	 */
+	cpu_reg(host_ctxt, 1) = __pkvm_init(phys, size, nr_cpus, per_cpu_base,
+					    hyp_va_bits);
+}
+
+static void handle___pkvm_cpu_set_vector(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(enum arm64_hyp_spectre_vector, slot, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = pkvm_cpu_set_vector(slot);
+}
+
+static void handle___pkvm_create_mappings(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(unsigned long, start, host_ctxt, 1);
+	DECLARE_REG(unsigned long, size, host_ctxt, 2);
+	DECLARE_REG(unsigned long, phys, host_ctxt, 3);
+	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_create_mappings(start, size, phys, prot);
+}
+
+static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
+	DECLARE_REG(size_t, size, host_ctxt, 2);
+	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
+}
+
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -125,6 +170,10 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__kvm_get_mdcr_el2),
 	HANDLE_FUNC(__vgic_v3_save_aprs),
 	HANDLE_FUNC(__vgic_v3_restore_aprs),
+	HANDLE_FUNC(__pkvm_init),
+	HANDLE_FUNC(__pkvm_cpu_set_vector),
+	HANDLE_FUNC(__pkvm_create_mappings),
+	HANDLE_FUNC(__pkvm_create_private_mapping),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
new file mode 100644
index 000000000000..a8efdf0f9003
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_pgtable.h>
+#include <asm/spectre.h>
+
+#include <nvhe/early_alloc.h>
+#include <nvhe/gfp.h>
+#include <nvhe/memory.h>
+#include <nvhe/mm.h>
+#include <nvhe/spinlock.h>
+
+struct kvm_pgtable pkvm_pgtable;
+hyp_spinlock_t pkvm_pgd_lock;
+u64 __io_map_base;
+
+struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
+unsigned int hyp_memblock_nr;
+
+int __pkvm_create_mappings(unsigned long start, unsigned long size,
+			  unsigned long phys, enum kvm_pgtable_prot prot)
+{
+	int err;
+
+	hyp_spin_lock(&pkvm_pgd_lock);
+	err = kvm_pgtable_hyp_map(&pkvm_pgtable, start, size, phys, prot);
+	hyp_spin_unlock(&pkvm_pgd_lock);
+
+	return err;
+}
+
+unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
+					    enum kvm_pgtable_prot prot)
+{
+	unsigned long addr;
+	int err;
+
+	hyp_spin_lock(&pkvm_pgd_lock);
+
+	size = PAGE_ALIGN(size + offset_in_page(phys));
+	addr = __io_map_base;
+	__io_map_base += size;
+
+	/* Are we overflowing on the vmemmap ? */
+	if (__io_map_base > __hyp_vmemmap) {
+		__io_map_base -= size;
+		addr = (unsigned long)ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
+	if (err) {
+		addr = (unsigned long)ERR_PTR(err);
+		goto out;
+	}
+
+	addr = addr + offset_in_page(phys);
+out:
+	hyp_spin_unlock(&pkvm_pgd_lock);
+
+	return addr;
+}
+
+int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
+{
+	unsigned long start = (unsigned long)from;
+	unsigned long end = (unsigned long)to;
+	unsigned long virt_addr;
+	phys_addr_t phys;
+
+	start = start & PAGE_MASK;
+	end = PAGE_ALIGN(end);
+
+	for (virt_addr = start; virt_addr < end; virt_addr += PAGE_SIZE) {
+		int err;
+
+		phys = hyp_virt_to_phys((void *)virt_addr);
+		err = __pkvm_create_mappings(virt_addr, PAGE_SIZE, phys, prot);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
+{
+	unsigned long start, end;
+
+	hyp_vmemmap_range(phys, size, &start, &end);
+
+	return __pkvm_create_mappings(start, end - start, back, PAGE_HYP);
+}
+
+static void *__hyp_bp_vect_base;
+int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot)
+{
+	void *vector;
+
+	switch (slot) {
+	case HYP_VECTOR_DIRECT: {
+		vector = __kvm_hyp_vector;
+		break;
+	}
+	case HYP_VECTOR_SPECTRE_DIRECT: {
+		vector = __bp_harden_hyp_vecs;
+		break;
+	}
+	case HYP_VECTOR_INDIRECT:
+	case HYP_VECTOR_SPECTRE_INDIRECT: {
+		vector = (void *)__hyp_bp_vect_base;
+		break;
+	}
+	default:
+		return -EINVAL;
+	}
+
+	vector = __kvm_vector_slot2addr(vector, slot);
+	*this_cpu_ptr(&kvm_hyp_vector) = (unsigned long)vector;
+
+	return 0;
+}
+
+int hyp_map_vectors(void)
+{
+	phys_addr_t phys;
+	void *bp_base;
+
+	if (!cpus_have_const_cap(ARM64_SPECTRE_V3A))
+		return 0;
+
+	phys = __hyp_pa(__bp_harden_hyp_vecs);
+	bp_base = (void *)__pkvm_create_private_mapping(phys,
+							__BP_HARDEN_HYP_VECS_SZ,
+							PAGE_HYP_EXEC);
+	if (IS_ERR_OR_NULL(bp_base))
+		return PTR_ERR(bp_base);
+
+	__hyp_bp_vect_base = bp_base;
+
+	return 0;
+}
+
+int hyp_create_idmap(u32 hyp_va_bits)
+{
+	unsigned long start, end;
+
+	start = hyp_virt_to_phys((void *)__hyp_idmap_text_start);
+	start = ALIGN_DOWN(start, PAGE_SIZE);
+
+	end = hyp_virt_to_phys((void *)__hyp_idmap_text_end);
+	end = ALIGN(end, PAGE_SIZE);
+
+	/*
+	 * One half of the VA space is reserved to linearly map portions of
+	 * memory -- see va_layout.c for more details. The other half of the VA
+	 * space contains the trampoline page, and needs some care. Split that
+	 * second half in two and find the quarter of VA space not conflicting
+	 * with the idmap to place the IOs and the vmemmap. IOs use the lower
+	 * half of the quarter and the vmemmap the upper half.
+	 */
+	__io_map_base = start & BIT(hyp_va_bits - 2);
+	__io_map_base ^= BIT(hyp_va_bits - 2);
+	__hyp_vmemmap = __io_map_base | BIT(hyp_va_bits - 3);
+
+	return __pkvm_create_mappings(start, end - start, start, PAGE_HYP_EXEC);
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
new file mode 100644
index 000000000000..178ec06f2b49
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_pgtable.h>
+
+#include <nvhe/early_alloc.h>
+#include <nvhe/gfp.h>
+#include <nvhe/memory.h>
+#include <nvhe/mm.h>
+#include <nvhe/trap_handler.h>
+
+struct hyp_pool hpool;
+struct kvm_pgtable_mm_ops pkvm_pgtable_mm_ops;
+unsigned long hyp_nr_cpus;
+
+#define hyp_percpu_size ((unsigned long)__per_cpu_end - \
+			 (unsigned long)__per_cpu_start)
+
+static void *vmemmap_base;
+static void *hyp_pgt_base;
+
+static int divide_memory_pool(void *virt, unsigned long size)
+{
+	unsigned long vstart, vend, nr_pages;
+
+	hyp_early_alloc_init(virt, size);
+
+	hyp_vmemmap_range(__hyp_pa(virt), size, &vstart, &vend);
+	nr_pages = (vend - vstart) >> PAGE_SHIFT;
+	vmemmap_base = hyp_early_alloc_contig(nr_pages);
+	if (!vmemmap_base)
+		return -ENOMEM;
+
+	nr_pages = hyp_s1_pgtable_pages();
+	hyp_pgt_base = hyp_early_alloc_contig(nr_pages);
+	if (!hyp_pgt_base)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
+				 unsigned long *per_cpu_base,
+				 u32 hyp_va_bits)
+{
+	void *start, *end, *virt = hyp_phys_to_virt(phys);
+	unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
+	int ret, i;
+
+	/* Recreate the hyp page-table using the early page allocator */
+	hyp_early_alloc_init(hyp_pgt_base, pgt_size);
+	ret = kvm_pgtable_hyp_init(&pkvm_pgtable, hyp_va_bits,
+				   &hyp_early_alloc_mm_ops);
+	if (ret)
+		return ret;
+
+	ret = hyp_create_idmap(hyp_va_bits);
+	if (ret)
+		return ret;
+
+	ret = hyp_map_vectors();
+	if (ret)
+		return ret;
+
+	ret = hyp_back_vmemmap(phys, size, hyp_virt_to_phys(vmemmap_base));
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__hyp_text_start, __hyp_text_end, PAGE_HYP_EXEC);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__start_rodata, __end_rodata, PAGE_HYP_RO);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__hyp_rodata_start, __hyp_rodata_end, PAGE_HYP_RO);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__hyp_bss_start, __hyp_bss_end, PAGE_HYP);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, PAGE_HYP_RO);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(virt, virt + size, PAGE_HYP);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < hyp_nr_cpus; i++) {
+		start = (void *)kern_hyp_va(per_cpu_base[i]);
+		end = start + PAGE_ALIGN(hyp_percpu_size);
+		ret = pkvm_create_mappings(start, end, PAGE_HYP);
+		if (ret)
+			return ret;
+
+		end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
+		start = end - PAGE_SIZE;
+		ret = pkvm_create_mappings(start, end, PAGE_HYP);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static void update_nvhe_init_params(void)
+{
+	struct kvm_nvhe_init_params *params;
+	unsigned long i;
+
+	for (i = 0; i < hyp_nr_cpus; i++) {
+		params = per_cpu_ptr(&kvm_init_params, i);
+		params->pgd_pa = __hyp_pa(pkvm_pgtable.pgd);
+		__flush_dcache_area(params, sizeof(*params));
+	}
+}
+
+static void *hyp_zalloc_hyp_page(void *arg)
+{
+	return hyp_alloc_pages(&hpool, 0);
+}
+
+void __noreturn __pkvm_init_finalise(void)
+{
+	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
+	struct kvm_cpu_context *host_ctxt = &host_data->host_ctxt;
+	unsigned long nr_pages, reserved_pages, pfn;
+	int ret;
+
+	/* Now that the vmemmap is backed, install the full-fledged allocator */
+	pfn = hyp_virt_to_pfn(hyp_pgt_base);
+	nr_pages = hyp_s1_pgtable_pages();
+	reserved_pages = hyp_early_alloc_nr_used_pages();
+	ret = hyp_pool_init(&hpool, pfn, nr_pages, reserved_pages);
+	if (ret)
+		goto out;
+
+	pkvm_pgtable_mm_ops.zalloc_page = hyp_zalloc_hyp_page;
+	pkvm_pgtable_mm_ops.phys_to_virt = hyp_phys_to_virt;
+	pkvm_pgtable_mm_ops.virt_to_phys = hyp_virt_to_phys;
+	pkvm_pgtable_mm_ops.get_page = hyp_get_page;
+	pkvm_pgtable_mm_ops.put_page = hyp_put_page;
+	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;
+
+out:
+	/*
+	 * We tail-called to here from handle___pkvm_init() and will not return,
+	 * so make sure to propagate the return value to the host.
+	 */
+	cpu_reg(host_ctxt, 1) = ret;
+
+	__host_enter(host_ctxt);
+}
+
+int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
+		unsigned long *per_cpu_base, u32 hyp_va_bits)
+{
+	struct kvm_nvhe_init_params *params;
+	void *virt = hyp_phys_to_virt(phys);
+	void (*fn)(phys_addr_t params_pa, void *finalize_fn_va);
+	int ret;
+
+	if (phys % PAGE_SIZE || size % PAGE_SIZE)
+		return -EINVAL;
+
+	hyp_spin_lock_init(&pkvm_pgd_lock);
+	hyp_nr_cpus = nr_cpus;
+
+	ret = divide_memory_pool(virt, size);
+	if (ret)
+		return ret;
+
+	ret = recreate_hyp_mappings(phys, size, per_cpu_base, hyp_va_bits);
+	if (ret)
+		return ret;
+
+	update_nvhe_init_params();
+
+	/* Jump in the idmap page to switch to the new page-tables */
+	params = this_cpu_ptr(&kvm_init_params);
+	fn = (typeof(fn))__hyp_pa(__pkvm_init_switch_pgd);
+	fn(__hyp_pa(params), __pkvm_init_finalise);
+
+	unreachable();
+}
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b975a67d1f85..7ce0969203e8 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -10,8 +10,6 @@
 #include <linux/bitfield.h>
 #include <asm/kvm_pgtable.h>
 
-#define KVM_PGTABLE_MAX_LEVELS		4U
-
 #define KVM_PTE_VALID			BIT(0)
 
 #define KVM_PTE_TYPE			BIT(1)
diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
new file mode 100644
index 000000000000..9bc6a6d27904
--- /dev/null
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/memblock.h>
+
+#include <asm/kvm_host.h>
+
+#include <nvhe/memory.h>
+#include <nvhe/mm.h>
+
+static struct memblock_region *hyp_memory = kvm_nvhe_sym(hyp_memory);
+static unsigned int *hyp_memblock_nr_ptr = &kvm_nvhe_sym(hyp_memblock_nr);
+
+phys_addr_t hyp_mem_base;
+phys_addr_t hyp_mem_size;
+
+static int __init register_memblock_regions(void)
+{
+	struct memblock_region *reg;
+
+	for_each_mem_region(reg) {
+		if (*hyp_memblock_nr_ptr >= HYP_MEMBLOCK_REGIONS)
+			return -ENOMEM;
+
+		hyp_memory[*hyp_memblock_nr_ptr] = *reg;
+		(*hyp_memblock_nr_ptr)++;
+	}
+
+	return 0;
+}
+
+void __init kvm_hyp_reserve(void)
+{
+	u64 nr_pages, prev, hyp_mem_pages = 0;
+	int ret;
+
+	if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
+		return;
+
+	if (kvm_get_mode() != KVM_MODE_PROTECTED)
+		return;
+
+	ret = register_memblock_regions();
+	if (ret) {
+		*hyp_memblock_nr_ptr = 0;
+		kvm_err("Failed to register hyp memblocks: %d\n", ret);
+		return;
+	}
+
+	hyp_mem_pages += hyp_s1_pgtable_pages();
+
+	/*
+	 * The hyp_vmemmap needs to be backed by pages, but these pages
+	 * themselves need to be present in the vmemmap, so compute the number
+	 * of pages needed by looking for a fixed point.
+	 */
+	nr_pages = 0;
+	do {
+		prev = nr_pages;
+		nr_pages = hyp_mem_pages + prev;
+		nr_pages = DIV_ROUND_UP(nr_pages * sizeof(struct hyp_page), PAGE_SIZE);
+		nr_pages += __hyp_pgtable_max_pages(nr_pages);
+	} while (nr_pages != prev);
+	hyp_mem_pages += nr_pages;
+
+	/*
+	 * Try to allocate a PMD-aligned region to reduce TLB pressure once
+	 * this is unmapped from the host stage-2, and fallback to PAGE_SIZE.
+	 */
+	hyp_mem_size = hyp_mem_pages << PAGE_SHIFT;
+	hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(),
+					      ALIGN(hyp_mem_size, PMD_SIZE),
+					      PMD_SIZE);
+	if (!hyp_mem_base)
+		hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(),
+						      hyp_mem_size, PAGE_SIZE);
+	else
+		hyp_mem_size = ALIGN(hyp_mem_size, PMD_SIZE);
+
+	if (!hyp_mem_base) {
+		kvm_err("Failed to reserve hyp memory\n");
+		return;
+	}
+	memblock_reserve(hyp_mem_base, hyp_mem_size);
+
+	kvm_info("Reserved %lld MiB at 0x%llx\n", hyp_mem_size >> 20,
+		 hyp_mem_base);
+}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 0ace5e68efba..2b1b41b1ff9b 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -35,6 +35,7 @@
 #include <asm/fixmap.h>
 #include <asm/kasan.h>
 #include <asm/kernel-pgtable.h>
+#include <asm/kvm_host.h>
 #include <asm/memory.h>
 #include <asm/numa.h>
 #include <asm/sections.h>
@@ -429,6 +430,8 @@ void __init bootmem_init(void)
 
 	dma_pernuma_cma_reserve();
 
+	kvm_hyp_reserve();
+
 	/*
 	 * sparse_init() tries to allocate memory from memblock, so must be
 	 * done after the fixed reservations
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 15/32] KVM: arm64: Prepare the creation of s1 mappings at EL2
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

When memory protection is enabled, the EL2 code needs the ability to
create and manage its own page-table. To do so, introduce a new set of
hypercalls to bootstrap a memory management system at EL2.

This leads to the following boot flow in nVHE Protected mode:

 1. the host allocates memory for the hypervisor very early on, using
    the memblock API;

 2. the host creates a set of stage 1 page-table for EL2, installs the
    EL2 vectors, and issues the __pkvm_init hypercall;

 3. during __pkvm_init, the hypervisor re-creates its stage 1 page-table
    and stores it in the memory pool provided by the host;

 4. the hypervisor then extends its stage 1 mappings to include a
    vmemmap in the EL2 VA space, hence allowing to use the buddy
    allocator introduced in a previous patch;

 5. the hypervisor jumps back in the idmap page, switches from the
    host-provided page-table to the new one, and wraps up its
    initialization by enabling the new allocator, before returning to
    the host.

 6. the host can free the now unused page-table created for EL2, and
    will now need to issue hypercalls to make changes to the EL2 stage 1
    mappings instead of modifying them directly.

Note that for the sake of simplifying the review, this patch focuses on
the hypervisor side of things. In other words, this only implements the
new hypercalls, but does not make use of them from the host yet. The
host-side changes will follow in a subsequent patch.

Credits to Will for __pkvm_init_switch_pgd.

Co-authored-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h     |   4 +
 arch/arm64/include/asm/kvm_host.h    |   7 +
 arch/arm64/include/asm/kvm_hyp.h     |   8 ++
 arch/arm64/include/asm/kvm_pgtable.h |   2 +
 arch/arm64/kernel/image-vars.h       |  16 +++
 arch/arm64/kvm/hyp/Makefile          |   2 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h |  71 ++++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile     |   4 +-
 arch/arm64/kvm/hyp/nvhe/hyp-init.S   |  31 +++++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  49 +++++++
 arch/arm64/kvm/hyp/nvhe/mm.c         | 173 ++++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c      | 195 +++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         |   2 -
 arch/arm64/kvm/hyp/reserved_mem.c    |  92 +++++++++++++
 arch/arm64/mm/init.c                 |   3 +
 15 files changed, 654 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mm.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/mm.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/setup.c
 create mode 100644 arch/arm64/kvm/hyp/reserved_mem.c

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 22d933e9b59e..db20a9477870 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -57,6 +57,10 @@
 #define __KVM_HOST_SMCCC_FUNC___kvm_get_mdcr_el2		12
 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs		13
 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs		14
+#define __KVM_HOST_SMCCC_FUNC___pkvm_init			15
+#define __KVM_HOST_SMCCC_FUNC___pkvm_create_mappings		16
+#define __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping	17
+#define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 459ee557f87c..b9d45a1f8538 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -781,5 +781,12 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 	(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
 
 int kvm_trng_call(struct kvm_vcpu *vcpu);
+#ifdef CONFIG_KVM
+extern phys_addr_t hyp_mem_base;
+extern phys_addr_t hyp_mem_size;
+void __init kvm_hyp_reserve(void);
+#else
+static inline void kvm_hyp_reserve(void) { }
+#endif
 
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index c0450828378b..ae55351b99a4 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -100,4 +100,12 @@ void __noreturn hyp_panic(void);
 void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
 #endif
 
+#ifdef __KVM_NVHE_HYPERVISOR__
+void __pkvm_init_switch_pgd(phys_addr_t phys, unsigned long size,
+			    phys_addr_t pgd, void *sp, void *cont_fn);
+int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
+		unsigned long *per_cpu_base, u32 hyp_va_bits);
+void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
+#endif
+
 #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 3c306f90f7da..7898610fbeeb 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,6 +11,8 @@
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 
+#define KVM_PGTABLE_MAX_LEVELS		4U
+
 typedef u64 kvm_pte_t;
 
 /**
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 4eb7a15c8b60..940c378fa837 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -115,6 +115,22 @@ KVM_NVHE_ALIAS_HYP(__memcpy, __pi_memcpy);
 KVM_NVHE_ALIAS_HYP(__memset, __pi_memset);
 #endif
 
+/* Kernel memory sections */
+KVM_NVHE_ALIAS(__start_rodata);
+KVM_NVHE_ALIAS(__end_rodata);
+KVM_NVHE_ALIAS(__bss_start);
+KVM_NVHE_ALIAS(__bss_stop);
+
+/* Hyp memory sections */
+KVM_NVHE_ALIAS(__hyp_idmap_text_start);
+KVM_NVHE_ALIAS(__hyp_idmap_text_end);
+KVM_NVHE_ALIAS(__hyp_text_start);
+KVM_NVHE_ALIAS(__hyp_text_end);
+KVM_NVHE_ALIAS(__hyp_bss_start);
+KVM_NVHE_ALIAS(__hyp_bss_end);
+KVM_NVHE_ALIAS(__hyp_rodata_start);
+KVM_NVHE_ALIAS(__hyp_rodata_end);
+
 #endif /* CONFIG_KVM */
 
 #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index 687598e41b21..b726332eec49 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -10,4 +10,4 @@ subdir-ccflags-y := -I$(incdir)				\
 		    -DDISABLE_BRANCH_PROFILING		\
 		    $(DISABLE_STACKLEAK_PLUGIN)
 
-obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o
+obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o reserved_mem.o
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
new file mode 100644
index 000000000000..ac0f7fcffd08
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_MM_H
+#define __KVM_HYP_MM_H
+
+#include <asm/kvm_pgtable.h>
+#include <asm/spectre.h>
+#include <linux/memblock.h>
+#include <linux/types.h>
+
+#include <nvhe/memory.h>
+#include <nvhe/spinlock.h>
+
+#define HYP_MEMBLOCK_REGIONS 128
+extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
+extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
+extern struct kvm_pgtable pkvm_pgtable;
+extern hyp_spinlock_t pkvm_pgd_lock;
+extern struct hyp_pool hpool;
+extern u64 __io_map_base;
+
+int hyp_create_idmap(u32 hyp_va_bits);
+int hyp_map_vectors(void);
+int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
+int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
+int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
+int __pkvm_create_mappings(unsigned long start, unsigned long size,
+			   unsigned long phys, enum kvm_pgtable_prot prot);
+unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
+					    enum kvm_pgtable_prot prot);
+
+static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
+				     unsigned long *start, unsigned long *end)
+{
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	struct hyp_page *p = hyp_phys_to_page(phys);
+
+	*start = (unsigned long)p;
+	*end = *start + nr_pages * sizeof(struct hyp_page);
+	*start = ALIGN_DOWN(*start, PAGE_SIZE);
+	*end = ALIGN(*end, PAGE_SIZE);
+}
+
+static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
+{
+	unsigned long total = 0, i;
+
+	/* Provision the worst case scenario */
+	for (i = 0; i < KVM_PGTABLE_MAX_LEVELS; i++) {
+		nr_pages = DIV_ROUND_UP(nr_pages, PTRS_PER_PTE);
+		total += nr_pages;
+	}
+
+	return total;
+}
+
+static inline unsigned long hyp_s1_pgtable_pages(void)
+{
+	unsigned long res = 0, i;
+
+	/* Cover all of memory with page-granularity */
+	for (i = 0; i < kvm_nvhe_sym(hyp_memblock_nr); i++) {
+		struct memblock_region *reg = &kvm_nvhe_sym(hyp_memory)[i];
+		res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT);
+	}
+
+	/* Allow 1 GiB for private mappings */
+	res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
+
+	return res;
+}
+#endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 0033591553fc..e204ea77ab27 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -14,9 +14,9 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
 	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
-	 cache.o cpufeature.o
+	 cache.o cpufeature.o setup.o mm.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
-	 ../fpsimd.o ../hyp-entry.o ../exception.o
+	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
 obj-y += $(lib-objs)
 
 ##
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index c631e29fb001..bc56ea92b812 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -244,4 +244,35 @@ alternative_else_nop_endif
 
 SYM_CODE_END(__kvm_handle_stub_hvc)
 
+SYM_FUNC_START(__pkvm_init_switch_pgd)
+	/* Turn the MMU off */
+	pre_disable_mmu_workaround
+	mrs	x2, sctlr_el2
+	bic	x3, x2, #SCTLR_ELx_M
+	msr	sctlr_el2, x3
+	isb
+
+	tlbi	alle2
+
+	/* Install the new pgtables */
+	ldr	x3, [x0, #NVHE_INIT_PGD_PA]
+	phys_to_ttbr x4, x3
+alternative_if ARM64_HAS_CNP
+	orr	x4, x4, #TTBR_CNP_BIT
+alternative_else_nop_endif
+	msr	ttbr0_el2, x4
+
+	/* Set the new stack pointer */
+	ldr	x0, [x0, #NVHE_INIT_STACK_HYP_VA]
+	mov	sp, x0
+
+	/* And turn the MMU back on! */
+	dsb	nsh
+	isb
+	msr	sctlr_el2, x2
+	ic	iallu
+	isb
+	ret	x1
+SYM_FUNC_END(__pkvm_init_switch_pgd)
+
 	.popsection
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index f012f8665ecc..ae6503c9be15 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -6,12 +6,14 @@
 
 #include <hyp/switch.h>
 
+#include <asm/pgtable-types.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_host.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 
+#include <nvhe/mm.h>
 #include <nvhe/trap_handler.h>
 
 DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
@@ -106,6 +108,49 @@ static void handle___vgic_v3_restore_aprs(struct kvm_cpu_context *host_ctxt)
 	__vgic_v3_restore_aprs(kern_hyp_va(cpu_if));
 }
 
+static void handle___pkvm_init(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
+	DECLARE_REG(unsigned long, size, host_ctxt, 2);
+	DECLARE_REG(unsigned long, nr_cpus, host_ctxt, 3);
+	DECLARE_REG(unsigned long *, per_cpu_base, host_ctxt, 4);
+	DECLARE_REG(u32, hyp_va_bits, host_ctxt, 5);
+
+	/*
+	 * __pkvm_init() will return only if an error occurred, otherwise it
+	 * will tail-call in __pkvm_init_finalise() which will have to deal
+	 * with the host context directly.
+	 */
+	cpu_reg(host_ctxt, 1) = __pkvm_init(phys, size, nr_cpus, per_cpu_base,
+					    hyp_va_bits);
+}
+
+static void handle___pkvm_cpu_set_vector(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(enum arm64_hyp_spectre_vector, slot, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = pkvm_cpu_set_vector(slot);
+}
+
+static void handle___pkvm_create_mappings(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(unsigned long, start, host_ctxt, 1);
+	DECLARE_REG(unsigned long, size, host_ctxt, 2);
+	DECLARE_REG(unsigned long, phys, host_ctxt, 3);
+	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_create_mappings(start, size, phys, prot);
+}
+
+static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
+	DECLARE_REG(size_t, size, host_ctxt, 2);
+	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
+}
+
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -125,6 +170,10 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__kvm_get_mdcr_el2),
 	HANDLE_FUNC(__vgic_v3_save_aprs),
 	HANDLE_FUNC(__vgic_v3_restore_aprs),
+	HANDLE_FUNC(__pkvm_init),
+	HANDLE_FUNC(__pkvm_cpu_set_vector),
+	HANDLE_FUNC(__pkvm_create_mappings),
+	HANDLE_FUNC(__pkvm_create_private_mapping),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
new file mode 100644
index 000000000000..a8efdf0f9003
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_pgtable.h>
+#include <asm/spectre.h>
+
+#include <nvhe/early_alloc.h>
+#include <nvhe/gfp.h>
+#include <nvhe/memory.h>
+#include <nvhe/mm.h>
+#include <nvhe/spinlock.h>
+
+struct kvm_pgtable pkvm_pgtable;
+hyp_spinlock_t pkvm_pgd_lock;
+u64 __io_map_base;
+
+struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
+unsigned int hyp_memblock_nr;
+
+int __pkvm_create_mappings(unsigned long start, unsigned long size,
+			  unsigned long phys, enum kvm_pgtable_prot prot)
+{
+	int err;
+
+	hyp_spin_lock(&pkvm_pgd_lock);
+	err = kvm_pgtable_hyp_map(&pkvm_pgtable, start, size, phys, prot);
+	hyp_spin_unlock(&pkvm_pgd_lock);
+
+	return err;
+}
+
+unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
+					    enum kvm_pgtable_prot prot)
+{
+	unsigned long addr;
+	int err;
+
+	hyp_spin_lock(&pkvm_pgd_lock);
+
+	size = PAGE_ALIGN(size + offset_in_page(phys));
+	addr = __io_map_base;
+	__io_map_base += size;
+
+	/* Are we overflowing on the vmemmap ? */
+	if (__io_map_base > __hyp_vmemmap) {
+		__io_map_base -= size;
+		addr = (unsigned long)ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
+	if (err) {
+		addr = (unsigned long)ERR_PTR(err);
+		goto out;
+	}
+
+	addr = addr + offset_in_page(phys);
+out:
+	hyp_spin_unlock(&pkvm_pgd_lock);
+
+	return addr;
+}
+
+int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
+{
+	unsigned long start = (unsigned long)from;
+	unsigned long end = (unsigned long)to;
+	unsigned long virt_addr;
+	phys_addr_t phys;
+
+	start = start & PAGE_MASK;
+	end = PAGE_ALIGN(end);
+
+	for (virt_addr = start; virt_addr < end; virt_addr += PAGE_SIZE) {
+		int err;
+
+		phys = hyp_virt_to_phys((void *)virt_addr);
+		err = __pkvm_create_mappings(virt_addr, PAGE_SIZE, phys, prot);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
+{
+	unsigned long start, end;
+
+	hyp_vmemmap_range(phys, size, &start, &end);
+
+	return __pkvm_create_mappings(start, end - start, back, PAGE_HYP);
+}
+
+static void *__hyp_bp_vect_base;
+int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot)
+{
+	void *vector;
+
+	switch (slot) {
+	case HYP_VECTOR_DIRECT: {
+		vector = __kvm_hyp_vector;
+		break;
+	}
+	case HYP_VECTOR_SPECTRE_DIRECT: {
+		vector = __bp_harden_hyp_vecs;
+		break;
+	}
+	case HYP_VECTOR_INDIRECT:
+	case HYP_VECTOR_SPECTRE_INDIRECT: {
+		vector = (void *)__hyp_bp_vect_base;
+		break;
+	}
+	default:
+		return -EINVAL;
+	}
+
+	vector = __kvm_vector_slot2addr(vector, slot);
+	*this_cpu_ptr(&kvm_hyp_vector) = (unsigned long)vector;
+
+	return 0;
+}
+
+int hyp_map_vectors(void)
+{
+	phys_addr_t phys;
+	void *bp_base;
+
+	if (!cpus_have_const_cap(ARM64_SPECTRE_V3A))
+		return 0;
+
+	phys = __hyp_pa(__bp_harden_hyp_vecs);
+	bp_base = (void *)__pkvm_create_private_mapping(phys,
+							__BP_HARDEN_HYP_VECS_SZ,
+							PAGE_HYP_EXEC);
+	if (IS_ERR_OR_NULL(bp_base))
+		return PTR_ERR(bp_base);
+
+	__hyp_bp_vect_base = bp_base;
+
+	return 0;
+}
+
+int hyp_create_idmap(u32 hyp_va_bits)
+{
+	unsigned long start, end;
+
+	start = hyp_virt_to_phys((void *)__hyp_idmap_text_start);
+	start = ALIGN_DOWN(start, PAGE_SIZE);
+
+	end = hyp_virt_to_phys((void *)__hyp_idmap_text_end);
+	end = ALIGN(end, PAGE_SIZE);
+
+	/*
+	 * One half of the VA space is reserved to linearly map portions of
+	 * memory -- see va_layout.c for more details. The other half of the VA
+	 * space contains the trampoline page, and needs some care. Split that
+	 * second half in two and find the quarter of VA space not conflicting
+	 * with the idmap to place the IOs and the vmemmap. IOs use the lower
+	 * half of the quarter and the vmemmap the upper half.
+	 */
+	__io_map_base = start & BIT(hyp_va_bits - 2);
+	__io_map_base ^= BIT(hyp_va_bits - 2);
+	__hyp_vmemmap = __io_map_base | BIT(hyp_va_bits - 3);
+
+	return __pkvm_create_mappings(start, end - start, start, PAGE_HYP_EXEC);
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
new file mode 100644
index 000000000000..178ec06f2b49
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_pgtable.h>
+
+#include <nvhe/early_alloc.h>
+#include <nvhe/gfp.h>
+#include <nvhe/memory.h>
+#include <nvhe/mm.h>
+#include <nvhe/trap_handler.h>
+
+struct hyp_pool hpool;
+struct kvm_pgtable_mm_ops pkvm_pgtable_mm_ops;
+unsigned long hyp_nr_cpus;
+
+#define hyp_percpu_size ((unsigned long)__per_cpu_end - \
+			 (unsigned long)__per_cpu_start)
+
+static void *vmemmap_base;
+static void *hyp_pgt_base;
+
+static int divide_memory_pool(void *virt, unsigned long size)
+{
+	unsigned long vstart, vend, nr_pages;
+
+	hyp_early_alloc_init(virt, size);
+
+	hyp_vmemmap_range(__hyp_pa(virt), size, &vstart, &vend);
+	nr_pages = (vend - vstart) >> PAGE_SHIFT;
+	vmemmap_base = hyp_early_alloc_contig(nr_pages);
+	if (!vmemmap_base)
+		return -ENOMEM;
+
+	nr_pages = hyp_s1_pgtable_pages();
+	hyp_pgt_base = hyp_early_alloc_contig(nr_pages);
+	if (!hyp_pgt_base)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
+				 unsigned long *per_cpu_base,
+				 u32 hyp_va_bits)
+{
+	void *start, *end, *virt = hyp_phys_to_virt(phys);
+	unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
+	int ret, i;
+
+	/* Recreate the hyp page-table using the early page allocator */
+	hyp_early_alloc_init(hyp_pgt_base, pgt_size);
+	ret = kvm_pgtable_hyp_init(&pkvm_pgtable, hyp_va_bits,
+				   &hyp_early_alloc_mm_ops);
+	if (ret)
+		return ret;
+
+	ret = hyp_create_idmap(hyp_va_bits);
+	if (ret)
+		return ret;
+
+	ret = hyp_map_vectors();
+	if (ret)
+		return ret;
+
+	ret = hyp_back_vmemmap(phys, size, hyp_virt_to_phys(vmemmap_base));
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__hyp_text_start, __hyp_text_end, PAGE_HYP_EXEC);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__start_rodata, __end_rodata, PAGE_HYP_RO);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__hyp_rodata_start, __hyp_rodata_end, PAGE_HYP_RO);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__hyp_bss_start, __hyp_bss_end, PAGE_HYP);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, PAGE_HYP_RO);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(virt, virt + size, PAGE_HYP);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < hyp_nr_cpus; i++) {
+		start = (void *)kern_hyp_va(per_cpu_base[i]);
+		end = start + PAGE_ALIGN(hyp_percpu_size);
+		ret = pkvm_create_mappings(start, end, PAGE_HYP);
+		if (ret)
+			return ret;
+
+		end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
+		start = end - PAGE_SIZE;
+		ret = pkvm_create_mappings(start, end, PAGE_HYP);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static void update_nvhe_init_params(void)
+{
+	struct kvm_nvhe_init_params *params;
+	unsigned long i;
+
+	for (i = 0; i < hyp_nr_cpus; i++) {
+		params = per_cpu_ptr(&kvm_init_params, i);
+		params->pgd_pa = __hyp_pa(pkvm_pgtable.pgd);
+		__flush_dcache_area(params, sizeof(*params));
+	}
+}
+
+static void *hyp_zalloc_hyp_page(void *arg)
+{
+	return hyp_alloc_pages(&hpool, 0);
+}
+
+void __noreturn __pkvm_init_finalise(void)
+{
+	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
+	struct kvm_cpu_context *host_ctxt = &host_data->host_ctxt;
+	unsigned long nr_pages, reserved_pages, pfn;
+	int ret;
+
+	/* Now that the vmemmap is backed, install the full-fledged allocator */
+	pfn = hyp_virt_to_pfn(hyp_pgt_base);
+	nr_pages = hyp_s1_pgtable_pages();
+	reserved_pages = hyp_early_alloc_nr_used_pages();
+	ret = hyp_pool_init(&hpool, pfn, nr_pages, reserved_pages);
+	if (ret)
+		goto out;
+
+	pkvm_pgtable_mm_ops.zalloc_page = hyp_zalloc_hyp_page;
+	pkvm_pgtable_mm_ops.phys_to_virt = hyp_phys_to_virt;
+	pkvm_pgtable_mm_ops.virt_to_phys = hyp_virt_to_phys;
+	pkvm_pgtable_mm_ops.get_page = hyp_get_page;
+	pkvm_pgtable_mm_ops.put_page = hyp_put_page;
+	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;
+
+out:
+	/*
+	 * We tail-called to here from handle___pkvm_init() and will not return,
+	 * so make sure to propagate the return value to the host.
+	 */
+	cpu_reg(host_ctxt, 1) = ret;
+
+	__host_enter(host_ctxt);
+}
+
+int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
+		unsigned long *per_cpu_base, u32 hyp_va_bits)
+{
+	struct kvm_nvhe_init_params *params;
+	void *virt = hyp_phys_to_virt(phys);
+	void (*fn)(phys_addr_t params_pa, void *finalize_fn_va);
+	int ret;
+
+	if (phys % PAGE_SIZE || size % PAGE_SIZE)
+		return -EINVAL;
+
+	hyp_spin_lock_init(&pkvm_pgd_lock);
+	hyp_nr_cpus = nr_cpus;
+
+	ret = divide_memory_pool(virt, size);
+	if (ret)
+		return ret;
+
+	ret = recreate_hyp_mappings(phys, size, per_cpu_base, hyp_va_bits);
+	if (ret)
+		return ret;
+
+	update_nvhe_init_params();
+
+	/* Jump in the idmap page to switch to the new page-tables */
+	params = this_cpu_ptr(&kvm_init_params);
+	fn = (typeof(fn))__hyp_pa(__pkvm_init_switch_pgd);
+	fn(__hyp_pa(params), __pkvm_init_finalise);
+
+	unreachable();
+}
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b975a67d1f85..7ce0969203e8 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -10,8 +10,6 @@
 #include <linux/bitfield.h>
 #include <asm/kvm_pgtable.h>
 
-#define KVM_PGTABLE_MAX_LEVELS		4U
-
 #define KVM_PTE_VALID			BIT(0)
 
 #define KVM_PTE_TYPE			BIT(1)
diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
new file mode 100644
index 000000000000..9bc6a6d27904
--- /dev/null
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/memblock.h>
+
+#include <asm/kvm_host.h>
+
+#include <nvhe/memory.h>
+#include <nvhe/mm.h>
+
+static struct memblock_region *hyp_memory = kvm_nvhe_sym(hyp_memory);
+static unsigned int *hyp_memblock_nr_ptr = &kvm_nvhe_sym(hyp_memblock_nr);
+
+phys_addr_t hyp_mem_base;
+phys_addr_t hyp_mem_size;
+
+static int __init register_memblock_regions(void)
+{
+	struct memblock_region *reg;
+
+	for_each_mem_region(reg) {
+		if (*hyp_memblock_nr_ptr >= HYP_MEMBLOCK_REGIONS)
+			return -ENOMEM;
+
+		hyp_memory[*hyp_memblock_nr_ptr] = *reg;
+		(*hyp_memblock_nr_ptr)++;
+	}
+
+	return 0;
+}
+
+void __init kvm_hyp_reserve(void)
+{
+	u64 nr_pages, prev, hyp_mem_pages = 0;
+	int ret;
+
+	if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
+		return;
+
+	if (kvm_get_mode() != KVM_MODE_PROTECTED)
+		return;
+
+	ret = register_memblock_regions();
+	if (ret) {
+		*hyp_memblock_nr_ptr = 0;
+		kvm_err("Failed to register hyp memblocks: %d\n", ret);
+		return;
+	}
+
+	hyp_mem_pages += hyp_s1_pgtable_pages();
+
+	/*
+	 * The hyp_vmemmap needs to be backed by pages, but these pages
+	 * themselves need to be present in the vmemmap, so compute the number
+	 * of pages needed by looking for a fixed point.
+	 */
+	nr_pages = 0;
+	do {
+		prev = nr_pages;
+		nr_pages = hyp_mem_pages + prev;
+		nr_pages = DIV_ROUND_UP(nr_pages * sizeof(struct hyp_page), PAGE_SIZE);
+		nr_pages += __hyp_pgtable_max_pages(nr_pages);
+	} while (nr_pages != prev);
+	hyp_mem_pages += nr_pages;
+
+	/*
+	 * Try to allocate a PMD-aligned region to reduce TLB pressure once
+	 * this is unmapped from the host stage-2, and fallback to PAGE_SIZE.
+	 */
+	hyp_mem_size = hyp_mem_pages << PAGE_SHIFT;
+	hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(),
+					      ALIGN(hyp_mem_size, PMD_SIZE),
+					      PMD_SIZE);
+	if (!hyp_mem_base)
+		hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(),
+						      hyp_mem_size, PAGE_SIZE);
+	else
+		hyp_mem_size = ALIGN(hyp_mem_size, PMD_SIZE);
+
+	if (!hyp_mem_base) {
+		kvm_err("Failed to reserve hyp memory\n");
+		return;
+	}
+	memblock_reserve(hyp_mem_base, hyp_mem_size);
+
+	kvm_info("Reserved %lld MiB at 0x%llx\n", hyp_mem_size >> 20,
+		 hyp_mem_base);
+}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 0ace5e68efba..2b1b41b1ff9b 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -35,6 +35,7 @@
 #include <asm/fixmap.h>
 #include <asm/kasan.h>
 #include <asm/kernel-pgtable.h>
+#include <asm/kvm_host.h>
 #include <asm/memory.h>
 #include <asm/numa.h>
 #include <asm/sections.h>
@@ -429,6 +430,8 @@ void __init bootmem_init(void)
 
 	dma_pernuma_cma_reserve();
 
+	kvm_hyp_reserve();
+
 	/*
 	 * sparse_init() tries to allocate memory from memblock, so must be
 	 * done after the fixed reservations
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 15/32] KVM: arm64: Prepare the creation of s1 mappings at EL2
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

When memory protection is enabled, the EL2 code needs the ability to
create and manage its own page-table. To do so, introduce a new set of
hypercalls to bootstrap a memory management system at EL2.

This leads to the following boot flow in nVHE Protected mode:

 1. the host allocates memory for the hypervisor very early on, using
    the memblock API;

 2. the host creates a set of stage 1 page-table for EL2, installs the
    EL2 vectors, and issues the __pkvm_init hypercall;

 3. during __pkvm_init, the hypervisor re-creates its stage 1 page-table
    and stores it in the memory pool provided by the host;

 4. the hypervisor then extends its stage 1 mappings to include a
    vmemmap in the EL2 VA space, hence allowing to use the buddy
    allocator introduced in a previous patch;

 5. the hypervisor jumps back in the idmap page, switches from the
    host-provided page-table to the new one, and wraps up its
    initialization by enabling the new allocator, before returning to
    the host.

 6. the host can free the now unused page-table created for EL2, and
    will now need to issue hypercalls to make changes to the EL2 stage 1
    mappings instead of modifying them directly.

Note that for the sake of simplifying the review, this patch focuses on
the hypervisor side of things. In other words, this only implements the
new hypercalls, but does not make use of them from the host yet. The
host-side changes will follow in a subsequent patch.

Credits to Will for __pkvm_init_switch_pgd.

Co-authored-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h     |   4 +
 arch/arm64/include/asm/kvm_host.h    |   7 +
 arch/arm64/include/asm/kvm_hyp.h     |   8 ++
 arch/arm64/include/asm/kvm_pgtable.h |   2 +
 arch/arm64/kernel/image-vars.h       |  16 +++
 arch/arm64/kvm/hyp/Makefile          |   2 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h |  71 ++++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile     |   4 +-
 arch/arm64/kvm/hyp/nvhe/hyp-init.S   |  31 +++++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  49 +++++++
 arch/arm64/kvm/hyp/nvhe/mm.c         | 173 ++++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c      | 195 +++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         |   2 -
 arch/arm64/kvm/hyp/reserved_mem.c    |  92 +++++++++++++
 arch/arm64/mm/init.c                 |   3 +
 15 files changed, 654 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mm.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/mm.c
 create mode 100644 arch/arm64/kvm/hyp/nvhe/setup.c
 create mode 100644 arch/arm64/kvm/hyp/reserved_mem.c

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 22d933e9b59e..db20a9477870 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -57,6 +57,10 @@
 #define __KVM_HOST_SMCCC_FUNC___kvm_get_mdcr_el2		12
 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs		13
 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs		14
+#define __KVM_HOST_SMCCC_FUNC___pkvm_init			15
+#define __KVM_HOST_SMCCC_FUNC___pkvm_create_mappings		16
+#define __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping	17
+#define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 459ee557f87c..b9d45a1f8538 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -781,5 +781,12 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 	(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
 
 int kvm_trng_call(struct kvm_vcpu *vcpu);
+#ifdef CONFIG_KVM
+extern phys_addr_t hyp_mem_base;
+extern phys_addr_t hyp_mem_size;
+void __init kvm_hyp_reserve(void);
+#else
+static inline void kvm_hyp_reserve(void) { }
+#endif
 
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index c0450828378b..ae55351b99a4 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -100,4 +100,12 @@ void __noreturn hyp_panic(void);
 void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
 #endif
 
+#ifdef __KVM_NVHE_HYPERVISOR__
+void __pkvm_init_switch_pgd(phys_addr_t phys, unsigned long size,
+			    phys_addr_t pgd, void *sp, void *cont_fn);
+int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
+		unsigned long *per_cpu_base, u32 hyp_va_bits);
+void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
+#endif
+
 #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 3c306f90f7da..7898610fbeeb 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,6 +11,8 @@
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 
+#define KVM_PGTABLE_MAX_LEVELS		4U
+
 typedef u64 kvm_pte_t;
 
 /**
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 4eb7a15c8b60..940c378fa837 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -115,6 +115,22 @@ KVM_NVHE_ALIAS_HYP(__memcpy, __pi_memcpy);
 KVM_NVHE_ALIAS_HYP(__memset, __pi_memset);
 #endif
 
+/* Kernel memory sections */
+KVM_NVHE_ALIAS(__start_rodata);
+KVM_NVHE_ALIAS(__end_rodata);
+KVM_NVHE_ALIAS(__bss_start);
+KVM_NVHE_ALIAS(__bss_stop);
+
+/* Hyp memory sections */
+KVM_NVHE_ALIAS(__hyp_idmap_text_start);
+KVM_NVHE_ALIAS(__hyp_idmap_text_end);
+KVM_NVHE_ALIAS(__hyp_text_start);
+KVM_NVHE_ALIAS(__hyp_text_end);
+KVM_NVHE_ALIAS(__hyp_bss_start);
+KVM_NVHE_ALIAS(__hyp_bss_end);
+KVM_NVHE_ALIAS(__hyp_rodata_start);
+KVM_NVHE_ALIAS(__hyp_rodata_end);
+
 #endif /* CONFIG_KVM */
 
 #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index 687598e41b21..b726332eec49 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -10,4 +10,4 @@ subdir-ccflags-y := -I$(incdir)				\
 		    -DDISABLE_BRANCH_PROFILING		\
 		    $(DISABLE_STACKLEAK_PLUGIN)
 
-obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o
+obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o reserved_mem.o
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
new file mode 100644
index 000000000000..ac0f7fcffd08
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_HYP_MM_H
+#define __KVM_HYP_MM_H
+
+#include <asm/kvm_pgtable.h>
+#include <asm/spectre.h>
+#include <linux/memblock.h>
+#include <linux/types.h>
+
+#include <nvhe/memory.h>
+#include <nvhe/spinlock.h>
+
+#define HYP_MEMBLOCK_REGIONS 128
+extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
+extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
+extern struct kvm_pgtable pkvm_pgtable;
+extern hyp_spinlock_t pkvm_pgd_lock;
+extern struct hyp_pool hpool;
+extern u64 __io_map_base;
+
+int hyp_create_idmap(u32 hyp_va_bits);
+int hyp_map_vectors(void);
+int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
+int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
+int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
+int __pkvm_create_mappings(unsigned long start, unsigned long size,
+			   unsigned long phys, enum kvm_pgtable_prot prot);
+unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
+					    enum kvm_pgtable_prot prot);
+
+static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
+				     unsigned long *start, unsigned long *end)
+{
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	struct hyp_page *p = hyp_phys_to_page(phys);
+
+	*start = (unsigned long)p;
+	*end = *start + nr_pages * sizeof(struct hyp_page);
+	*start = ALIGN_DOWN(*start, PAGE_SIZE);
+	*end = ALIGN(*end, PAGE_SIZE);
+}
+
+static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
+{
+	unsigned long total = 0, i;
+
+	/* Provision the worst case scenario */
+	for (i = 0; i < KVM_PGTABLE_MAX_LEVELS; i++) {
+		nr_pages = DIV_ROUND_UP(nr_pages, PTRS_PER_PTE);
+		total += nr_pages;
+	}
+
+	return total;
+}
+
+static inline unsigned long hyp_s1_pgtable_pages(void)
+{
+	unsigned long res = 0, i;
+
+	/* Cover all of memory with page-granularity */
+	for (i = 0; i < kvm_nvhe_sym(hyp_memblock_nr); i++) {
+		struct memblock_region *reg = &kvm_nvhe_sym(hyp_memory)[i];
+		res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT);
+	}
+
+	/* Allow 1 GiB for private mappings */
+	res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
+
+	return res;
+}
+#endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 0033591553fc..e204ea77ab27 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -14,9 +14,9 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
 	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
-	 cache.o cpufeature.o
+	 cache.o cpufeature.o setup.o mm.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
-	 ../fpsimd.o ../hyp-entry.o ../exception.o
+	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
 obj-y += $(lib-objs)
 
 ##
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index c631e29fb001..bc56ea92b812 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -244,4 +244,35 @@ alternative_else_nop_endif
 
 SYM_CODE_END(__kvm_handle_stub_hvc)
 
+SYM_FUNC_START(__pkvm_init_switch_pgd)
+	/* Turn the MMU off */
+	pre_disable_mmu_workaround
+	mrs	x2, sctlr_el2
+	bic	x3, x2, #SCTLR_ELx_M
+	msr	sctlr_el2, x3
+	isb
+
+	tlbi	alle2
+
+	/* Install the new pgtables */
+	ldr	x3, [x0, #NVHE_INIT_PGD_PA]
+	phys_to_ttbr x4, x3
+alternative_if ARM64_HAS_CNP
+	orr	x4, x4, #TTBR_CNP_BIT
+alternative_else_nop_endif
+	msr	ttbr0_el2, x4
+
+	/* Set the new stack pointer */
+	ldr	x0, [x0, #NVHE_INIT_STACK_HYP_VA]
+	mov	sp, x0
+
+	/* And turn the MMU back on! */
+	dsb	nsh
+	isb
+	msr	sctlr_el2, x2
+	ic	iallu
+	isb
+	ret	x1
+SYM_FUNC_END(__pkvm_init_switch_pgd)
+
 	.popsection
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index f012f8665ecc..ae6503c9be15 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -6,12 +6,14 @@
 
 #include <hyp/switch.h>
 
+#include <asm/pgtable-types.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_host.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 
+#include <nvhe/mm.h>
 #include <nvhe/trap_handler.h>
 
 DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
@@ -106,6 +108,49 @@ static void handle___vgic_v3_restore_aprs(struct kvm_cpu_context *host_ctxt)
 	__vgic_v3_restore_aprs(kern_hyp_va(cpu_if));
 }
 
+static void handle___pkvm_init(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
+	DECLARE_REG(unsigned long, size, host_ctxt, 2);
+	DECLARE_REG(unsigned long, nr_cpus, host_ctxt, 3);
+	DECLARE_REG(unsigned long *, per_cpu_base, host_ctxt, 4);
+	DECLARE_REG(u32, hyp_va_bits, host_ctxt, 5);
+
+	/*
+	 * __pkvm_init() will return only if an error occurred, otherwise it
+	 * will tail-call in __pkvm_init_finalise() which will have to deal
+	 * with the host context directly.
+	 */
+	cpu_reg(host_ctxt, 1) = __pkvm_init(phys, size, nr_cpus, per_cpu_base,
+					    hyp_va_bits);
+}
+
+static void handle___pkvm_cpu_set_vector(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(enum arm64_hyp_spectre_vector, slot, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = pkvm_cpu_set_vector(slot);
+}
+
+static void handle___pkvm_create_mappings(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(unsigned long, start, host_ctxt, 1);
+	DECLARE_REG(unsigned long, size, host_ctxt, 2);
+	DECLARE_REG(unsigned long, phys, host_ctxt, 3);
+	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_create_mappings(start, size, phys, prot);
+}
+
+static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
+	DECLARE_REG(size_t, size, host_ctxt, 2);
+	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
+}
+
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -125,6 +170,10 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__kvm_get_mdcr_el2),
 	HANDLE_FUNC(__vgic_v3_save_aprs),
 	HANDLE_FUNC(__vgic_v3_restore_aprs),
+	HANDLE_FUNC(__pkvm_init),
+	HANDLE_FUNC(__pkvm_cpu_set_vector),
+	HANDLE_FUNC(__pkvm_create_mappings),
+	HANDLE_FUNC(__pkvm_create_private_mapping),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
new file mode 100644
index 000000000000..a8efdf0f9003
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_pgtable.h>
+#include <asm/spectre.h>
+
+#include <nvhe/early_alloc.h>
+#include <nvhe/gfp.h>
+#include <nvhe/memory.h>
+#include <nvhe/mm.h>
+#include <nvhe/spinlock.h>
+
+struct kvm_pgtable pkvm_pgtable;
+hyp_spinlock_t pkvm_pgd_lock;
+u64 __io_map_base;
+
+struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
+unsigned int hyp_memblock_nr;
+
+int __pkvm_create_mappings(unsigned long start, unsigned long size,
+			  unsigned long phys, enum kvm_pgtable_prot prot)
+{
+	int err;
+
+	hyp_spin_lock(&pkvm_pgd_lock);
+	err = kvm_pgtable_hyp_map(&pkvm_pgtable, start, size, phys, prot);
+	hyp_spin_unlock(&pkvm_pgd_lock);
+
+	return err;
+}
+
+unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
+					    enum kvm_pgtable_prot prot)
+{
+	unsigned long addr;
+	int err;
+
+	hyp_spin_lock(&pkvm_pgd_lock);
+
+	size = PAGE_ALIGN(size + offset_in_page(phys));
+	addr = __io_map_base;
+	__io_map_base += size;
+
+	/* Are we overflowing on the vmemmap ? */
+	if (__io_map_base > __hyp_vmemmap) {
+		__io_map_base -= size;
+		addr = (unsigned long)ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
+	if (err) {
+		addr = (unsigned long)ERR_PTR(err);
+		goto out;
+	}
+
+	addr = addr + offset_in_page(phys);
+out:
+	hyp_spin_unlock(&pkvm_pgd_lock);
+
+	return addr;
+}
+
+int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
+{
+	unsigned long start = (unsigned long)from;
+	unsigned long end = (unsigned long)to;
+	unsigned long virt_addr;
+	phys_addr_t phys;
+
+	start = start & PAGE_MASK;
+	end = PAGE_ALIGN(end);
+
+	for (virt_addr = start; virt_addr < end; virt_addr += PAGE_SIZE) {
+		int err;
+
+		phys = hyp_virt_to_phys((void *)virt_addr);
+		err = __pkvm_create_mappings(virt_addr, PAGE_SIZE, phys, prot);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
+{
+	unsigned long start, end;
+
+	hyp_vmemmap_range(phys, size, &start, &end);
+
+	return __pkvm_create_mappings(start, end - start, back, PAGE_HYP);
+}
+
+static void *__hyp_bp_vect_base;
+int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot)
+{
+	void *vector;
+
+	switch (slot) {
+	case HYP_VECTOR_DIRECT: {
+		vector = __kvm_hyp_vector;
+		break;
+	}
+	case HYP_VECTOR_SPECTRE_DIRECT: {
+		vector = __bp_harden_hyp_vecs;
+		break;
+	}
+	case HYP_VECTOR_INDIRECT:
+	case HYP_VECTOR_SPECTRE_INDIRECT: {
+		vector = (void *)__hyp_bp_vect_base;
+		break;
+	}
+	default:
+		return -EINVAL;
+	}
+
+	vector = __kvm_vector_slot2addr(vector, slot);
+	*this_cpu_ptr(&kvm_hyp_vector) = (unsigned long)vector;
+
+	return 0;
+}
+
+int hyp_map_vectors(void)
+{
+	phys_addr_t phys;
+	void *bp_base;
+
+	if (!cpus_have_const_cap(ARM64_SPECTRE_V3A))
+		return 0;
+
+	phys = __hyp_pa(__bp_harden_hyp_vecs);
+	bp_base = (void *)__pkvm_create_private_mapping(phys,
+							__BP_HARDEN_HYP_VECS_SZ,
+							PAGE_HYP_EXEC);
+	if (IS_ERR_OR_NULL(bp_base))
+		return PTR_ERR(bp_base);
+
+	__hyp_bp_vect_base = bp_base;
+
+	return 0;
+}
+
+int hyp_create_idmap(u32 hyp_va_bits)
+{
+	unsigned long start, end;
+
+	start = hyp_virt_to_phys((void *)__hyp_idmap_text_start);
+	start = ALIGN_DOWN(start, PAGE_SIZE);
+
+	end = hyp_virt_to_phys((void *)__hyp_idmap_text_end);
+	end = ALIGN(end, PAGE_SIZE);
+
+	/*
+	 * One half of the VA space is reserved to linearly map portions of
+	 * memory -- see va_layout.c for more details. The other half of the VA
+	 * space contains the trampoline page, and needs some care. Split that
+	 * second half in two and find the quarter of VA space not conflicting
+	 * with the idmap to place the IOs and the vmemmap. IOs use the lower
+	 * half of the quarter and the vmemmap the upper half.
+	 */
+	__io_map_base = start & BIT(hyp_va_bits - 2);
+	__io_map_base ^= BIT(hyp_va_bits - 2);
+	__hyp_vmemmap = __io_map_base | BIT(hyp_va_bits - 3);
+
+	return __pkvm_create_mappings(start, end - start, start, PAGE_HYP_EXEC);
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
new file mode 100644
index 000000000000..178ec06f2b49
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_pgtable.h>
+
+#include <nvhe/early_alloc.h>
+#include <nvhe/gfp.h>
+#include <nvhe/memory.h>
+#include <nvhe/mm.h>
+#include <nvhe/trap_handler.h>
+
+struct hyp_pool hpool;
+struct kvm_pgtable_mm_ops pkvm_pgtable_mm_ops;
+unsigned long hyp_nr_cpus;
+
+#define hyp_percpu_size ((unsigned long)__per_cpu_end - \
+			 (unsigned long)__per_cpu_start)
+
+static void *vmemmap_base;
+static void *hyp_pgt_base;
+
+static int divide_memory_pool(void *virt, unsigned long size)
+{
+	unsigned long vstart, vend, nr_pages;
+
+	hyp_early_alloc_init(virt, size);
+
+	hyp_vmemmap_range(__hyp_pa(virt), size, &vstart, &vend);
+	nr_pages = (vend - vstart) >> PAGE_SHIFT;
+	vmemmap_base = hyp_early_alloc_contig(nr_pages);
+	if (!vmemmap_base)
+		return -ENOMEM;
+
+	nr_pages = hyp_s1_pgtable_pages();
+	hyp_pgt_base = hyp_early_alloc_contig(nr_pages);
+	if (!hyp_pgt_base)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
+				 unsigned long *per_cpu_base,
+				 u32 hyp_va_bits)
+{
+	void *start, *end, *virt = hyp_phys_to_virt(phys);
+	unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
+	int ret, i;
+
+	/* Recreate the hyp page-table using the early page allocator */
+	hyp_early_alloc_init(hyp_pgt_base, pgt_size);
+	ret = kvm_pgtable_hyp_init(&pkvm_pgtable, hyp_va_bits,
+				   &hyp_early_alloc_mm_ops);
+	if (ret)
+		return ret;
+
+	ret = hyp_create_idmap(hyp_va_bits);
+	if (ret)
+		return ret;
+
+	ret = hyp_map_vectors();
+	if (ret)
+		return ret;
+
+	ret = hyp_back_vmemmap(phys, size, hyp_virt_to_phys(vmemmap_base));
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__hyp_text_start, __hyp_text_end, PAGE_HYP_EXEC);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__start_rodata, __end_rodata, PAGE_HYP_RO);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__hyp_rodata_start, __hyp_rodata_end, PAGE_HYP_RO);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__hyp_bss_start, __hyp_bss_end, PAGE_HYP);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, PAGE_HYP_RO);
+	if (ret)
+		return ret;
+
+	ret = pkvm_create_mappings(virt, virt + size, PAGE_HYP);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < hyp_nr_cpus; i++) {
+		start = (void *)kern_hyp_va(per_cpu_base[i]);
+		end = start + PAGE_ALIGN(hyp_percpu_size);
+		ret = pkvm_create_mappings(start, end, PAGE_HYP);
+		if (ret)
+			return ret;
+
+		end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
+		start = end - PAGE_SIZE;
+		ret = pkvm_create_mappings(start, end, PAGE_HYP);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static void update_nvhe_init_params(void)
+{
+	struct kvm_nvhe_init_params *params;
+	unsigned long i;
+
+	for (i = 0; i < hyp_nr_cpus; i++) {
+		params = per_cpu_ptr(&kvm_init_params, i);
+		params->pgd_pa = __hyp_pa(pkvm_pgtable.pgd);
+		__flush_dcache_area(params, sizeof(*params));
+	}
+}
+
+static void *hyp_zalloc_hyp_page(void *arg)
+{
+	return hyp_alloc_pages(&hpool, 0);
+}
+
+void __noreturn __pkvm_init_finalise(void)
+{
+	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
+	struct kvm_cpu_context *host_ctxt = &host_data->host_ctxt;
+	unsigned long nr_pages, reserved_pages, pfn;
+	int ret;
+
+	/* Now that the vmemmap is backed, install the full-fledged allocator */
+	pfn = hyp_virt_to_pfn(hyp_pgt_base);
+	nr_pages = hyp_s1_pgtable_pages();
+	reserved_pages = hyp_early_alloc_nr_used_pages();
+	ret = hyp_pool_init(&hpool, pfn, nr_pages, reserved_pages);
+	if (ret)
+		goto out;
+
+	pkvm_pgtable_mm_ops.zalloc_page = hyp_zalloc_hyp_page;
+	pkvm_pgtable_mm_ops.phys_to_virt = hyp_phys_to_virt;
+	pkvm_pgtable_mm_ops.virt_to_phys = hyp_virt_to_phys;
+	pkvm_pgtable_mm_ops.get_page = hyp_get_page;
+	pkvm_pgtable_mm_ops.put_page = hyp_put_page;
+	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;
+
+out:
+	/*
+	 * We tail-called to here from handle___pkvm_init() and will not return,
+	 * so make sure to propagate the return value to the host.
+	 */
+	cpu_reg(host_ctxt, 1) = ret;
+
+	__host_enter(host_ctxt);
+}
+
+int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
+		unsigned long *per_cpu_base, u32 hyp_va_bits)
+{
+	struct kvm_nvhe_init_params *params;
+	void *virt = hyp_phys_to_virt(phys);
+	void (*fn)(phys_addr_t params_pa, void *finalize_fn_va);
+	int ret;
+
+	if (phys % PAGE_SIZE || size % PAGE_SIZE)
+		return -EINVAL;
+
+	hyp_spin_lock_init(&pkvm_pgd_lock);
+	hyp_nr_cpus = nr_cpus;
+
+	ret = divide_memory_pool(virt, size);
+	if (ret)
+		return ret;
+
+	ret = recreate_hyp_mappings(phys, size, per_cpu_base, hyp_va_bits);
+	if (ret)
+		return ret;
+
+	update_nvhe_init_params();
+
+	/* Jump in the idmap page to switch to the new page-tables */
+	params = this_cpu_ptr(&kvm_init_params);
+	fn = (typeof(fn))__hyp_pa(__pkvm_init_switch_pgd);
+	fn(__hyp_pa(params), __pkvm_init_finalise);
+
+	unreachable();
+}
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b975a67d1f85..7ce0969203e8 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -10,8 +10,6 @@
 #include <linux/bitfield.h>
 #include <asm/kvm_pgtable.h>
 
-#define KVM_PGTABLE_MAX_LEVELS		4U
-
 #define KVM_PTE_VALID			BIT(0)
 
 #define KVM_PTE_TYPE			BIT(1)
diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
new file mode 100644
index 000000000000..9bc6a6d27904
--- /dev/null
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 - Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/memblock.h>
+
+#include <asm/kvm_host.h>
+
+#include <nvhe/memory.h>
+#include <nvhe/mm.h>
+
+static struct memblock_region *hyp_memory = kvm_nvhe_sym(hyp_memory);
+static unsigned int *hyp_memblock_nr_ptr = &kvm_nvhe_sym(hyp_memblock_nr);
+
+phys_addr_t hyp_mem_base;
+phys_addr_t hyp_mem_size;
+
+static int __init register_memblock_regions(void)
+{
+	struct memblock_region *reg;
+
+	for_each_mem_region(reg) {
+		if (*hyp_memblock_nr_ptr >= HYP_MEMBLOCK_REGIONS)
+			return -ENOMEM;
+
+		hyp_memory[*hyp_memblock_nr_ptr] = *reg;
+		(*hyp_memblock_nr_ptr)++;
+	}
+
+	return 0;
+}
+
+void __init kvm_hyp_reserve(void)
+{
+	u64 nr_pages, prev, hyp_mem_pages = 0;
+	int ret;
+
+	if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
+		return;
+
+	if (kvm_get_mode() != KVM_MODE_PROTECTED)
+		return;
+
+	ret = register_memblock_regions();
+	if (ret) {
+		*hyp_memblock_nr_ptr = 0;
+		kvm_err("Failed to register hyp memblocks: %d\n", ret);
+		return;
+	}
+
+	hyp_mem_pages += hyp_s1_pgtable_pages();
+
+	/*
+	 * The hyp_vmemmap needs to be backed by pages, but these pages
+	 * themselves need to be present in the vmemmap, so compute the number
+	 * of pages needed by looking for a fixed point.
+	 */
+	nr_pages = 0;
+	do {
+		prev = nr_pages;
+		nr_pages = hyp_mem_pages + prev;
+		nr_pages = DIV_ROUND_UP(nr_pages * sizeof(struct hyp_page), PAGE_SIZE);
+		nr_pages += __hyp_pgtable_max_pages(nr_pages);
+	} while (nr_pages != prev);
+	hyp_mem_pages += nr_pages;
+
+	/*
+	 * Try to allocate a PMD-aligned region to reduce TLB pressure once
+	 * this is unmapped from the host stage-2, and fallback to PAGE_SIZE.
+	 */
+	hyp_mem_size = hyp_mem_pages << PAGE_SHIFT;
+	hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(),
+					      ALIGN(hyp_mem_size, PMD_SIZE),
+					      PMD_SIZE);
+	if (!hyp_mem_base)
+		hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(),
+						      hyp_mem_size, PAGE_SIZE);
+	else
+		hyp_mem_size = ALIGN(hyp_mem_size, PMD_SIZE);
+
+	if (!hyp_mem_base) {
+		kvm_err("Failed to reserve hyp memory\n");
+		return;
+	}
+	memblock_reserve(hyp_mem_base, hyp_mem_size);
+
+	kvm_info("Reserved %lld MiB at 0x%llx\n", hyp_mem_size >> 20,
+		 hyp_mem_base);
+}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 0ace5e68efba..2b1b41b1ff9b 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -35,6 +35,7 @@
 #include <asm/fixmap.h>
 #include <asm/kasan.h>
 #include <asm/kernel-pgtable.h>
+#include <asm/kvm_host.h>
 #include <asm/memory.h>
 #include <asm/numa.h>
 #include <asm/sections.h>
@@ -429,6 +430,8 @@ void __init bootmem_init(void)
 
 	dma_pernuma_cma_reserve();
 
+	kvm_hyp_reserve();
+
 	/*
 	 * sparse_init() tries to allocate memory from memblock, so must be
 	 * done after the fixed reservations
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 16/32] KVM: arm64: Elevate hypervisor mappings creation at EL2
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Previous commits have introduced infrastructure to enable the EL2 code
to manage its own stage 1 mappings. However, this was preliminary work,
and none of it is currently in use.

Put all of this together by elevating the mapping creation at EL2 when
memory protection is enabled. In this case, the host kernel running
at EL1 still creates _temporary_ EL2 mappings, only used while
initializing the hypervisor, but frees them right after.

As such, all calls to create_hyp_mappings() after kvm init has finished
turn into hypercalls, as the host now has no 'legal' way to modify the
hypevisor page tables directly.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_mmu.h |  2 +-
 arch/arm64/kvm/arm.c             | 87 +++++++++++++++++++++++++++++---
 arch/arm64/kvm/mmu.c             | 43 ++++++++++++++--
 3 files changed, 120 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 5c42ec023cc7..ce02a4052dcf 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -166,7 +166,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
 
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
-int kvm_mmu_init(void);
+int kvm_mmu_init(u32 *hyp_va_bits);
 
 static inline void *__kvm_vector_slot2addr(void *base,
 					   enum arm64_hyp_spectre_vector slot)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 26e573cdede3..5a97abdc3ba6 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1421,7 +1421,7 @@ static void cpu_prepare_hyp_mode(int cpu)
 	kvm_flush_dcache_to_poc(params, sizeof(*params));
 }
 
-static void cpu_init_hyp_mode(void)
+static void hyp_install_host_vector(void)
 {
 	struct kvm_nvhe_init_params *params;
 	struct arm_smccc_res res;
@@ -1439,6 +1439,11 @@ static void cpu_init_hyp_mode(void)
 	params = this_cpu_ptr_nvhe_sym(kvm_init_params);
 	arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init), virt_to_phys(params), &res);
 	WARN_ON(res.a0 != SMCCC_RET_SUCCESS);
+}
+
+static void cpu_init_hyp_mode(void)
+{
+	hyp_install_host_vector();
 
 	/*
 	 * Disabling SSBD on a non-VHE system requires us to enable SSBS
@@ -1481,7 +1486,10 @@ static void cpu_set_hyp_vector(void)
 	struct bp_hardening_data *data = this_cpu_ptr(&bp_hardening_data);
 	void *vector = hyp_spectre_vector_selector[data->slot];
 
-	*this_cpu_ptr_hyp_sym(kvm_hyp_vector) = (unsigned long)vector;
+	if (!is_protected_kvm_enabled())
+		*this_cpu_ptr_hyp_sym(kvm_hyp_vector) = (unsigned long)vector;
+	else
+		kvm_call_hyp_nvhe(__pkvm_cpu_set_vector, data->slot);
 }
 
 static void cpu_hyp_reinit(void)
@@ -1489,13 +1497,14 @@ static void cpu_hyp_reinit(void)
 	kvm_init_host_cpu_context(&this_cpu_ptr_hyp_sym(kvm_host_data)->host_ctxt);
 
 	cpu_hyp_reset();
-	cpu_set_hyp_vector();
 
 	if (is_kernel_in_hyp_mode())
 		kvm_timer_init_vhe();
 	else
 		cpu_init_hyp_mode();
 
+	cpu_set_hyp_vector();
+
 	kvm_arm_init_debug();
 
 	if (vgic_present)
@@ -1691,18 +1700,59 @@ static void teardown_hyp_mode(void)
 	}
 }
 
+static int do_pkvm_init(u32 hyp_va_bits)
+{
+	void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
+	int ret;
+
+	preempt_disable();
+	hyp_install_host_vector();
+	ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size,
+				num_possible_cpus(), kern_hyp_va(per_cpu_base),
+				hyp_va_bits);
+	preempt_enable();
+
+	return ret;
+}
+
+static int kvm_hyp_init_protection(u32 hyp_va_bits)
+{
+	void *addr = phys_to_virt(hyp_mem_base);
+	int ret;
+
+	ret = create_hyp_mappings(addr, addr + hyp_mem_size, PAGE_HYP);
+	if (ret)
+		return ret;
+
+	ret = do_pkvm_init(hyp_va_bits);
+	if (ret)
+		return ret;
+
+	free_hyp_pgds();
+
+	return 0;
+}
+
 /**
  * Inits Hyp-mode on all online CPUs
  */
 static int init_hyp_mode(void)
 {
+	u32 hyp_va_bits;
 	int cpu;
-	int err = 0;
+	int err = -ENOMEM;
+
+	/*
+	 * The protected Hyp-mode cannot be initialized if the memory pool
+	 * allocation has failed.
+	 */
+	if (is_protected_kvm_enabled() && !hyp_mem_base)
+		return err;
 
 	/*
 	 * Allocate Hyp PGD and setup Hyp identity mapping
 	 */
-	err = kvm_mmu_init();
+	err = kvm_mmu_init(&hyp_va_bits);
 	if (err)
 		goto out_err;
 
@@ -1818,6 +1868,14 @@ static int init_hyp_mode(void)
 			goto out_err;
 	}
 
+	if (is_protected_kvm_enabled()) {
+		err = kvm_hyp_init_protection(hyp_va_bits);
+		if (err) {
+			kvm_err("Failed to init hyp memory protection\n");
+			goto out_err;
+		}
+	}
+
 	return 0;
 
 out_err:
@@ -1826,6 +1884,16 @@ static int init_hyp_mode(void)
 	return err;
 }
 
+static int finalize_hyp_mode(void)
+{
+	if (!is_protected_kvm_enabled())
+		return 0;
+
+	static_branch_enable(&kvm_protected_mode_initialized);
+
+	return 0;
+}
+
 static void check_kvm_target_cpu(void *ret)
 {
 	*(int *)ret = kvm_target_cpu();
@@ -1942,8 +2010,15 @@ int kvm_arch_init(void *opaque)
 	if (err)
 		goto out_hyp;
 
+	if (!in_hyp_mode) {
+		err = finalize_hyp_mode();
+		if (err) {
+			kvm_err("Failed to finalize Hyp protection\n");
+			goto out_hyp;
+		}
+	}
+
 	if (is_protected_kvm_enabled()) {
-		static_branch_enable(&kvm_protected_mode_initialized);
 		kvm_info("Protected nVHE mode initialized successfully\n");
 	} else if (in_hyp_mode) {
 		kvm_info("VHE mode initialized successfully\n");
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4d41d7838d53..9d331bf262d2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -221,15 +221,39 @@ void free_hyp_pgds(void)
 	if (hyp_pgtable) {
 		kvm_pgtable_hyp_destroy(hyp_pgtable);
 		kfree(hyp_pgtable);
+		hyp_pgtable = NULL;
 	}
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 }
 
+static bool kvm_host_owns_hyp_mappings(void)
+{
+	if (static_branch_likely(&kvm_protected_mode_initialized))
+		return false;
+
+	/*
+	 * This can happen at boot time when __create_hyp_mappings() is called
+	 * after the hyp protection has been enabled, but the static key has
+	 * not been flipped yet.
+	 */
+	if (!hyp_pgtable && is_protected_kvm_enabled())
+		return false;
+
+	WARN_ON(!hyp_pgtable);
+
+	return true;
+}
+
 static int __create_hyp_mappings(unsigned long start, unsigned long size,
 				 unsigned long phys, enum kvm_pgtable_prot prot)
 {
 	int err;
 
+	if (!kvm_host_owns_hyp_mappings()) {
+		return kvm_call_hyp_nvhe(__pkvm_create_mappings,
+					 start, size, phys, prot);
+	}
+
 	mutex_lock(&kvm_hyp_pgd_mutex);
 	err = kvm_pgtable_hyp_map(hyp_pgtable, start, size, phys, prot);
 	mutex_unlock(&kvm_hyp_pgd_mutex);
@@ -291,6 +315,16 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
 	unsigned long base;
 	int ret = 0;
 
+	if (!kvm_host_owns_hyp_mappings()) {
+		base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
+					 phys_addr, size, prot);
+		if (IS_ERR_OR_NULL((void *)base))
+			return PTR_ERR((void *)base);
+		*haddr = base;
+
+		return 0;
+	}
+
 	mutex_lock(&kvm_hyp_pgd_mutex);
 
 	/*
@@ -1270,10 +1304,9 @@ static struct kvm_pgtable_mm_ops kvm_hyp_mm_ops = {
 	.virt_to_phys		= kvm_host_pa,
 };
 
-int kvm_mmu_init(void)
+int kvm_mmu_init(u32 *hyp_va_bits)
 {
 	int err;
-	u32 hyp_va_bits;
 
 	hyp_idmap_start = __pa_symbol(__hyp_idmap_text_start);
 	hyp_idmap_start = ALIGN_DOWN(hyp_idmap_start, PAGE_SIZE);
@@ -1287,8 +1320,8 @@ int kvm_mmu_init(void)
 	 */
 	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
 
-	hyp_va_bits = 64 - ((idmap_t0sz & TCR_T0SZ_MASK) >> TCR_T0SZ_OFFSET);
-	kvm_debug("Using %u-bit virtual addresses at EL2\n", hyp_va_bits);
+	*hyp_va_bits = 64 - ((idmap_t0sz & TCR_T0SZ_MASK) >> TCR_T0SZ_OFFSET);
+	kvm_debug("Using %u-bit virtual addresses at EL2\n", *hyp_va_bits);
 	kvm_debug("IDMAP page: %lx\n", hyp_idmap_start);
 	kvm_debug("HYP VA range: %lx:%lx\n",
 		  kern_hyp_va(PAGE_OFFSET),
@@ -1313,7 +1346,7 @@ int kvm_mmu_init(void)
 		goto out;
 	}
 
-	err = kvm_pgtable_hyp_init(hyp_pgtable, hyp_va_bits, &kvm_hyp_mm_ops);
+	err = kvm_pgtable_hyp_init(hyp_pgtable, *hyp_va_bits, &kvm_hyp_mm_ops);
 	if (err)
 		goto out_free_pgtable;
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 16/32] KVM: arm64: Elevate hypervisor mappings creation at EL2
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

Previous commits have introduced infrastructure to enable the EL2 code
to manage its own stage 1 mappings. However, this was preliminary work,
and none of it is currently in use.

Put all of this together by elevating the mapping creation at EL2 when
memory protection is enabled. In this case, the host kernel running
at EL1 still creates _temporary_ EL2 mappings, only used while
initializing the hypervisor, but frees them right after.

As such, all calls to create_hyp_mappings() after kvm init has finished
turn into hypercalls, as the host now has no 'legal' way to modify the
hypevisor page tables directly.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_mmu.h |  2 +-
 arch/arm64/kvm/arm.c             | 87 +++++++++++++++++++++++++++++---
 arch/arm64/kvm/mmu.c             | 43 ++++++++++++++--
 3 files changed, 120 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 5c42ec023cc7..ce02a4052dcf 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -166,7 +166,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
 
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
-int kvm_mmu_init(void);
+int kvm_mmu_init(u32 *hyp_va_bits);
 
 static inline void *__kvm_vector_slot2addr(void *base,
 					   enum arm64_hyp_spectre_vector slot)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 26e573cdede3..5a97abdc3ba6 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1421,7 +1421,7 @@ static void cpu_prepare_hyp_mode(int cpu)
 	kvm_flush_dcache_to_poc(params, sizeof(*params));
 }
 
-static void cpu_init_hyp_mode(void)
+static void hyp_install_host_vector(void)
 {
 	struct kvm_nvhe_init_params *params;
 	struct arm_smccc_res res;
@@ -1439,6 +1439,11 @@ static void cpu_init_hyp_mode(void)
 	params = this_cpu_ptr_nvhe_sym(kvm_init_params);
 	arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init), virt_to_phys(params), &res);
 	WARN_ON(res.a0 != SMCCC_RET_SUCCESS);
+}
+
+static void cpu_init_hyp_mode(void)
+{
+	hyp_install_host_vector();
 
 	/*
 	 * Disabling SSBD on a non-VHE system requires us to enable SSBS
@@ -1481,7 +1486,10 @@ static void cpu_set_hyp_vector(void)
 	struct bp_hardening_data *data = this_cpu_ptr(&bp_hardening_data);
 	void *vector = hyp_spectre_vector_selector[data->slot];
 
-	*this_cpu_ptr_hyp_sym(kvm_hyp_vector) = (unsigned long)vector;
+	if (!is_protected_kvm_enabled())
+		*this_cpu_ptr_hyp_sym(kvm_hyp_vector) = (unsigned long)vector;
+	else
+		kvm_call_hyp_nvhe(__pkvm_cpu_set_vector, data->slot);
 }
 
 static void cpu_hyp_reinit(void)
@@ -1489,13 +1497,14 @@ static void cpu_hyp_reinit(void)
 	kvm_init_host_cpu_context(&this_cpu_ptr_hyp_sym(kvm_host_data)->host_ctxt);
 
 	cpu_hyp_reset();
-	cpu_set_hyp_vector();
 
 	if (is_kernel_in_hyp_mode())
 		kvm_timer_init_vhe();
 	else
 		cpu_init_hyp_mode();
 
+	cpu_set_hyp_vector();
+
 	kvm_arm_init_debug();
 
 	if (vgic_present)
@@ -1691,18 +1700,59 @@ static void teardown_hyp_mode(void)
 	}
 }
 
+static int do_pkvm_init(u32 hyp_va_bits)
+{
+	void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
+	int ret;
+
+	preempt_disable();
+	hyp_install_host_vector();
+	ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size,
+				num_possible_cpus(), kern_hyp_va(per_cpu_base),
+				hyp_va_bits);
+	preempt_enable();
+
+	return ret;
+}
+
+static int kvm_hyp_init_protection(u32 hyp_va_bits)
+{
+	void *addr = phys_to_virt(hyp_mem_base);
+	int ret;
+
+	ret = create_hyp_mappings(addr, addr + hyp_mem_size, PAGE_HYP);
+	if (ret)
+		return ret;
+
+	ret = do_pkvm_init(hyp_va_bits);
+	if (ret)
+		return ret;
+
+	free_hyp_pgds();
+
+	return 0;
+}
+
 /**
  * Inits Hyp-mode on all online CPUs
  */
 static int init_hyp_mode(void)
 {
+	u32 hyp_va_bits;
 	int cpu;
-	int err = 0;
+	int err = -ENOMEM;
+
+	/*
+	 * The protected Hyp-mode cannot be initialized if the memory pool
+	 * allocation has failed.
+	 */
+	if (is_protected_kvm_enabled() && !hyp_mem_base)
+		return err;
 
 	/*
 	 * Allocate Hyp PGD and setup Hyp identity mapping
 	 */
-	err = kvm_mmu_init();
+	err = kvm_mmu_init(&hyp_va_bits);
 	if (err)
 		goto out_err;
 
@@ -1818,6 +1868,14 @@ static int init_hyp_mode(void)
 			goto out_err;
 	}
 
+	if (is_protected_kvm_enabled()) {
+		err = kvm_hyp_init_protection(hyp_va_bits);
+		if (err) {
+			kvm_err("Failed to init hyp memory protection\n");
+			goto out_err;
+		}
+	}
+
 	return 0;
 
 out_err:
@@ -1826,6 +1884,16 @@ static int init_hyp_mode(void)
 	return err;
 }
 
+static int finalize_hyp_mode(void)
+{
+	if (!is_protected_kvm_enabled())
+		return 0;
+
+	static_branch_enable(&kvm_protected_mode_initialized);
+
+	return 0;
+}
+
 static void check_kvm_target_cpu(void *ret)
 {
 	*(int *)ret = kvm_target_cpu();
@@ -1942,8 +2010,15 @@ int kvm_arch_init(void *opaque)
 	if (err)
 		goto out_hyp;
 
+	if (!in_hyp_mode) {
+		err = finalize_hyp_mode();
+		if (err) {
+			kvm_err("Failed to finalize Hyp protection\n");
+			goto out_hyp;
+		}
+	}
+
 	if (is_protected_kvm_enabled()) {
-		static_branch_enable(&kvm_protected_mode_initialized);
 		kvm_info("Protected nVHE mode initialized successfully\n");
 	} else if (in_hyp_mode) {
 		kvm_info("VHE mode initialized successfully\n");
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4d41d7838d53..9d331bf262d2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -221,15 +221,39 @@ void free_hyp_pgds(void)
 	if (hyp_pgtable) {
 		kvm_pgtable_hyp_destroy(hyp_pgtable);
 		kfree(hyp_pgtable);
+		hyp_pgtable = NULL;
 	}
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 }
 
+static bool kvm_host_owns_hyp_mappings(void)
+{
+	if (static_branch_likely(&kvm_protected_mode_initialized))
+		return false;
+
+	/*
+	 * This can happen at boot time when __create_hyp_mappings() is called
+	 * after the hyp protection has been enabled, but the static key has
+	 * not been flipped yet.
+	 */
+	if (!hyp_pgtable && is_protected_kvm_enabled())
+		return false;
+
+	WARN_ON(!hyp_pgtable);
+
+	return true;
+}
+
 static int __create_hyp_mappings(unsigned long start, unsigned long size,
 				 unsigned long phys, enum kvm_pgtable_prot prot)
 {
 	int err;
 
+	if (!kvm_host_owns_hyp_mappings()) {
+		return kvm_call_hyp_nvhe(__pkvm_create_mappings,
+					 start, size, phys, prot);
+	}
+
 	mutex_lock(&kvm_hyp_pgd_mutex);
 	err = kvm_pgtable_hyp_map(hyp_pgtable, start, size, phys, prot);
 	mutex_unlock(&kvm_hyp_pgd_mutex);
@@ -291,6 +315,16 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
 	unsigned long base;
 	int ret = 0;
 
+	if (!kvm_host_owns_hyp_mappings()) {
+		base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
+					 phys_addr, size, prot);
+		if (IS_ERR_OR_NULL((void *)base))
+			return PTR_ERR((void *)base);
+		*haddr = base;
+
+		return 0;
+	}
+
 	mutex_lock(&kvm_hyp_pgd_mutex);
 
 	/*
@@ -1270,10 +1304,9 @@ static struct kvm_pgtable_mm_ops kvm_hyp_mm_ops = {
 	.virt_to_phys		= kvm_host_pa,
 };
 
-int kvm_mmu_init(void)
+int kvm_mmu_init(u32 *hyp_va_bits)
 {
 	int err;
-	u32 hyp_va_bits;
 
 	hyp_idmap_start = __pa_symbol(__hyp_idmap_text_start);
 	hyp_idmap_start = ALIGN_DOWN(hyp_idmap_start, PAGE_SIZE);
@@ -1287,8 +1320,8 @@ int kvm_mmu_init(void)
 	 */
 	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
 
-	hyp_va_bits = 64 - ((idmap_t0sz & TCR_T0SZ_MASK) >> TCR_T0SZ_OFFSET);
-	kvm_debug("Using %u-bit virtual addresses at EL2\n", hyp_va_bits);
+	*hyp_va_bits = 64 - ((idmap_t0sz & TCR_T0SZ_MASK) >> TCR_T0SZ_OFFSET);
+	kvm_debug("Using %u-bit virtual addresses at EL2\n", *hyp_va_bits);
 	kvm_debug("IDMAP page: %lx\n", hyp_idmap_start);
 	kvm_debug("HYP VA range: %lx:%lx\n",
 		  kern_hyp_va(PAGE_OFFSET),
@@ -1313,7 +1346,7 @@ int kvm_mmu_init(void)
 		goto out;
 	}
 
-	err = kvm_pgtable_hyp_init(hyp_pgtable, hyp_va_bits, &kvm_hyp_mm_ops);
+	err = kvm_pgtable_hyp_init(hyp_pgtable, *hyp_va_bits, &kvm_hyp_mm_ops);
 	if (err)
 		goto out_free_pgtable;
 
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 16/32] KVM: arm64: Elevate hypervisor mappings creation at EL2
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Previous commits have introduced infrastructure to enable the EL2 code
to manage its own stage 1 mappings. However, this was preliminary work,
and none of it is currently in use.

Put all of this together by elevating the mapping creation at EL2 when
memory protection is enabled. In this case, the host kernel running
at EL1 still creates _temporary_ EL2 mappings, only used while
initializing the hypervisor, but frees them right after.

As such, all calls to create_hyp_mappings() after kvm init has finished
turn into hypercalls, as the host now has no 'legal' way to modify the
hypevisor page tables directly.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_mmu.h |  2 +-
 arch/arm64/kvm/arm.c             | 87 +++++++++++++++++++++++++++++---
 arch/arm64/kvm/mmu.c             | 43 ++++++++++++++--
 3 files changed, 120 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 5c42ec023cc7..ce02a4052dcf 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -166,7 +166,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
 
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
-int kvm_mmu_init(void);
+int kvm_mmu_init(u32 *hyp_va_bits);
 
 static inline void *__kvm_vector_slot2addr(void *base,
 					   enum arm64_hyp_spectre_vector slot)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 26e573cdede3..5a97abdc3ba6 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1421,7 +1421,7 @@ static void cpu_prepare_hyp_mode(int cpu)
 	kvm_flush_dcache_to_poc(params, sizeof(*params));
 }
 
-static void cpu_init_hyp_mode(void)
+static void hyp_install_host_vector(void)
 {
 	struct kvm_nvhe_init_params *params;
 	struct arm_smccc_res res;
@@ -1439,6 +1439,11 @@ static void cpu_init_hyp_mode(void)
 	params = this_cpu_ptr_nvhe_sym(kvm_init_params);
 	arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init), virt_to_phys(params), &res);
 	WARN_ON(res.a0 != SMCCC_RET_SUCCESS);
+}
+
+static void cpu_init_hyp_mode(void)
+{
+	hyp_install_host_vector();
 
 	/*
 	 * Disabling SSBD on a non-VHE system requires us to enable SSBS
@@ -1481,7 +1486,10 @@ static void cpu_set_hyp_vector(void)
 	struct bp_hardening_data *data = this_cpu_ptr(&bp_hardening_data);
 	void *vector = hyp_spectre_vector_selector[data->slot];
 
-	*this_cpu_ptr_hyp_sym(kvm_hyp_vector) = (unsigned long)vector;
+	if (!is_protected_kvm_enabled())
+		*this_cpu_ptr_hyp_sym(kvm_hyp_vector) = (unsigned long)vector;
+	else
+		kvm_call_hyp_nvhe(__pkvm_cpu_set_vector, data->slot);
 }
 
 static void cpu_hyp_reinit(void)
@@ -1489,13 +1497,14 @@ static void cpu_hyp_reinit(void)
 	kvm_init_host_cpu_context(&this_cpu_ptr_hyp_sym(kvm_host_data)->host_ctxt);
 
 	cpu_hyp_reset();
-	cpu_set_hyp_vector();
 
 	if (is_kernel_in_hyp_mode())
 		kvm_timer_init_vhe();
 	else
 		cpu_init_hyp_mode();
 
+	cpu_set_hyp_vector();
+
 	kvm_arm_init_debug();
 
 	if (vgic_present)
@@ -1691,18 +1700,59 @@ static void teardown_hyp_mode(void)
 	}
 }
 
+static int do_pkvm_init(u32 hyp_va_bits)
+{
+	void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
+	int ret;
+
+	preempt_disable();
+	hyp_install_host_vector();
+	ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size,
+				num_possible_cpus(), kern_hyp_va(per_cpu_base),
+				hyp_va_bits);
+	preempt_enable();
+
+	return ret;
+}
+
+static int kvm_hyp_init_protection(u32 hyp_va_bits)
+{
+	void *addr = phys_to_virt(hyp_mem_base);
+	int ret;
+
+	ret = create_hyp_mappings(addr, addr + hyp_mem_size, PAGE_HYP);
+	if (ret)
+		return ret;
+
+	ret = do_pkvm_init(hyp_va_bits);
+	if (ret)
+		return ret;
+
+	free_hyp_pgds();
+
+	return 0;
+}
+
 /**
  * Inits Hyp-mode on all online CPUs
  */
 static int init_hyp_mode(void)
 {
+	u32 hyp_va_bits;
 	int cpu;
-	int err = 0;
+	int err = -ENOMEM;
+
+	/*
+	 * The protected Hyp-mode cannot be initialized if the memory pool
+	 * allocation has failed.
+	 */
+	if (is_protected_kvm_enabled() && !hyp_mem_base)
+		return err;
 
 	/*
 	 * Allocate Hyp PGD and setup Hyp identity mapping
 	 */
-	err = kvm_mmu_init();
+	err = kvm_mmu_init(&hyp_va_bits);
 	if (err)
 		goto out_err;
 
@@ -1818,6 +1868,14 @@ static int init_hyp_mode(void)
 			goto out_err;
 	}
 
+	if (is_protected_kvm_enabled()) {
+		err = kvm_hyp_init_protection(hyp_va_bits);
+		if (err) {
+			kvm_err("Failed to init hyp memory protection\n");
+			goto out_err;
+		}
+	}
+
 	return 0;
 
 out_err:
@@ -1826,6 +1884,16 @@ static int init_hyp_mode(void)
 	return err;
 }
 
+static int finalize_hyp_mode(void)
+{
+	if (!is_protected_kvm_enabled())
+		return 0;
+
+	static_branch_enable(&kvm_protected_mode_initialized);
+
+	return 0;
+}
+
 static void check_kvm_target_cpu(void *ret)
 {
 	*(int *)ret = kvm_target_cpu();
@@ -1942,8 +2010,15 @@ int kvm_arch_init(void *opaque)
 	if (err)
 		goto out_hyp;
 
+	if (!in_hyp_mode) {
+		err = finalize_hyp_mode();
+		if (err) {
+			kvm_err("Failed to finalize Hyp protection\n");
+			goto out_hyp;
+		}
+	}
+
 	if (is_protected_kvm_enabled()) {
-		static_branch_enable(&kvm_protected_mode_initialized);
 		kvm_info("Protected nVHE mode initialized successfully\n");
 	} else if (in_hyp_mode) {
 		kvm_info("VHE mode initialized successfully\n");
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4d41d7838d53..9d331bf262d2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -221,15 +221,39 @@ void free_hyp_pgds(void)
 	if (hyp_pgtable) {
 		kvm_pgtable_hyp_destroy(hyp_pgtable);
 		kfree(hyp_pgtable);
+		hyp_pgtable = NULL;
 	}
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 }
 
+static bool kvm_host_owns_hyp_mappings(void)
+{
+	if (static_branch_likely(&kvm_protected_mode_initialized))
+		return false;
+
+	/*
+	 * This can happen at boot time when __create_hyp_mappings() is called
+	 * after the hyp protection has been enabled, but the static key has
+	 * not been flipped yet.
+	 */
+	if (!hyp_pgtable && is_protected_kvm_enabled())
+		return false;
+
+	WARN_ON(!hyp_pgtable);
+
+	return true;
+}
+
 static int __create_hyp_mappings(unsigned long start, unsigned long size,
 				 unsigned long phys, enum kvm_pgtable_prot prot)
 {
 	int err;
 
+	if (!kvm_host_owns_hyp_mappings()) {
+		return kvm_call_hyp_nvhe(__pkvm_create_mappings,
+					 start, size, phys, prot);
+	}
+
 	mutex_lock(&kvm_hyp_pgd_mutex);
 	err = kvm_pgtable_hyp_map(hyp_pgtable, start, size, phys, prot);
 	mutex_unlock(&kvm_hyp_pgd_mutex);
@@ -291,6 +315,16 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
 	unsigned long base;
 	int ret = 0;
 
+	if (!kvm_host_owns_hyp_mappings()) {
+		base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
+					 phys_addr, size, prot);
+		if (IS_ERR_OR_NULL((void *)base))
+			return PTR_ERR((void *)base);
+		*haddr = base;
+
+		return 0;
+	}
+
 	mutex_lock(&kvm_hyp_pgd_mutex);
 
 	/*
@@ -1270,10 +1304,9 @@ static struct kvm_pgtable_mm_ops kvm_hyp_mm_ops = {
 	.virt_to_phys		= kvm_host_pa,
 };
 
-int kvm_mmu_init(void)
+int kvm_mmu_init(u32 *hyp_va_bits)
 {
 	int err;
-	u32 hyp_va_bits;
 
 	hyp_idmap_start = __pa_symbol(__hyp_idmap_text_start);
 	hyp_idmap_start = ALIGN_DOWN(hyp_idmap_start, PAGE_SIZE);
@@ -1287,8 +1320,8 @@ int kvm_mmu_init(void)
 	 */
 	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
 
-	hyp_va_bits = 64 - ((idmap_t0sz & TCR_T0SZ_MASK) >> TCR_T0SZ_OFFSET);
-	kvm_debug("Using %u-bit virtual addresses at EL2\n", hyp_va_bits);
+	*hyp_va_bits = 64 - ((idmap_t0sz & TCR_T0SZ_MASK) >> TCR_T0SZ_OFFSET);
+	kvm_debug("Using %u-bit virtual addresses at EL2\n", *hyp_va_bits);
 	kvm_debug("IDMAP page: %lx\n", hyp_idmap_start);
 	kvm_debug("HYP VA range: %lx:%lx\n",
 		  kern_hyp_va(PAGE_OFFSET),
@@ -1313,7 +1346,7 @@ int kvm_mmu_init(void)
 		goto out;
 	}
 
-	err = kvm_pgtable_hyp_init(hyp_pgtable, hyp_va_bits, &kvm_hyp_mm_ops);
+	err = kvm_pgtable_hyp_init(hyp_pgtable, *hyp_va_bits, &kvm_hyp_mm_ops);
 	if (err)
 		goto out_free_pgtable;
 
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 17/32] KVM: arm64: Use kvm_arch for stage 2 pgtable
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In order to make use of the stage 2 pgtable code for the host stage 2,
use struct kvm_arch in lieu of struct kvm as the host will have the
former but not the latter.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 5 +++--
 arch/arm64/kvm/hyp/pgtable.c         | 6 +++---
 arch/arm64/kvm/mmu.c                 | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 7898610fbeeb..a8255d55c168 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -162,12 +162,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 /**
  * kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
- * @kvm:	KVM structure representing the guest virtual machine.
+ * @arch:	Arch-specific KVM structure representing the guest virtual
+ *		machine.
  * @mm_ops:	Memory management callbacks.
  *
  * Return: 0 on success, negative error code on failure.
  */
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_arch *arch,
 			    struct kvm_pgtable_mm_ops *mm_ops);
 
 /**
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 7ce0969203e8..3d79c8094cdd 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -879,11 +879,11 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 	return kvm_pgtable_walk(pgt, addr, size, &walker);
 }
 
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_arch *arch,
 			    struct kvm_pgtable_mm_ops *mm_ops)
 {
 	size_t pgd_sz;
-	u64 vtcr = kvm->arch.vtcr;
+	u64 vtcr = arch->vtcr;
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
 	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
@@ -896,7 +896,7 @@ int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
 	pgt->ia_bits		= ia_bits;
 	pgt->start_level	= start_level;
 	pgt->mm_ops		= mm_ops;
-	pgt->mmu		= &kvm->arch.mmu;
+	pgt->mmu		= &arch->mmu;
 
 	/* Ensure zeroed PGD pages are visible to the hardware walker */
 	dsb(ishst);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 9d331bf262d2..41f9c03cbcc3 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -457,7 +457,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	if (!pgt)
 		return -ENOMEM;
 
-	err = kvm_pgtable_stage2_init(pgt, kvm, &kvm_s2_mm_ops);
+	err = kvm_pgtable_stage2_init(pgt, &kvm->arch, &kvm_s2_mm_ops);
 	if (err)
 		goto out_free_pgtable;
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 17/32] KVM: arm64: Use kvm_arch for stage 2 pgtable
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

In order to make use of the stage 2 pgtable code for the host stage 2,
use struct kvm_arch in lieu of struct kvm as the host will have the
former but not the latter.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 5 +++--
 arch/arm64/kvm/hyp/pgtable.c         | 6 +++---
 arch/arm64/kvm/mmu.c                 | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 7898610fbeeb..a8255d55c168 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -162,12 +162,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 /**
  * kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
- * @kvm:	KVM structure representing the guest virtual machine.
+ * @arch:	Arch-specific KVM structure representing the guest virtual
+ *		machine.
  * @mm_ops:	Memory management callbacks.
  *
  * Return: 0 on success, negative error code on failure.
  */
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_arch *arch,
 			    struct kvm_pgtable_mm_ops *mm_ops);
 
 /**
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 7ce0969203e8..3d79c8094cdd 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -879,11 +879,11 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 	return kvm_pgtable_walk(pgt, addr, size, &walker);
 }
 
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_arch *arch,
 			    struct kvm_pgtable_mm_ops *mm_ops)
 {
 	size_t pgd_sz;
-	u64 vtcr = kvm->arch.vtcr;
+	u64 vtcr = arch->vtcr;
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
 	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
@@ -896,7 +896,7 @@ int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
 	pgt->ia_bits		= ia_bits;
 	pgt->start_level	= start_level;
 	pgt->mm_ops		= mm_ops;
-	pgt->mmu		= &kvm->arch.mmu;
+	pgt->mmu		= &arch->mmu;
 
 	/* Ensure zeroed PGD pages are visible to the hardware walker */
 	dsb(ishst);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 9d331bf262d2..41f9c03cbcc3 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -457,7 +457,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	if (!pgt)
 		return -ENOMEM;
 
-	err = kvm_pgtable_stage2_init(pgt, kvm, &kvm_s2_mm_ops);
+	err = kvm_pgtable_stage2_init(pgt, &kvm->arch, &kvm_s2_mm_ops);
 	if (err)
 		goto out_free_pgtable;
 
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 17/32] KVM: arm64: Use kvm_arch for stage 2 pgtable
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In order to make use of the stage 2 pgtable code for the host stage 2,
use struct kvm_arch in lieu of struct kvm as the host will have the
former but not the latter.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 5 +++--
 arch/arm64/kvm/hyp/pgtable.c         | 6 +++---
 arch/arm64/kvm/mmu.c                 | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 7898610fbeeb..a8255d55c168 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -162,12 +162,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 /**
  * kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
- * @kvm:	KVM structure representing the guest virtual machine.
+ * @arch:	Arch-specific KVM structure representing the guest virtual
+ *		machine.
  * @mm_ops:	Memory management callbacks.
  *
  * Return: 0 on success, negative error code on failure.
  */
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_arch *arch,
 			    struct kvm_pgtable_mm_ops *mm_ops);
 
 /**
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 7ce0969203e8..3d79c8094cdd 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -879,11 +879,11 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 	return kvm_pgtable_walk(pgt, addr, size, &walker);
 }
 
-int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_arch *arch,
 			    struct kvm_pgtable_mm_ops *mm_ops)
 {
 	size_t pgd_sz;
-	u64 vtcr = kvm->arch.vtcr;
+	u64 vtcr = arch->vtcr;
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
 	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
@@ -896,7 +896,7 @@ int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm,
 	pgt->ia_bits		= ia_bits;
 	pgt->start_level	= start_level;
 	pgt->mm_ops		= mm_ops;
-	pgt->mmu		= &kvm->arch.mmu;
+	pgt->mmu		= &arch->mmu;
 
 	/* Ensure zeroed PGD pages are visible to the hardware walker */
 	dsb(ishst);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 9d331bf262d2..41f9c03cbcc3 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -457,7 +457,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	if (!pgt)
 		return -ENOMEM;
 
-	err = kvm_pgtable_stage2_init(pgt, kvm, &kvm_s2_mm_ops);
+	err = kvm_pgtable_stage2_init(pgt, &kvm->arch, &kvm_s2_mm_ops);
 	if (err)
 		goto out_free_pgtable;
 
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 18/32] KVM: arm64: Use kvm_arch in kvm_s2_mmu
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In order to make use of the stage 2 pgtable code for the host stage 2,
change kvm_s2_mmu to use a kvm_arch pointer in lieu of the kvm pointer,
as the host will have the former but not the latter.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/include/asm/kvm_mmu.h  | 6 +++++-
 arch/arm64/kvm/mmu.c              | 8 ++++----
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index b9d45a1f8538..90565782ce3e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -94,7 +94,7 @@ struct kvm_s2_mmu {
 	/* The last vcpu id that ran on each physical CPU */
 	int __percpu *last_vcpu_ran;
 
-	struct kvm *kvm;
+	struct kvm_arch *arch;
 };
 
 struct kvm_arch_memory_slot {
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index ce02a4052dcf..6f743e20cb06 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -272,7 +272,7 @@ static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
  */
 static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
 {
-	write_sysreg(kern_hyp_va(mmu->kvm)->arch.vtcr, vtcr_el2);
+	write_sysreg(kern_hyp_va(mmu->arch)->vtcr, vtcr_el2);
 	write_sysreg(kvm_get_vttbr(mmu), vttbr_el2);
 
 	/*
@@ -283,5 +283,9 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
 	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
 }
 
+static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
+{
+	return container_of(mmu->arch, struct kvm, arch);
+}
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 41f9c03cbcc3..3257cadfab24 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -165,7 +165,7 @@ static void *kvm_host_va(phys_addr_t phys)
 static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size,
 				 bool may_block)
 {
-	struct kvm *kvm = mmu->kvm;
+	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
 	phys_addr_t end = start + size;
 
 	assert_spin_locked(&kvm->mmu_lock);
@@ -470,7 +470,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	for_each_possible_cpu(cpu)
 		*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
 
-	mmu->kvm = kvm;
+	mmu->arch = &kvm->arch;
 	mmu->pgt = pgt;
 	mmu->pgd_phys = __pa(pgt->pgd);
 	mmu->vmid.vmid_gen = 0;
@@ -552,7 +552,7 @@ void stage2_unmap_vm(struct kvm *kvm)
 
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 {
-	struct kvm *kvm = mmu->kvm;
+	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
 	struct kvm_pgtable *pgt = NULL;
 
 	spin_lock(&kvm->mmu_lock);
@@ -621,7 +621,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
  */
 static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
 {
-	struct kvm *kvm = mmu->kvm;
+	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
 	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_wrprotect);
 }
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 18/32] KVM: arm64: Use kvm_arch in kvm_s2_mmu
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

In order to make use of the stage 2 pgtable code for the host stage 2,
change kvm_s2_mmu to use a kvm_arch pointer in lieu of the kvm pointer,
as the host will have the former but not the latter.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/include/asm/kvm_mmu.h  | 6 +++++-
 arch/arm64/kvm/mmu.c              | 8 ++++----
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index b9d45a1f8538..90565782ce3e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -94,7 +94,7 @@ struct kvm_s2_mmu {
 	/* The last vcpu id that ran on each physical CPU */
 	int __percpu *last_vcpu_ran;
 
-	struct kvm *kvm;
+	struct kvm_arch *arch;
 };
 
 struct kvm_arch_memory_slot {
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index ce02a4052dcf..6f743e20cb06 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -272,7 +272,7 @@ static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
  */
 static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
 {
-	write_sysreg(kern_hyp_va(mmu->kvm)->arch.vtcr, vtcr_el2);
+	write_sysreg(kern_hyp_va(mmu->arch)->vtcr, vtcr_el2);
 	write_sysreg(kvm_get_vttbr(mmu), vttbr_el2);
 
 	/*
@@ -283,5 +283,9 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
 	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
 }
 
+static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
+{
+	return container_of(mmu->arch, struct kvm, arch);
+}
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 41f9c03cbcc3..3257cadfab24 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -165,7 +165,7 @@ static void *kvm_host_va(phys_addr_t phys)
 static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size,
 				 bool may_block)
 {
-	struct kvm *kvm = mmu->kvm;
+	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
 	phys_addr_t end = start + size;
 
 	assert_spin_locked(&kvm->mmu_lock);
@@ -470,7 +470,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	for_each_possible_cpu(cpu)
 		*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
 
-	mmu->kvm = kvm;
+	mmu->arch = &kvm->arch;
 	mmu->pgt = pgt;
 	mmu->pgd_phys = __pa(pgt->pgd);
 	mmu->vmid.vmid_gen = 0;
@@ -552,7 +552,7 @@ void stage2_unmap_vm(struct kvm *kvm)
 
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 {
-	struct kvm *kvm = mmu->kvm;
+	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
 	struct kvm_pgtable *pgt = NULL;
 
 	spin_lock(&kvm->mmu_lock);
@@ -621,7 +621,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
  */
 static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
 {
-	struct kvm *kvm = mmu->kvm;
+	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
 	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_wrprotect);
 }
 
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 18/32] KVM: arm64: Use kvm_arch in kvm_s2_mmu
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In order to make use of the stage 2 pgtable code for the host stage 2,
change kvm_s2_mmu to use a kvm_arch pointer in lieu of the kvm pointer,
as the host will have the former but not the latter.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/include/asm/kvm_mmu.h  | 6 +++++-
 arch/arm64/kvm/mmu.c              | 8 ++++----
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index b9d45a1f8538..90565782ce3e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -94,7 +94,7 @@ struct kvm_s2_mmu {
 	/* The last vcpu id that ran on each physical CPU */
 	int __percpu *last_vcpu_ran;
 
-	struct kvm *kvm;
+	struct kvm_arch *arch;
 };
 
 struct kvm_arch_memory_slot {
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index ce02a4052dcf..6f743e20cb06 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -272,7 +272,7 @@ static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
  */
 static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
 {
-	write_sysreg(kern_hyp_va(mmu->kvm)->arch.vtcr, vtcr_el2);
+	write_sysreg(kern_hyp_va(mmu->arch)->vtcr, vtcr_el2);
 	write_sysreg(kvm_get_vttbr(mmu), vttbr_el2);
 
 	/*
@@ -283,5 +283,9 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
 	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
 }
 
+static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
+{
+	return container_of(mmu->arch, struct kvm, arch);
+}
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 41f9c03cbcc3..3257cadfab24 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -165,7 +165,7 @@ static void *kvm_host_va(phys_addr_t phys)
 static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size,
 				 bool may_block)
 {
-	struct kvm *kvm = mmu->kvm;
+	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
 	phys_addr_t end = start + size;
 
 	assert_spin_locked(&kvm->mmu_lock);
@@ -470,7 +470,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	for_each_possible_cpu(cpu)
 		*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
 
-	mmu->kvm = kvm;
+	mmu->arch = &kvm->arch;
 	mmu->pgt = pgt;
 	mmu->pgd_phys = __pa(pgt->pgd);
 	mmu->vmid.vmid_gen = 0;
@@ -552,7 +552,7 @@ void stage2_unmap_vm(struct kvm *kvm)
 
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 {
-	struct kvm *kvm = mmu->kvm;
+	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
 	struct kvm_pgtable *pgt = NULL;
 
 	spin_lock(&kvm->mmu_lock);
@@ -621,7 +621,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
  */
 static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
 {
-	struct kvm *kvm = mmu->kvm;
+	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
 	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_wrprotect);
 }
 
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 19/32] KVM: arm64: Set host stage 2 using kvm_nvhe_init_params
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Move the registers relevant to host stage 2 enablement to
kvm_nvhe_init_params to prepare the ground for enabling it in later
patches.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h   |  3 +++
 arch/arm64/kernel/asm-offsets.c    |  3 +++
 arch/arm64/kvm/arm.c               |  5 +++++
 arch/arm64/kvm/hyp/nvhe/hyp-init.S | 14 +++++++++-----
 arch/arm64/kvm/hyp/nvhe/switch.c   |  5 +----
 5 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index db20a9477870..6dce860f8bca 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -158,6 +158,9 @@ struct kvm_nvhe_init_params {
 	unsigned long tpidr_el2;
 	unsigned long stack_hyp_va;
 	phys_addr_t pgd_pa;
+	unsigned long hcr_el2;
+	unsigned long vttbr;
+	unsigned long vtcr;
 };
 
 /* Translate a kernel address @ptr into its equivalent linear mapping */
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index a36e2fc330d4..8930b42f6418 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -120,6 +120,9 @@ int main(void)
   DEFINE(NVHE_INIT_TPIDR_EL2,	offsetof(struct kvm_nvhe_init_params, tpidr_el2));
   DEFINE(NVHE_INIT_STACK_HYP_VA,	offsetof(struct kvm_nvhe_init_params, stack_hyp_va));
   DEFINE(NVHE_INIT_PGD_PA,	offsetof(struct kvm_nvhe_init_params, pgd_pa));
+  DEFINE(NVHE_INIT_HCR_EL2,	offsetof(struct kvm_nvhe_init_params, hcr_el2));
+  DEFINE(NVHE_INIT_VTTBR,	offsetof(struct kvm_nvhe_init_params, vttbr));
+  DEFINE(NVHE_INIT_VTCR,	offsetof(struct kvm_nvhe_init_params, vtcr));
 #endif
 #ifdef CONFIG_CPU_PM
   DEFINE(CPU_CTX_SP,		offsetof(struct cpu_suspend_ctx, sp));
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 5a97abdc3ba6..b6a818f88051 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1413,6 +1413,11 @@ static void cpu_prepare_hyp_mode(int cpu)
 
 	params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
 	params->pgd_pa = kvm_mmu_get_httbr();
+	if (is_protected_kvm_enabled())
+		params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
+	else
+		params->hcr_el2 = HCR_HOST_NVHE_FLAGS;
+	params->vttbr = params->vtcr = 0;
 
 	/*
 	 * Flush the init params from the data cache because the struct will
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index bc56ea92b812..f312672d895e 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -83,11 +83,6 @@ SYM_CODE_END(__kvm_hyp_init)
  * x0: struct kvm_nvhe_init_params PA
  */
 SYM_CODE_START_LOCAL(___kvm_hyp_init)
-alternative_if ARM64_KVM_PROTECTED_MODE
-	mov_q	x1, HCR_HOST_NVHE_PROTECTED_FLAGS
-	msr	hcr_el2, x1
-alternative_else_nop_endif
-
 	ldr	x1, [x0, #NVHE_INIT_TPIDR_EL2]
 	msr	tpidr_el2, x1
 
@@ -97,6 +92,15 @@ alternative_else_nop_endif
 	ldr	x1, [x0, #NVHE_INIT_MAIR_EL2]
 	msr	mair_el2, x1
 
+	ldr	x1, [x0, #NVHE_INIT_HCR_EL2]
+	msr	hcr_el2, x1
+
+	ldr	x1, [x0, #NVHE_INIT_VTTBR]
+	msr	vttbr_el2, x1
+
+	ldr	x1, [x0, #NVHE_INIT_VTCR]
+	msr	vtcr_el2, x1
+
 	ldr	x1, [x0, #NVHE_INIT_PGD_PA]
 	phys_to_ttbr x2, x1
 alternative_if ARM64_HAS_CNP
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index f3d0e9eca56c..979a76cdf9fb 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -97,10 +97,7 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 	mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
 
 	write_sysreg(mdcr_el2, mdcr_el2);
-	if (is_protected_kvm_enabled())
-		write_sysreg(HCR_HOST_NVHE_PROTECTED_FLAGS, hcr_el2);
-	else
-		write_sysreg(HCR_HOST_NVHE_FLAGS, hcr_el2);
+	write_sysreg(this_cpu_ptr(&kvm_init_params)->hcr_el2, hcr_el2);
 	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
 	write_sysreg(__kvm_hyp_host_vector, vbar_el2);
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 19/32] KVM: arm64: Set host stage 2 using kvm_nvhe_init_params
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

Move the registers relevant to host stage 2 enablement to
kvm_nvhe_init_params to prepare the ground for enabling it in later
patches.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h   |  3 +++
 arch/arm64/kernel/asm-offsets.c    |  3 +++
 arch/arm64/kvm/arm.c               |  5 +++++
 arch/arm64/kvm/hyp/nvhe/hyp-init.S | 14 +++++++++-----
 arch/arm64/kvm/hyp/nvhe/switch.c   |  5 +----
 5 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index db20a9477870..6dce860f8bca 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -158,6 +158,9 @@ struct kvm_nvhe_init_params {
 	unsigned long tpidr_el2;
 	unsigned long stack_hyp_va;
 	phys_addr_t pgd_pa;
+	unsigned long hcr_el2;
+	unsigned long vttbr;
+	unsigned long vtcr;
 };
 
 /* Translate a kernel address @ptr into its equivalent linear mapping */
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index a36e2fc330d4..8930b42f6418 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -120,6 +120,9 @@ int main(void)
   DEFINE(NVHE_INIT_TPIDR_EL2,	offsetof(struct kvm_nvhe_init_params, tpidr_el2));
   DEFINE(NVHE_INIT_STACK_HYP_VA,	offsetof(struct kvm_nvhe_init_params, stack_hyp_va));
   DEFINE(NVHE_INIT_PGD_PA,	offsetof(struct kvm_nvhe_init_params, pgd_pa));
+  DEFINE(NVHE_INIT_HCR_EL2,	offsetof(struct kvm_nvhe_init_params, hcr_el2));
+  DEFINE(NVHE_INIT_VTTBR,	offsetof(struct kvm_nvhe_init_params, vttbr));
+  DEFINE(NVHE_INIT_VTCR,	offsetof(struct kvm_nvhe_init_params, vtcr));
 #endif
 #ifdef CONFIG_CPU_PM
   DEFINE(CPU_CTX_SP,		offsetof(struct cpu_suspend_ctx, sp));
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 5a97abdc3ba6..b6a818f88051 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1413,6 +1413,11 @@ static void cpu_prepare_hyp_mode(int cpu)
 
 	params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
 	params->pgd_pa = kvm_mmu_get_httbr();
+	if (is_protected_kvm_enabled())
+		params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
+	else
+		params->hcr_el2 = HCR_HOST_NVHE_FLAGS;
+	params->vttbr = params->vtcr = 0;
 
 	/*
 	 * Flush the init params from the data cache because the struct will
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index bc56ea92b812..f312672d895e 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -83,11 +83,6 @@ SYM_CODE_END(__kvm_hyp_init)
  * x0: struct kvm_nvhe_init_params PA
  */
 SYM_CODE_START_LOCAL(___kvm_hyp_init)
-alternative_if ARM64_KVM_PROTECTED_MODE
-	mov_q	x1, HCR_HOST_NVHE_PROTECTED_FLAGS
-	msr	hcr_el2, x1
-alternative_else_nop_endif
-
 	ldr	x1, [x0, #NVHE_INIT_TPIDR_EL2]
 	msr	tpidr_el2, x1
 
@@ -97,6 +92,15 @@ alternative_else_nop_endif
 	ldr	x1, [x0, #NVHE_INIT_MAIR_EL2]
 	msr	mair_el2, x1
 
+	ldr	x1, [x0, #NVHE_INIT_HCR_EL2]
+	msr	hcr_el2, x1
+
+	ldr	x1, [x0, #NVHE_INIT_VTTBR]
+	msr	vttbr_el2, x1
+
+	ldr	x1, [x0, #NVHE_INIT_VTCR]
+	msr	vtcr_el2, x1
+
 	ldr	x1, [x0, #NVHE_INIT_PGD_PA]
 	phys_to_ttbr x2, x1
 alternative_if ARM64_HAS_CNP
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index f3d0e9eca56c..979a76cdf9fb 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -97,10 +97,7 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 	mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
 
 	write_sysreg(mdcr_el2, mdcr_el2);
-	if (is_protected_kvm_enabled())
-		write_sysreg(HCR_HOST_NVHE_PROTECTED_FLAGS, hcr_el2);
-	else
-		write_sysreg(HCR_HOST_NVHE_FLAGS, hcr_el2);
+	write_sysreg(this_cpu_ptr(&kvm_init_params)->hcr_el2, hcr_el2);
 	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
 	write_sysreg(__kvm_hyp_host_vector, vbar_el2);
 }
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 19/32] KVM: arm64: Set host stage 2 using kvm_nvhe_init_params
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Move the registers relevant to host stage 2 enablement to
kvm_nvhe_init_params to prepare the ground for enabling it in later
patches.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h   |  3 +++
 arch/arm64/kernel/asm-offsets.c    |  3 +++
 arch/arm64/kvm/arm.c               |  5 +++++
 arch/arm64/kvm/hyp/nvhe/hyp-init.S | 14 +++++++++-----
 arch/arm64/kvm/hyp/nvhe/switch.c   |  5 +----
 5 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index db20a9477870..6dce860f8bca 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -158,6 +158,9 @@ struct kvm_nvhe_init_params {
 	unsigned long tpidr_el2;
 	unsigned long stack_hyp_va;
 	phys_addr_t pgd_pa;
+	unsigned long hcr_el2;
+	unsigned long vttbr;
+	unsigned long vtcr;
 };
 
 /* Translate a kernel address @ptr into its equivalent linear mapping */
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index a36e2fc330d4..8930b42f6418 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -120,6 +120,9 @@ int main(void)
   DEFINE(NVHE_INIT_TPIDR_EL2,	offsetof(struct kvm_nvhe_init_params, tpidr_el2));
   DEFINE(NVHE_INIT_STACK_HYP_VA,	offsetof(struct kvm_nvhe_init_params, stack_hyp_va));
   DEFINE(NVHE_INIT_PGD_PA,	offsetof(struct kvm_nvhe_init_params, pgd_pa));
+  DEFINE(NVHE_INIT_HCR_EL2,	offsetof(struct kvm_nvhe_init_params, hcr_el2));
+  DEFINE(NVHE_INIT_VTTBR,	offsetof(struct kvm_nvhe_init_params, vttbr));
+  DEFINE(NVHE_INIT_VTCR,	offsetof(struct kvm_nvhe_init_params, vtcr));
 #endif
 #ifdef CONFIG_CPU_PM
   DEFINE(CPU_CTX_SP,		offsetof(struct cpu_suspend_ctx, sp));
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 5a97abdc3ba6..b6a818f88051 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1413,6 +1413,11 @@ static void cpu_prepare_hyp_mode(int cpu)
 
 	params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
 	params->pgd_pa = kvm_mmu_get_httbr();
+	if (is_protected_kvm_enabled())
+		params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
+	else
+		params->hcr_el2 = HCR_HOST_NVHE_FLAGS;
+	params->vttbr = params->vtcr = 0;
 
 	/*
 	 * Flush the init params from the data cache because the struct will
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index bc56ea92b812..f312672d895e 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -83,11 +83,6 @@ SYM_CODE_END(__kvm_hyp_init)
  * x0: struct kvm_nvhe_init_params PA
  */
 SYM_CODE_START_LOCAL(___kvm_hyp_init)
-alternative_if ARM64_KVM_PROTECTED_MODE
-	mov_q	x1, HCR_HOST_NVHE_PROTECTED_FLAGS
-	msr	hcr_el2, x1
-alternative_else_nop_endif
-
 	ldr	x1, [x0, #NVHE_INIT_TPIDR_EL2]
 	msr	tpidr_el2, x1
 
@@ -97,6 +92,15 @@ alternative_else_nop_endif
 	ldr	x1, [x0, #NVHE_INIT_MAIR_EL2]
 	msr	mair_el2, x1
 
+	ldr	x1, [x0, #NVHE_INIT_HCR_EL2]
+	msr	hcr_el2, x1
+
+	ldr	x1, [x0, #NVHE_INIT_VTTBR]
+	msr	vttbr_el2, x1
+
+	ldr	x1, [x0, #NVHE_INIT_VTCR]
+	msr	vtcr_el2, x1
+
 	ldr	x1, [x0, #NVHE_INIT_PGD_PA]
 	phys_to_ttbr x2, x1
 alternative_if ARM64_HAS_CNP
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index f3d0e9eca56c..979a76cdf9fb 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -97,10 +97,7 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 	mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
 
 	write_sysreg(mdcr_el2, mdcr_el2);
-	if (is_protected_kvm_enabled())
-		write_sysreg(HCR_HOST_NVHE_PROTECTED_FLAGS, hcr_el2);
-	else
-		write_sysreg(HCR_HOST_NVHE_FLAGS, hcr_el2);
+	write_sysreg(this_cpu_ptr(&kvm_init_params)->hcr_el2, hcr_el2);
 	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
 	write_sysreg(__kvm_hyp_host_vector, vbar_el2);
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 20/32] KVM: arm64: Refactor kvm_arm_setup_stage2()
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In order to re-use some of the stage 2 setup code at EL2, factor parts
of kvm_arm_setup_stage2() out into separate functions.

No functional change intended.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 26 +++++++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 32 +++++++++++++++++++++
 arch/arm64/kvm/reset.c               | 42 +++-------------------------
 3 files changed, 62 insertions(+), 38 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index a8255d55c168..21e0985d2e00 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -13,6 +13,16 @@
 
 #define KVM_PGTABLE_MAX_LEVELS		4U
 
+static inline u64 kvm_get_parange(u64 mmfr0)
+{
+	u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
+				ID_AA64MMFR0_PARANGE_SHIFT);
+	if (parange > ID_AA64MMFR0_PARANGE_MAX)
+		parange = ID_AA64MMFR0_PARANGE_MAX;
+
+	return parange;
+}
+
 typedef u64 kvm_pte_t;
 
 /**
@@ -159,6 +169,22 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt);
 int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 			enum kvm_pgtable_prot prot);
 
+/**
+ * kvm_get_vtcr() - Helper to construct VTCR_EL2
+ * @mmfr0:	Sanitized value of SYS_ID_AA64MMFR0_EL1 register.
+ * @mmfr1:	Sanitized value of SYS_ID_AA64MMFR1_EL1 register.
+ * @phys_shfit:	Value to set in VTCR_EL2.T0SZ.
+ *
+ * The VTCR value is common across all the physical CPUs on the system.
+ * We use system wide sanitised values to fill in different fields,
+ * except for Hardware Management of Access Flags. HA Flag is set
+ * unconditionally on all CPUs, as it is safe to run with or without
+ * the feature and the bit is RES0 on CPUs that don't support it.
+ *
+ * Return: VTCR_EL2 value
+ */
+u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
+
 /**
  * kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 3d79c8094cdd..296675e5600d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -9,6 +9,7 @@
 
 #include <linux/bitfield.h>
 #include <asm/kvm_pgtable.h>
+#include <asm/stage2_pgtable.h>
 
 #define KVM_PTE_VALID			BIT(0)
 
@@ -449,6 +450,37 @@ struct stage2_map_data {
 	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
+u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
+{
+	u64 vtcr = VTCR_EL2_FLAGS;
+	u8 lvls;
+
+	vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
+	vtcr |= VTCR_EL2_T0SZ(phys_shift);
+	/*
+	 * Use a minimum 2 level page table to prevent splitting
+	 * host PMD huge pages at stage2.
+	 */
+	lvls = stage2_pgtable_levels(phys_shift);
+	if (lvls < 2)
+		lvls = 2;
+	vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
+
+	/*
+	 * Enable the Hardware Access Flag management, unconditionally
+	 * on all CPUs. The features is RES0 on CPUs without the support
+	 * and must be ignored by the CPUs.
+	 */
+	vtcr |= VTCR_EL2_HA;
+
+	/* Set the vmid bits */
+	vtcr |= (get_vmid_bits(mmfr1) == 16) ?
+		VTCR_EL2_VS_16BIT :
+		VTCR_EL2_VS_8BIT;
+
+	return vtcr;
+}
+
 static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 				    struct stage2_map_data *data)
 {
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 47f3f035f3ea..6aae118c960a 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -332,19 +332,10 @@ int kvm_set_ipa_limit(void)
 	return 0;
 }
 
-/*
- * Configure the VTCR_EL2 for this VM. The VTCR value is common
- * across all the physical CPUs on the system. We use system wide
- * sanitised values to fill in different fields, except for Hardware
- * Management of Access Flags. HA Flag is set unconditionally on
- * all CPUs, as it is safe to run with or without the feature and
- * the bit is RES0 on CPUs that don't support it.
- */
 int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type)
 {
-	u64 vtcr = VTCR_EL2_FLAGS, mmfr0;
-	u32 parange, phys_shift;
-	u8 lvls;
+	u64 mmfr0, mmfr1;
+	u32 phys_shift;
 
 	if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
 		return -EINVAL;
@@ -359,33 +350,8 @@ int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type)
 	}
 
 	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
-	parange = cpuid_feature_extract_unsigned_field(mmfr0,
-				ID_AA64MMFR0_PARANGE_SHIFT);
-	if (parange > ID_AA64MMFR0_PARANGE_MAX)
-		parange = ID_AA64MMFR0_PARANGE_MAX;
-	vtcr |= parange << VTCR_EL2_PS_SHIFT;
-
-	vtcr |= VTCR_EL2_T0SZ(phys_shift);
-	/*
-	 * Use a minimum 2 level page table to prevent splitting
-	 * host PMD huge pages at stage2.
-	 */
-	lvls = stage2_pgtable_levels(phys_shift);
-	if (lvls < 2)
-		lvls = 2;
-	vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
-
-	/*
-	 * Enable the Hardware Access Flag management, unconditionally
-	 * on all CPUs. The features is RES0 on CPUs without the support
-	 * and must be ignored by the CPUs.
-	 */
-	vtcr |= VTCR_EL2_HA;
+	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
+	kvm->arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
 
-	/* Set the vmid bits */
-	vtcr |= (kvm_get_vmid_bits() == 16) ?
-		VTCR_EL2_VS_16BIT :
-		VTCR_EL2_VS_8BIT;
-	kvm->arch.vtcr = vtcr;
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 20/32] KVM: arm64: Refactor kvm_arm_setup_stage2()
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

In order to re-use some of the stage 2 setup code at EL2, factor parts
of kvm_arm_setup_stage2() out into separate functions.

No functional change intended.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 26 +++++++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 32 +++++++++++++++++++++
 arch/arm64/kvm/reset.c               | 42 +++-------------------------
 3 files changed, 62 insertions(+), 38 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index a8255d55c168..21e0985d2e00 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -13,6 +13,16 @@
 
 #define KVM_PGTABLE_MAX_LEVELS		4U
 
+static inline u64 kvm_get_parange(u64 mmfr0)
+{
+	u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
+				ID_AA64MMFR0_PARANGE_SHIFT);
+	if (parange > ID_AA64MMFR0_PARANGE_MAX)
+		parange = ID_AA64MMFR0_PARANGE_MAX;
+
+	return parange;
+}
+
 typedef u64 kvm_pte_t;
 
 /**
@@ -159,6 +169,22 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt);
 int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 			enum kvm_pgtable_prot prot);
 
+/**
+ * kvm_get_vtcr() - Helper to construct VTCR_EL2
+ * @mmfr0:	Sanitized value of SYS_ID_AA64MMFR0_EL1 register.
+ * @mmfr1:	Sanitized value of SYS_ID_AA64MMFR1_EL1 register.
+ * @phys_shfit:	Value to set in VTCR_EL2.T0SZ.
+ *
+ * The VTCR value is common across all the physical CPUs on the system.
+ * We use system wide sanitised values to fill in different fields,
+ * except for Hardware Management of Access Flags. HA Flag is set
+ * unconditionally on all CPUs, as it is safe to run with or without
+ * the feature and the bit is RES0 on CPUs that don't support it.
+ *
+ * Return: VTCR_EL2 value
+ */
+u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
+
 /**
  * kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 3d79c8094cdd..296675e5600d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -9,6 +9,7 @@
 
 #include <linux/bitfield.h>
 #include <asm/kvm_pgtable.h>
+#include <asm/stage2_pgtable.h>
 
 #define KVM_PTE_VALID			BIT(0)
 
@@ -449,6 +450,37 @@ struct stage2_map_data {
 	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
+u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
+{
+	u64 vtcr = VTCR_EL2_FLAGS;
+	u8 lvls;
+
+	vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
+	vtcr |= VTCR_EL2_T0SZ(phys_shift);
+	/*
+	 * Use a minimum 2 level page table to prevent splitting
+	 * host PMD huge pages at stage2.
+	 */
+	lvls = stage2_pgtable_levels(phys_shift);
+	if (lvls < 2)
+		lvls = 2;
+	vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
+
+	/*
+	 * Enable the Hardware Access Flag management, unconditionally
+	 * on all CPUs. The features is RES0 on CPUs without the support
+	 * and must be ignored by the CPUs.
+	 */
+	vtcr |= VTCR_EL2_HA;
+
+	/* Set the vmid bits */
+	vtcr |= (get_vmid_bits(mmfr1) == 16) ?
+		VTCR_EL2_VS_16BIT :
+		VTCR_EL2_VS_8BIT;
+
+	return vtcr;
+}
+
 static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 				    struct stage2_map_data *data)
 {
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 47f3f035f3ea..6aae118c960a 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -332,19 +332,10 @@ int kvm_set_ipa_limit(void)
 	return 0;
 }
 
-/*
- * Configure the VTCR_EL2 for this VM. The VTCR value is common
- * across all the physical CPUs on the system. We use system wide
- * sanitised values to fill in different fields, except for Hardware
- * Management of Access Flags. HA Flag is set unconditionally on
- * all CPUs, as it is safe to run with or without the feature and
- * the bit is RES0 on CPUs that don't support it.
- */
 int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type)
 {
-	u64 vtcr = VTCR_EL2_FLAGS, mmfr0;
-	u32 parange, phys_shift;
-	u8 lvls;
+	u64 mmfr0, mmfr1;
+	u32 phys_shift;
 
 	if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
 		return -EINVAL;
@@ -359,33 +350,8 @@ int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type)
 	}
 
 	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
-	parange = cpuid_feature_extract_unsigned_field(mmfr0,
-				ID_AA64MMFR0_PARANGE_SHIFT);
-	if (parange > ID_AA64MMFR0_PARANGE_MAX)
-		parange = ID_AA64MMFR0_PARANGE_MAX;
-	vtcr |= parange << VTCR_EL2_PS_SHIFT;
-
-	vtcr |= VTCR_EL2_T0SZ(phys_shift);
-	/*
-	 * Use a minimum 2 level page table to prevent splitting
-	 * host PMD huge pages at stage2.
-	 */
-	lvls = stage2_pgtable_levels(phys_shift);
-	if (lvls < 2)
-		lvls = 2;
-	vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
-
-	/*
-	 * Enable the Hardware Access Flag management, unconditionally
-	 * on all CPUs. The features is RES0 on CPUs without the support
-	 * and must be ignored by the CPUs.
-	 */
-	vtcr |= VTCR_EL2_HA;
+	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
+	kvm->arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
 
-	/* Set the vmid bits */
-	vtcr |= (kvm_get_vmid_bits() == 16) ?
-		VTCR_EL2_VS_16BIT :
-		VTCR_EL2_VS_8BIT;
-	kvm->arch.vtcr = vtcr;
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 20/32] KVM: arm64: Refactor kvm_arm_setup_stage2()
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In order to re-use some of the stage 2 setup code at EL2, factor parts
of kvm_arm_setup_stage2() out into separate functions.

No functional change intended.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 26 +++++++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 32 +++++++++++++++++++++
 arch/arm64/kvm/reset.c               | 42 +++-------------------------
 3 files changed, 62 insertions(+), 38 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index a8255d55c168..21e0985d2e00 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -13,6 +13,16 @@
 
 #define KVM_PGTABLE_MAX_LEVELS		4U
 
+static inline u64 kvm_get_parange(u64 mmfr0)
+{
+	u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
+				ID_AA64MMFR0_PARANGE_SHIFT);
+	if (parange > ID_AA64MMFR0_PARANGE_MAX)
+		parange = ID_AA64MMFR0_PARANGE_MAX;
+
+	return parange;
+}
+
 typedef u64 kvm_pte_t;
 
 /**
@@ -159,6 +169,22 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt);
 int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 			enum kvm_pgtable_prot prot);
 
+/**
+ * kvm_get_vtcr() - Helper to construct VTCR_EL2
+ * @mmfr0:	Sanitized value of SYS_ID_AA64MMFR0_EL1 register.
+ * @mmfr1:	Sanitized value of SYS_ID_AA64MMFR1_EL1 register.
+ * @phys_shfit:	Value to set in VTCR_EL2.T0SZ.
+ *
+ * The VTCR value is common across all the physical CPUs on the system.
+ * We use system wide sanitised values to fill in different fields,
+ * except for Hardware Management of Access Flags. HA Flag is set
+ * unconditionally on all CPUs, as it is safe to run with or without
+ * the feature and the bit is RES0 on CPUs that don't support it.
+ *
+ * Return: VTCR_EL2 value
+ */
+u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
+
 /**
  * kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 3d79c8094cdd..296675e5600d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -9,6 +9,7 @@
 
 #include <linux/bitfield.h>
 #include <asm/kvm_pgtable.h>
+#include <asm/stage2_pgtable.h>
 
 #define KVM_PTE_VALID			BIT(0)
 
@@ -449,6 +450,37 @@ struct stage2_map_data {
 	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
+u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
+{
+	u64 vtcr = VTCR_EL2_FLAGS;
+	u8 lvls;
+
+	vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
+	vtcr |= VTCR_EL2_T0SZ(phys_shift);
+	/*
+	 * Use a minimum 2 level page table to prevent splitting
+	 * host PMD huge pages at stage2.
+	 */
+	lvls = stage2_pgtable_levels(phys_shift);
+	if (lvls < 2)
+		lvls = 2;
+	vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
+
+	/*
+	 * Enable the Hardware Access Flag management, unconditionally
+	 * on all CPUs. The features is RES0 on CPUs without the support
+	 * and must be ignored by the CPUs.
+	 */
+	vtcr |= VTCR_EL2_HA;
+
+	/* Set the vmid bits */
+	vtcr |= (get_vmid_bits(mmfr1) == 16) ?
+		VTCR_EL2_VS_16BIT :
+		VTCR_EL2_VS_8BIT;
+
+	return vtcr;
+}
+
 static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 				    struct stage2_map_data *data)
 {
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 47f3f035f3ea..6aae118c960a 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -332,19 +332,10 @@ int kvm_set_ipa_limit(void)
 	return 0;
 }
 
-/*
- * Configure the VTCR_EL2 for this VM. The VTCR value is common
- * across all the physical CPUs on the system. We use system wide
- * sanitised values to fill in different fields, except for Hardware
- * Management of Access Flags. HA Flag is set unconditionally on
- * all CPUs, as it is safe to run with or without the feature and
- * the bit is RES0 on CPUs that don't support it.
- */
 int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type)
 {
-	u64 vtcr = VTCR_EL2_FLAGS, mmfr0;
-	u32 parange, phys_shift;
-	u8 lvls;
+	u64 mmfr0, mmfr1;
+	u32 phys_shift;
 
 	if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
 		return -EINVAL;
@@ -359,33 +350,8 @@ int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type)
 	}
 
 	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
-	parange = cpuid_feature_extract_unsigned_field(mmfr0,
-				ID_AA64MMFR0_PARANGE_SHIFT);
-	if (parange > ID_AA64MMFR0_PARANGE_MAX)
-		parange = ID_AA64MMFR0_PARANGE_MAX;
-	vtcr |= parange << VTCR_EL2_PS_SHIFT;
-
-	vtcr |= VTCR_EL2_T0SZ(phys_shift);
-	/*
-	 * Use a minimum 2 level page table to prevent splitting
-	 * host PMD huge pages at stage2.
-	 */
-	lvls = stage2_pgtable_levels(phys_shift);
-	if (lvls < 2)
-		lvls = 2;
-	vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
-
-	/*
-	 * Enable the Hardware Access Flag management, unconditionally
-	 * on all CPUs. The features is RES0 on CPUs without the support
-	 * and must be ignored by the CPUs.
-	 */
-	vtcr |= VTCR_EL2_HA;
+	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
+	kvm->arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
 
-	/* Set the vmid bits */
-	vtcr |= (kvm_get_vmid_bits() == 16) ?
-		VTCR_EL2_VS_16BIT :
-		VTCR_EL2_VS_8BIT;
-	kvm->arch.vtcr = vtcr;
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 21/32] KVM: arm64: Refactor __load_guest_stage2()
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Refactor __load_guest_stage2() to introduce __load_stage2() which will
be re-used when loading the host stage 2.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 6f743e20cb06..9d64fa73ee67 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -270,9 +270,9 @@ static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
  * Must be called from hyp code running at EL2 with an updated VTTBR
  * and interrupts disabled.
  */
-static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
+static __always_inline void __load_stage2(struct kvm_s2_mmu *mmu, unsigned long vtcr)
 {
-	write_sysreg(kern_hyp_va(mmu->arch)->vtcr, vtcr_el2);
+	write_sysreg(vtcr, vtcr_el2);
 	write_sysreg(kvm_get_vttbr(mmu), vttbr_el2);
 
 	/*
@@ -283,6 +283,11 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
 	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
 }
 
+static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
+{
+	__load_stage2(mmu, kern_hyp_va(mmu->arch)->vtcr);
+}
+
 static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
 {
 	return container_of(mmu->arch, struct kvm, arch);
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 21/32] KVM: arm64: Refactor __load_guest_stage2()
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

Refactor __load_guest_stage2() to introduce __load_stage2() which will
be re-used when loading the host stage 2.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 6f743e20cb06..9d64fa73ee67 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -270,9 +270,9 @@ static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
  * Must be called from hyp code running at EL2 with an updated VTTBR
  * and interrupts disabled.
  */
-static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
+static __always_inline void __load_stage2(struct kvm_s2_mmu *mmu, unsigned long vtcr)
 {
-	write_sysreg(kern_hyp_va(mmu->arch)->vtcr, vtcr_el2);
+	write_sysreg(vtcr, vtcr_el2);
 	write_sysreg(kvm_get_vttbr(mmu), vttbr_el2);
 
 	/*
@@ -283,6 +283,11 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
 	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
 }
 
+static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
+{
+	__load_stage2(mmu, kern_hyp_va(mmu->arch)->vtcr);
+}
+
 static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
 {
 	return container_of(mmu->arch, struct kvm, arch);
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 21/32] KVM: arm64: Refactor __load_guest_stage2()
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Refactor __load_guest_stage2() to introduce __load_stage2() which will
be re-used when loading the host stage 2.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 6f743e20cb06..9d64fa73ee67 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -270,9 +270,9 @@ static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
  * Must be called from hyp code running at EL2 with an updated VTTBR
  * and interrupts disabled.
  */
-static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
+static __always_inline void __load_stage2(struct kvm_s2_mmu *mmu, unsigned long vtcr)
 {
-	write_sysreg(kern_hyp_va(mmu->arch)->vtcr, vtcr_el2);
+	write_sysreg(vtcr, vtcr_el2);
 	write_sysreg(kvm_get_vttbr(mmu), vttbr_el2);
 
 	/*
@@ -283,6 +283,11 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
 	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
 }
 
+static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
+{
+	__load_stage2(mmu, kern_hyp_va(mmu->arch)->vtcr);
+}
+
 static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
 {
 	return container_of(mmu->arch, struct kvm, arch);
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 22/32] KVM: arm64: Refactor __populate_fault_info()
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Refactor __populate_fault_info() to introduce __get_fault_info() which
will be used once the host is wrapped in a stage 2.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/hyp/switch.h | 37 ++++++++++++++-----------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 6c1f51f25eb3..1f017c9851bb 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -160,19 +160,9 @@ static inline bool __translate_far_to_hpfar(u64 far, u64 *hpfar)
 	return true;
 }
 
-static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
+static inline bool __get_fault_info(u64 esr, struct kvm_vcpu_fault_info *fault)
 {
-	u8 ec;
-	u64 esr;
-	u64 hpfar, far;
-
-	esr = vcpu->arch.fault.esr_el2;
-	ec = ESR_ELx_EC(esr);
-
-	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
-		return true;
-
-	far = read_sysreg_el2(SYS_FAR);
+	fault->far_el2 = read_sysreg_el2(SYS_FAR);
 
 	/*
 	 * The HPFAR can be invalid if the stage 2 fault did not
@@ -188,14 +178,29 @@ static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
 	if (!(esr & ESR_ELx_S1PTW) &&
 	    (cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
 	     (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
-		if (!__translate_far_to_hpfar(far, &hpfar))
+		if (!__translate_far_to_hpfar(fault->far_el2, &fault->hpfar_el2))
 			return false;
 	} else {
-		hpfar = read_sysreg(hpfar_el2);
+		fault->hpfar_el2 = read_sysreg(hpfar_el2);
 	}
 
-	vcpu->arch.fault.far_el2 = far;
-	vcpu->arch.fault.hpfar_el2 = hpfar;
+	return true;
+}
+
+static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
+{
+	u8 ec;
+	u64 esr;
+
+	esr = vcpu->arch.fault.esr_el2;
+	ec = ESR_ELx_EC(esr);
+
+	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
+		return true;
+
+	if (!__get_fault_info(esr, &vcpu->arch.fault))
+		return false;
+
 	return true;
 }
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 22/32] KVM: arm64: Refactor __populate_fault_info()
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

Refactor __populate_fault_info() to introduce __get_fault_info() which
will be used once the host is wrapped in a stage 2.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/hyp/switch.h | 37 ++++++++++++++-----------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 6c1f51f25eb3..1f017c9851bb 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -160,19 +160,9 @@ static inline bool __translate_far_to_hpfar(u64 far, u64 *hpfar)
 	return true;
 }
 
-static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
+static inline bool __get_fault_info(u64 esr, struct kvm_vcpu_fault_info *fault)
 {
-	u8 ec;
-	u64 esr;
-	u64 hpfar, far;
-
-	esr = vcpu->arch.fault.esr_el2;
-	ec = ESR_ELx_EC(esr);
-
-	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
-		return true;
-
-	far = read_sysreg_el2(SYS_FAR);
+	fault->far_el2 = read_sysreg_el2(SYS_FAR);
 
 	/*
 	 * The HPFAR can be invalid if the stage 2 fault did not
@@ -188,14 +178,29 @@ static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
 	if (!(esr & ESR_ELx_S1PTW) &&
 	    (cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
 	     (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
-		if (!__translate_far_to_hpfar(far, &hpfar))
+		if (!__translate_far_to_hpfar(fault->far_el2, &fault->hpfar_el2))
 			return false;
 	} else {
-		hpfar = read_sysreg(hpfar_el2);
+		fault->hpfar_el2 = read_sysreg(hpfar_el2);
 	}
 
-	vcpu->arch.fault.far_el2 = far;
-	vcpu->arch.fault.hpfar_el2 = hpfar;
+	return true;
+}
+
+static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
+{
+	u8 ec;
+	u64 esr;
+
+	esr = vcpu->arch.fault.esr_el2;
+	ec = ESR_ELx_EC(esr);
+
+	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
+		return true;
+
+	if (!__get_fault_info(esr, &vcpu->arch.fault))
+		return false;
+
 	return true;
 }
 
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 22/32] KVM: arm64: Refactor __populate_fault_info()
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Refactor __populate_fault_info() to introduce __get_fault_info() which
will be used once the host is wrapped in a stage 2.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/hyp/switch.h | 37 ++++++++++++++-----------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 6c1f51f25eb3..1f017c9851bb 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -160,19 +160,9 @@ static inline bool __translate_far_to_hpfar(u64 far, u64 *hpfar)
 	return true;
 }
 
-static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
+static inline bool __get_fault_info(u64 esr, struct kvm_vcpu_fault_info *fault)
 {
-	u8 ec;
-	u64 esr;
-	u64 hpfar, far;
-
-	esr = vcpu->arch.fault.esr_el2;
-	ec = ESR_ELx_EC(esr);
-
-	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
-		return true;
-
-	far = read_sysreg_el2(SYS_FAR);
+	fault->far_el2 = read_sysreg_el2(SYS_FAR);
 
 	/*
 	 * The HPFAR can be invalid if the stage 2 fault did not
@@ -188,14 +178,29 @@ static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
 	if (!(esr & ESR_ELx_S1PTW) &&
 	    (cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
 	     (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
-		if (!__translate_far_to_hpfar(far, &hpfar))
+		if (!__translate_far_to_hpfar(fault->far_el2, &fault->hpfar_el2))
 			return false;
 	} else {
-		hpfar = read_sysreg(hpfar_el2);
+		fault->hpfar_el2 = read_sysreg(hpfar_el2);
 	}
 
-	vcpu->arch.fault.far_el2 = far;
-	vcpu->arch.fault.hpfar_el2 = hpfar;
+	return true;
+}
+
+static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
+{
+	u8 ec;
+	u64 esr;
+
+	esr = vcpu->arch.fault.esr_el2;
+	ec = ESR_ELx_EC(esr);
+
+	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
+		return true;
+
+	if (!__get_fault_info(esr, &vcpu->arch.fault))
+		return false;
+
 	return true;
 }
 
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 23/32] KVM: arm64: Make memcache anonymous in pgtable allocator
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

The current stage2 page-table allocator uses a memcache to get
pre-allocated pages when it needs any. To allow re-using this code at
EL2 which uses a concept of memory pools, make the memcache argument of
kvm_pgtable_stage2_map() anonymous, and let the mm_ops zalloc_page()
callbacks use it the way they need to.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 6 +++---
 arch/arm64/kvm/hyp/pgtable.c         | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 21e0985d2e00..9935dbae2cc1 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -213,8 +213,8 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
  * @size:	Size of the mapping.
  * @phys:	Physical address of the memory to map.
  * @prot:	Permissions and attributes for the mapping.
- * @mc:		Cache of pre-allocated GFP_PGTABLE_USER memory from which to
- *		allocate page-table pages.
+ * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
+ *		page-table pages.
  *
  * The offset of @addr within a page is ignored, @size is rounded-up to
  * the next page boundary and @phys is rounded-down to the previous page
@@ -236,7 +236,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
  */
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
-			   struct kvm_mmu_memory_cache *mc);
+			   void *mc);
 
 /**
  * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 296675e5600d..bdd6e3d4eeb6 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -445,7 +445,7 @@ struct stage2_map_data {
 	kvm_pte_t			*anchor;
 
 	struct kvm_s2_mmu		*mmu;
-	struct kvm_mmu_memory_cache	*memcache;
+	void				*memcache;
 
 	struct kvm_pgtable_mm_ops	*mm_ops;
 };
@@ -669,7 +669,7 @@ static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
-			   struct kvm_mmu_memory_cache *mc)
+			   void *mc)
 {
 	int ret;
 	struct stage2_map_data map_data = {
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 23/32] KVM: arm64: Make memcache anonymous in pgtable allocator
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

The current stage2 page-table allocator uses a memcache to get
pre-allocated pages when it needs any. To allow re-using this code at
EL2 which uses a concept of memory pools, make the memcache argument of
kvm_pgtable_stage2_map() anonymous, and let the mm_ops zalloc_page()
callbacks use it the way they need to.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 6 +++---
 arch/arm64/kvm/hyp/pgtable.c         | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 21e0985d2e00..9935dbae2cc1 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -213,8 +213,8 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
  * @size:	Size of the mapping.
  * @phys:	Physical address of the memory to map.
  * @prot:	Permissions and attributes for the mapping.
- * @mc:		Cache of pre-allocated GFP_PGTABLE_USER memory from which to
- *		allocate page-table pages.
+ * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
+ *		page-table pages.
  *
  * The offset of @addr within a page is ignored, @size is rounded-up to
  * the next page boundary and @phys is rounded-down to the previous page
@@ -236,7 +236,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
  */
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
-			   struct kvm_mmu_memory_cache *mc);
+			   void *mc);
 
 /**
  * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 296675e5600d..bdd6e3d4eeb6 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -445,7 +445,7 @@ struct stage2_map_data {
 	kvm_pte_t			*anchor;
 
 	struct kvm_s2_mmu		*mmu;
-	struct kvm_mmu_memory_cache	*memcache;
+	void				*memcache;
 
 	struct kvm_pgtable_mm_ops	*mm_ops;
 };
@@ -669,7 +669,7 @@ static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
-			   struct kvm_mmu_memory_cache *mc)
+			   void *mc)
 {
 	int ret;
 	struct stage2_map_data map_data = {
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 23/32] KVM: arm64: Make memcache anonymous in pgtable allocator
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

The current stage2 page-table allocator uses a memcache to get
pre-allocated pages when it needs any. To allow re-using this code at
EL2 which uses a concept of memory pools, make the memcache argument of
kvm_pgtable_stage2_map() anonymous, and let the mm_ops zalloc_page()
callbacks use it the way they need to.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 6 +++---
 arch/arm64/kvm/hyp/pgtable.c         | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 21e0985d2e00..9935dbae2cc1 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -213,8 +213,8 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
  * @size:	Size of the mapping.
  * @phys:	Physical address of the memory to map.
  * @prot:	Permissions and attributes for the mapping.
- * @mc:		Cache of pre-allocated GFP_PGTABLE_USER memory from which to
- *		allocate page-table pages.
+ * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
+ *		page-table pages.
  *
  * The offset of @addr within a page is ignored, @size is rounded-up to
  * the next page boundary and @phys is rounded-down to the previous page
@@ -236,7 +236,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
  */
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
-			   struct kvm_mmu_memory_cache *mc);
+			   void *mc);
 
 /**
  * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 296675e5600d..bdd6e3d4eeb6 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -445,7 +445,7 @@ struct stage2_map_data {
 	kvm_pte_t			*anchor;
 
 	struct kvm_s2_mmu		*mmu;
-	struct kvm_mmu_memory_cache	*memcache;
+	void				*memcache;
 
 	struct kvm_pgtable_mm_ops	*mm_ops;
 };
@@ -669,7 +669,7 @@ static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
-			   struct kvm_mmu_memory_cache *mc)
+			   void *mc)
 {
 	int ret;
 	struct stage2_map_data map_data = {
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 24/32] KVM: arm64: Reserve memory for host stage 2
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Extend the memory pool allocated for the hypervisor to include enough
pages to map all of memory at page granularity for the host stage 2.
While at it, also reserve some memory for device mappings.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/mm.h | 23 ++++++++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/setup.c      | 12 ++++++++++++
 arch/arm64/kvm/hyp/reserved_mem.c    |  2 ++
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index ac0f7fcffd08..411a35db949c 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -53,7 +53,7 @@ static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 	return total;
 }
 
-static inline unsigned long hyp_s1_pgtable_pages(void)
+static inline unsigned long __hyp_pgtable_total_pages(void)
 {
 	unsigned long res = 0, i;
 
@@ -63,9 +63,30 @@ static inline unsigned long hyp_s1_pgtable_pages(void)
 		res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT);
 	}
 
+	return res;
+}
+
+static inline unsigned long hyp_s1_pgtable_pages(void)
+{
+	unsigned long res;
+
+	res = __hyp_pgtable_total_pages();
+
 	/* Allow 1 GiB for private mappings */
 	res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
 
 	return res;
 }
+
+static inline unsigned long host_s2_mem_pgtable_pages(void)
+{
+	return __hyp_pgtable_total_pages() + 16;
+}
+
+static inline unsigned long host_s2_dev_pgtable_pages(void)
+{
+	/* Allow 1 GiB for private mappings */
+	return __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
+}
+
 #endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 178ec06f2b49..7e923b25271c 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -24,6 +24,8 @@ unsigned long hyp_nr_cpus;
 
 static void *vmemmap_base;
 static void *hyp_pgt_base;
+static void *host_s2_mem_pgt_base;
+static void *host_s2_dev_pgt_base;
 
 static int divide_memory_pool(void *virt, unsigned long size)
 {
@@ -42,6 +44,16 @@ static int divide_memory_pool(void *virt, unsigned long size)
 	if (!hyp_pgt_base)
 		return -ENOMEM;
 
+	nr_pages = host_s2_mem_pgtable_pages();
+	host_s2_mem_pgt_base = hyp_early_alloc_contig(nr_pages);
+	if (!host_s2_mem_pgt_base)
+		return -ENOMEM;
+
+	nr_pages = host_s2_dev_pgtable_pages();
+	host_s2_dev_pgt_base = hyp_early_alloc_contig(nr_pages);
+	if (!host_s2_dev_pgt_base)
+		return -ENOMEM;
+
 	return 0;
 }
 
diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
index 9bc6a6d27904..fd42705a3c26 100644
--- a/arch/arm64/kvm/hyp/reserved_mem.c
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -52,6 +52,8 @@ void __init kvm_hyp_reserve(void)
 	}
 
 	hyp_mem_pages += hyp_s1_pgtable_pages();
+	hyp_mem_pages += host_s2_mem_pgtable_pages();
+	hyp_mem_pages += host_s2_dev_pgtable_pages();
 
 	/*
 	 * The hyp_vmemmap needs to be backed by pages, but these pages
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 24/32] KVM: arm64: Reserve memory for host stage 2
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

Extend the memory pool allocated for the hypervisor to include enough
pages to map all of memory at page granularity for the host stage 2.
While at it, also reserve some memory for device mappings.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/mm.h | 23 ++++++++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/setup.c      | 12 ++++++++++++
 arch/arm64/kvm/hyp/reserved_mem.c    |  2 ++
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index ac0f7fcffd08..411a35db949c 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -53,7 +53,7 @@ static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 	return total;
 }
 
-static inline unsigned long hyp_s1_pgtable_pages(void)
+static inline unsigned long __hyp_pgtable_total_pages(void)
 {
 	unsigned long res = 0, i;
 
@@ -63,9 +63,30 @@ static inline unsigned long hyp_s1_pgtable_pages(void)
 		res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT);
 	}
 
+	return res;
+}
+
+static inline unsigned long hyp_s1_pgtable_pages(void)
+{
+	unsigned long res;
+
+	res = __hyp_pgtable_total_pages();
+
 	/* Allow 1 GiB for private mappings */
 	res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
 
 	return res;
 }
+
+static inline unsigned long host_s2_mem_pgtable_pages(void)
+{
+	return __hyp_pgtable_total_pages() + 16;
+}
+
+static inline unsigned long host_s2_dev_pgtable_pages(void)
+{
+	/* Allow 1 GiB for private mappings */
+	return __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
+}
+
 #endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 178ec06f2b49..7e923b25271c 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -24,6 +24,8 @@ unsigned long hyp_nr_cpus;
 
 static void *vmemmap_base;
 static void *hyp_pgt_base;
+static void *host_s2_mem_pgt_base;
+static void *host_s2_dev_pgt_base;
 
 static int divide_memory_pool(void *virt, unsigned long size)
 {
@@ -42,6 +44,16 @@ static int divide_memory_pool(void *virt, unsigned long size)
 	if (!hyp_pgt_base)
 		return -ENOMEM;
 
+	nr_pages = host_s2_mem_pgtable_pages();
+	host_s2_mem_pgt_base = hyp_early_alloc_contig(nr_pages);
+	if (!host_s2_mem_pgt_base)
+		return -ENOMEM;
+
+	nr_pages = host_s2_dev_pgtable_pages();
+	host_s2_dev_pgt_base = hyp_early_alloc_contig(nr_pages);
+	if (!host_s2_dev_pgt_base)
+		return -ENOMEM;
+
 	return 0;
 }
 
diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
index 9bc6a6d27904..fd42705a3c26 100644
--- a/arch/arm64/kvm/hyp/reserved_mem.c
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -52,6 +52,8 @@ void __init kvm_hyp_reserve(void)
 	}
 
 	hyp_mem_pages += hyp_s1_pgtable_pages();
+	hyp_mem_pages += host_s2_mem_pgtable_pages();
+	hyp_mem_pages += host_s2_dev_pgtable_pages();
 
 	/*
 	 * The hyp_vmemmap needs to be backed by pages, but these pages
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 24/32] KVM: arm64: Reserve memory for host stage 2
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Extend the memory pool allocated for the hypervisor to include enough
pages to map all of memory at page granularity for the host stage 2.
While at it, also reserve some memory for device mappings.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/mm.h | 23 ++++++++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/setup.c      | 12 ++++++++++++
 arch/arm64/kvm/hyp/reserved_mem.c    |  2 ++
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index ac0f7fcffd08..411a35db949c 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -53,7 +53,7 @@ static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 	return total;
 }
 
-static inline unsigned long hyp_s1_pgtable_pages(void)
+static inline unsigned long __hyp_pgtable_total_pages(void)
 {
 	unsigned long res = 0, i;
 
@@ -63,9 +63,30 @@ static inline unsigned long hyp_s1_pgtable_pages(void)
 		res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT);
 	}
 
+	return res;
+}
+
+static inline unsigned long hyp_s1_pgtable_pages(void)
+{
+	unsigned long res;
+
+	res = __hyp_pgtable_total_pages();
+
 	/* Allow 1 GiB for private mappings */
 	res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
 
 	return res;
 }
+
+static inline unsigned long host_s2_mem_pgtable_pages(void)
+{
+	return __hyp_pgtable_total_pages() + 16;
+}
+
+static inline unsigned long host_s2_dev_pgtable_pages(void)
+{
+	/* Allow 1 GiB for private mappings */
+	return __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
+}
+
 #endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 178ec06f2b49..7e923b25271c 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -24,6 +24,8 @@ unsigned long hyp_nr_cpus;
 
 static void *vmemmap_base;
 static void *hyp_pgt_base;
+static void *host_s2_mem_pgt_base;
+static void *host_s2_dev_pgt_base;
 
 static int divide_memory_pool(void *virt, unsigned long size)
 {
@@ -42,6 +44,16 @@ static int divide_memory_pool(void *virt, unsigned long size)
 	if (!hyp_pgt_base)
 		return -ENOMEM;
 
+	nr_pages = host_s2_mem_pgtable_pages();
+	host_s2_mem_pgt_base = hyp_early_alloc_contig(nr_pages);
+	if (!host_s2_mem_pgt_base)
+		return -ENOMEM;
+
+	nr_pages = host_s2_dev_pgtable_pages();
+	host_s2_dev_pgt_base = hyp_early_alloc_contig(nr_pages);
+	if (!host_s2_dev_pgt_base)
+		return -ENOMEM;
+
 	return 0;
 }
 
diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
index 9bc6a6d27904..fd42705a3c26 100644
--- a/arch/arm64/kvm/hyp/reserved_mem.c
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -52,6 +52,8 @@ void __init kvm_hyp_reserve(void)
 	}
 
 	hyp_mem_pages += hyp_s1_pgtable_pages();
+	hyp_mem_pages += host_s2_mem_pgtable_pages();
+	hyp_mem_pages += host_s2_dev_pgtable_pages();
 
 	/*
 	 * The hyp_vmemmap needs to be backed by pages, but these pages
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 25/32] KVM: arm64: Sort the hypervisor memblocks
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

We will soon need to check if a Physical Address belongs to a memblock
at EL2, so make sure to sort them so this can be done efficiently.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/reserved_mem.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
index fd42705a3c26..83ca23ac259b 100644
--- a/arch/arm64/kvm/hyp/reserved_mem.c
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -6,6 +6,7 @@
 
 #include <linux/kvm_host.h>
 #include <linux/memblock.h>
+#include <linux/sort.h>
 
 #include <asm/kvm_host.h>
 
@@ -18,6 +19,23 @@ static unsigned int *hyp_memblock_nr_ptr = &kvm_nvhe_sym(hyp_memblock_nr);
 phys_addr_t hyp_mem_base;
 phys_addr_t hyp_mem_size;
 
+static int cmp_hyp_memblock(const void *p1, const void *p2)
+{
+	const struct memblock_region *r1 = p1;
+	const struct memblock_region *r2 = p2;
+
+	return r1->base < r2->base ? -1 : (r1->base > r2->base);
+}
+
+static void __init sort_memblock_regions(void)
+{
+	sort(hyp_memory,
+	     *hyp_memblock_nr_ptr,
+	     sizeof(struct memblock_region),
+	     cmp_hyp_memblock,
+	     NULL);
+}
+
 static int __init register_memblock_regions(void)
 {
 	struct memblock_region *reg;
@@ -29,6 +47,7 @@ static int __init register_memblock_regions(void)
 		hyp_memory[*hyp_memblock_nr_ptr] = *reg;
 		(*hyp_memblock_nr_ptr)++;
 	}
+	sort_memblock_regions();
 
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 25/32] KVM: arm64: Sort the hypervisor memblocks
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

We will soon need to check if a Physical Address belongs to a memblock
at EL2, so make sure to sort them so this can be done efficiently.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/reserved_mem.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
index fd42705a3c26..83ca23ac259b 100644
--- a/arch/arm64/kvm/hyp/reserved_mem.c
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -6,6 +6,7 @@
 
 #include <linux/kvm_host.h>
 #include <linux/memblock.h>
+#include <linux/sort.h>
 
 #include <asm/kvm_host.h>
 
@@ -18,6 +19,23 @@ static unsigned int *hyp_memblock_nr_ptr = &kvm_nvhe_sym(hyp_memblock_nr);
 phys_addr_t hyp_mem_base;
 phys_addr_t hyp_mem_size;
 
+static int cmp_hyp_memblock(const void *p1, const void *p2)
+{
+	const struct memblock_region *r1 = p1;
+	const struct memblock_region *r2 = p2;
+
+	return r1->base < r2->base ? -1 : (r1->base > r2->base);
+}
+
+static void __init sort_memblock_regions(void)
+{
+	sort(hyp_memory,
+	     *hyp_memblock_nr_ptr,
+	     sizeof(struct memblock_region),
+	     cmp_hyp_memblock,
+	     NULL);
+}
+
 static int __init register_memblock_regions(void)
 {
 	struct memblock_region *reg;
@@ -29,6 +47,7 @@ static int __init register_memblock_regions(void)
 		hyp_memory[*hyp_memblock_nr_ptr] = *reg;
 		(*hyp_memblock_nr_ptr)++;
 	}
+	sort_memblock_regions();
 
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 25/32] KVM: arm64: Sort the hypervisor memblocks
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

We will soon need to check if a Physical Address belongs to a memblock
at EL2, so make sure to sort them so this can be done efficiently.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/reserved_mem.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c
index fd42705a3c26..83ca23ac259b 100644
--- a/arch/arm64/kvm/hyp/reserved_mem.c
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -6,6 +6,7 @@
 
 #include <linux/kvm_host.h>
 #include <linux/memblock.h>
+#include <linux/sort.h>
 
 #include <asm/kvm_host.h>
 
@@ -18,6 +19,23 @@ static unsigned int *hyp_memblock_nr_ptr = &kvm_nvhe_sym(hyp_memblock_nr);
 phys_addr_t hyp_mem_base;
 phys_addr_t hyp_mem_size;
 
+static int cmp_hyp_memblock(const void *p1, const void *p2)
+{
+	const struct memblock_region *r1 = p1;
+	const struct memblock_region *r2 = p2;
+
+	return r1->base < r2->base ? -1 : (r1->base > r2->base);
+}
+
+static void __init sort_memblock_regions(void)
+{
+	sort(hyp_memory,
+	     *hyp_memblock_nr_ptr,
+	     sizeof(struct memblock_region),
+	     cmp_hyp_memblock,
+	     NULL);
+}
+
 static int __init register_memblock_regions(void)
 {
 	struct memblock_region *reg;
@@ -29,6 +47,7 @@ static int __init register_memblock_regions(void)
 		hyp_memory[*hyp_memblock_nr_ptr] = *reg;
 		(*hyp_memblock_nr_ptr)++;
 	}
+	sort_memblock_regions();
 
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Once we start unmapping portions of memory from the host stage 2 (such
as e.g. the hypervisor memory sections, or pages that belong to
protected guests), we will need a way to track page ownership. And
given that all mappings in the host stage 2 will be identity-mapped, we
can use the host stage 2 page-table itself as a simplistic rmap.

As a first step towards this, introduce a new protection attribute
in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which
allows to annotate portions of the IPA space as inaccessible. For
simplicity, PROT_NONE mappings are created as invalid mappings with a
software bit set.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h |  2 ++
 arch/arm64/kvm/hyp/pgtable.c         | 26 ++++++++++++++++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 9935dbae2cc1..c9f6ed76e0ad 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -80,6 +80,7 @@ struct kvm_pgtable {
  * @KVM_PGTABLE_PROT_W:		Write permission.
  * @KVM_PGTABLE_PROT_R:		Read permission.
  * @KVM_PGTABLE_PROT_DEVICE:	Device attributes.
+ * @KVM_PGTABLE_PROT_NONE:	No permission.
  */
 enum kvm_pgtable_prot {
 	KVM_PGTABLE_PROT_X			= BIT(0),
@@ -87,6 +88,7 @@ enum kvm_pgtable_prot {
 	KVM_PGTABLE_PROT_R			= BIT(2),
 
 	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
+	KVM_PGTABLE_PROT_NONE			= BIT(4),
 };
 
 #define PAGE_HYP		(KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index bdd6e3d4eeb6..8e7059fcfd40 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -48,6 +48,8 @@
 					 KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
 					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
 
+#define KVM_PTE_LEAF_SW_BIT_PROT_NONE	BIT(55)
+
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable		*pgt;
 	struct kvm_pgtable_walker	*walker;
@@ -120,6 +122,16 @@ static bool kvm_pte_valid(kvm_pte_t pte)
 	return pte & KVM_PTE_VALID;
 }
 
+static bool kvm_pte_prot_none(kvm_pte_t pte)
+{
+	return pte & KVM_PTE_LEAF_SW_BIT_PROT_NONE;
+}
+
+static inline bool stage2_is_permanent_mapping(kvm_pte_t pte)
+{
+	return kvm_pte_prot_none(pte);
+}
+
 static bool kvm_pte_table(kvm_pte_t pte, u32 level)
 {
 	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
@@ -182,7 +194,8 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
 
 	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
 	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
-	pte |= KVM_PTE_VALID;
+	if (!kvm_pte_prot_none(pte))
+		pte |= KVM_PTE_VALID;
 
 	return pte;
 }
@@ -317,7 +330,7 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
 	u32 ap = (prot & KVM_PGTABLE_PROT_W) ? KVM_PTE_LEAF_ATTR_LO_S1_AP_RW :
 					       KVM_PTE_LEAF_ATTR_LO_S1_AP_RO;
 
-	if (!(prot & KVM_PGTABLE_PROT_R))
+	if (!(prot & KVM_PGTABLE_PROT_R) || (prot & KVM_PGTABLE_PROT_NONE))
 		return -EINVAL;
 
 	if (prot & KVM_PGTABLE_PROT_X) {
@@ -489,6 +502,13 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 			    PAGE_S2_MEMATTR(NORMAL);
 	u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
 
+	if (prot & KVM_PGTABLE_PROT_NONE) {
+		if (prot != KVM_PGTABLE_PROT_NONE)
+			return -EINVAL;
+		attr |= KVM_PTE_LEAF_SW_BIT_PROT_NONE;
+		goto out;
+	}
+
 	if (!(prot & KVM_PGTABLE_PROT_X))
 		attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
 	else if (device)
@@ -502,6 +522,8 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 
 	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
 	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
+
+out:
 	data->attr = attr;
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

Once we start unmapping portions of memory from the host stage 2 (such
as e.g. the hypervisor memory sections, or pages that belong to
protected guests), we will need a way to track page ownership. And
given that all mappings in the host stage 2 will be identity-mapped, we
can use the host stage 2 page-table itself as a simplistic rmap.

As a first step towards this, introduce a new protection attribute
in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which
allows to annotate portions of the IPA space as inaccessible. For
simplicity, PROT_NONE mappings are created as invalid mappings with a
software bit set.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h |  2 ++
 arch/arm64/kvm/hyp/pgtable.c         | 26 ++++++++++++++++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 9935dbae2cc1..c9f6ed76e0ad 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -80,6 +80,7 @@ struct kvm_pgtable {
  * @KVM_PGTABLE_PROT_W:		Write permission.
  * @KVM_PGTABLE_PROT_R:		Read permission.
  * @KVM_PGTABLE_PROT_DEVICE:	Device attributes.
+ * @KVM_PGTABLE_PROT_NONE:	No permission.
  */
 enum kvm_pgtable_prot {
 	KVM_PGTABLE_PROT_X			= BIT(0),
@@ -87,6 +88,7 @@ enum kvm_pgtable_prot {
 	KVM_PGTABLE_PROT_R			= BIT(2),
 
 	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
+	KVM_PGTABLE_PROT_NONE			= BIT(4),
 };
 
 #define PAGE_HYP		(KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index bdd6e3d4eeb6..8e7059fcfd40 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -48,6 +48,8 @@
 					 KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
 					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
 
+#define KVM_PTE_LEAF_SW_BIT_PROT_NONE	BIT(55)
+
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable		*pgt;
 	struct kvm_pgtable_walker	*walker;
@@ -120,6 +122,16 @@ static bool kvm_pte_valid(kvm_pte_t pte)
 	return pte & KVM_PTE_VALID;
 }
 
+static bool kvm_pte_prot_none(kvm_pte_t pte)
+{
+	return pte & KVM_PTE_LEAF_SW_BIT_PROT_NONE;
+}
+
+static inline bool stage2_is_permanent_mapping(kvm_pte_t pte)
+{
+	return kvm_pte_prot_none(pte);
+}
+
 static bool kvm_pte_table(kvm_pte_t pte, u32 level)
 {
 	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
@@ -182,7 +194,8 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
 
 	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
 	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
-	pte |= KVM_PTE_VALID;
+	if (!kvm_pte_prot_none(pte))
+		pte |= KVM_PTE_VALID;
 
 	return pte;
 }
@@ -317,7 +330,7 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
 	u32 ap = (prot & KVM_PGTABLE_PROT_W) ? KVM_PTE_LEAF_ATTR_LO_S1_AP_RW :
 					       KVM_PTE_LEAF_ATTR_LO_S1_AP_RO;
 
-	if (!(prot & KVM_PGTABLE_PROT_R))
+	if (!(prot & KVM_PGTABLE_PROT_R) || (prot & KVM_PGTABLE_PROT_NONE))
 		return -EINVAL;
 
 	if (prot & KVM_PGTABLE_PROT_X) {
@@ -489,6 +502,13 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 			    PAGE_S2_MEMATTR(NORMAL);
 	u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
 
+	if (prot & KVM_PGTABLE_PROT_NONE) {
+		if (prot != KVM_PGTABLE_PROT_NONE)
+			return -EINVAL;
+		attr |= KVM_PTE_LEAF_SW_BIT_PROT_NONE;
+		goto out;
+	}
+
 	if (!(prot & KVM_PGTABLE_PROT_X))
 		attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
 	else if (device)
@@ -502,6 +522,8 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 
 	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
 	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
+
+out:
 	data->attr = attr;
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Once we start unmapping portions of memory from the host stage 2 (such
as e.g. the hypervisor memory sections, or pages that belong to
protected guests), we will need a way to track page ownership. And
given that all mappings in the host stage 2 will be identity-mapped, we
can use the host stage 2 page-table itself as a simplistic rmap.

As a first step towards this, introduce a new protection attribute
in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which
allows to annotate portions of the IPA space as inaccessible. For
simplicity, PROT_NONE mappings are created as invalid mappings with a
software bit set.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h |  2 ++
 arch/arm64/kvm/hyp/pgtable.c         | 26 ++++++++++++++++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 9935dbae2cc1..c9f6ed76e0ad 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -80,6 +80,7 @@ struct kvm_pgtable {
  * @KVM_PGTABLE_PROT_W:		Write permission.
  * @KVM_PGTABLE_PROT_R:		Read permission.
  * @KVM_PGTABLE_PROT_DEVICE:	Device attributes.
+ * @KVM_PGTABLE_PROT_NONE:	No permission.
  */
 enum kvm_pgtable_prot {
 	KVM_PGTABLE_PROT_X			= BIT(0),
@@ -87,6 +88,7 @@ enum kvm_pgtable_prot {
 	KVM_PGTABLE_PROT_R			= BIT(2),
 
 	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
+	KVM_PGTABLE_PROT_NONE			= BIT(4),
 };
 
 #define PAGE_HYP		(KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index bdd6e3d4eeb6..8e7059fcfd40 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -48,6 +48,8 @@
 					 KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
 					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
 
+#define KVM_PTE_LEAF_SW_BIT_PROT_NONE	BIT(55)
+
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable		*pgt;
 	struct kvm_pgtable_walker	*walker;
@@ -120,6 +122,16 @@ static bool kvm_pte_valid(kvm_pte_t pte)
 	return pte & KVM_PTE_VALID;
 }
 
+static bool kvm_pte_prot_none(kvm_pte_t pte)
+{
+	return pte & KVM_PTE_LEAF_SW_BIT_PROT_NONE;
+}
+
+static inline bool stage2_is_permanent_mapping(kvm_pte_t pte)
+{
+	return kvm_pte_prot_none(pte);
+}
+
 static bool kvm_pte_table(kvm_pte_t pte, u32 level)
 {
 	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
@@ -182,7 +194,8 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
 
 	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
 	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
-	pte |= KVM_PTE_VALID;
+	if (!kvm_pte_prot_none(pte))
+		pte |= KVM_PTE_VALID;
 
 	return pte;
 }
@@ -317,7 +330,7 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
 	u32 ap = (prot & KVM_PGTABLE_PROT_W) ? KVM_PTE_LEAF_ATTR_LO_S1_AP_RW :
 					       KVM_PTE_LEAF_ATTR_LO_S1_AP_RO;
 
-	if (!(prot & KVM_PGTABLE_PROT_R))
+	if (!(prot & KVM_PGTABLE_PROT_R) || (prot & KVM_PGTABLE_PROT_NONE))
 		return -EINVAL;
 
 	if (prot & KVM_PGTABLE_PROT_X) {
@@ -489,6 +502,13 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 			    PAGE_S2_MEMATTR(NORMAL);
 	u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
 
+	if (prot & KVM_PGTABLE_PROT_NONE) {
+		if (prot != KVM_PGTABLE_PROT_NONE)
+			return -EINVAL;
+		attr |= KVM_PTE_LEAF_SW_BIT_PROT_NONE;
+		goto out;
+	}
+
 	if (!(prot & KVM_PGTABLE_PROT_X))
 		attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
 	else if (device)
@@ -502,6 +522,8 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 
 	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
 	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
+
+out:
 	data->attr = attr;
 	return 0;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 27/32] KVM: arm64: Refactor stage2_map_set_prot_attr()
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In order to ease its re-use in other code paths, refactor
stage2_map_set_prot_attr() to not depend on a stage2_map_data struct.
No functional change intended.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 8e7059fcfd40..8aa01a9e2603 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -494,8 +494,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 	return vtcr;
 }
 
-static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
-				    struct stage2_map_data *data)
+static kvm_pte_t stage2_get_prot_attr(enum kvm_pgtable_prot prot)
 {
 	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
 	kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
@@ -504,15 +503,15 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 
 	if (prot & KVM_PGTABLE_PROT_NONE) {
 		if (prot != KVM_PGTABLE_PROT_NONE)
-			return -EINVAL;
+			return 0;
 		attr |= KVM_PTE_LEAF_SW_BIT_PROT_NONE;
-		goto out;
+		return attr;
 	}
 
 	if (!(prot & KVM_PGTABLE_PROT_X))
 		attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
 	else if (device)
-		return -EINVAL;
+		return 0;
 
 	if (prot & KVM_PGTABLE_PROT_R)
 		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
@@ -523,9 +522,7 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
 	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
 
-out:
-	data->attr = attr;
-	return 0;
+	return attr;
 }
 
 static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
@@ -708,9 +705,9 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.arg		= &map_data,
 	};
 
-	ret = stage2_map_set_prot_attr(prot, &map_data);
-	if (ret)
-		return ret;
+	map_data.attr = stage2_get_prot_attr(prot);
+	if (!map_data.attr)
+		return -EINVAL;
 
 	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
 	dsb(ishst);
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 27/32] KVM: arm64: Refactor stage2_map_set_prot_attr()
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

In order to ease its re-use in other code paths, refactor
stage2_map_set_prot_attr() to not depend on a stage2_map_data struct.
No functional change intended.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 8e7059fcfd40..8aa01a9e2603 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -494,8 +494,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 	return vtcr;
 }
 
-static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
-				    struct stage2_map_data *data)
+static kvm_pte_t stage2_get_prot_attr(enum kvm_pgtable_prot prot)
 {
 	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
 	kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
@@ -504,15 +503,15 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 
 	if (prot & KVM_PGTABLE_PROT_NONE) {
 		if (prot != KVM_PGTABLE_PROT_NONE)
-			return -EINVAL;
+			return 0;
 		attr |= KVM_PTE_LEAF_SW_BIT_PROT_NONE;
-		goto out;
+		return attr;
 	}
 
 	if (!(prot & KVM_PGTABLE_PROT_X))
 		attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
 	else if (device)
-		return -EINVAL;
+		return 0;
 
 	if (prot & KVM_PGTABLE_PROT_R)
 		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
@@ -523,9 +522,7 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
 	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
 
-out:
-	data->attr = attr;
-	return 0;
+	return attr;
 }
 
 static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
@@ -708,9 +705,9 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.arg		= &map_data,
 	};
 
-	ret = stage2_map_set_prot_attr(prot, &map_data);
-	if (ret)
-		return ret;
+	map_data.attr = stage2_get_prot_attr(prot);
+	if (!map_data.attr)
+		return -EINVAL;
 
 	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
 	dsb(ishst);
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 27/32] KVM: arm64: Refactor stage2_map_set_prot_attr()
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

In order to ease its re-use in other code paths, refactor
stage2_map_set_prot_attr() to not depend on a stage2_map_data struct.
No functional change intended.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 8e7059fcfd40..8aa01a9e2603 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -494,8 +494,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 	return vtcr;
 }
 
-static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
-				    struct stage2_map_data *data)
+static kvm_pte_t stage2_get_prot_attr(enum kvm_pgtable_prot prot)
 {
 	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
 	kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
@@ -504,15 +503,15 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 
 	if (prot & KVM_PGTABLE_PROT_NONE) {
 		if (prot != KVM_PGTABLE_PROT_NONE)
-			return -EINVAL;
+			return 0;
 		attr |= KVM_PTE_LEAF_SW_BIT_PROT_NONE;
-		goto out;
+		return attr;
 	}
 
 	if (!(prot & KVM_PGTABLE_PROT_X))
 		attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
 	else if (device)
-		return -EINVAL;
+		return 0;
 
 	if (prot & KVM_PGTABLE_PROT_R)
 		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
@@ -523,9 +522,7 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
 	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
 
-out:
-	data->attr = attr;
-	return 0;
+	return attr;
 }
 
 static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
@@ -708,9 +705,9 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.arg		= &map_data,
 	};
 
-	ret = stage2_map_set_prot_attr(prot, &map_data);
-	if (ret)
-		return ret;
+	map_data.attr = stage2_get_prot_attr(prot);
+	if (!map_data.attr)
+		return -EINVAL;
 
 	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
 	dsb(ishst);
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Add a new map function to the KVM page-table library that allows to
greedily create block identity-mappings. This will be useful to create
lazily the host stage 2 page-table as it will own most of memory and
will always be identity mapped.

The new helper function creates the mapping in 2 steps: it first walks
the page-table to compute the largest possible granule that can be used
to idmap a given address without overriding existing incompatible
mappings; and then creates a mapping accordingly.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h |  37 +++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 119 +++++++++++++++++++++++++++
 2 files changed, 156 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index c9f6ed76e0ad..e51dcce69a5e 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -96,6 +96,16 @@ enum kvm_pgtable_prot {
 #define PAGE_HYP_RO		(KVM_PGTABLE_PROT_R)
 #define PAGE_HYP_DEVICE		(PAGE_HYP | KVM_PGTABLE_PROT_DEVICE)
 
+/**
+ * struct kvm_mem_range - Range of Intermediate Physical Addresses
+ * @start:	Start of the range.
+ * @end:	End of the range.
+ */
+struct kvm_mem_range {
+	u64 start;
+	u64 end;
+};
+
 /**
  * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
  * @KVM_PGTABLE_WALK_LEAF:		Visit leaf entries, including invalid
@@ -379,4 +389,31 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
 int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		     struct kvm_pgtable_walker *walker);
 
+/**
+ * kvm_pgtable_stage2_idmap_greedy() - Identity-map an Intermediate Physical
+ *				       Address with a leaf entry at the highest
+ *				       possible level.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
+ * @addr:	Input address to identity-map.
+ * @prot:	Permissions and attributes for the mapping.
+ * @range:	Boundaries of the maximum memory region to map.
+ * @mc:		Cache of pre-allocated memory from which to allocate page-table
+ *		pages.
+ *
+ * This function attempts to install high-level identity-mappings covering @addr
+ * without overriding existing mappings with incompatible permissions or
+ * attributes. An existing table entry may be coalesced into a block mapping
+ * if and only if it covers @addr and all its leafs are either invalid and/or
+ * have permissions and attributes strictly matching @prot. The mapping is
+ * guaranteed to be contained within the boundaries specified by @range at call
+ * time. If only a subset of the memory specified by @range is mapped (because
+ * of e.g. alignment issues or existing incompatible mappings), @range will be
+ * updated accordingly.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr,
+				    enum kvm_pgtable_prot prot,
+				    struct kvm_mem_range *range,
+				    void *mc);
 #endif	/* __ARM64_KVM_PGTABLE_H__ */
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 8aa01a9e2603..6897d771e2b2 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -987,3 +987,122 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
 	pgt->pgd = NULL;
 }
+
+struct stage2_reduce_range_data {
+	kvm_pte_t attr;
+	u64 target_addr;
+	u32 start_level;
+	struct kvm_mem_range *range;
+};
+
+static int __stage2_reduce_range(struct stage2_reduce_range_data *data, u64 addr)
+{
+	u32 level = data->start_level;
+
+	for (; level < KVM_PGTABLE_MAX_LEVELS; level++) {
+		u64 granule = kvm_granule_size(level);
+		u64 start = ALIGN_DOWN(data->target_addr, granule);
+		u64 end = start + granule;
+
+		/*
+		 * The pinned address is in the current range, try one level
+		 * deeper.
+		 */
+		if (start == ALIGN_DOWN(addr, granule))
+			continue;
+
+		/*
+		 * Make sure the current range is a reduction of the existing
+		 * range before updating it.
+		 */
+		if (data->range->start <= start && end <= data->range->end) {
+			data->start_level = level;
+			data->range->start = start;
+			data->range->end = end;
+			return 0;
+		}
+	}
+
+	return -EINVAL;
+}
+
+#define KVM_PTE_LEAF_S2_COMPAT_MASK	(KVM_PTE_LEAF_ATTR_S2_PERMS | \
+					 KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | \
+					 KVM_PTE_LEAF_SW_BIT_PROT_NONE)
+
+static int stage2_reduce_range_walker(u64 addr, u64 end, u32 level,
+				      kvm_pte_t *ptep,
+				      enum kvm_pgtable_walk_flags flag,
+				      void * const arg)
+{
+	struct stage2_reduce_range_data *data = arg;
+	kvm_pte_t attr;
+	int ret;
+
+	if (addr < data->range->start || addr >= data->range->end)
+		return 0;
+
+	attr = *ptep & KVM_PTE_LEAF_S2_COMPAT_MASK;
+	if (!attr || attr == data->attr)
+		return 0;
+
+	/*
+	 * An existing mapping with incompatible protection attributes is
+	 * 'pinned', so reduce the range if we hit one.
+	 */
+	ret = __stage2_reduce_range(data, addr);
+	if (ret)
+		return ret;
+
+	return -EAGAIN;
+}
+
+static int stage2_reduce_range(struct kvm_pgtable *pgt, u64 addr,
+			       enum kvm_pgtable_prot prot,
+			       struct kvm_mem_range *range)
+{
+	struct stage2_reduce_range_data data = {
+		.start_level	= pgt->start_level,
+		.range		= range,
+		.target_addr	= addr,
+	};
+	struct kvm_pgtable_walker walker = {
+		.cb		= stage2_reduce_range_walker,
+		.flags		= KVM_PGTABLE_WALK_LEAF,
+		.arg		= &data,
+	};
+	int ret;
+
+	data.attr = stage2_get_prot_attr(prot) & KVM_PTE_LEAF_S2_COMPAT_MASK;
+	if (!data.attr)
+		return -EINVAL;
+
+	/* Reduce the kvm_mem_range to a granule size */
+	ret = __stage2_reduce_range(&data, range->end);
+	if (ret)
+		return ret;
+
+	/* Walk the range to check permissions and reduce further if needed */
+	do {
+		ret = kvm_pgtable_walk(pgt, range->start, range->end, &walker);
+	} while (ret == -EAGAIN);
+
+	return ret;
+}
+
+int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr,
+				    enum kvm_pgtable_prot prot,
+				    struct kvm_mem_range *range,
+				    void *mc)
+{
+	u64 size;
+	int ret;
+
+	ret = stage2_reduce_range(pgt, addr, prot, range);
+	if (ret)
+		return ret;
+
+	size = range->end - range->start;
+	return kvm_pgtable_stage2_map(pgt, range->start, size, range->start,
+				      prot, mc);
+}
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

Add a new map function to the KVM page-table library that allows to
greedily create block identity-mappings. This will be useful to create
lazily the host stage 2 page-table as it will own most of memory and
will always be identity mapped.

The new helper function creates the mapping in 2 steps: it first walks
the page-table to compute the largest possible granule that can be used
to idmap a given address without overriding existing incompatible
mappings; and then creates a mapping accordingly.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h |  37 +++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 119 +++++++++++++++++++++++++++
 2 files changed, 156 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index c9f6ed76e0ad..e51dcce69a5e 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -96,6 +96,16 @@ enum kvm_pgtable_prot {
 #define PAGE_HYP_RO		(KVM_PGTABLE_PROT_R)
 #define PAGE_HYP_DEVICE		(PAGE_HYP | KVM_PGTABLE_PROT_DEVICE)
 
+/**
+ * struct kvm_mem_range - Range of Intermediate Physical Addresses
+ * @start:	Start of the range.
+ * @end:	End of the range.
+ */
+struct kvm_mem_range {
+	u64 start;
+	u64 end;
+};
+
 /**
  * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
  * @KVM_PGTABLE_WALK_LEAF:		Visit leaf entries, including invalid
@@ -379,4 +389,31 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
 int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		     struct kvm_pgtable_walker *walker);
 
+/**
+ * kvm_pgtable_stage2_idmap_greedy() - Identity-map an Intermediate Physical
+ *				       Address with a leaf entry at the highest
+ *				       possible level.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
+ * @addr:	Input address to identity-map.
+ * @prot:	Permissions and attributes for the mapping.
+ * @range:	Boundaries of the maximum memory region to map.
+ * @mc:		Cache of pre-allocated memory from which to allocate page-table
+ *		pages.
+ *
+ * This function attempts to install high-level identity-mappings covering @addr
+ * without overriding existing mappings with incompatible permissions or
+ * attributes. An existing table entry may be coalesced into a block mapping
+ * if and only if it covers @addr and all its leafs are either invalid and/or
+ * have permissions and attributes strictly matching @prot. The mapping is
+ * guaranteed to be contained within the boundaries specified by @range at call
+ * time. If only a subset of the memory specified by @range is mapped (because
+ * of e.g. alignment issues or existing incompatible mappings), @range will be
+ * updated accordingly.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr,
+				    enum kvm_pgtable_prot prot,
+				    struct kvm_mem_range *range,
+				    void *mc);
 #endif	/* __ARM64_KVM_PGTABLE_H__ */
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 8aa01a9e2603..6897d771e2b2 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -987,3 +987,122 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
 	pgt->pgd = NULL;
 }
+
+struct stage2_reduce_range_data {
+	kvm_pte_t attr;
+	u64 target_addr;
+	u32 start_level;
+	struct kvm_mem_range *range;
+};
+
+static int __stage2_reduce_range(struct stage2_reduce_range_data *data, u64 addr)
+{
+	u32 level = data->start_level;
+
+	for (; level < KVM_PGTABLE_MAX_LEVELS; level++) {
+		u64 granule = kvm_granule_size(level);
+		u64 start = ALIGN_DOWN(data->target_addr, granule);
+		u64 end = start + granule;
+
+		/*
+		 * The pinned address is in the current range, try one level
+		 * deeper.
+		 */
+		if (start == ALIGN_DOWN(addr, granule))
+			continue;
+
+		/*
+		 * Make sure the current range is a reduction of the existing
+		 * range before updating it.
+		 */
+		if (data->range->start <= start && end <= data->range->end) {
+			data->start_level = level;
+			data->range->start = start;
+			data->range->end = end;
+			return 0;
+		}
+	}
+
+	return -EINVAL;
+}
+
+#define KVM_PTE_LEAF_S2_COMPAT_MASK	(KVM_PTE_LEAF_ATTR_S2_PERMS | \
+					 KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | \
+					 KVM_PTE_LEAF_SW_BIT_PROT_NONE)
+
+static int stage2_reduce_range_walker(u64 addr, u64 end, u32 level,
+				      kvm_pte_t *ptep,
+				      enum kvm_pgtable_walk_flags flag,
+				      void * const arg)
+{
+	struct stage2_reduce_range_data *data = arg;
+	kvm_pte_t attr;
+	int ret;
+
+	if (addr < data->range->start || addr >= data->range->end)
+		return 0;
+
+	attr = *ptep & KVM_PTE_LEAF_S2_COMPAT_MASK;
+	if (!attr || attr == data->attr)
+		return 0;
+
+	/*
+	 * An existing mapping with incompatible protection attributes is
+	 * 'pinned', so reduce the range if we hit one.
+	 */
+	ret = __stage2_reduce_range(data, addr);
+	if (ret)
+		return ret;
+
+	return -EAGAIN;
+}
+
+static int stage2_reduce_range(struct kvm_pgtable *pgt, u64 addr,
+			       enum kvm_pgtable_prot prot,
+			       struct kvm_mem_range *range)
+{
+	struct stage2_reduce_range_data data = {
+		.start_level	= pgt->start_level,
+		.range		= range,
+		.target_addr	= addr,
+	};
+	struct kvm_pgtable_walker walker = {
+		.cb		= stage2_reduce_range_walker,
+		.flags		= KVM_PGTABLE_WALK_LEAF,
+		.arg		= &data,
+	};
+	int ret;
+
+	data.attr = stage2_get_prot_attr(prot) & KVM_PTE_LEAF_S2_COMPAT_MASK;
+	if (!data.attr)
+		return -EINVAL;
+
+	/* Reduce the kvm_mem_range to a granule size */
+	ret = __stage2_reduce_range(&data, range->end);
+	if (ret)
+		return ret;
+
+	/* Walk the range to check permissions and reduce further if needed */
+	do {
+		ret = kvm_pgtable_walk(pgt, range->start, range->end, &walker);
+	} while (ret == -EAGAIN);
+
+	return ret;
+}
+
+int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr,
+				    enum kvm_pgtable_prot prot,
+				    struct kvm_mem_range *range,
+				    void *mc)
+{
+	u64 size;
+	int ret;
+
+	ret = stage2_reduce_range(pgt, addr, prot, range);
+	if (ret)
+		return ret;
+
+	size = range->end - range->start;
+	return kvm_pgtable_stage2_map(pgt, range->start, size, range->start,
+				      prot, mc);
+}
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

Add a new map function to the KVM page-table library that allows to
greedily create block identity-mappings. This will be useful to create
lazily the host stage 2 page-table as it will own most of memory and
will always be identity mapped.

The new helper function creates the mapping in 2 steps: it first walks
the page-table to compute the largest possible granule that can be used
to idmap a given address without overriding existing incompatible
mappings; and then creates a mapping accordingly.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h |  37 +++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 119 +++++++++++++++++++++++++++
 2 files changed, 156 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index c9f6ed76e0ad..e51dcce69a5e 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -96,6 +96,16 @@ enum kvm_pgtable_prot {
 #define PAGE_HYP_RO		(KVM_PGTABLE_PROT_R)
 #define PAGE_HYP_DEVICE		(PAGE_HYP | KVM_PGTABLE_PROT_DEVICE)
 
+/**
+ * struct kvm_mem_range - Range of Intermediate Physical Addresses
+ * @start:	Start of the range.
+ * @end:	End of the range.
+ */
+struct kvm_mem_range {
+	u64 start;
+	u64 end;
+};
+
 /**
  * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
  * @KVM_PGTABLE_WALK_LEAF:		Visit leaf entries, including invalid
@@ -379,4 +389,31 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
 int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		     struct kvm_pgtable_walker *walker);
 
+/**
+ * kvm_pgtable_stage2_idmap_greedy() - Identity-map an Intermediate Physical
+ *				       Address with a leaf entry at the highest
+ *				       possible level.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
+ * @addr:	Input address to identity-map.
+ * @prot:	Permissions and attributes for the mapping.
+ * @range:	Boundaries of the maximum memory region to map.
+ * @mc:		Cache of pre-allocated memory from which to allocate page-table
+ *		pages.
+ *
+ * This function attempts to install high-level identity-mappings covering @addr
+ * without overriding existing mappings with incompatible permissions or
+ * attributes. An existing table entry may be coalesced into a block mapping
+ * if and only if it covers @addr and all its leafs are either invalid and/or
+ * have permissions and attributes strictly matching @prot. The mapping is
+ * guaranteed to be contained within the boundaries specified by @range at call
+ * time. If only a subset of the memory specified by @range is mapped (because
+ * of e.g. alignment issues or existing incompatible mappings), @range will be
+ * updated accordingly.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr,
+				    enum kvm_pgtable_prot prot,
+				    struct kvm_mem_range *range,
+				    void *mc);
 #endif	/* __ARM64_KVM_PGTABLE_H__ */
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 8aa01a9e2603..6897d771e2b2 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -987,3 +987,122 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
 	pgt->pgd = NULL;
 }
+
+struct stage2_reduce_range_data {
+	kvm_pte_t attr;
+	u64 target_addr;
+	u32 start_level;
+	struct kvm_mem_range *range;
+};
+
+static int __stage2_reduce_range(struct stage2_reduce_range_data *data, u64 addr)
+{
+	u32 level = data->start_level;
+
+	for (; level < KVM_PGTABLE_MAX_LEVELS; level++) {
+		u64 granule = kvm_granule_size(level);
+		u64 start = ALIGN_DOWN(data->target_addr, granule);
+		u64 end = start + granule;
+
+		/*
+		 * The pinned address is in the current range, try one level
+		 * deeper.
+		 */
+		if (start == ALIGN_DOWN(addr, granule))
+			continue;
+
+		/*
+		 * Make sure the current range is a reduction of the existing
+		 * range before updating it.
+		 */
+		if (data->range->start <= start && end <= data->range->end) {
+			data->start_level = level;
+			data->range->start = start;
+			data->range->end = end;
+			return 0;
+		}
+	}
+
+	return -EINVAL;
+}
+
+#define KVM_PTE_LEAF_S2_COMPAT_MASK	(KVM_PTE_LEAF_ATTR_S2_PERMS | \
+					 KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | \
+					 KVM_PTE_LEAF_SW_BIT_PROT_NONE)
+
+static int stage2_reduce_range_walker(u64 addr, u64 end, u32 level,
+				      kvm_pte_t *ptep,
+				      enum kvm_pgtable_walk_flags flag,
+				      void * const arg)
+{
+	struct stage2_reduce_range_data *data = arg;
+	kvm_pte_t attr;
+	int ret;
+
+	if (addr < data->range->start || addr >= data->range->end)
+		return 0;
+
+	attr = *ptep & KVM_PTE_LEAF_S2_COMPAT_MASK;
+	if (!attr || attr == data->attr)
+		return 0;
+
+	/*
+	 * An existing mapping with incompatible protection attributes is
+	 * 'pinned', so reduce the range if we hit one.
+	 */
+	ret = __stage2_reduce_range(data, addr);
+	if (ret)
+		return ret;
+
+	return -EAGAIN;
+}
+
+static int stage2_reduce_range(struct kvm_pgtable *pgt, u64 addr,
+			       enum kvm_pgtable_prot prot,
+			       struct kvm_mem_range *range)
+{
+	struct stage2_reduce_range_data data = {
+		.start_level	= pgt->start_level,
+		.range		= range,
+		.target_addr	= addr,
+	};
+	struct kvm_pgtable_walker walker = {
+		.cb		= stage2_reduce_range_walker,
+		.flags		= KVM_PGTABLE_WALK_LEAF,
+		.arg		= &data,
+	};
+	int ret;
+
+	data.attr = stage2_get_prot_attr(prot) & KVM_PTE_LEAF_S2_COMPAT_MASK;
+	if (!data.attr)
+		return -EINVAL;
+
+	/* Reduce the kvm_mem_range to a granule size */
+	ret = __stage2_reduce_range(&data, range->end);
+	if (ret)
+		return ret;
+
+	/* Walk the range to check permissions and reduce further if needed */
+	do {
+		ret = kvm_pgtable_walk(pgt, range->start, range->end, &walker);
+	} while (ret == -EAGAIN);
+
+	return ret;
+}
+
+int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr,
+				    enum kvm_pgtable_prot prot,
+				    struct kvm_mem_range *range,
+				    void *mc)
+{
+	u64 size;
+	int ret;
+
+	ret = stage2_reduce_range(pgt, addr, prot, range);
+	if (ret)
+		return ret;
+
+	size = range->end - range->start;
+	return kvm_pgtable_stage2_map(pgt, range->start, size, range->start,
+				      prot, mc);
+}
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 14:59   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

When KVM runs in protected nVHE mode, make use of a stage 2 page-table
to give the hypervisor some control over the host memory accesses. The
host stage 2 is created lazily using large block mappings if possible,
and will default to page mappings in absence of a better solution.

From this point on, memory accesses from the host to protected memory
regions (e.g. marked PROT_NONE) are fatal and lead to hyp_panic().

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h              |   1 +
 arch/arm64/include/asm/kvm_cpufeature.h       |   2 +
 arch/arm64/kernel/image-vars.h                |   3 +
 arch/arm64/kvm/arm.c                          |  10 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  34 +++
 arch/arm64/kvm/hyp/nvhe/Makefile              |   2 +-
 arch/arm64/kvm/hyp/nvhe/hyp-init.S            |   1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  11 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 213 ++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |   5 +
 arch/arm64/kvm/hyp/nvhe/switch.c              |   7 +-
 arch/arm64/kvm/hyp/nvhe/tlb.c                 |   4 +-
 12 files changed, 286 insertions(+), 7 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 6dce860f8bca..b127af02bd45 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -61,6 +61,7 @@
 #define __KVM_HOST_SMCCC_FUNC___pkvm_create_mappings		16
 #define __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping	17
 #define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
+#define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize		19
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/include/asm/kvm_cpufeature.h b/arch/arm64/include/asm/kvm_cpufeature.h
index d34f85cba358..74043a149322 100644
--- a/arch/arm64/include/asm/kvm_cpufeature.h
+++ b/arch/arm64/include/asm/kvm_cpufeature.h
@@ -15,3 +15,5 @@
 #endif
 
 KVM_HYP_CPU_FTR_REG(SYS_CTR_EL0, arm64_ftr_reg_ctrel0)
+KVM_HYP_CPU_FTR_REG(SYS_ID_AA64MMFR0_EL1, arm64_ftr_reg_id_aa64mmfr0_el1)
+KVM_HYP_CPU_FTR_REG(SYS_ID_AA64MMFR1_EL1, arm64_ftr_reg_id_aa64mmfr1_el1)
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 940c378fa837..d5dc2b792651 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -131,6 +131,9 @@ KVM_NVHE_ALIAS(__hyp_bss_end);
 KVM_NVHE_ALIAS(__hyp_rodata_start);
 KVM_NVHE_ALIAS(__hyp_rodata_end);
 
+/* pKVM static key */
+KVM_NVHE_ALIAS(kvm_protected_mode_initialized);
+
 #endif /* CONFIG_KVM */
 
 #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b6a818f88051..a31c56bc55b3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1889,12 +1889,22 @@ static int init_hyp_mode(void)
 	return err;
 }
 
+void _kvm_host_prot_finalize(void *discard)
+{
+	WARN_ON(kvm_call_hyp_nvhe(__pkvm_prot_finalize));
+}
+
 static int finalize_hyp_mode(void)
 {
 	if (!is_protected_kvm_enabled())
 		return 0;
 
+	/*
+	 * Flip the static key upfront as that may no longer be possible
+	 * once the host stage 2 is installed.
+	 */
 	static_branch_enable(&kvm_protected_mode_initialized);
+	on_each_cpu(_kvm_host_prot_finalize, NULL, 1);
 
 	return 0;
 }
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
new file mode 100644
index 000000000000..d293cb328cc4
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#ifndef __KVM_NVHE_MEM_PROTECT__
+#define __KVM_NVHE_MEM_PROTECT__
+#include <linux/kvm_host.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_pgtable.h>
+#include <asm/virt.h>
+#include <nvhe/spinlock.h>
+
+struct host_kvm {
+	struct kvm_arch arch;
+	struct kvm_pgtable pgt;
+	struct kvm_pgtable_mm_ops mm_ops;
+	hyp_spinlock_t lock;
+};
+extern struct host_kvm host_kvm;
+
+int __pkvm_prot_finalize(void);
+int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool);
+void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
+
+static __always_inline void __load_host_stage2(void)
+{
+	if (static_branch_likely(&kvm_protected_mode_initialized))
+		__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
+	else
+		write_sysreg(0, vttbr_el2);
+}
+#endif /* __KVM_NVHE_MEM_PROTECT__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index e204ea77ab27..ce49795324a7 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -14,7 +14,7 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
 	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
-	 cache.o cpufeature.o setup.o mm.o
+	 cache.o cpufeature.o setup.o mm.o mem_protect.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index f312672d895e..6fa01b04954f 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -119,6 +119,7 @@ alternative_else_nop_endif
 
 	/* Invalidate the stale TLBs from Bootloader */
 	tlbi	alle2
+	tlbi	vmalls12e1
 	dsb	sy
 
 	/*
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index ae6503c9be15..f47028d3fd0a 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -13,6 +13,7 @@
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 
+#include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 #include <nvhe/trap_handler.h>
 
@@ -151,6 +152,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
 	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
 }
 
+static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
+{
+	cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
+}
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -174,6 +179,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_cpu_set_vector),
 	HANDLE_FUNC(__pkvm_create_mappings),
 	HANDLE_FUNC(__pkvm_create_private_mapping),
+	HANDLE_FUNC(__pkvm_prot_finalize),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
@@ -226,6 +232,11 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
 	case ESR_ELx_EC_SMC64:
 		handle_host_smc(host_ctxt);
 		break;
+	case ESR_ELx_EC_IABT_LOW:
+		fallthrough;
+	case ESR_ELx_EC_DABT_LOW:
+		handle_host_mem_abort(host_ctxt);
+		break;
 	default:
 		hyp_panic();
 	}
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
new file mode 100644
index 000000000000..2252ad1a8945
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -0,0 +1,213 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_cpufeature.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_pgtable.h>
+#include <asm/stage2_pgtable.h>
+
+#include <hyp/switch.h>
+
+#include <nvhe/gfp.h>
+#include <nvhe/memory.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/mm.h>
+
+extern unsigned long hyp_nr_cpus;
+struct host_kvm host_kvm;
+
+struct hyp_pool host_s2_mem;
+struct hyp_pool host_s2_dev;
+
+static void *host_s2_zalloc_pages_exact(size_t size)
+{
+	return hyp_alloc_pages(&host_s2_mem, get_order(size));
+}
+
+static void *host_s2_zalloc_page(void *pool)
+{
+	return hyp_alloc_pages(pool, 0);
+}
+
+static int prepare_s2_pools(void *mem_pgt_pool, void *dev_pgt_pool)
+{
+	unsigned long nr_pages, pfn;
+	int ret;
+
+	pfn = hyp_virt_to_pfn(mem_pgt_pool);
+	nr_pages = host_s2_mem_pgtable_pages();
+	ret = hyp_pool_init(&host_s2_mem, pfn, nr_pages, 0);
+	if (ret)
+		return ret;
+
+	pfn = hyp_virt_to_pfn(dev_pgt_pool);
+	nr_pages = host_s2_dev_pgtable_pages();
+	ret = hyp_pool_init(&host_s2_dev, pfn, nr_pages, 0);
+	if (ret)
+		return ret;
+
+	host_kvm.mm_ops.zalloc_pages_exact = host_s2_zalloc_pages_exact;
+	host_kvm.mm_ops.zalloc_page = host_s2_zalloc_page;
+	host_kvm.mm_ops.phys_to_virt = hyp_phys_to_virt;
+	host_kvm.mm_ops.virt_to_phys = hyp_virt_to_phys;
+	host_kvm.mm_ops.page_count = hyp_page_count;
+	host_kvm.mm_ops.get_page = hyp_get_page;
+	host_kvm.mm_ops.put_page = hyp_put_page;
+
+	return 0;
+}
+
+static void prepare_host_vtcr(void)
+{
+	u32 parange, phys_shift;
+	u64 mmfr0, mmfr1;
+
+	mmfr0 = arm64_ftr_reg_id_aa64mmfr0_el1.sys_val;
+	mmfr1 = arm64_ftr_reg_id_aa64mmfr1_el1.sys_val;
+
+	/* The host stage 2 is id-mapped, so use parange for T0SZ */
+	parange = kvm_get_parange(mmfr0);
+	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
+
+	host_kvm.arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
+}
+
+int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool)
+{
+	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
+	int ret;
+
+	prepare_host_vtcr();
+	hyp_spin_lock_init(&host_kvm.lock);
+
+	ret = prepare_s2_pools(mem_pgt_pool, dev_pgt_pool);
+	if (ret)
+		return ret;
+
+	ret = kvm_pgtable_stage2_init(&host_kvm.pgt, &host_kvm.arch,
+				      &host_kvm.mm_ops);
+	if (ret)
+		return ret;
+
+	mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
+	mmu->arch = &host_kvm.arch;
+	mmu->pgt = &host_kvm.pgt;
+	mmu->vmid.vmid_gen = 0;
+	mmu->vmid.vmid = 0;
+
+	return 0;
+}
+
+int __pkvm_prot_finalize(void)
+{
+	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
+	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
+
+	params->vttbr = kvm_get_vttbr(mmu);
+	params->vtcr = host_kvm.arch.vtcr;
+	params->hcr_el2 |= HCR_VM;
+	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
+		params->hcr_el2 |= HCR_FWB;
+	kvm_flush_dcache_to_poc(params, sizeof(*params));
+
+	write_sysreg(params->hcr_el2, hcr_el2);
+	__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
+
+	__tlbi(vmalls12e1is);
+	dsb(ish);
+	isb();
+
+	return 0;
+}
+
+static void host_stage2_unmap_dev_all(void)
+{
+	struct kvm_pgtable *pgt = &host_kvm.pgt;
+	struct memblock_region *reg;
+	u64 addr = 0;
+	int i;
+
+	/* Unmap all non-memory regions to recycle the pages */
+	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
+		reg = &hyp_memory[i];
+		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
+	}
+	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);
+}
+
+static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
+{
+	int cur, left = 0, right = hyp_memblock_nr;
+	struct memblock_region *reg;
+	phys_addr_t end;
+
+	range->start = 0;
+	range->end = ULONG_MAX;
+
+	/* The list of memblock regions is sorted, binary search it */
+	while (left < right) {
+		cur = (left + right) >> 1;
+		reg = &hyp_memory[cur];
+		end = reg->base + reg->size;
+		if (addr < reg->base) {
+			right = cur;
+			range->end = reg->base;
+		} else if (addr >= end) {
+			left = cur + 1;
+			range->start = end;
+		} else {
+			range->start = reg->base;
+			range->end = end;
+			return true;
+		}
+	}
+
+	return false;
+}
+
+static int host_stage2_idmap(u64 addr)
+{
+	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
+	struct kvm_mem_range range;
+	bool is_memory = find_mem_range(addr, &range);
+	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
+	int ret;
+
+	if (is_memory)
+		prot |= KVM_PGTABLE_PROT_X;
+
+	hyp_spin_lock(&host_kvm.lock);
+	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
+					      &range, pool);
+	if (is_memory || ret != -ENOMEM)
+		goto unlock;
+	host_stage2_unmap_dev_all();
+	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
+					      &range, pool);
+unlock:
+	hyp_spin_unlock(&host_kvm.lock);
+
+	return ret;
+}
+
+void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
+{
+	struct kvm_vcpu_fault_info fault;
+	u64 esr, addr;
+	int ret = 0;
+
+	esr = read_sysreg_el2(SYS_ESR);
+	if (!__get_fault_info(esr, &fault))
+		hyp_panic();
+
+	addr = (fault.hpfar_el2 & HPFAR_MASK) << 8;
+	ret = host_stage2_idmap(addr);
+	if (ret && ret != -EAGAIN)
+		hyp_panic();
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 7e923b25271c..94b9f14491f9 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -12,6 +12,7 @@
 #include <nvhe/early_alloc.h>
 #include <nvhe/gfp.h>
 #include <nvhe/memory.h>
+#include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 #include <nvhe/trap_handler.h>
 
@@ -157,6 +158,10 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
+	ret = kvm_host_prepare_stage2(host_s2_mem_pgt_base, host_s2_dev_pgt_base);
+	if (ret)
+		goto out;
+
 	pkvm_pgtable_mm_ops.zalloc_page = hyp_zalloc_hyp_page;
 	pkvm_pgtable_mm_ops.phys_to_virt = hyp_phys_to_virt;
 	pkvm_pgtable_mm_ops.virt_to_phys = hyp_virt_to_phys;
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 979a76cdf9fb..31bc1a843bf8 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -28,6 +28,8 @@
 #include <asm/processor.h>
 #include <asm/thread_info.h>
 
+#include <nvhe/mem_protect.h>
+
 /* Non-VHE specific context */
 DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
 DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
@@ -102,11 +104,6 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 	write_sysreg(__kvm_hyp_host_vector, vbar_el2);
 }
 
-static void __load_host_stage2(void)
-{
-	write_sysreg(0, vttbr_el2);
-}
-
 /* Save VGICv3 state on non-VHE systems */
 static void __hyp_vgic_save_state(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index fbde89a2c6e8..255a23a1b2db 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -8,6 +8,8 @@
 #include <asm/kvm_mmu.h>
 #include <asm/tlbflush.h>
 
+#include <nvhe/mem_protect.h>
+
 struct tlb_inv_context {
 	u64		tcr;
 };
@@ -43,7 +45,7 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
 
 static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
 {
-	write_sysreg(0, vttbr_el2);
+	__load_host_stage2();
 
 	if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
 		/* Ensure write of the host VMID */
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

When KVM runs in protected nVHE mode, make use of a stage 2 page-table
to give the hypervisor some control over the host memory accesses. The
host stage 2 is created lazily using large block mappings if possible,
and will default to page mappings in absence of a better solution.

From this point on, memory accesses from the host to protected memory
regions (e.g. marked PROT_NONE) are fatal and lead to hyp_panic().

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h              |   1 +
 arch/arm64/include/asm/kvm_cpufeature.h       |   2 +
 arch/arm64/kernel/image-vars.h                |   3 +
 arch/arm64/kvm/arm.c                          |  10 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  34 +++
 arch/arm64/kvm/hyp/nvhe/Makefile              |   2 +-
 arch/arm64/kvm/hyp/nvhe/hyp-init.S            |   1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  11 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 213 ++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |   5 +
 arch/arm64/kvm/hyp/nvhe/switch.c              |   7 +-
 arch/arm64/kvm/hyp/nvhe/tlb.c                 |   4 +-
 12 files changed, 286 insertions(+), 7 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 6dce860f8bca..b127af02bd45 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -61,6 +61,7 @@
 #define __KVM_HOST_SMCCC_FUNC___pkvm_create_mappings		16
 #define __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping	17
 #define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
+#define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize		19
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/include/asm/kvm_cpufeature.h b/arch/arm64/include/asm/kvm_cpufeature.h
index d34f85cba358..74043a149322 100644
--- a/arch/arm64/include/asm/kvm_cpufeature.h
+++ b/arch/arm64/include/asm/kvm_cpufeature.h
@@ -15,3 +15,5 @@
 #endif
 
 KVM_HYP_CPU_FTR_REG(SYS_CTR_EL0, arm64_ftr_reg_ctrel0)
+KVM_HYP_CPU_FTR_REG(SYS_ID_AA64MMFR0_EL1, arm64_ftr_reg_id_aa64mmfr0_el1)
+KVM_HYP_CPU_FTR_REG(SYS_ID_AA64MMFR1_EL1, arm64_ftr_reg_id_aa64mmfr1_el1)
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 940c378fa837..d5dc2b792651 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -131,6 +131,9 @@ KVM_NVHE_ALIAS(__hyp_bss_end);
 KVM_NVHE_ALIAS(__hyp_rodata_start);
 KVM_NVHE_ALIAS(__hyp_rodata_end);
 
+/* pKVM static key */
+KVM_NVHE_ALIAS(kvm_protected_mode_initialized);
+
 #endif /* CONFIG_KVM */
 
 #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b6a818f88051..a31c56bc55b3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1889,12 +1889,22 @@ static int init_hyp_mode(void)
 	return err;
 }
 
+void _kvm_host_prot_finalize(void *discard)
+{
+	WARN_ON(kvm_call_hyp_nvhe(__pkvm_prot_finalize));
+}
+
 static int finalize_hyp_mode(void)
 {
 	if (!is_protected_kvm_enabled())
 		return 0;
 
+	/*
+	 * Flip the static key upfront as that may no longer be possible
+	 * once the host stage 2 is installed.
+	 */
 	static_branch_enable(&kvm_protected_mode_initialized);
+	on_each_cpu(_kvm_host_prot_finalize, NULL, 1);
 
 	return 0;
 }
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
new file mode 100644
index 000000000000..d293cb328cc4
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#ifndef __KVM_NVHE_MEM_PROTECT__
+#define __KVM_NVHE_MEM_PROTECT__
+#include <linux/kvm_host.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_pgtable.h>
+#include <asm/virt.h>
+#include <nvhe/spinlock.h>
+
+struct host_kvm {
+	struct kvm_arch arch;
+	struct kvm_pgtable pgt;
+	struct kvm_pgtable_mm_ops mm_ops;
+	hyp_spinlock_t lock;
+};
+extern struct host_kvm host_kvm;
+
+int __pkvm_prot_finalize(void);
+int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool);
+void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
+
+static __always_inline void __load_host_stage2(void)
+{
+	if (static_branch_likely(&kvm_protected_mode_initialized))
+		__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
+	else
+		write_sysreg(0, vttbr_el2);
+}
+#endif /* __KVM_NVHE_MEM_PROTECT__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index e204ea77ab27..ce49795324a7 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -14,7 +14,7 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
 	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
-	 cache.o cpufeature.o setup.o mm.o
+	 cache.o cpufeature.o setup.o mm.o mem_protect.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index f312672d895e..6fa01b04954f 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -119,6 +119,7 @@ alternative_else_nop_endif
 
 	/* Invalidate the stale TLBs from Bootloader */
 	tlbi	alle2
+	tlbi	vmalls12e1
 	dsb	sy
 
 	/*
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index ae6503c9be15..f47028d3fd0a 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -13,6 +13,7 @@
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 
+#include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 #include <nvhe/trap_handler.h>
 
@@ -151,6 +152,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
 	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
 }
 
+static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
+{
+	cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
+}
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -174,6 +179,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_cpu_set_vector),
 	HANDLE_FUNC(__pkvm_create_mappings),
 	HANDLE_FUNC(__pkvm_create_private_mapping),
+	HANDLE_FUNC(__pkvm_prot_finalize),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
@@ -226,6 +232,11 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
 	case ESR_ELx_EC_SMC64:
 		handle_host_smc(host_ctxt);
 		break;
+	case ESR_ELx_EC_IABT_LOW:
+		fallthrough;
+	case ESR_ELx_EC_DABT_LOW:
+		handle_host_mem_abort(host_ctxt);
+		break;
 	default:
 		hyp_panic();
 	}
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
new file mode 100644
index 000000000000..2252ad1a8945
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -0,0 +1,213 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_cpufeature.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_pgtable.h>
+#include <asm/stage2_pgtable.h>
+
+#include <hyp/switch.h>
+
+#include <nvhe/gfp.h>
+#include <nvhe/memory.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/mm.h>
+
+extern unsigned long hyp_nr_cpus;
+struct host_kvm host_kvm;
+
+struct hyp_pool host_s2_mem;
+struct hyp_pool host_s2_dev;
+
+static void *host_s2_zalloc_pages_exact(size_t size)
+{
+	return hyp_alloc_pages(&host_s2_mem, get_order(size));
+}
+
+static void *host_s2_zalloc_page(void *pool)
+{
+	return hyp_alloc_pages(pool, 0);
+}
+
+static int prepare_s2_pools(void *mem_pgt_pool, void *dev_pgt_pool)
+{
+	unsigned long nr_pages, pfn;
+	int ret;
+
+	pfn = hyp_virt_to_pfn(mem_pgt_pool);
+	nr_pages = host_s2_mem_pgtable_pages();
+	ret = hyp_pool_init(&host_s2_mem, pfn, nr_pages, 0);
+	if (ret)
+		return ret;
+
+	pfn = hyp_virt_to_pfn(dev_pgt_pool);
+	nr_pages = host_s2_dev_pgtable_pages();
+	ret = hyp_pool_init(&host_s2_dev, pfn, nr_pages, 0);
+	if (ret)
+		return ret;
+
+	host_kvm.mm_ops.zalloc_pages_exact = host_s2_zalloc_pages_exact;
+	host_kvm.mm_ops.zalloc_page = host_s2_zalloc_page;
+	host_kvm.mm_ops.phys_to_virt = hyp_phys_to_virt;
+	host_kvm.mm_ops.virt_to_phys = hyp_virt_to_phys;
+	host_kvm.mm_ops.page_count = hyp_page_count;
+	host_kvm.mm_ops.get_page = hyp_get_page;
+	host_kvm.mm_ops.put_page = hyp_put_page;
+
+	return 0;
+}
+
+static void prepare_host_vtcr(void)
+{
+	u32 parange, phys_shift;
+	u64 mmfr0, mmfr1;
+
+	mmfr0 = arm64_ftr_reg_id_aa64mmfr0_el1.sys_val;
+	mmfr1 = arm64_ftr_reg_id_aa64mmfr1_el1.sys_val;
+
+	/* The host stage 2 is id-mapped, so use parange for T0SZ */
+	parange = kvm_get_parange(mmfr0);
+	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
+
+	host_kvm.arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
+}
+
+int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool)
+{
+	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
+	int ret;
+
+	prepare_host_vtcr();
+	hyp_spin_lock_init(&host_kvm.lock);
+
+	ret = prepare_s2_pools(mem_pgt_pool, dev_pgt_pool);
+	if (ret)
+		return ret;
+
+	ret = kvm_pgtable_stage2_init(&host_kvm.pgt, &host_kvm.arch,
+				      &host_kvm.mm_ops);
+	if (ret)
+		return ret;
+
+	mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
+	mmu->arch = &host_kvm.arch;
+	mmu->pgt = &host_kvm.pgt;
+	mmu->vmid.vmid_gen = 0;
+	mmu->vmid.vmid = 0;
+
+	return 0;
+}
+
+int __pkvm_prot_finalize(void)
+{
+	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
+	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
+
+	params->vttbr = kvm_get_vttbr(mmu);
+	params->vtcr = host_kvm.arch.vtcr;
+	params->hcr_el2 |= HCR_VM;
+	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
+		params->hcr_el2 |= HCR_FWB;
+	kvm_flush_dcache_to_poc(params, sizeof(*params));
+
+	write_sysreg(params->hcr_el2, hcr_el2);
+	__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
+
+	__tlbi(vmalls12e1is);
+	dsb(ish);
+	isb();
+
+	return 0;
+}
+
+static void host_stage2_unmap_dev_all(void)
+{
+	struct kvm_pgtable *pgt = &host_kvm.pgt;
+	struct memblock_region *reg;
+	u64 addr = 0;
+	int i;
+
+	/* Unmap all non-memory regions to recycle the pages */
+	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
+		reg = &hyp_memory[i];
+		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
+	}
+	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);
+}
+
+static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
+{
+	int cur, left = 0, right = hyp_memblock_nr;
+	struct memblock_region *reg;
+	phys_addr_t end;
+
+	range->start = 0;
+	range->end = ULONG_MAX;
+
+	/* The list of memblock regions is sorted, binary search it */
+	while (left < right) {
+		cur = (left + right) >> 1;
+		reg = &hyp_memory[cur];
+		end = reg->base + reg->size;
+		if (addr < reg->base) {
+			right = cur;
+			range->end = reg->base;
+		} else if (addr >= end) {
+			left = cur + 1;
+			range->start = end;
+		} else {
+			range->start = reg->base;
+			range->end = end;
+			return true;
+		}
+	}
+
+	return false;
+}
+
+static int host_stage2_idmap(u64 addr)
+{
+	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
+	struct kvm_mem_range range;
+	bool is_memory = find_mem_range(addr, &range);
+	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
+	int ret;
+
+	if (is_memory)
+		prot |= KVM_PGTABLE_PROT_X;
+
+	hyp_spin_lock(&host_kvm.lock);
+	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
+					      &range, pool);
+	if (is_memory || ret != -ENOMEM)
+		goto unlock;
+	host_stage2_unmap_dev_all();
+	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
+					      &range, pool);
+unlock:
+	hyp_spin_unlock(&host_kvm.lock);
+
+	return ret;
+}
+
+void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
+{
+	struct kvm_vcpu_fault_info fault;
+	u64 esr, addr;
+	int ret = 0;
+
+	esr = read_sysreg_el2(SYS_ESR);
+	if (!__get_fault_info(esr, &fault))
+		hyp_panic();
+
+	addr = (fault.hpfar_el2 & HPFAR_MASK) << 8;
+	ret = host_stage2_idmap(addr);
+	if (ret && ret != -EAGAIN)
+		hyp_panic();
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 7e923b25271c..94b9f14491f9 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -12,6 +12,7 @@
 #include <nvhe/early_alloc.h>
 #include <nvhe/gfp.h>
 #include <nvhe/memory.h>
+#include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 #include <nvhe/trap_handler.h>
 
@@ -157,6 +158,10 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
+	ret = kvm_host_prepare_stage2(host_s2_mem_pgt_base, host_s2_dev_pgt_base);
+	if (ret)
+		goto out;
+
 	pkvm_pgtable_mm_ops.zalloc_page = hyp_zalloc_hyp_page;
 	pkvm_pgtable_mm_ops.phys_to_virt = hyp_phys_to_virt;
 	pkvm_pgtable_mm_ops.virt_to_phys = hyp_virt_to_phys;
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 979a76cdf9fb..31bc1a843bf8 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -28,6 +28,8 @@
 #include <asm/processor.h>
 #include <asm/thread_info.h>
 
+#include <nvhe/mem_protect.h>
+
 /* Non-VHE specific context */
 DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
 DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
@@ -102,11 +104,6 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 	write_sysreg(__kvm_hyp_host_vector, vbar_el2);
 }
 
-static void __load_host_stage2(void)
-{
-	write_sysreg(0, vttbr_el2);
-}
-
 /* Save VGICv3 state on non-VHE systems */
 static void __hyp_vgic_save_state(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index fbde89a2c6e8..255a23a1b2db 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -8,6 +8,8 @@
 #include <asm/kvm_mmu.h>
 #include <asm/tlbflush.h>
 
+#include <nvhe/mem_protect.h>
+
 struct tlb_inv_context {
 	u64		tcr;
 };
@@ -43,7 +45,7 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
 
 static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
 {
-	write_sysreg(0, vttbr_el2);
+	__load_host_stage2();
 
 	if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
 		/* Ensure write of the host VMID */
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
@ 2021-03-02 14:59   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 14:59 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

When KVM runs in protected nVHE mode, make use of a stage 2 page-table
to give the hypervisor some control over the host memory accesses. The
host stage 2 is created lazily using large block mappings if possible,
and will default to page mappings in absence of a better solution.

From this point on, memory accesses from the host to protected memory
regions (e.g. marked PROT_NONE) are fatal and lead to hyp_panic().

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h              |   1 +
 arch/arm64/include/asm/kvm_cpufeature.h       |   2 +
 arch/arm64/kernel/image-vars.h                |   3 +
 arch/arm64/kvm/arm.c                          |  10 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  34 +++
 arch/arm64/kvm/hyp/nvhe/Makefile              |   2 +-
 arch/arm64/kvm/hyp/nvhe/hyp-init.S            |   1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  11 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 213 ++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |   5 +
 arch/arm64/kvm/hyp/nvhe/switch.c              |   7 +-
 arch/arm64/kvm/hyp/nvhe/tlb.c                 |   4 +-
 12 files changed, 286 insertions(+), 7 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 6dce860f8bca..b127af02bd45 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -61,6 +61,7 @@
 #define __KVM_HOST_SMCCC_FUNC___pkvm_create_mappings		16
 #define __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping	17
 #define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
+#define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize		19
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/include/asm/kvm_cpufeature.h b/arch/arm64/include/asm/kvm_cpufeature.h
index d34f85cba358..74043a149322 100644
--- a/arch/arm64/include/asm/kvm_cpufeature.h
+++ b/arch/arm64/include/asm/kvm_cpufeature.h
@@ -15,3 +15,5 @@
 #endif
 
 KVM_HYP_CPU_FTR_REG(SYS_CTR_EL0, arm64_ftr_reg_ctrel0)
+KVM_HYP_CPU_FTR_REG(SYS_ID_AA64MMFR0_EL1, arm64_ftr_reg_id_aa64mmfr0_el1)
+KVM_HYP_CPU_FTR_REG(SYS_ID_AA64MMFR1_EL1, arm64_ftr_reg_id_aa64mmfr1_el1)
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 940c378fa837..d5dc2b792651 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -131,6 +131,9 @@ KVM_NVHE_ALIAS(__hyp_bss_end);
 KVM_NVHE_ALIAS(__hyp_rodata_start);
 KVM_NVHE_ALIAS(__hyp_rodata_end);
 
+/* pKVM static key */
+KVM_NVHE_ALIAS(kvm_protected_mode_initialized);
+
 #endif /* CONFIG_KVM */
 
 #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b6a818f88051..a31c56bc55b3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1889,12 +1889,22 @@ static int init_hyp_mode(void)
 	return err;
 }
 
+void _kvm_host_prot_finalize(void *discard)
+{
+	WARN_ON(kvm_call_hyp_nvhe(__pkvm_prot_finalize));
+}
+
 static int finalize_hyp_mode(void)
 {
 	if (!is_protected_kvm_enabled())
 		return 0;
 
+	/*
+	 * Flip the static key upfront as that may no longer be possible
+	 * once the host stage 2 is installed.
+	 */
 	static_branch_enable(&kvm_protected_mode_initialized);
+	on_each_cpu(_kvm_host_prot_finalize, NULL, 1);
 
 	return 0;
 }
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
new file mode 100644
index 000000000000..d293cb328cc4
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#ifndef __KVM_NVHE_MEM_PROTECT__
+#define __KVM_NVHE_MEM_PROTECT__
+#include <linux/kvm_host.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_pgtable.h>
+#include <asm/virt.h>
+#include <nvhe/spinlock.h>
+
+struct host_kvm {
+	struct kvm_arch arch;
+	struct kvm_pgtable pgt;
+	struct kvm_pgtable_mm_ops mm_ops;
+	hyp_spinlock_t lock;
+};
+extern struct host_kvm host_kvm;
+
+int __pkvm_prot_finalize(void);
+int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool);
+void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
+
+static __always_inline void __load_host_stage2(void)
+{
+	if (static_branch_likely(&kvm_protected_mode_initialized))
+		__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
+	else
+		write_sysreg(0, vttbr_el2);
+}
+#endif /* __KVM_NVHE_MEM_PROTECT__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index e204ea77ab27..ce49795324a7 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -14,7 +14,7 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
 	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
-	 cache.o cpufeature.o setup.o mm.o
+	 cache.o cpufeature.o setup.o mm.o mem_protect.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index f312672d895e..6fa01b04954f 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -119,6 +119,7 @@ alternative_else_nop_endif
 
 	/* Invalidate the stale TLBs from Bootloader */
 	tlbi	alle2
+	tlbi	vmalls12e1
 	dsb	sy
 
 	/*
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index ae6503c9be15..f47028d3fd0a 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -13,6 +13,7 @@
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 
+#include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 #include <nvhe/trap_handler.h>
 
@@ -151,6 +152,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
 	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
 }
 
+static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
+{
+	cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
+}
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -174,6 +179,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_cpu_set_vector),
 	HANDLE_FUNC(__pkvm_create_mappings),
 	HANDLE_FUNC(__pkvm_create_private_mapping),
+	HANDLE_FUNC(__pkvm_prot_finalize),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
@@ -226,6 +232,11 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
 	case ESR_ELx_EC_SMC64:
 		handle_host_smc(host_ctxt);
 		break;
+	case ESR_ELx_EC_IABT_LOW:
+		fallthrough;
+	case ESR_ELx_EC_DABT_LOW:
+		handle_host_mem_abort(host_ctxt);
+		break;
 	default:
 		hyp_panic();
 	}
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
new file mode 100644
index 000000000000..2252ad1a8945
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -0,0 +1,213 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Quentin Perret <qperret@google.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_cpufeature.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_pgtable.h>
+#include <asm/stage2_pgtable.h>
+
+#include <hyp/switch.h>
+
+#include <nvhe/gfp.h>
+#include <nvhe/memory.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/mm.h>
+
+extern unsigned long hyp_nr_cpus;
+struct host_kvm host_kvm;
+
+struct hyp_pool host_s2_mem;
+struct hyp_pool host_s2_dev;
+
+static void *host_s2_zalloc_pages_exact(size_t size)
+{
+	return hyp_alloc_pages(&host_s2_mem, get_order(size));
+}
+
+static void *host_s2_zalloc_page(void *pool)
+{
+	return hyp_alloc_pages(pool, 0);
+}
+
+static int prepare_s2_pools(void *mem_pgt_pool, void *dev_pgt_pool)
+{
+	unsigned long nr_pages, pfn;
+	int ret;
+
+	pfn = hyp_virt_to_pfn(mem_pgt_pool);
+	nr_pages = host_s2_mem_pgtable_pages();
+	ret = hyp_pool_init(&host_s2_mem, pfn, nr_pages, 0);
+	if (ret)
+		return ret;
+
+	pfn = hyp_virt_to_pfn(dev_pgt_pool);
+	nr_pages = host_s2_dev_pgtable_pages();
+	ret = hyp_pool_init(&host_s2_dev, pfn, nr_pages, 0);
+	if (ret)
+		return ret;
+
+	host_kvm.mm_ops.zalloc_pages_exact = host_s2_zalloc_pages_exact;
+	host_kvm.mm_ops.zalloc_page = host_s2_zalloc_page;
+	host_kvm.mm_ops.phys_to_virt = hyp_phys_to_virt;
+	host_kvm.mm_ops.virt_to_phys = hyp_virt_to_phys;
+	host_kvm.mm_ops.page_count = hyp_page_count;
+	host_kvm.mm_ops.get_page = hyp_get_page;
+	host_kvm.mm_ops.put_page = hyp_put_page;
+
+	return 0;
+}
+
+static void prepare_host_vtcr(void)
+{
+	u32 parange, phys_shift;
+	u64 mmfr0, mmfr1;
+
+	mmfr0 = arm64_ftr_reg_id_aa64mmfr0_el1.sys_val;
+	mmfr1 = arm64_ftr_reg_id_aa64mmfr1_el1.sys_val;
+
+	/* The host stage 2 is id-mapped, so use parange for T0SZ */
+	parange = kvm_get_parange(mmfr0);
+	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
+
+	host_kvm.arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
+}
+
+int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool)
+{
+	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
+	int ret;
+
+	prepare_host_vtcr();
+	hyp_spin_lock_init(&host_kvm.lock);
+
+	ret = prepare_s2_pools(mem_pgt_pool, dev_pgt_pool);
+	if (ret)
+		return ret;
+
+	ret = kvm_pgtable_stage2_init(&host_kvm.pgt, &host_kvm.arch,
+				      &host_kvm.mm_ops);
+	if (ret)
+		return ret;
+
+	mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
+	mmu->arch = &host_kvm.arch;
+	mmu->pgt = &host_kvm.pgt;
+	mmu->vmid.vmid_gen = 0;
+	mmu->vmid.vmid = 0;
+
+	return 0;
+}
+
+int __pkvm_prot_finalize(void)
+{
+	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
+	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
+
+	params->vttbr = kvm_get_vttbr(mmu);
+	params->vtcr = host_kvm.arch.vtcr;
+	params->hcr_el2 |= HCR_VM;
+	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
+		params->hcr_el2 |= HCR_FWB;
+	kvm_flush_dcache_to_poc(params, sizeof(*params));
+
+	write_sysreg(params->hcr_el2, hcr_el2);
+	__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
+
+	__tlbi(vmalls12e1is);
+	dsb(ish);
+	isb();
+
+	return 0;
+}
+
+static void host_stage2_unmap_dev_all(void)
+{
+	struct kvm_pgtable *pgt = &host_kvm.pgt;
+	struct memblock_region *reg;
+	u64 addr = 0;
+	int i;
+
+	/* Unmap all non-memory regions to recycle the pages */
+	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
+		reg = &hyp_memory[i];
+		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
+	}
+	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);
+}
+
+static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
+{
+	int cur, left = 0, right = hyp_memblock_nr;
+	struct memblock_region *reg;
+	phys_addr_t end;
+
+	range->start = 0;
+	range->end = ULONG_MAX;
+
+	/* The list of memblock regions is sorted, binary search it */
+	while (left < right) {
+		cur = (left + right) >> 1;
+		reg = &hyp_memory[cur];
+		end = reg->base + reg->size;
+		if (addr < reg->base) {
+			right = cur;
+			range->end = reg->base;
+		} else if (addr >= end) {
+			left = cur + 1;
+			range->start = end;
+		} else {
+			range->start = reg->base;
+			range->end = end;
+			return true;
+		}
+	}
+
+	return false;
+}
+
+static int host_stage2_idmap(u64 addr)
+{
+	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
+	struct kvm_mem_range range;
+	bool is_memory = find_mem_range(addr, &range);
+	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
+	int ret;
+
+	if (is_memory)
+		prot |= KVM_PGTABLE_PROT_X;
+
+	hyp_spin_lock(&host_kvm.lock);
+	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
+					      &range, pool);
+	if (is_memory || ret != -ENOMEM)
+		goto unlock;
+	host_stage2_unmap_dev_all();
+	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
+					      &range, pool);
+unlock:
+	hyp_spin_unlock(&host_kvm.lock);
+
+	return ret;
+}
+
+void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
+{
+	struct kvm_vcpu_fault_info fault;
+	u64 esr, addr;
+	int ret = 0;
+
+	esr = read_sysreg_el2(SYS_ESR);
+	if (!__get_fault_info(esr, &fault))
+		hyp_panic();
+
+	addr = (fault.hpfar_el2 & HPFAR_MASK) << 8;
+	ret = host_stage2_idmap(addr);
+	if (ret && ret != -EAGAIN)
+		hyp_panic();
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 7e923b25271c..94b9f14491f9 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -12,6 +12,7 @@
 #include <nvhe/early_alloc.h>
 #include <nvhe/gfp.h>
 #include <nvhe/memory.h>
+#include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 #include <nvhe/trap_handler.h>
 
@@ -157,6 +158,10 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
+	ret = kvm_host_prepare_stage2(host_s2_mem_pgt_base, host_s2_dev_pgt_base);
+	if (ret)
+		goto out;
+
 	pkvm_pgtable_mm_ops.zalloc_page = hyp_zalloc_hyp_page;
 	pkvm_pgtable_mm_ops.phys_to_virt = hyp_phys_to_virt;
 	pkvm_pgtable_mm_ops.virt_to_phys = hyp_virt_to_phys;
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 979a76cdf9fb..31bc1a843bf8 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -28,6 +28,8 @@
 #include <asm/processor.h>
 #include <asm/thread_info.h>
 
+#include <nvhe/mem_protect.h>
+
 /* Non-VHE specific context */
 DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
 DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
@@ -102,11 +104,6 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 	write_sysreg(__kvm_hyp_host_vector, vbar_el2);
 }
 
-static void __load_host_stage2(void)
-{
-	write_sysreg(0, vttbr_el2);
-}
-
 /* Save VGICv3 state on non-VHE systems */
 static void __hyp_vgic_save_state(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index fbde89a2c6e8..255a23a1b2db 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -8,6 +8,8 @@
 #include <asm/kvm_mmu.h>
 #include <asm/tlbflush.h>
 
+#include <nvhe/mem_protect.h>
+
 struct tlb_inv_context {
 	u64		tcr;
 };
@@ -43,7 +45,7 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
 
 static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
 {
-	write_sysreg(0, vttbr_el2);
+	__load_host_stage2();
 
 	if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
 		/* Ensure write of the host VMID */
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 30/32] KVM: arm64: Page-align the .hyp sections
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 15:00   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 15:00 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

We will soon unmap the .hyp sections from the host stage 2 in Protected
nVHE mode, which obvisouly works with at least page granularity, so make
sure to align them correctly.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kernel/vmlinux.lds.S | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index e96173ce211b..709d2c433c5e 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -15,9 +15,11 @@
 
 #define HYPERVISOR_DATA_SECTIONS				\
 	HYP_SECTION_NAME(.rodata) : {				\
+		. = ALIGN(PAGE_SIZE);				\
 		__hyp_rodata_start = .;				\
 		*(HYP_SECTION_NAME(.data..ro_after_init))	\
 		*(HYP_SECTION_NAME(.rodata))			\
+		. = ALIGN(PAGE_SIZE);				\
 		__hyp_rodata_end = .;				\
 	}
 
@@ -72,21 +74,14 @@ ENTRY(_text)
 jiffies = jiffies_64;
 
 #define HYPERVISOR_TEXT					\
-	/*						\
-	 * Align to 4 KB so that			\
-	 * a) the HYP vector table is at its minimum	\
-	 *    alignment of 2048 bytes			\
-	 * b) the HYP init code will not cross a page	\
-	 *    boundary if its size does not exceed	\
-	 *    4 KB (see related ASSERT() below)		\
-	 */						\
-	. = ALIGN(SZ_4K);				\
+	. = ALIGN(PAGE_SIZE);				\
 	__hyp_idmap_text_start = .;			\
 	*(.hyp.idmap.text)				\
 	__hyp_idmap_text_end = .;			\
 	__hyp_text_start = .;				\
 	*(.hyp.text)					\
 	HYPERVISOR_EXTABLE				\
+	. = ALIGN(PAGE_SIZE);				\
 	__hyp_text_end = .;
 
 #define IDMAP_TEXT					\
@@ -322,11 +317,12 @@ SECTIONS
 #include "image-vars.h"
 
 /*
- * The HYP init code and ID map text can't be longer than a page each,
- * and should not cross a page boundary.
+ * The HYP init code and ID map text can't be longer than a page each. The
+ * former is page-aligned, but the latter may not be with 16K or 64K pages, so
+ * it should also not cross a page boundary.
  */
-ASSERT(__hyp_idmap_text_end - (__hyp_idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
-	"HYP init code too big or misaligned")
+ASSERT(__hyp_idmap_text_end - __hyp_idmap_text_start <= PAGE_SIZE,
+	"HYP init code too big")
 ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
 	"ID map text too big or misaligned")
 #ifdef CONFIG_HIBERNATION
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 30/32] KVM: arm64: Page-align the .hyp sections
@ 2021-03-02 15:00   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 15:00 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

We will soon unmap the .hyp sections from the host stage 2 in Protected
nVHE mode, which obvisouly works with at least page granularity, so make
sure to align them correctly.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kernel/vmlinux.lds.S | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index e96173ce211b..709d2c433c5e 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -15,9 +15,11 @@
 
 #define HYPERVISOR_DATA_SECTIONS				\
 	HYP_SECTION_NAME(.rodata) : {				\
+		. = ALIGN(PAGE_SIZE);				\
 		__hyp_rodata_start = .;				\
 		*(HYP_SECTION_NAME(.data..ro_after_init))	\
 		*(HYP_SECTION_NAME(.rodata))			\
+		. = ALIGN(PAGE_SIZE);				\
 		__hyp_rodata_end = .;				\
 	}
 
@@ -72,21 +74,14 @@ ENTRY(_text)
 jiffies = jiffies_64;
 
 #define HYPERVISOR_TEXT					\
-	/*						\
-	 * Align to 4 KB so that			\
-	 * a) the HYP vector table is at its minimum	\
-	 *    alignment of 2048 bytes			\
-	 * b) the HYP init code will not cross a page	\
-	 *    boundary if its size does not exceed	\
-	 *    4 KB (see related ASSERT() below)		\
-	 */						\
-	. = ALIGN(SZ_4K);				\
+	. = ALIGN(PAGE_SIZE);				\
 	__hyp_idmap_text_start = .;			\
 	*(.hyp.idmap.text)				\
 	__hyp_idmap_text_end = .;			\
 	__hyp_text_start = .;				\
 	*(.hyp.text)					\
 	HYPERVISOR_EXTABLE				\
+	. = ALIGN(PAGE_SIZE);				\
 	__hyp_text_end = .;
 
 #define IDMAP_TEXT					\
@@ -322,11 +317,12 @@ SECTIONS
 #include "image-vars.h"
 
 /*
- * The HYP init code and ID map text can't be longer than a page each,
- * and should not cross a page boundary.
+ * The HYP init code and ID map text can't be longer than a page each. The
+ * former is page-aligned, but the latter may not be with 16K or 64K pages, so
+ * it should also not cross a page boundary.
  */
-ASSERT(__hyp_idmap_text_end - (__hyp_idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
-	"HYP init code too big or misaligned")
+ASSERT(__hyp_idmap_text_end - __hyp_idmap_text_start <= PAGE_SIZE,
+	"HYP init code too big")
 ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
 	"ID map text too big or misaligned")
 #ifdef CONFIG_HIBERNATION
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 30/32] KVM: arm64: Page-align the .hyp sections
@ 2021-03-02 15:00   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 15:00 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

We will soon unmap the .hyp sections from the host stage 2 in Protected
nVHE mode, which obvisouly works with at least page granularity, so make
sure to align them correctly.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kernel/vmlinux.lds.S | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index e96173ce211b..709d2c433c5e 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -15,9 +15,11 @@
 
 #define HYPERVISOR_DATA_SECTIONS				\
 	HYP_SECTION_NAME(.rodata) : {				\
+		. = ALIGN(PAGE_SIZE);				\
 		__hyp_rodata_start = .;				\
 		*(HYP_SECTION_NAME(.data..ro_after_init))	\
 		*(HYP_SECTION_NAME(.rodata))			\
+		. = ALIGN(PAGE_SIZE);				\
 		__hyp_rodata_end = .;				\
 	}
 
@@ -72,21 +74,14 @@ ENTRY(_text)
 jiffies = jiffies_64;
 
 #define HYPERVISOR_TEXT					\
-	/*						\
-	 * Align to 4 KB so that			\
-	 * a) the HYP vector table is at its minimum	\
-	 *    alignment of 2048 bytes			\
-	 * b) the HYP init code will not cross a page	\
-	 *    boundary if its size does not exceed	\
-	 *    4 KB (see related ASSERT() below)		\
-	 */						\
-	. = ALIGN(SZ_4K);				\
+	. = ALIGN(PAGE_SIZE);				\
 	__hyp_idmap_text_start = .;			\
 	*(.hyp.idmap.text)				\
 	__hyp_idmap_text_end = .;			\
 	__hyp_text_start = .;				\
 	*(.hyp.text)					\
 	HYPERVISOR_EXTABLE				\
+	. = ALIGN(PAGE_SIZE);				\
 	__hyp_text_end = .;
 
 #define IDMAP_TEXT					\
@@ -322,11 +317,12 @@ SECTIONS
 #include "image-vars.h"
 
 /*
- * The HYP init code and ID map text can't be longer than a page each,
- * and should not cross a page boundary.
+ * The HYP init code and ID map text can't be longer than a page each. The
+ * former is page-aligned, but the latter may not be with 16K or 64K pages, so
+ * it should also not cross a page boundary.
  */
-ASSERT(__hyp_idmap_text_end - (__hyp_idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
-	"HYP init code too big or misaligned")
+ASSERT(__hyp_idmap_text_end - __hyp_idmap_text_start <= PAGE_SIZE,
+	"HYP init code too big")
 ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
 	"ID map text too big or misaligned")
 #ifdef CONFIG_HIBERNATION
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 31/32] KVM: arm64: Disable PMU support in protected mode
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 15:00   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 15:00 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

The host currently writes directly in EL2 per-CPU data sections from
the PMU code when running in nVHE. In preparation for unmapping the EL2
sections from the host stage 2, disable PMU support in protected mode as
we currently do not have a use-case for it.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/perf.c | 3 ++-
 arch/arm64/kvm/pmu.c  | 8 ++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/perf.c b/arch/arm64/kvm/perf.c
index 739164324afe..8f860ae56bb7 100644
--- a/arch/arm64/kvm/perf.c
+++ b/arch/arm64/kvm/perf.c
@@ -55,7 +55,8 @@ int kvm_perf_init(void)
 	 * hardware performance counters. This could ensure the presence of
 	 * a physical PMU and CONFIG_PERF_EVENT is selected.
 	 */
-	if (IS_ENABLED(CONFIG_ARM_PMU) && perf_num_counters() > 0)
+	if (IS_ENABLED(CONFIG_ARM_PMU) && perf_num_counters() > 0
+				       && !is_protected_kvm_enabled())
 		static_branch_enable(&kvm_arm_pmu_available);
 
 	return perf_register_guest_info_callbacks(&kvm_guest_cbs);
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index faf32a44ba04..03a6c1f4a09a 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -33,7 +33,7 @@ void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr)
 {
 	struct kvm_host_data *ctx = this_cpu_ptr_hyp_sym(kvm_host_data);
 
-	if (!ctx || !kvm_pmu_switch_needed(attr))
+	if (!kvm_arm_support_pmu_v3() || !ctx || !kvm_pmu_switch_needed(attr))
 		return;
 
 	if (!attr->exclude_host)
@@ -49,7 +49,7 @@ void kvm_clr_pmu_events(u32 clr)
 {
 	struct kvm_host_data *ctx = this_cpu_ptr_hyp_sym(kvm_host_data);
 
-	if (!ctx)
+	if (!kvm_arm_support_pmu_v3() || !ctx)
 		return;
 
 	ctx->pmu_events.events_host &= ~clr;
@@ -172,7 +172,7 @@ void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu)
 	struct kvm_host_data *host;
 	u32 events_guest, events_host;
 
-	if (!has_vhe())
+	if (!kvm_arm_support_pmu_v3() || !has_vhe())
 		return;
 
 	preempt_disable();
@@ -193,7 +193,7 @@ void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
 	struct kvm_host_data *host;
 	u32 events_guest, events_host;
 
-	if (!has_vhe())
+	if (!kvm_arm_support_pmu_v3() || !has_vhe())
 		return;
 
 	host = this_cpu_ptr_hyp_sym(kvm_host_data);
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 31/32] KVM: arm64: Disable PMU support in protected mode
@ 2021-03-02 15:00   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 15:00 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

The host currently writes directly in EL2 per-CPU data sections from
the PMU code when running in nVHE. In preparation for unmapping the EL2
sections from the host stage 2, disable PMU support in protected mode as
we currently do not have a use-case for it.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/perf.c | 3 ++-
 arch/arm64/kvm/pmu.c  | 8 ++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/perf.c b/arch/arm64/kvm/perf.c
index 739164324afe..8f860ae56bb7 100644
--- a/arch/arm64/kvm/perf.c
+++ b/arch/arm64/kvm/perf.c
@@ -55,7 +55,8 @@ int kvm_perf_init(void)
 	 * hardware performance counters. This could ensure the presence of
 	 * a physical PMU and CONFIG_PERF_EVENT is selected.
 	 */
-	if (IS_ENABLED(CONFIG_ARM_PMU) && perf_num_counters() > 0)
+	if (IS_ENABLED(CONFIG_ARM_PMU) && perf_num_counters() > 0
+				       && !is_protected_kvm_enabled())
 		static_branch_enable(&kvm_arm_pmu_available);
 
 	return perf_register_guest_info_callbacks(&kvm_guest_cbs);
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index faf32a44ba04..03a6c1f4a09a 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -33,7 +33,7 @@ void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr)
 {
 	struct kvm_host_data *ctx = this_cpu_ptr_hyp_sym(kvm_host_data);
 
-	if (!ctx || !kvm_pmu_switch_needed(attr))
+	if (!kvm_arm_support_pmu_v3() || !ctx || !kvm_pmu_switch_needed(attr))
 		return;
 
 	if (!attr->exclude_host)
@@ -49,7 +49,7 @@ void kvm_clr_pmu_events(u32 clr)
 {
 	struct kvm_host_data *ctx = this_cpu_ptr_hyp_sym(kvm_host_data);
 
-	if (!ctx)
+	if (!kvm_arm_support_pmu_v3() || !ctx)
 		return;
 
 	ctx->pmu_events.events_host &= ~clr;
@@ -172,7 +172,7 @@ void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu)
 	struct kvm_host_data *host;
 	u32 events_guest, events_host;
 
-	if (!has_vhe())
+	if (!kvm_arm_support_pmu_v3() || !has_vhe())
 		return;
 
 	preempt_disable();
@@ -193,7 +193,7 @@ void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
 	struct kvm_host_data *host;
 	u32 events_guest, events_host;
 
-	if (!has_vhe())
+	if (!kvm_arm_support_pmu_v3() || !has_vhe())
 		return;
 
 	host = this_cpu_ptr_hyp_sym(kvm_host_data);
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 31/32] KVM: arm64: Disable PMU support in protected mode
@ 2021-03-02 15:00   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 15:00 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

The host currently writes directly in EL2 per-CPU data sections from
the PMU code when running in nVHE. In preparation for unmapping the EL2
sections from the host stage 2, disable PMU support in protected mode as
we currently do not have a use-case for it.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/perf.c | 3 ++-
 arch/arm64/kvm/pmu.c  | 8 ++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/perf.c b/arch/arm64/kvm/perf.c
index 739164324afe..8f860ae56bb7 100644
--- a/arch/arm64/kvm/perf.c
+++ b/arch/arm64/kvm/perf.c
@@ -55,7 +55,8 @@ int kvm_perf_init(void)
 	 * hardware performance counters. This could ensure the presence of
 	 * a physical PMU and CONFIG_PERF_EVENT is selected.
 	 */
-	if (IS_ENABLED(CONFIG_ARM_PMU) && perf_num_counters() > 0)
+	if (IS_ENABLED(CONFIG_ARM_PMU) && perf_num_counters() > 0
+				       && !is_protected_kvm_enabled())
 		static_branch_enable(&kvm_arm_pmu_available);
 
 	return perf_register_guest_info_callbacks(&kvm_guest_cbs);
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index faf32a44ba04..03a6c1f4a09a 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -33,7 +33,7 @@ void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr)
 {
 	struct kvm_host_data *ctx = this_cpu_ptr_hyp_sym(kvm_host_data);
 
-	if (!ctx || !kvm_pmu_switch_needed(attr))
+	if (!kvm_arm_support_pmu_v3() || !ctx || !kvm_pmu_switch_needed(attr))
 		return;
 
 	if (!attr->exclude_host)
@@ -49,7 +49,7 @@ void kvm_clr_pmu_events(u32 clr)
 {
 	struct kvm_host_data *ctx = this_cpu_ptr_hyp_sym(kvm_host_data);
 
-	if (!ctx)
+	if (!kvm_arm_support_pmu_v3() || !ctx)
 		return;
 
 	ctx->pmu_events.events_host &= ~clr;
@@ -172,7 +172,7 @@ void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu)
 	struct kvm_host_data *host;
 	u32 events_guest, events_host;
 
-	if (!has_vhe())
+	if (!kvm_arm_support_pmu_v3() || !has_vhe())
 		return;
 
 	preempt_disable();
@@ -193,7 +193,7 @@ void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
 	struct kvm_host_data *host;
 	u32 events_guest, events_host;
 
-	if (!has_vhe())
+	if (!kvm_arm_support_pmu_v3() || !has_vhe())
 		return;
 
 	host = this_cpu_ptr_hyp_sym(kvm_host_data);
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 32/32] KVM: arm64: Protect the .hyp sections from the host
  2021-03-02 14:59 ` Quentin Perret
  (?)
@ 2021-03-02 15:00   ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 15:00 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

When KVM runs in nVHE protected mode, use the host stage 2 to unmap the
hypervisor sections. The long-term goal is to ensure the EL2 code can
remain robust regardless of the host's state, so this starts by making
sure the host cannot e.g. write to the .hyp sections directly.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h              |  1 +
 arch/arm64/kvm/arm.c                          | 46 +++++++++++++++++++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  9 ++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 22 +++++++++
 5 files changed, 80 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index b127af02bd45..9accf5350858 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -62,6 +62,7 @@
 #define __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping	17
 #define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
 #define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize		19
+#define __KVM_HOST_SMCCC_FUNC___pkvm_host_unmap			20
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a31c56bc55b3..73c26d206542 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1894,11 +1894,57 @@ void _kvm_host_prot_finalize(void *discard)
 	WARN_ON(kvm_call_hyp_nvhe(__pkvm_prot_finalize));
 }
 
+static inline int pkvm_host_unmap(phys_addr_t start, phys_addr_t end)
+{
+	return kvm_call_hyp_nvhe(__pkvm_host_unmap, start, end);
+}
+
+#define pkvm_host_unmap_section(__section)		\
+	pkvm_host_unmap(__pa_symbol(__section##_start),	\
+			__pa_symbol(__section##_end))
+
 static int finalize_hyp_mode(void)
 {
+	int cpu, ret;
+
 	if (!is_protected_kvm_enabled())
 		return 0;
 
+	ret = pkvm_host_unmap_section(__hyp_idmap_text);
+	if (ret)
+		return ret;
+
+	ret = pkvm_host_unmap_section(__hyp_text);
+	if (ret)
+		return ret;
+
+	ret = pkvm_host_unmap_section(__hyp_rodata);
+	if (ret)
+		return ret;
+
+	ret = pkvm_host_unmap_section(__hyp_bss);
+	if (ret)
+		return ret;
+
+	ret = pkvm_host_unmap(hyp_mem_base, hyp_mem_base + hyp_mem_size);
+	if (ret)
+		return ret;
+
+	for_each_possible_cpu(cpu) {
+		phys_addr_t start = virt_to_phys((void *)kvm_arm_hyp_percpu_base[cpu]);
+		phys_addr_t end = start + (PAGE_SIZE << nvhe_percpu_order());
+
+		ret = pkvm_host_unmap(start, end);
+		if (ret)
+			return ret;
+
+		start = virt_to_phys((void *)per_cpu(kvm_arm_hyp_stack_page, cpu));
+		end = start + PAGE_SIZE;
+		ret = pkvm_host_unmap(start, end);
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * Flip the static key upfront as that may no longer be possible
 	 * once the host stage 2 is installed.
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index d293cb328cc4..39890d4f1dc8 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -21,6 +21,8 @@ struct host_kvm {
 extern struct host_kvm host_kvm;
 
 int __pkvm_prot_finalize(void);
+int __pkvm_host_unmap(phys_addr_t start, phys_addr_t end);
+
 int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool);
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index f47028d3fd0a..2069136fdaec 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -156,6 +156,14 @@ static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
 {
 	cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
 }
+
+static void handle___pkvm_host_unmap(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(phys_addr_t, start, host_ctxt, 1);
+	DECLARE_REG(phys_addr_t, end, host_ctxt, 2);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_host_unmap(start, end);
+}
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -180,6 +188,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_create_mappings),
 	HANDLE_FUNC(__pkvm_create_private_mapping),
 	HANDLE_FUNC(__pkvm_prot_finalize),
+	HANDLE_FUNC(__pkvm_host_unmap),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 2252ad1a8945..ed480facdc88 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -196,6 +196,28 @@ static int host_stage2_idmap(u64 addr)
 	return ret;
 }
 
+int __pkvm_host_unmap(phys_addr_t start, phys_addr_t end)
+{
+	struct kvm_mem_range r1, r2;
+	int ret;
+
+	/*
+	 * host_stage2_unmap_dev_all() currently relies on MMIO mappings being
+	 * non-persistent, so don't allow PROT_NONE in MMIO range.
+	 */
+	if (!find_mem_range(start, &r1) || !find_mem_range(end, &r2))
+		return -EINVAL;
+	if (r1.start != r2.start)
+		return -EINVAL;
+
+	hyp_spin_lock(&host_kvm.lock);
+	ret = kvm_pgtable_stage2_map(&host_kvm.pgt, start, end - start, start,
+				     KVM_PGTABLE_PROT_NONE, &host_s2_mem);
+	hyp_spin_unlock(&host_kvm.lock);
+
+	return ret;
+}
+
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 {
 	struct kvm_vcpu_fault_info fault;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 32/32] KVM: arm64: Protect the .hyp sections from the host
@ 2021-03-02 15:00   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 15:00 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, seanjc, mate.toth-pal, linux-kernel, robh+dt,
	linux-arm-kernel, kernel-team, kvmarm, tabba

When KVM runs in nVHE protected mode, use the host stage 2 to unmap the
hypervisor sections. The long-term goal is to ensure the EL2 code can
remain robust regardless of the host's state, so this starts by making
sure the host cannot e.g. write to the .hyp sections directly.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h              |  1 +
 arch/arm64/kvm/arm.c                          | 46 +++++++++++++++++++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  9 ++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 22 +++++++++
 5 files changed, 80 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index b127af02bd45..9accf5350858 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -62,6 +62,7 @@
 #define __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping	17
 #define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
 #define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize		19
+#define __KVM_HOST_SMCCC_FUNC___pkvm_host_unmap			20
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a31c56bc55b3..73c26d206542 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1894,11 +1894,57 @@ void _kvm_host_prot_finalize(void *discard)
 	WARN_ON(kvm_call_hyp_nvhe(__pkvm_prot_finalize));
 }
 
+static inline int pkvm_host_unmap(phys_addr_t start, phys_addr_t end)
+{
+	return kvm_call_hyp_nvhe(__pkvm_host_unmap, start, end);
+}
+
+#define pkvm_host_unmap_section(__section)		\
+	pkvm_host_unmap(__pa_symbol(__section##_start),	\
+			__pa_symbol(__section##_end))
+
 static int finalize_hyp_mode(void)
 {
+	int cpu, ret;
+
 	if (!is_protected_kvm_enabled())
 		return 0;
 
+	ret = pkvm_host_unmap_section(__hyp_idmap_text);
+	if (ret)
+		return ret;
+
+	ret = pkvm_host_unmap_section(__hyp_text);
+	if (ret)
+		return ret;
+
+	ret = pkvm_host_unmap_section(__hyp_rodata);
+	if (ret)
+		return ret;
+
+	ret = pkvm_host_unmap_section(__hyp_bss);
+	if (ret)
+		return ret;
+
+	ret = pkvm_host_unmap(hyp_mem_base, hyp_mem_base + hyp_mem_size);
+	if (ret)
+		return ret;
+
+	for_each_possible_cpu(cpu) {
+		phys_addr_t start = virt_to_phys((void *)kvm_arm_hyp_percpu_base[cpu]);
+		phys_addr_t end = start + (PAGE_SIZE << nvhe_percpu_order());
+
+		ret = pkvm_host_unmap(start, end);
+		if (ret)
+			return ret;
+
+		start = virt_to_phys((void *)per_cpu(kvm_arm_hyp_stack_page, cpu));
+		end = start + PAGE_SIZE;
+		ret = pkvm_host_unmap(start, end);
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * Flip the static key upfront as that may no longer be possible
 	 * once the host stage 2 is installed.
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index d293cb328cc4..39890d4f1dc8 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -21,6 +21,8 @@ struct host_kvm {
 extern struct host_kvm host_kvm;
 
 int __pkvm_prot_finalize(void);
+int __pkvm_host_unmap(phys_addr_t start, phys_addr_t end);
+
 int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool);
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index f47028d3fd0a..2069136fdaec 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -156,6 +156,14 @@ static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
 {
 	cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
 }
+
+static void handle___pkvm_host_unmap(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(phys_addr_t, start, host_ctxt, 1);
+	DECLARE_REG(phys_addr_t, end, host_ctxt, 2);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_host_unmap(start, end);
+}
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -180,6 +188,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_create_mappings),
 	HANDLE_FUNC(__pkvm_create_private_mapping),
 	HANDLE_FUNC(__pkvm_prot_finalize),
+	HANDLE_FUNC(__pkvm_host_unmap),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 2252ad1a8945..ed480facdc88 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -196,6 +196,28 @@ static int host_stage2_idmap(u64 addr)
 	return ret;
 }
 
+int __pkvm_host_unmap(phys_addr_t start, phys_addr_t end)
+{
+	struct kvm_mem_range r1, r2;
+	int ret;
+
+	/*
+	 * host_stage2_unmap_dev_all() currently relies on MMIO mappings being
+	 * non-persistent, so don't allow PROT_NONE in MMIO range.
+	 */
+	if (!find_mem_range(start, &r1) || !find_mem_range(end, &r2))
+		return -EINVAL;
+	if (r1.start != r2.start)
+		return -EINVAL;
+
+	hyp_spin_lock(&host_kvm.lock);
+	ret = kvm_pgtable_stage2_map(&host_kvm.pgt, start, end - start, start,
+				     KVM_PGTABLE_PROT_NONE, &host_s2_mem);
+	hyp_spin_unlock(&host_kvm.lock);
+
+	return ret;
+}
+
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 {
 	struct kvm_vcpu_fault_info fault;
-- 
2.30.1.766.gb4fecdf3b7-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* [PATCH v3 32/32] KVM: arm64: Protect the .hyp sections from the host
@ 2021-03-02 15:00   ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-02 15:00 UTC (permalink / raw)
  To: catalin.marinas, will, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose
  Cc: android-kvm, linux-kernel, kernel-team, kvmarm, linux-arm-kernel,
	tabba, mark.rutland, dbrazdil, mate.toth-pal, seanjc, qperret,
	robh+dt

When KVM runs in nVHE protected mode, use the host stage 2 to unmap the
hypervisor sections. The long-term goal is to ensure the EL2 code can
remain robust regardless of the host's state, so this starts by making
sure the host cannot e.g. write to the .hyp sections directly.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h              |  1 +
 arch/arm64/kvm/arm.c                          | 46 +++++++++++++++++++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  9 ++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 22 +++++++++
 5 files changed, 80 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index b127af02bd45..9accf5350858 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -62,6 +62,7 @@
 #define __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping	17
 #define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
 #define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize		19
+#define __KVM_HOST_SMCCC_FUNC___pkvm_host_unmap			20
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a31c56bc55b3..73c26d206542 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1894,11 +1894,57 @@ void _kvm_host_prot_finalize(void *discard)
 	WARN_ON(kvm_call_hyp_nvhe(__pkvm_prot_finalize));
 }
 
+static inline int pkvm_host_unmap(phys_addr_t start, phys_addr_t end)
+{
+	return kvm_call_hyp_nvhe(__pkvm_host_unmap, start, end);
+}
+
+#define pkvm_host_unmap_section(__section)		\
+	pkvm_host_unmap(__pa_symbol(__section##_start),	\
+			__pa_symbol(__section##_end))
+
 static int finalize_hyp_mode(void)
 {
+	int cpu, ret;
+
 	if (!is_protected_kvm_enabled())
 		return 0;
 
+	ret = pkvm_host_unmap_section(__hyp_idmap_text);
+	if (ret)
+		return ret;
+
+	ret = pkvm_host_unmap_section(__hyp_text);
+	if (ret)
+		return ret;
+
+	ret = pkvm_host_unmap_section(__hyp_rodata);
+	if (ret)
+		return ret;
+
+	ret = pkvm_host_unmap_section(__hyp_bss);
+	if (ret)
+		return ret;
+
+	ret = pkvm_host_unmap(hyp_mem_base, hyp_mem_base + hyp_mem_size);
+	if (ret)
+		return ret;
+
+	for_each_possible_cpu(cpu) {
+		phys_addr_t start = virt_to_phys((void *)kvm_arm_hyp_percpu_base[cpu]);
+		phys_addr_t end = start + (PAGE_SIZE << nvhe_percpu_order());
+
+		ret = pkvm_host_unmap(start, end);
+		if (ret)
+			return ret;
+
+		start = virt_to_phys((void *)per_cpu(kvm_arm_hyp_stack_page, cpu));
+		end = start + PAGE_SIZE;
+		ret = pkvm_host_unmap(start, end);
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * Flip the static key upfront as that may no longer be possible
 	 * once the host stage 2 is installed.
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index d293cb328cc4..39890d4f1dc8 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -21,6 +21,8 @@ struct host_kvm {
 extern struct host_kvm host_kvm;
 
 int __pkvm_prot_finalize(void);
+int __pkvm_host_unmap(phys_addr_t start, phys_addr_t end);
+
 int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool);
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index f47028d3fd0a..2069136fdaec 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -156,6 +156,14 @@ static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
 {
 	cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
 }
+
+static void handle___pkvm_host_unmap(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(phys_addr_t, start, host_ctxt, 1);
+	DECLARE_REG(phys_addr_t, end, host_ctxt, 2);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_host_unmap(start, end);
+}
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -180,6 +188,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_create_mappings),
 	HANDLE_FUNC(__pkvm_create_private_mapping),
 	HANDLE_FUNC(__pkvm_prot_finalize),
+	HANDLE_FUNC(__pkvm_host_unmap),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 2252ad1a8945..ed480facdc88 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -196,6 +196,28 @@ static int host_stage2_idmap(u64 addr)
 	return ret;
 }
 
+int __pkvm_host_unmap(phys_addr_t start, phys_addr_t end)
+{
+	struct kvm_mem_range r1, r2;
+	int ret;
+
+	/*
+	 * host_stage2_unmap_dev_all() currently relies on MMIO mappings being
+	 * non-persistent, so don't allow PROT_NONE in MMIO range.
+	 */
+	if (!find_mem_range(start, &r1) || !find_mem_range(end, &r2))
+		return -EINVAL;
+	if (r1.start != r2.start)
+		return -EINVAL;
+
+	hyp_spin_lock(&host_kvm.lock);
+	ret = kvm_pgtable_stage2_map(&host_kvm.pgt, start, end - start, start,
+				     KVM_PGTABLE_PROT_NONE, &host_s2_mem);
+	hyp_spin_unlock(&host_kvm.lock);
+
+	return ret;
+}
+
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 {
 	struct kvm_vcpu_fault_info fault;
-- 
2.30.1.766.gb4fecdf3b7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 04/32] KVM: arm64: Initialize kvm_nvhe_init_params early
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-04 13:39     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 13:39 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:34PM +0000, Quentin Perret wrote:
> Move the initialization of kvm_nvhe_init_params in a dedicated function
> that is run early, and only once during KVM init, rather than every time
> the KVM vectors are set and reset.
> 
> This also opens the opportunity for the hypervisor to change the init
> structs during boot, hence simplifying the replacement of host-provided
> page-table by the one the hypervisor will create for itself.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/arm.c | 30 ++++++++++++++++++------------
>  1 file changed, 18 insertions(+), 12 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 04/32] KVM: arm64: Initialize kvm_nvhe_init_params early
@ 2021-03-04 13:39     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 13:39 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:34PM +0000, Quentin Perret wrote:
> Move the initialization of kvm_nvhe_init_params in a dedicated function
> that is run early, and only once during KVM init, rather than every time
> the KVM vectors are set and reset.
> 
> This also opens the opportunity for the hypervisor to change the init
> structs during boot, hence simplifying the replacement of host-provided
> page-table by the one the hypervisor will create for itself.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/arm.c | 30 ++++++++++++++++++------------
>  1 file changed, 18 insertions(+), 12 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 04/32] KVM: arm64: Initialize kvm_nvhe_init_params early
@ 2021-03-04 13:39     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 13:39 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:34PM +0000, Quentin Perret wrote:
> Move the initialization of kvm_nvhe_init_params in a dedicated function
> that is run early, and only once during KVM init, rather than every time
> the KVM vectors are set and reset.
> 
> This also opens the opportunity for the hypervisor to change the init
> structs during boot, hence simplifying the replacement of host-provided
> page-table by the one the hypervisor will create for itself.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/arm.c | 30 ++++++++++++++++++------------
>  1 file changed, 18 insertions(+), 12 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 06/32] KVM: arm64: Factor memory allocation out of pgtable.c
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-04 14:06     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 14:06 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:36PM +0000, Quentin Perret wrote:
> In preparation for enabling the creation of page-tables at EL2, factor
> all memory allocation out of the page-table code, hence making it
> re-usable with any compatible memory allocator.
> 
> No functional changes intended.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 41 +++++++++++-
>  arch/arm64/kvm/hyp/pgtable.c         | 98 +++++++++++++++++-----------
>  arch/arm64/kvm/mmu.c                 | 66 ++++++++++++++++++-
>  3 files changed, 163 insertions(+), 42 deletions(-)

Just a few nits:

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 8886d43cfb11..3c306f90f7da 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -13,17 +13,50 @@
>  
>  typedef u64 kvm_pte_t;
>  
> +/**
> + * struct kvm_pgtable_mm_ops - Memory management callbacks.
> + * @zalloc_page:	Allocate a single zeroed memory page. The @arg parameter
> + *			can be used by the walker to pass a memcache. The
> + *			initial refcount of the page is 1.
> + * @zalloc_pages_exact:	Allocate an exact number of zeroed memory pages. The
> + *			@size parameter is in bytes, it is automatically rounded
> + *			to PAGE_SIZE and the resulting allocation is physically
> + *			contiguous.

It's not clear to me whether "it is automatically rounded to PAGE_SIZE"
means that the caller or the callee does the rounding. If it's the caller,
maybe it would be better to pass the number of pages as an 'npages' argument
instead of the size in bytes?

> + * @free_pages_exact:	Free an exact number of memory pages, to free memory
> + *			allocated with zalloc_pages_exact.

"Free an exact number of memory pages previously allocated by
 zalloc_pages_exact()"


> + * @get_page:		Increment the refcount on a page.
> + * @put_page:		Decrement the refcount on a page. When the refcount
> + *			reaches 0 the page is automatically freed.
> + * @page_count:		Return the refcount of a page.
> + * @phys_to_virt:	Convert a physical address into a virtual address as
> + *			accessible in the current context.

s/as accessible/mapped/

With those changes:

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 06/32] KVM: arm64: Factor memory allocation out of pgtable.c
@ 2021-03-04 14:06     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 14:06 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:36PM +0000, Quentin Perret wrote:
> In preparation for enabling the creation of page-tables at EL2, factor
> all memory allocation out of the page-table code, hence making it
> re-usable with any compatible memory allocator.
> 
> No functional changes intended.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 41 +++++++++++-
>  arch/arm64/kvm/hyp/pgtable.c         | 98 +++++++++++++++++-----------
>  arch/arm64/kvm/mmu.c                 | 66 ++++++++++++++++++-
>  3 files changed, 163 insertions(+), 42 deletions(-)

Just a few nits:

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 8886d43cfb11..3c306f90f7da 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -13,17 +13,50 @@
>  
>  typedef u64 kvm_pte_t;
>  
> +/**
> + * struct kvm_pgtable_mm_ops - Memory management callbacks.
> + * @zalloc_page:	Allocate a single zeroed memory page. The @arg parameter
> + *			can be used by the walker to pass a memcache. The
> + *			initial refcount of the page is 1.
> + * @zalloc_pages_exact:	Allocate an exact number of zeroed memory pages. The
> + *			@size parameter is in bytes, it is automatically rounded
> + *			to PAGE_SIZE and the resulting allocation is physically
> + *			contiguous.

It's not clear to me whether "it is automatically rounded to PAGE_SIZE"
means that the caller or the callee does the rounding. If it's the caller,
maybe it would be better to pass the number of pages as an 'npages' argument
instead of the size in bytes?

> + * @free_pages_exact:	Free an exact number of memory pages, to free memory
> + *			allocated with zalloc_pages_exact.

"Free an exact number of memory pages previously allocated by
 zalloc_pages_exact()"


> + * @get_page:		Increment the refcount on a page.
> + * @put_page:		Decrement the refcount on a page. When the refcount
> + *			reaches 0 the page is automatically freed.
> + * @page_count:		Return the refcount of a page.
> + * @phys_to_virt:	Convert a physical address into a virtual address as
> + *			accessible in the current context.

s/as accessible/mapped/

With those changes:

Acked-by: Will Deacon <will@kernel.org>

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 06/32] KVM: arm64: Factor memory allocation out of pgtable.c
@ 2021-03-04 14:06     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 14:06 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:36PM +0000, Quentin Perret wrote:
> In preparation for enabling the creation of page-tables at EL2, factor
> all memory allocation out of the page-table code, hence making it
> re-usable with any compatible memory allocator.
> 
> No functional changes intended.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 41 +++++++++++-
>  arch/arm64/kvm/hyp/pgtable.c         | 98 +++++++++++++++++-----------
>  arch/arm64/kvm/mmu.c                 | 66 ++++++++++++++++++-
>  3 files changed, 163 insertions(+), 42 deletions(-)

Just a few nits:

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 8886d43cfb11..3c306f90f7da 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -13,17 +13,50 @@
>  
>  typedef u64 kvm_pte_t;
>  
> +/**
> + * struct kvm_pgtable_mm_ops - Memory management callbacks.
> + * @zalloc_page:	Allocate a single zeroed memory page. The @arg parameter
> + *			can be used by the walker to pass a memcache. The
> + *			initial refcount of the page is 1.
> + * @zalloc_pages_exact:	Allocate an exact number of zeroed memory pages. The
> + *			@size parameter is in bytes, it is automatically rounded
> + *			to PAGE_SIZE and the resulting allocation is physically
> + *			contiguous.

It's not clear to me whether "it is automatically rounded to PAGE_SIZE"
means that the caller or the callee does the rounding. If it's the caller,
maybe it would be better to pass the number of pages as an 'npages' argument
instead of the size in bytes?

> + * @free_pages_exact:	Free an exact number of memory pages, to free memory
> + *			allocated with zalloc_pages_exact.

"Free an exact number of memory pages previously allocated by
 zalloc_pages_exact()"


> + * @get_page:		Increment the refcount on a page.
> + * @put_page:		Decrement the refcount on a page. When the refcount
> + *			reaches 0 the page is automatically freed.
> + * @page_count:		Return the refcount of a page.
> + * @phys_to_virt:	Convert a physical address into a virtual address as
> + *			accessible in the current context.

s/as accessible/mapped/

With those changes:

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 07/32] KVM: arm64: Introduce a BSS section for use at Hyp
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-04 14:09     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 14:09 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:37PM +0000, Quentin Perret wrote:
> Currently, the hyp code cannot make full use of a bss, as the kernel
> section is mapped read-only.
> 
> While this mapping could simply be changed to read-write, it would
> intermingle even more the hyp and kernel state than they currently are.
> Instead, introduce a __hyp_bss section, that uses reserved pages, and
> create the appropriate RW hyp mappings during KVM init.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/sections.h |  1 +
>  arch/arm64/kernel/vmlinux.lds.S   | 52 ++++++++++++++++++++-----------
>  arch/arm64/kvm/arm.c              | 14 ++++++++-
>  arch/arm64/kvm/hyp/nvhe/hyp.lds.S |  1 +
>  4 files changed, 49 insertions(+), 19 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 07/32] KVM: arm64: Introduce a BSS section for use at Hyp
@ 2021-03-04 14:09     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 14:09 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:37PM +0000, Quentin Perret wrote:
> Currently, the hyp code cannot make full use of a bss, as the kernel
> section is mapped read-only.
> 
> While this mapping could simply be changed to read-write, it would
> intermingle even more the hyp and kernel state than they currently are.
> Instead, introduce a __hyp_bss section, that uses reserved pages, and
> create the appropriate RW hyp mappings during KVM init.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/sections.h |  1 +
>  arch/arm64/kernel/vmlinux.lds.S   | 52 ++++++++++++++++++++-----------
>  arch/arm64/kvm/arm.c              | 14 ++++++++-
>  arch/arm64/kvm/hyp/nvhe/hyp.lds.S |  1 +
>  4 files changed, 49 insertions(+), 19 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 07/32] KVM: arm64: Introduce a BSS section for use at Hyp
@ 2021-03-04 14:09     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 14:09 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:37PM +0000, Quentin Perret wrote:
> Currently, the hyp code cannot make full use of a bss, as the kernel
> section is mapped read-only.
> 
> While this mapping could simply be changed to read-write, it would
> intermingle even more the hyp and kernel state than they currently are.
> Instead, introduce a __hyp_bss section, that uses reserved pages, and
> create the appropriate RW hyp mappings during KVM init.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/sections.h |  1 +
>  arch/arm64/kernel/vmlinux.lds.S   | 52 ++++++++++++++++++++-----------
>  arch/arm64/kvm/arm.c              | 14 ++++++++-
>  arch/arm64/kvm/hyp/nvhe/hyp.lds.S |  1 +
>  4 files changed, 49 insertions(+), 19 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 10/32] KVM: arm64: Introduce an early Hyp page allocator
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-04 14:38     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 14:38 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:40PM +0000, Quentin Perret wrote:
> With nVHE, the host currently creates all stage 1 hypervisor mappings at
> EL1 during boot, installs them at EL2, and extends them as required
> (e.g. when creating a new VM). But in a world where the host is no
> longer trusted, it cannot have full control over the code mapped in the
> hypervisor.
> 
> In preparation for enabling the hypervisor to create its own stage 1
> mappings during boot, introduce an early page allocator, with minimal
> functionality. This allocator is designed to be used only during early
> bootstrap of the hyp code when memory protection is enabled, which will
> then switch to using a full-fledged page allocator after init.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/early_alloc.h | 14 +++++
>  arch/arm64/kvm/hyp/include/nvhe/memory.h      | 24 +++++++++
>  arch/arm64/kvm/hyp/nvhe/Makefile              |  2 +-
>  arch/arm64/kvm/hyp/nvhe/early_alloc.c         | 54 +++++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/psci-relay.c          |  4 +-
>  5 files changed, 94 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/memory.h
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/early_alloc.c

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 10/32] KVM: arm64: Introduce an early Hyp page allocator
@ 2021-03-04 14:38     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 14:38 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:40PM +0000, Quentin Perret wrote:
> With nVHE, the host currently creates all stage 1 hypervisor mappings at
> EL1 during boot, installs them at EL2, and extends them as required
> (e.g. when creating a new VM). But in a world where the host is no
> longer trusted, it cannot have full control over the code mapped in the
> hypervisor.
> 
> In preparation for enabling the hypervisor to create its own stage 1
> mappings during boot, introduce an early page allocator, with minimal
> functionality. This allocator is designed to be used only during early
> bootstrap of the hyp code when memory protection is enabled, which will
> then switch to using a full-fledged page allocator after init.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/early_alloc.h | 14 +++++
>  arch/arm64/kvm/hyp/include/nvhe/memory.h      | 24 +++++++++
>  arch/arm64/kvm/hyp/nvhe/Makefile              |  2 +-
>  arch/arm64/kvm/hyp/nvhe/early_alloc.c         | 54 +++++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/psci-relay.c          |  4 +-
>  5 files changed, 94 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/memory.h
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/early_alloc.c

Acked-by: Will Deacon <will@kernel.org>

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 10/32] KVM: arm64: Introduce an early Hyp page allocator
@ 2021-03-04 14:38     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 14:38 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:40PM +0000, Quentin Perret wrote:
> With nVHE, the host currently creates all stage 1 hypervisor mappings at
> EL1 during boot, installs them at EL2, and extends them as required
> (e.g. when creating a new VM). But in a world where the host is no
> longer trusted, it cannot have full control over the code mapped in the
> hypervisor.
> 
> In preparation for enabling the hypervisor to create its own stage 1
> mappings during boot, introduce an early page allocator, with minimal
> functionality. This allocator is designed to be used only during early
> bootstrap of the hyp code when memory protection is enabled, which will
> then switch to using a full-fledged page allocator after init.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/early_alloc.h | 14 +++++
>  arch/arm64/kvm/hyp/include/nvhe/memory.h      | 24 +++++++++
>  arch/arm64/kvm/hyp/nvhe/Makefile              |  2 +-
>  arch/arm64/kvm/hyp/nvhe/early_alloc.c         | 54 +++++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/psci-relay.c          |  4 +-
>  5 files changed, 94 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/memory.h
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/early_alloc.c

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 12/32] KVM: arm64: Introduce a Hyp buddy page allocator
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-04 15:30     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 15:30 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:42PM +0000, Quentin Perret wrote:
> When memory protection is enabled, the hyp code will require a basic
> form of memory management in order to allocate and free memory pages at
> EL2. This is needed for various use-cases, including the creation of hyp
> mappings or the allocation of stage 2 page tables.
> 
> To address these use-case, introduce a simple memory allocator in the
> hyp code. The allocator is designed as a conventional 'buddy allocator',
> working with a page granularity. It allows to allocate and free
> physically contiguous pages from memory 'pools', with a guaranteed order
> alignment in the PA space. Each page in a memory pool is associated
> with a struct hyp_page which holds the page's metadata, including its
> refcount, as well as its current order, hence mimicking the kernel's
> buddy system in the GFP infrastructure. The hyp_page metadata are made
> accessible through a hyp_vmemmap, following the concept of
> SPARSE_VMEMMAP in the kernel.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  55 +++++++
>  arch/arm64/kvm/hyp/include/nvhe/memory.h |  28 ++++
>  arch/arm64/kvm/hyp/nvhe/Makefile         |   2 +-
>  arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 195 +++++++++++++++++++++++
>  4 files changed, 279 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/gfp.h
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/page_alloc.c

[...]

> +static void __hyp_attach_page(struct hyp_pool *pool,
> +			      struct hyp_page *p)
> +{
> +	unsigned int order = p->order;
> +	struct hyp_page *buddy;
> +
> +	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
> +
> +	/*
> +	 * Only the first struct hyp_page of a high-order page (otherwise known
> +	 * as the 'head') should have p->order set. The non-head pages should
> +	 * have p->order = HYP_NO_ORDER. Here @p may no longer be the head
> +	 * after coallescing, so make sure to mark it HYP_NO_ORDER proactively.
> +	 */
> +	p->order = HYP_NO_ORDER;
> +	for (; (order + 1) < pool->max_order; order++) {
> +		buddy = __find_buddy_avail(pool, p, order);
> +		if (!buddy)
> +			break;
> +
> +		/* Take the buddy out of its list, and coallesce with @p */
> +		list_del_init(&buddy->node);
> +		buddy->order = HYP_NO_ORDER;
> +		p = (p < buddy) ? p : buddy;

nit: this is min()

> +	}
> +
> +	/* Mark the new head, and insert it */
> +	p->order = order;
> +	list_add_tail(&p->node, &pool->free_area[order]);
> +}
> +
> +static void hyp_attach_page(struct hyp_page *p)
> +{
> +	struct hyp_pool *pool = hyp_page_to_pool(p);
> +
> +	hyp_spin_lock(&pool->lock);
> +	__hyp_attach_page(pool, p);
> +	hyp_spin_unlock(&pool->lock);
> +}
> +
> +static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
> +					   struct hyp_page *p,
> +					   unsigned int order)
> +{
> +	struct hyp_page *buddy;
> +
> +	list_del_init(&p->node);
> +	while (p->order > order) {
> +		/*
> +		 * The buddy of order n - 1 currently has HYP_NO_ORDER as it
> +		 * is covered by a higher-level page (whose head is @p). Use
> +		 * __find_buddy_nocheck() to find it and inject it in the
> +		 * free_list[n - 1], effectively splitting @p in half.
> +		 */
> +		p->order--;
> +		buddy = __find_buddy_nocheck(pool, p, p->order);
> +		buddy->order = p->order;
> +		list_add_tail(&buddy->node, &pool->free_area[buddy->order]);
> +	}
> +
> +	return p;
> +}
> +
> +void hyp_put_page(void *addr)
> +{
> +	struct hyp_page *p = hyp_virt_to_page(addr);
> +
> +	if (hyp_page_ref_dec_and_test(p))
> +		hyp_attach_page(p);
> +}
> +
> +void hyp_get_page(void *addr)
> +{
> +	struct hyp_page *p = hyp_virt_to_page(addr);
> +
> +	hyp_page_ref_inc(p);
> +}
> +
> +void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order)
> +{
> +	unsigned int i = order;
> +	struct hyp_page *p;
> +
> +	hyp_spin_lock(&pool->lock);
> +
> +	/* Look for a high-enough-order page */
> +	while (i < pool->max_order && list_empty(&pool->free_area[i]))
> +		i++;
> +	if (i >= pool->max_order) {
> +		hyp_spin_unlock(&pool->lock);
> +		return NULL;
> +	}
> +
> +	/* Extract it from the tree at the right order */
> +	p = list_first_entry(&pool->free_area[i], struct hyp_page, node);
> +	p = __hyp_extract_page(pool, p, order);
> +
> +	hyp_spin_unlock(&pool->lock);
> +	hyp_page_ref_inc(p);

I find this a little scary, as we momentarily drop the lock. It think
it's ok because the reference count on the page must be 0 at this point,
but actually then I think it would be clearer to have a
hyp_page_ref_init() function which could take the lock, check that the
refcount is indeed 0 and then set it to 1.

What do you think?

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 12/32] KVM: arm64: Introduce a Hyp buddy page allocator
@ 2021-03-04 15:30     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 15:30 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:42PM +0000, Quentin Perret wrote:
> When memory protection is enabled, the hyp code will require a basic
> form of memory management in order to allocate and free memory pages at
> EL2. This is needed for various use-cases, including the creation of hyp
> mappings or the allocation of stage 2 page tables.
> 
> To address these use-case, introduce a simple memory allocator in the
> hyp code. The allocator is designed as a conventional 'buddy allocator',
> working with a page granularity. It allows to allocate and free
> physically contiguous pages from memory 'pools', with a guaranteed order
> alignment in the PA space. Each page in a memory pool is associated
> with a struct hyp_page which holds the page's metadata, including its
> refcount, as well as its current order, hence mimicking the kernel's
> buddy system in the GFP infrastructure. The hyp_page metadata are made
> accessible through a hyp_vmemmap, following the concept of
> SPARSE_VMEMMAP in the kernel.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  55 +++++++
>  arch/arm64/kvm/hyp/include/nvhe/memory.h |  28 ++++
>  arch/arm64/kvm/hyp/nvhe/Makefile         |   2 +-
>  arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 195 +++++++++++++++++++++++
>  4 files changed, 279 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/gfp.h
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/page_alloc.c

[...]

> +static void __hyp_attach_page(struct hyp_pool *pool,
> +			      struct hyp_page *p)
> +{
> +	unsigned int order = p->order;
> +	struct hyp_page *buddy;
> +
> +	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
> +
> +	/*
> +	 * Only the first struct hyp_page of a high-order page (otherwise known
> +	 * as the 'head') should have p->order set. The non-head pages should
> +	 * have p->order = HYP_NO_ORDER. Here @p may no longer be the head
> +	 * after coallescing, so make sure to mark it HYP_NO_ORDER proactively.
> +	 */
> +	p->order = HYP_NO_ORDER;
> +	for (; (order + 1) < pool->max_order; order++) {
> +		buddy = __find_buddy_avail(pool, p, order);
> +		if (!buddy)
> +			break;
> +
> +		/* Take the buddy out of its list, and coallesce with @p */
> +		list_del_init(&buddy->node);
> +		buddy->order = HYP_NO_ORDER;
> +		p = (p < buddy) ? p : buddy;

nit: this is min()

> +	}
> +
> +	/* Mark the new head, and insert it */
> +	p->order = order;
> +	list_add_tail(&p->node, &pool->free_area[order]);
> +}
> +
> +static void hyp_attach_page(struct hyp_page *p)
> +{
> +	struct hyp_pool *pool = hyp_page_to_pool(p);
> +
> +	hyp_spin_lock(&pool->lock);
> +	__hyp_attach_page(pool, p);
> +	hyp_spin_unlock(&pool->lock);
> +}
> +
> +static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
> +					   struct hyp_page *p,
> +					   unsigned int order)
> +{
> +	struct hyp_page *buddy;
> +
> +	list_del_init(&p->node);
> +	while (p->order > order) {
> +		/*
> +		 * The buddy of order n - 1 currently has HYP_NO_ORDER as it
> +		 * is covered by a higher-level page (whose head is @p). Use
> +		 * __find_buddy_nocheck() to find it and inject it in the
> +		 * free_list[n - 1], effectively splitting @p in half.
> +		 */
> +		p->order--;
> +		buddy = __find_buddy_nocheck(pool, p, p->order);
> +		buddy->order = p->order;
> +		list_add_tail(&buddy->node, &pool->free_area[buddy->order]);
> +	}
> +
> +	return p;
> +}
> +
> +void hyp_put_page(void *addr)
> +{
> +	struct hyp_page *p = hyp_virt_to_page(addr);
> +
> +	if (hyp_page_ref_dec_and_test(p))
> +		hyp_attach_page(p);
> +}
> +
> +void hyp_get_page(void *addr)
> +{
> +	struct hyp_page *p = hyp_virt_to_page(addr);
> +
> +	hyp_page_ref_inc(p);
> +}
> +
> +void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order)
> +{
> +	unsigned int i = order;
> +	struct hyp_page *p;
> +
> +	hyp_spin_lock(&pool->lock);
> +
> +	/* Look for a high-enough-order page */
> +	while (i < pool->max_order && list_empty(&pool->free_area[i]))
> +		i++;
> +	if (i >= pool->max_order) {
> +		hyp_spin_unlock(&pool->lock);
> +		return NULL;
> +	}
> +
> +	/* Extract it from the tree at the right order */
> +	p = list_first_entry(&pool->free_area[i], struct hyp_page, node);
> +	p = __hyp_extract_page(pool, p, order);
> +
> +	hyp_spin_unlock(&pool->lock);
> +	hyp_page_ref_inc(p);

I find this a little scary, as we momentarily drop the lock. It think
it's ok because the reference count on the page must be 0 at this point,
but actually then I think it would be clearer to have a
hyp_page_ref_init() function which could take the lock, check that the
refcount is indeed 0 and then set it to 1.

What do you think?

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 12/32] KVM: arm64: Introduce a Hyp buddy page allocator
@ 2021-03-04 15:30     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 15:30 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:42PM +0000, Quentin Perret wrote:
> When memory protection is enabled, the hyp code will require a basic
> form of memory management in order to allocate and free memory pages at
> EL2. This is needed for various use-cases, including the creation of hyp
> mappings or the allocation of stage 2 page tables.
> 
> To address these use-case, introduce a simple memory allocator in the
> hyp code. The allocator is designed as a conventional 'buddy allocator',
> working with a page granularity. It allows to allocate and free
> physically contiguous pages from memory 'pools', with a guaranteed order
> alignment in the PA space. Each page in a memory pool is associated
> with a struct hyp_page which holds the page's metadata, including its
> refcount, as well as its current order, hence mimicking the kernel's
> buddy system in the GFP infrastructure. The hyp_page metadata are made
> accessible through a hyp_vmemmap, following the concept of
> SPARSE_VMEMMAP in the kernel.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  55 +++++++
>  arch/arm64/kvm/hyp/include/nvhe/memory.h |  28 ++++
>  arch/arm64/kvm/hyp/nvhe/Makefile         |   2 +-
>  arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 195 +++++++++++++++++++++++
>  4 files changed, 279 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/gfp.h
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/page_alloc.c

[...]

> +static void __hyp_attach_page(struct hyp_pool *pool,
> +			      struct hyp_page *p)
> +{
> +	unsigned int order = p->order;
> +	struct hyp_page *buddy;
> +
> +	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
> +
> +	/*
> +	 * Only the first struct hyp_page of a high-order page (otherwise known
> +	 * as the 'head') should have p->order set. The non-head pages should
> +	 * have p->order = HYP_NO_ORDER. Here @p may no longer be the head
> +	 * after coallescing, so make sure to mark it HYP_NO_ORDER proactively.
> +	 */
> +	p->order = HYP_NO_ORDER;
> +	for (; (order + 1) < pool->max_order; order++) {
> +		buddy = __find_buddy_avail(pool, p, order);
> +		if (!buddy)
> +			break;
> +
> +		/* Take the buddy out of its list, and coallesce with @p */
> +		list_del_init(&buddy->node);
> +		buddy->order = HYP_NO_ORDER;
> +		p = (p < buddy) ? p : buddy;

nit: this is min()

> +	}
> +
> +	/* Mark the new head, and insert it */
> +	p->order = order;
> +	list_add_tail(&p->node, &pool->free_area[order]);
> +}
> +
> +static void hyp_attach_page(struct hyp_page *p)
> +{
> +	struct hyp_pool *pool = hyp_page_to_pool(p);
> +
> +	hyp_spin_lock(&pool->lock);
> +	__hyp_attach_page(pool, p);
> +	hyp_spin_unlock(&pool->lock);
> +}
> +
> +static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
> +					   struct hyp_page *p,
> +					   unsigned int order)
> +{
> +	struct hyp_page *buddy;
> +
> +	list_del_init(&p->node);
> +	while (p->order > order) {
> +		/*
> +		 * The buddy of order n - 1 currently has HYP_NO_ORDER as it
> +		 * is covered by a higher-level page (whose head is @p). Use
> +		 * __find_buddy_nocheck() to find it and inject it in the
> +		 * free_list[n - 1], effectively splitting @p in half.
> +		 */
> +		p->order--;
> +		buddy = __find_buddy_nocheck(pool, p, p->order);
> +		buddy->order = p->order;
> +		list_add_tail(&buddy->node, &pool->free_area[buddy->order]);
> +	}
> +
> +	return p;
> +}
> +
> +void hyp_put_page(void *addr)
> +{
> +	struct hyp_page *p = hyp_virt_to_page(addr);
> +
> +	if (hyp_page_ref_dec_and_test(p))
> +		hyp_attach_page(p);
> +}
> +
> +void hyp_get_page(void *addr)
> +{
> +	struct hyp_page *p = hyp_virt_to_page(addr);
> +
> +	hyp_page_ref_inc(p);
> +}
> +
> +void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order)
> +{
> +	unsigned int i = order;
> +	struct hyp_page *p;
> +
> +	hyp_spin_lock(&pool->lock);
> +
> +	/* Look for a high-enough-order page */
> +	while (i < pool->max_order && list_empty(&pool->free_area[i]))
> +		i++;
> +	if (i >= pool->max_order) {
> +		hyp_spin_unlock(&pool->lock);
> +		return NULL;
> +	}
> +
> +	/* Extract it from the tree at the right order */
> +	p = list_first_entry(&pool->free_area[i], struct hyp_page, node);
> +	p = __hyp_extract_page(pool, p, order);
> +
> +	hyp_spin_unlock(&pool->lock);
> +	hyp_page_ref_inc(p);

I find this a little scary, as we momentarily drop the lock. It think
it's ok because the reference count on the page must be 0 at this point,
but actually then I think it would be clearer to have a
hyp_page_ref_init() function which could take the lock, check that the
refcount is indeed 0 and then set it to 1.

What do you think?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 12/32] KVM: arm64: Introduce a Hyp buddy page allocator
  2021-03-04 15:30     ` Will Deacon
  (?)
@ 2021-03-04 15:49       ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-04 15:49 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Thursday 04 Mar 2021 at 15:30:36 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:42PM +0000, Quentin Perret wrote:
> > +void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order)
> > +{
> > +	unsigned int i = order;
> > +	struct hyp_page *p;
> > +
> > +	hyp_spin_lock(&pool->lock);
> > +
> > +	/* Look for a high-enough-order page */
> > +	while (i < pool->max_order && list_empty(&pool->free_area[i]))
> > +		i++;
> > +	if (i >= pool->max_order) {
> > +		hyp_spin_unlock(&pool->lock);
> > +		return NULL;
> > +	}
> > +
> > +	/* Extract it from the tree at the right order */
> > +	p = list_first_entry(&pool->free_area[i], struct hyp_page, node);
> > +	p = __hyp_extract_page(pool, p, order);
> > +
> > +	hyp_spin_unlock(&pool->lock);
> > +	hyp_page_ref_inc(p);
> 
> I find this a little scary, as we momentarily drop the lock. It think
> it's ok because the reference count on the page must be 0 at this point,

Yep, @p shouldn't be visible to the caller yet so this should be fine.

> but actually then I think it would be clearer to have a
> hyp_page_ref_init() function which could take the lock, check that the
> refcount is indeed 0 and then set it to 1.

Works for me. Maybe I'll use another name for the API to stay consistent
with the kernel gfp code (hyp_page_ref_inc() and friends are inspired
from their kernel counterpart). And I guess I can hyp_panic() if the
refcount is not 0 at this point to match the VM_BUG_ON_PAGE() in
set_page_refcounted().

Thanks!
Quentin

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 12/32] KVM: arm64: Introduce a Hyp buddy page allocator
@ 2021-03-04 15:49       ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-04 15:49 UTC (permalink / raw)
  To: Will Deacon
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Thursday 04 Mar 2021 at 15:30:36 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:42PM +0000, Quentin Perret wrote:
> > +void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order)
> > +{
> > +	unsigned int i = order;
> > +	struct hyp_page *p;
> > +
> > +	hyp_spin_lock(&pool->lock);
> > +
> > +	/* Look for a high-enough-order page */
> > +	while (i < pool->max_order && list_empty(&pool->free_area[i]))
> > +		i++;
> > +	if (i >= pool->max_order) {
> > +		hyp_spin_unlock(&pool->lock);
> > +		return NULL;
> > +	}
> > +
> > +	/* Extract it from the tree at the right order */
> > +	p = list_first_entry(&pool->free_area[i], struct hyp_page, node);
> > +	p = __hyp_extract_page(pool, p, order);
> > +
> > +	hyp_spin_unlock(&pool->lock);
> > +	hyp_page_ref_inc(p);
> 
> I find this a little scary, as we momentarily drop the lock. It think
> it's ok because the reference count on the page must be 0 at this point,

Yep, @p shouldn't be visible to the caller yet so this should be fine.

> but actually then I think it would be clearer to have a
> hyp_page_ref_init() function which could take the lock, check that the
> refcount is indeed 0 and then set it to 1.

Works for me. Maybe I'll use another name for the API to stay consistent
with the kernel gfp code (hyp_page_ref_inc() and friends are inspired
from their kernel counterpart). And I guess I can hyp_panic() if the
refcount is not 0 at this point to match the VM_BUG_ON_PAGE() in
set_page_refcounted().

Thanks!
Quentin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 12/32] KVM: arm64: Introduce a Hyp buddy page allocator
@ 2021-03-04 15:49       ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-04 15:49 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Thursday 04 Mar 2021 at 15:30:36 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:42PM +0000, Quentin Perret wrote:
> > +void *hyp_alloc_pages(struct hyp_pool *pool, unsigned int order)
> > +{
> > +	unsigned int i = order;
> > +	struct hyp_page *p;
> > +
> > +	hyp_spin_lock(&pool->lock);
> > +
> > +	/* Look for a high-enough-order page */
> > +	while (i < pool->max_order && list_empty(&pool->free_area[i]))
> > +		i++;
> > +	if (i >= pool->max_order) {
> > +		hyp_spin_unlock(&pool->lock);
> > +		return NULL;
> > +	}
> > +
> > +	/* Extract it from the tree at the right order */
> > +	p = list_first_entry(&pool->free_area[i], struct hyp_page, node);
> > +	p = __hyp_extract_page(pool, p, order);
> > +
> > +	hyp_spin_unlock(&pool->lock);
> > +	hyp_page_ref_inc(p);
> 
> I find this a little scary, as we momentarily drop the lock. It think
> it's ok because the reference count on the page must be 0 at this point,

Yep, @p shouldn't be visible to the caller yet so this should be fine.

> but actually then I think it would be clearer to have a
> hyp_page_ref_init() function which could take the lock, check that the
> refcount is indeed 0 and then set it to 1.

Works for me. Maybe I'll use another name for the API to stay consistent
with the kernel gfp code (hyp_page_ref_inc() and friends are inspired
from their kernel counterpart). And I guess I can hyp_panic() if the
refcount is not 0 at this point to match the VM_BUG_ON_PAGE() in
set_page_refcounted().

Thanks!
Quentin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 15/32] KVM: arm64: Prepare the creation of s1 mappings at EL2
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-04 18:47     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 18:47 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

Hi Quentin,

On Tue, Mar 02, 2021 at 02:59:45PM +0000, Quentin Perret wrote:
> When memory protection is enabled, the EL2 code needs the ability to
> create and manage its own page-table. To do so, introduce a new set of
> hypercalls to bootstrap a memory management system at EL2.
> 
> This leads to the following boot flow in nVHE Protected mode:
> 
>  1. the host allocates memory for the hypervisor very early on, using
>     the memblock API;
> 
>  2. the host creates a set of stage 1 page-table for EL2, installs the
>     EL2 vectors, and issues the __pkvm_init hypercall;
> 
>  3. during __pkvm_init, the hypervisor re-creates its stage 1 page-table
>     and stores it in the memory pool provided by the host;
> 
>  4. the hypervisor then extends its stage 1 mappings to include a
>     vmemmap in the EL2 VA space, hence allowing to use the buddy
>     allocator introduced in a previous patch;
> 
>  5. the hypervisor jumps back in the idmap page, switches from the
>     host-provided page-table to the new one, and wraps up its
>     initialization by enabling the new allocator, before returning to
>     the host.
> 
>  6. the host can free the now unused page-table created for EL2, and
>     will now need to issue hypercalls to make changes to the EL2 stage 1
>     mappings instead of modifying them directly.
> 
> Note that for the sake of simplifying the review, this patch focuses on
> the hypervisor side of things. In other words, this only implements the
> new hypercalls, but does not make use of them from the host yet. The
> host-side changes will follow in a subsequent patch.
> 
> Credits to Will for __pkvm_init_switch_pgd.
> 
> Co-authored-by: Will Deacon <will@kernel.org>
> Signed-off-by: Will Deacon <will@kernel.org>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h     |   4 +
>  arch/arm64/include/asm/kvm_host.h    |   7 +
>  arch/arm64/include/asm/kvm_hyp.h     |   8 ++
>  arch/arm64/include/asm/kvm_pgtable.h |   2 +
>  arch/arm64/kernel/image-vars.h       |  16 +++
>  arch/arm64/kvm/hyp/Makefile          |   2 +-
>  arch/arm64/kvm/hyp/include/nvhe/mm.h |  71 ++++++++++
>  arch/arm64/kvm/hyp/nvhe/Makefile     |   4 +-
>  arch/arm64/kvm/hyp/nvhe/hyp-init.S   |  31 +++++
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  49 +++++++
>  arch/arm64/kvm/hyp/nvhe/mm.c         | 173 ++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/setup.c      | 195 +++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/pgtable.c         |   2 -
>  arch/arm64/kvm/hyp/reserved_mem.c    |  92 +++++++++++++
>  arch/arm64/mm/init.c                 |   3 +
>  15 files changed, 654 insertions(+), 5 deletions(-)

This mostly looks good to me, but in a patch this size I was bound to spot
a few niggles. It is _huge_!

> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> index c631e29fb001..bc56ea92b812 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> @@ -244,4 +244,35 @@ alternative_else_nop_endif
>  
>  SYM_CODE_END(__kvm_handle_stub_hvc)
>  
> +SYM_FUNC_START(__pkvm_init_switch_pgd)
> +	/* Turn the MMU off */
> +	pre_disable_mmu_workaround
> +	mrs	x2, sctlr_el2
> +	bic	x3, x2, #SCTLR_ELx_M
> +	msr	sctlr_el2, x3
> +	isb
> +
> +	tlbi	alle2
> +
> +	/* Install the new pgtables */
> +	ldr	x3, [x0, #NVHE_INIT_PGD_PA]
> +	phys_to_ttbr x4, x3
> +alternative_if ARM64_HAS_CNP
> +	orr	x4, x4, #TTBR_CNP_BIT
> +alternative_else_nop_endif
> +	msr	ttbr0_el2, x4
> +
> +	/* Set the new stack pointer */
> +	ldr	x0, [x0, #NVHE_INIT_STACK_HYP_VA]
> +	mov	sp, x0
> +
> +	/* And turn the MMU back on! */
> +	dsb	nsh
> +	isb
> +	msr	sctlr_el2, x2
> +	ic	iallu
> +	isb

Comparing with the new-fangled set_sctlr_el1 macro we have, this sequence
isn't quite right. Probably best to introduce set_sctlr_el2, and implement
that and the existing macro in terms of set_sctlr_elX or something like
that.

> +void __noreturn __pkvm_init_finalise(void)
> +{
> +	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
> +	struct kvm_cpu_context *host_ctxt = &host_data->host_ctxt;
> +	unsigned long nr_pages, reserved_pages, pfn;
> +	int ret;
> +
> +	/* Now that the vmemmap is backed, install the full-fledged allocator */
> +	pfn = hyp_virt_to_pfn(hyp_pgt_base);
> +	nr_pages = hyp_s1_pgtable_pages();
> +	reserved_pages = hyp_early_alloc_nr_used_pages();
> +	ret = hyp_pool_init(&hpool, pfn, nr_pages, reserved_pages);
> +	if (ret)
> +		goto out;
> +
> +	pkvm_pgtable_mm_ops.zalloc_page = hyp_zalloc_hyp_page;
> +	pkvm_pgtable_mm_ops.phys_to_virt = hyp_phys_to_virt;
> +	pkvm_pgtable_mm_ops.virt_to_phys = hyp_virt_to_phys;
> +	pkvm_pgtable_mm_ops.get_page = hyp_get_page;
> +	pkvm_pgtable_mm_ops.put_page = hyp_put_page;
> +	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;

Can you do:

	pkvm_pgtable_mm_ops = (struct kvm_pgtable_mm_ops) {
		.zalloc_page	= hyp_zalloc_hyp_page,
		.phys_to_virt	= ...,
		...
	};

here?

> +
> +out:
> +	/*
> +	 * We tail-called to here from handle___pkvm_init() and will not return,
> +	 * so make sure to propagate the return value to the host.
> +	 */
> +	cpu_reg(host_ctxt, 1) = ret;
> +
> +	__host_enter(host_ctxt);
> +}
> +
> +int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
> +		unsigned long *per_cpu_base, u32 hyp_va_bits)
> +{
> +	struct kvm_nvhe_init_params *params;
> +	void *virt = hyp_phys_to_virt(phys);
> +	void (*fn)(phys_addr_t params_pa, void *finalize_fn_va);
> +	int ret;
> +
> +	if (phys % PAGE_SIZE || size % PAGE_SIZE)
> +		return -EINVAL;

Either PAGE_ALIGNED or '& ~PAGE_MASK' would be better than spelling this
with '%', I reckon.

Anyway, other than these nits:

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 15/32] KVM: arm64: Prepare the creation of s1 mappings at EL2
@ 2021-03-04 18:47     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 18:47 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

Hi Quentin,

On Tue, Mar 02, 2021 at 02:59:45PM +0000, Quentin Perret wrote:
> When memory protection is enabled, the EL2 code needs the ability to
> create and manage its own page-table. To do so, introduce a new set of
> hypercalls to bootstrap a memory management system at EL2.
> 
> This leads to the following boot flow in nVHE Protected mode:
> 
>  1. the host allocates memory for the hypervisor very early on, using
>     the memblock API;
> 
>  2. the host creates a set of stage 1 page-table for EL2, installs the
>     EL2 vectors, and issues the __pkvm_init hypercall;
> 
>  3. during __pkvm_init, the hypervisor re-creates its stage 1 page-table
>     and stores it in the memory pool provided by the host;
> 
>  4. the hypervisor then extends its stage 1 mappings to include a
>     vmemmap in the EL2 VA space, hence allowing to use the buddy
>     allocator introduced in a previous patch;
> 
>  5. the hypervisor jumps back in the idmap page, switches from the
>     host-provided page-table to the new one, and wraps up its
>     initialization by enabling the new allocator, before returning to
>     the host.
> 
>  6. the host can free the now unused page-table created for EL2, and
>     will now need to issue hypercalls to make changes to the EL2 stage 1
>     mappings instead of modifying them directly.
> 
> Note that for the sake of simplifying the review, this patch focuses on
> the hypervisor side of things. In other words, this only implements the
> new hypercalls, but does not make use of them from the host yet. The
> host-side changes will follow in a subsequent patch.
> 
> Credits to Will for __pkvm_init_switch_pgd.
> 
> Co-authored-by: Will Deacon <will@kernel.org>
> Signed-off-by: Will Deacon <will@kernel.org>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h     |   4 +
>  arch/arm64/include/asm/kvm_host.h    |   7 +
>  arch/arm64/include/asm/kvm_hyp.h     |   8 ++
>  arch/arm64/include/asm/kvm_pgtable.h |   2 +
>  arch/arm64/kernel/image-vars.h       |  16 +++
>  arch/arm64/kvm/hyp/Makefile          |   2 +-
>  arch/arm64/kvm/hyp/include/nvhe/mm.h |  71 ++++++++++
>  arch/arm64/kvm/hyp/nvhe/Makefile     |   4 +-
>  arch/arm64/kvm/hyp/nvhe/hyp-init.S   |  31 +++++
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  49 +++++++
>  arch/arm64/kvm/hyp/nvhe/mm.c         | 173 ++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/setup.c      | 195 +++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/pgtable.c         |   2 -
>  arch/arm64/kvm/hyp/reserved_mem.c    |  92 +++++++++++++
>  arch/arm64/mm/init.c                 |   3 +
>  15 files changed, 654 insertions(+), 5 deletions(-)

This mostly looks good to me, but in a patch this size I was bound to spot
a few niggles. It is _huge_!

> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> index c631e29fb001..bc56ea92b812 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> @@ -244,4 +244,35 @@ alternative_else_nop_endif
>  
>  SYM_CODE_END(__kvm_handle_stub_hvc)
>  
> +SYM_FUNC_START(__pkvm_init_switch_pgd)
> +	/* Turn the MMU off */
> +	pre_disable_mmu_workaround
> +	mrs	x2, sctlr_el2
> +	bic	x3, x2, #SCTLR_ELx_M
> +	msr	sctlr_el2, x3
> +	isb
> +
> +	tlbi	alle2
> +
> +	/* Install the new pgtables */
> +	ldr	x3, [x0, #NVHE_INIT_PGD_PA]
> +	phys_to_ttbr x4, x3
> +alternative_if ARM64_HAS_CNP
> +	orr	x4, x4, #TTBR_CNP_BIT
> +alternative_else_nop_endif
> +	msr	ttbr0_el2, x4
> +
> +	/* Set the new stack pointer */
> +	ldr	x0, [x0, #NVHE_INIT_STACK_HYP_VA]
> +	mov	sp, x0
> +
> +	/* And turn the MMU back on! */
> +	dsb	nsh
> +	isb
> +	msr	sctlr_el2, x2
> +	ic	iallu
> +	isb

Comparing with the new-fangled set_sctlr_el1 macro we have, this sequence
isn't quite right. Probably best to introduce set_sctlr_el2, and implement
that and the existing macro in terms of set_sctlr_elX or something like
that.

> +void __noreturn __pkvm_init_finalise(void)
> +{
> +	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
> +	struct kvm_cpu_context *host_ctxt = &host_data->host_ctxt;
> +	unsigned long nr_pages, reserved_pages, pfn;
> +	int ret;
> +
> +	/* Now that the vmemmap is backed, install the full-fledged allocator */
> +	pfn = hyp_virt_to_pfn(hyp_pgt_base);
> +	nr_pages = hyp_s1_pgtable_pages();
> +	reserved_pages = hyp_early_alloc_nr_used_pages();
> +	ret = hyp_pool_init(&hpool, pfn, nr_pages, reserved_pages);
> +	if (ret)
> +		goto out;
> +
> +	pkvm_pgtable_mm_ops.zalloc_page = hyp_zalloc_hyp_page;
> +	pkvm_pgtable_mm_ops.phys_to_virt = hyp_phys_to_virt;
> +	pkvm_pgtable_mm_ops.virt_to_phys = hyp_virt_to_phys;
> +	pkvm_pgtable_mm_ops.get_page = hyp_get_page;
> +	pkvm_pgtable_mm_ops.put_page = hyp_put_page;
> +	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;

Can you do:

	pkvm_pgtable_mm_ops = (struct kvm_pgtable_mm_ops) {
		.zalloc_page	= hyp_zalloc_hyp_page,
		.phys_to_virt	= ...,
		...
	};

here?

> +
> +out:
> +	/*
> +	 * We tail-called to here from handle___pkvm_init() and will not return,
> +	 * so make sure to propagate the return value to the host.
> +	 */
> +	cpu_reg(host_ctxt, 1) = ret;
> +
> +	__host_enter(host_ctxt);
> +}
> +
> +int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
> +		unsigned long *per_cpu_base, u32 hyp_va_bits)
> +{
> +	struct kvm_nvhe_init_params *params;
> +	void *virt = hyp_phys_to_virt(phys);
> +	void (*fn)(phys_addr_t params_pa, void *finalize_fn_va);
> +	int ret;
> +
> +	if (phys % PAGE_SIZE || size % PAGE_SIZE)
> +		return -EINVAL;

Either PAGE_ALIGNED or '& ~PAGE_MASK' would be better than spelling this
with '%', I reckon.

Anyway, other than these nits:

Acked-by: Will Deacon <will@kernel.org>

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 15/32] KVM: arm64: Prepare the creation of s1 mappings at EL2
@ 2021-03-04 18:47     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 18:47 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

Hi Quentin,

On Tue, Mar 02, 2021 at 02:59:45PM +0000, Quentin Perret wrote:
> When memory protection is enabled, the EL2 code needs the ability to
> create and manage its own page-table. To do so, introduce a new set of
> hypercalls to bootstrap a memory management system at EL2.
> 
> This leads to the following boot flow in nVHE Protected mode:
> 
>  1. the host allocates memory for the hypervisor very early on, using
>     the memblock API;
> 
>  2. the host creates a set of stage 1 page-table for EL2, installs the
>     EL2 vectors, and issues the __pkvm_init hypercall;
> 
>  3. during __pkvm_init, the hypervisor re-creates its stage 1 page-table
>     and stores it in the memory pool provided by the host;
> 
>  4. the hypervisor then extends its stage 1 mappings to include a
>     vmemmap in the EL2 VA space, hence allowing to use the buddy
>     allocator introduced in a previous patch;
> 
>  5. the hypervisor jumps back in the idmap page, switches from the
>     host-provided page-table to the new one, and wraps up its
>     initialization by enabling the new allocator, before returning to
>     the host.
> 
>  6. the host can free the now unused page-table created for EL2, and
>     will now need to issue hypercalls to make changes to the EL2 stage 1
>     mappings instead of modifying them directly.
> 
> Note that for the sake of simplifying the review, this patch focuses on
> the hypervisor side of things. In other words, this only implements the
> new hypercalls, but does not make use of them from the host yet. The
> host-side changes will follow in a subsequent patch.
> 
> Credits to Will for __pkvm_init_switch_pgd.
> 
> Co-authored-by: Will Deacon <will@kernel.org>
> Signed-off-by: Will Deacon <will@kernel.org>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h     |   4 +
>  arch/arm64/include/asm/kvm_host.h    |   7 +
>  arch/arm64/include/asm/kvm_hyp.h     |   8 ++
>  arch/arm64/include/asm/kvm_pgtable.h |   2 +
>  arch/arm64/kernel/image-vars.h       |  16 +++
>  arch/arm64/kvm/hyp/Makefile          |   2 +-
>  arch/arm64/kvm/hyp/include/nvhe/mm.h |  71 ++++++++++
>  arch/arm64/kvm/hyp/nvhe/Makefile     |   4 +-
>  arch/arm64/kvm/hyp/nvhe/hyp-init.S   |  31 +++++
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  49 +++++++
>  arch/arm64/kvm/hyp/nvhe/mm.c         | 173 ++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/setup.c      | 195 +++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/pgtable.c         |   2 -
>  arch/arm64/kvm/hyp/reserved_mem.c    |  92 +++++++++++++
>  arch/arm64/mm/init.c                 |   3 +
>  15 files changed, 654 insertions(+), 5 deletions(-)

This mostly looks good to me, but in a patch this size I was bound to spot
a few niggles. It is _huge_!

> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> index c631e29fb001..bc56ea92b812 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> @@ -244,4 +244,35 @@ alternative_else_nop_endif
>  
>  SYM_CODE_END(__kvm_handle_stub_hvc)
>  
> +SYM_FUNC_START(__pkvm_init_switch_pgd)
> +	/* Turn the MMU off */
> +	pre_disable_mmu_workaround
> +	mrs	x2, sctlr_el2
> +	bic	x3, x2, #SCTLR_ELx_M
> +	msr	sctlr_el2, x3
> +	isb
> +
> +	tlbi	alle2
> +
> +	/* Install the new pgtables */
> +	ldr	x3, [x0, #NVHE_INIT_PGD_PA]
> +	phys_to_ttbr x4, x3
> +alternative_if ARM64_HAS_CNP
> +	orr	x4, x4, #TTBR_CNP_BIT
> +alternative_else_nop_endif
> +	msr	ttbr0_el2, x4
> +
> +	/* Set the new stack pointer */
> +	ldr	x0, [x0, #NVHE_INIT_STACK_HYP_VA]
> +	mov	sp, x0
> +
> +	/* And turn the MMU back on! */
> +	dsb	nsh
> +	isb
> +	msr	sctlr_el2, x2
> +	ic	iallu
> +	isb

Comparing with the new-fangled set_sctlr_el1 macro we have, this sequence
isn't quite right. Probably best to introduce set_sctlr_el2, and implement
that and the existing macro in terms of set_sctlr_elX or something like
that.

> +void __noreturn __pkvm_init_finalise(void)
> +{
> +	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
> +	struct kvm_cpu_context *host_ctxt = &host_data->host_ctxt;
> +	unsigned long nr_pages, reserved_pages, pfn;
> +	int ret;
> +
> +	/* Now that the vmemmap is backed, install the full-fledged allocator */
> +	pfn = hyp_virt_to_pfn(hyp_pgt_base);
> +	nr_pages = hyp_s1_pgtable_pages();
> +	reserved_pages = hyp_early_alloc_nr_used_pages();
> +	ret = hyp_pool_init(&hpool, pfn, nr_pages, reserved_pages);
> +	if (ret)
> +		goto out;
> +
> +	pkvm_pgtable_mm_ops.zalloc_page = hyp_zalloc_hyp_page;
> +	pkvm_pgtable_mm_ops.phys_to_virt = hyp_phys_to_virt;
> +	pkvm_pgtable_mm_ops.virt_to_phys = hyp_virt_to_phys;
> +	pkvm_pgtable_mm_ops.get_page = hyp_get_page;
> +	pkvm_pgtable_mm_ops.put_page = hyp_put_page;
> +	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;

Can you do:

	pkvm_pgtable_mm_ops = (struct kvm_pgtable_mm_ops) {
		.zalloc_page	= hyp_zalloc_hyp_page,
		.phys_to_virt	= ...,
		...
	};

here?

> +
> +out:
> +	/*
> +	 * We tail-called to here from handle___pkvm_init() and will not return,
> +	 * so make sure to propagate the return value to the host.
> +	 */
> +	cpu_reg(host_ctxt, 1) = ret;
> +
> +	__host_enter(host_ctxt);
> +}
> +
> +int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
> +		unsigned long *per_cpu_base, u32 hyp_va_bits)
> +{
> +	struct kvm_nvhe_init_params *params;
> +	void *virt = hyp_phys_to_virt(phys);
> +	void (*fn)(phys_addr_t params_pa, void *finalize_fn_va);
> +	int ret;
> +
> +	if (phys % PAGE_SIZE || size % PAGE_SIZE)
> +		return -EINVAL;

Either PAGE_ALIGNED or '& ~PAGE_MASK' would be better than spelling this
with '%', I reckon.

Anyway, other than these nits:

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 16/32] KVM: arm64: Elevate hypervisor mappings creation at EL2
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-04 19:25     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:25 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:46PM +0000, Quentin Perret wrote:
> Previous commits have introduced infrastructure to enable the EL2 code
> to manage its own stage 1 mappings. However, this was preliminary work,
> and none of it is currently in use.
> 
> Put all of this together by elevating the mapping creation at EL2 when
> memory protection is enabled. In this case, the host kernel running
> at EL1 still creates _temporary_ EL2 mappings, only used while
> initializing the hypervisor, but frees them right after.
> 
> As such, all calls to create_hyp_mappings() after kvm init has finished
> turn into hypercalls, as the host now has no 'legal' way to modify the
> hypevisor page tables directly.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_mmu.h |  2 +-
>  arch/arm64/kvm/arm.c             | 87 +++++++++++++++++++++++++++++---
>  arch/arm64/kvm/mmu.c             | 43 ++++++++++++++--
>  3 files changed, 120 insertions(+), 12 deletions(-)

[...]

> @@ -1489,13 +1497,14 @@ static void cpu_hyp_reinit(void)
>  	kvm_init_host_cpu_context(&this_cpu_ptr_hyp_sym(kvm_host_data)->host_ctxt);
>  
>  	cpu_hyp_reset();
> -	cpu_set_hyp_vector();
>  
>  	if (is_kernel_in_hyp_mode())
>  		kvm_timer_init_vhe();
>  	else
>  		cpu_init_hyp_mode();
>  
> +	cpu_set_hyp_vector();
> +
>  	kvm_arm_init_debug();
>  
>  	if (vgic_present)
> @@ -1691,18 +1700,59 @@ static void teardown_hyp_mode(void)
>  	}
>  }
>  
> +static int do_pkvm_init(u32 hyp_va_bits)
> +{
> +	void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
> +	int ret;
> +
> +	preempt_disable();
> +	hyp_install_host_vector();

It's a shame we need this both here _and_ on the reinit path, but it looks
like it's necessary.

> +	ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size,
> +				num_possible_cpus(), kern_hyp_va(per_cpu_base),
> +				hyp_va_bits);
> +	preempt_enable();
> +
> +	return ret;
> +}

[...]

>  /**
>   * Inits Hyp-mode on all online CPUs
>   */
>  static int init_hyp_mode(void)
>  {
> +	u32 hyp_va_bits;
>  	int cpu;
> -	int err = 0;
> +	int err = -ENOMEM;
> +
> +	/*
> +	 * The protected Hyp-mode cannot be initialized if the memory pool
> +	 * allocation has failed.
> +	 */
> +	if (is_protected_kvm_enabled() && !hyp_mem_base)
> +		return err;

This skips the error message you get on the out_err path.

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 4d41d7838d53..9d331bf262d2 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -221,15 +221,39 @@ void free_hyp_pgds(void)
>  	if (hyp_pgtable) {
>  		kvm_pgtable_hyp_destroy(hyp_pgtable);
>  		kfree(hyp_pgtable);
> +		hyp_pgtable = NULL;
>  	}
>  	mutex_unlock(&kvm_hyp_pgd_mutex);
>  }
>  
> +static bool kvm_host_owns_hyp_mappings(void)
> +{
> +	if (static_branch_likely(&kvm_protected_mode_initialized))
> +		return false;
> +
> +	/*
> +	 * This can happen at boot time when __create_hyp_mappings() is called
> +	 * after the hyp protection has been enabled, but the static key has
> +	 * not been flipped yet.
> +	 */
> +	if (!hyp_pgtable && is_protected_kvm_enabled())
> +		return false;
> +
> +	WARN_ON(!hyp_pgtable);
> +
> +	return true;

	return !(WARN_ON(!hyp_pgtable) && is_protected_kvm_enabled());

?

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 16/32] KVM: arm64: Elevate hypervisor mappings creation at EL2
@ 2021-03-04 19:25     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:25 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:46PM +0000, Quentin Perret wrote:
> Previous commits have introduced infrastructure to enable the EL2 code
> to manage its own stage 1 mappings. However, this was preliminary work,
> and none of it is currently in use.
> 
> Put all of this together by elevating the mapping creation at EL2 when
> memory protection is enabled. In this case, the host kernel running
> at EL1 still creates _temporary_ EL2 mappings, only used while
> initializing the hypervisor, but frees them right after.
> 
> As such, all calls to create_hyp_mappings() after kvm init has finished
> turn into hypercalls, as the host now has no 'legal' way to modify the
> hypevisor page tables directly.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_mmu.h |  2 +-
>  arch/arm64/kvm/arm.c             | 87 +++++++++++++++++++++++++++++---
>  arch/arm64/kvm/mmu.c             | 43 ++++++++++++++--
>  3 files changed, 120 insertions(+), 12 deletions(-)

[...]

> @@ -1489,13 +1497,14 @@ static void cpu_hyp_reinit(void)
>  	kvm_init_host_cpu_context(&this_cpu_ptr_hyp_sym(kvm_host_data)->host_ctxt);
>  
>  	cpu_hyp_reset();
> -	cpu_set_hyp_vector();
>  
>  	if (is_kernel_in_hyp_mode())
>  		kvm_timer_init_vhe();
>  	else
>  		cpu_init_hyp_mode();
>  
> +	cpu_set_hyp_vector();
> +
>  	kvm_arm_init_debug();
>  
>  	if (vgic_present)
> @@ -1691,18 +1700,59 @@ static void teardown_hyp_mode(void)
>  	}
>  }
>  
> +static int do_pkvm_init(u32 hyp_va_bits)
> +{
> +	void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
> +	int ret;
> +
> +	preempt_disable();
> +	hyp_install_host_vector();

It's a shame we need this both here _and_ on the reinit path, but it looks
like it's necessary.

> +	ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size,
> +				num_possible_cpus(), kern_hyp_va(per_cpu_base),
> +				hyp_va_bits);
> +	preempt_enable();
> +
> +	return ret;
> +}

[...]

>  /**
>   * Inits Hyp-mode on all online CPUs
>   */
>  static int init_hyp_mode(void)
>  {
> +	u32 hyp_va_bits;
>  	int cpu;
> -	int err = 0;
> +	int err = -ENOMEM;
> +
> +	/*
> +	 * The protected Hyp-mode cannot be initialized if the memory pool
> +	 * allocation has failed.
> +	 */
> +	if (is_protected_kvm_enabled() && !hyp_mem_base)
> +		return err;

This skips the error message you get on the out_err path.

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 4d41d7838d53..9d331bf262d2 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -221,15 +221,39 @@ void free_hyp_pgds(void)
>  	if (hyp_pgtable) {
>  		kvm_pgtable_hyp_destroy(hyp_pgtable);
>  		kfree(hyp_pgtable);
> +		hyp_pgtable = NULL;
>  	}
>  	mutex_unlock(&kvm_hyp_pgd_mutex);
>  }
>  
> +static bool kvm_host_owns_hyp_mappings(void)
> +{
> +	if (static_branch_likely(&kvm_protected_mode_initialized))
> +		return false;
> +
> +	/*
> +	 * This can happen at boot time when __create_hyp_mappings() is called
> +	 * after the hyp protection has been enabled, but the static key has
> +	 * not been flipped yet.
> +	 */
> +	if (!hyp_pgtable && is_protected_kvm_enabled())
> +		return false;
> +
> +	WARN_ON(!hyp_pgtable);
> +
> +	return true;

	return !(WARN_ON(!hyp_pgtable) && is_protected_kvm_enabled());

?

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 16/32] KVM: arm64: Elevate hypervisor mappings creation at EL2
@ 2021-03-04 19:25     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:25 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:46PM +0000, Quentin Perret wrote:
> Previous commits have introduced infrastructure to enable the EL2 code
> to manage its own stage 1 mappings. However, this was preliminary work,
> and none of it is currently in use.
> 
> Put all of this together by elevating the mapping creation at EL2 when
> memory protection is enabled. In this case, the host kernel running
> at EL1 still creates _temporary_ EL2 mappings, only used while
> initializing the hypervisor, but frees them right after.
> 
> As such, all calls to create_hyp_mappings() after kvm init has finished
> turn into hypercalls, as the host now has no 'legal' way to modify the
> hypevisor page tables directly.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_mmu.h |  2 +-
>  arch/arm64/kvm/arm.c             | 87 +++++++++++++++++++++++++++++---
>  arch/arm64/kvm/mmu.c             | 43 ++++++++++++++--
>  3 files changed, 120 insertions(+), 12 deletions(-)

[...]

> @@ -1489,13 +1497,14 @@ static void cpu_hyp_reinit(void)
>  	kvm_init_host_cpu_context(&this_cpu_ptr_hyp_sym(kvm_host_data)->host_ctxt);
>  
>  	cpu_hyp_reset();
> -	cpu_set_hyp_vector();
>  
>  	if (is_kernel_in_hyp_mode())
>  		kvm_timer_init_vhe();
>  	else
>  		cpu_init_hyp_mode();
>  
> +	cpu_set_hyp_vector();
> +
>  	kvm_arm_init_debug();
>  
>  	if (vgic_present)
> @@ -1691,18 +1700,59 @@ static void teardown_hyp_mode(void)
>  	}
>  }
>  
> +static int do_pkvm_init(u32 hyp_va_bits)
> +{
> +	void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
> +	int ret;
> +
> +	preempt_disable();
> +	hyp_install_host_vector();

It's a shame we need this both here _and_ on the reinit path, but it looks
like it's necessary.

> +	ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size,
> +				num_possible_cpus(), kern_hyp_va(per_cpu_base),
> +				hyp_va_bits);
> +	preempt_enable();
> +
> +	return ret;
> +}

[...]

>  /**
>   * Inits Hyp-mode on all online CPUs
>   */
>  static int init_hyp_mode(void)
>  {
> +	u32 hyp_va_bits;
>  	int cpu;
> -	int err = 0;
> +	int err = -ENOMEM;
> +
> +	/*
> +	 * The protected Hyp-mode cannot be initialized if the memory pool
> +	 * allocation has failed.
> +	 */
> +	if (is_protected_kvm_enabled() && !hyp_mem_base)
> +		return err;

This skips the error message you get on the out_err path.

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 4d41d7838d53..9d331bf262d2 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -221,15 +221,39 @@ void free_hyp_pgds(void)
>  	if (hyp_pgtable) {
>  		kvm_pgtable_hyp_destroy(hyp_pgtable);
>  		kfree(hyp_pgtable);
> +		hyp_pgtable = NULL;
>  	}
>  	mutex_unlock(&kvm_hyp_pgd_mutex);
>  }
>  
> +static bool kvm_host_owns_hyp_mappings(void)
> +{
> +	if (static_branch_likely(&kvm_protected_mode_initialized))
> +		return false;
> +
> +	/*
> +	 * This can happen at boot time when __create_hyp_mappings() is called
> +	 * after the hyp protection has been enabled, but the static key has
> +	 * not been flipped yet.
> +	 */
> +	if (!hyp_pgtable && is_protected_kvm_enabled())
> +		return false;
> +
> +	WARN_ON(!hyp_pgtable);
> +
> +	return true;

	return !(WARN_ON(!hyp_pgtable) && is_protected_kvm_enabled());

?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 20/32] KVM: arm64: Refactor kvm_arm_setup_stage2()
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-04 19:35     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:35 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:50PM +0000, Quentin Perret wrote:
> In order to re-use some of the stage 2 setup code at EL2, factor parts
> of kvm_arm_setup_stage2() out into separate functions.
> 
> No functional change intended.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 26 +++++++++++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 32 +++++++++++++++++++++
>  arch/arm64/kvm/reset.c               | 42 +++-------------------------
>  3 files changed, 62 insertions(+), 38 deletions(-)

Looks much better than the big header in v2, thanks!

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 20/32] KVM: arm64: Refactor kvm_arm_setup_stage2()
@ 2021-03-04 19:35     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:35 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:50PM +0000, Quentin Perret wrote:
> In order to re-use some of the stage 2 setup code at EL2, factor parts
> of kvm_arm_setup_stage2() out into separate functions.
> 
> No functional change intended.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 26 +++++++++++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 32 +++++++++++++++++++++
>  arch/arm64/kvm/reset.c               | 42 +++-------------------------
>  3 files changed, 62 insertions(+), 38 deletions(-)

Looks much better than the big header in v2, thanks!

Acked-by: Will Deacon <will@kernel.org>

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 20/32] KVM: arm64: Refactor kvm_arm_setup_stage2()
@ 2021-03-04 19:35     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:35 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:50PM +0000, Quentin Perret wrote:
> In order to re-use some of the stage 2 setup code at EL2, factor parts
> of kvm_arm_setup_stage2() out into separate functions.
> 
> No functional change intended.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 26 +++++++++++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 32 +++++++++++++++++++++
>  arch/arm64/kvm/reset.c               | 42 +++-------------------------
>  3 files changed, 62 insertions(+), 38 deletions(-)

Looks much better than the big header in v2, thanks!

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 22/32] KVM: arm64: Refactor __populate_fault_info()
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-04 19:39     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:39 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:52PM +0000, Quentin Perret wrote:
> Refactor __populate_fault_info() to introduce __get_fault_info() which
> will be used once the host is wrapped in a stage 2.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/hyp/switch.h | 37 ++++++++++++++-----------
>  1 file changed, 21 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index 6c1f51f25eb3..1f017c9851bb 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -160,19 +160,9 @@ static inline bool __translate_far_to_hpfar(u64 far, u64 *hpfar)
>  	return true;
>  }
>  
> -static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
> +static inline bool __get_fault_info(u64 esr, struct kvm_vcpu_fault_info *fault)
>  {
> -	u8 ec;
> -	u64 esr;
> -	u64 hpfar, far;
> -
> -	esr = vcpu->arch.fault.esr_el2;
> -	ec = ESR_ELx_EC(esr);
> -
> -	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
> -		return true;
> -
> -	far = read_sysreg_el2(SYS_FAR);
> +	fault->far_el2 = read_sysreg_el2(SYS_FAR);
>  
>  	/*
>  	 * The HPFAR can be invalid if the stage 2 fault did not
> @@ -188,14 +178,29 @@ static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
>  	if (!(esr & ESR_ELx_S1PTW) &&
>  	    (cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
>  	     (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
> -		if (!__translate_far_to_hpfar(far, &hpfar))
> +		if (!__translate_far_to_hpfar(fault->far_el2, &fault->hpfar_el2))
>  			return false;
>  	} else {
> -		hpfar = read_sysreg(hpfar_el2);
> +		fault->hpfar_el2 = read_sysreg(hpfar_el2);
>  	}
>  
> -	vcpu->arch.fault.far_el2 = far;
> -	vcpu->arch.fault.hpfar_el2 = hpfar;
> +	return true;
> +}
> +
> +static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
> +{
> +	u8 ec;
> +	u64 esr;
> +
> +	esr = vcpu->arch.fault.esr_el2;
> +	ec = ESR_ELx_EC(esr);
> +
> +	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
> +		return true;
> +
> +	if (!__get_fault_info(esr, &vcpu->arch.fault))
> +		return false;
> +
>  	return true;

Just return __get_fault_info(esr, &vcpu->arch.fault); here.

With that:

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 22/32] KVM: arm64: Refactor __populate_fault_info()
@ 2021-03-04 19:39     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:39 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:52PM +0000, Quentin Perret wrote:
> Refactor __populate_fault_info() to introduce __get_fault_info() which
> will be used once the host is wrapped in a stage 2.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/hyp/switch.h | 37 ++++++++++++++-----------
>  1 file changed, 21 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index 6c1f51f25eb3..1f017c9851bb 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -160,19 +160,9 @@ static inline bool __translate_far_to_hpfar(u64 far, u64 *hpfar)
>  	return true;
>  }
>  
> -static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
> +static inline bool __get_fault_info(u64 esr, struct kvm_vcpu_fault_info *fault)
>  {
> -	u8 ec;
> -	u64 esr;
> -	u64 hpfar, far;
> -
> -	esr = vcpu->arch.fault.esr_el2;
> -	ec = ESR_ELx_EC(esr);
> -
> -	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
> -		return true;
> -
> -	far = read_sysreg_el2(SYS_FAR);
> +	fault->far_el2 = read_sysreg_el2(SYS_FAR);
>  
>  	/*
>  	 * The HPFAR can be invalid if the stage 2 fault did not
> @@ -188,14 +178,29 @@ static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
>  	if (!(esr & ESR_ELx_S1PTW) &&
>  	    (cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
>  	     (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
> -		if (!__translate_far_to_hpfar(far, &hpfar))
> +		if (!__translate_far_to_hpfar(fault->far_el2, &fault->hpfar_el2))
>  			return false;
>  	} else {
> -		hpfar = read_sysreg(hpfar_el2);
> +		fault->hpfar_el2 = read_sysreg(hpfar_el2);
>  	}
>  
> -	vcpu->arch.fault.far_el2 = far;
> -	vcpu->arch.fault.hpfar_el2 = hpfar;
> +	return true;
> +}
> +
> +static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
> +{
> +	u8 ec;
> +	u64 esr;
> +
> +	esr = vcpu->arch.fault.esr_el2;
> +	ec = ESR_ELx_EC(esr);
> +
> +	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
> +		return true;
> +
> +	if (!__get_fault_info(esr, &vcpu->arch.fault))
> +		return false;
> +
>  	return true;

Just return __get_fault_info(esr, &vcpu->arch.fault); here.

With that:

Acked-by: Will Deacon <will@kernel.org>

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 22/32] KVM: arm64: Refactor __populate_fault_info()
@ 2021-03-04 19:39     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:39 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:52PM +0000, Quentin Perret wrote:
> Refactor __populate_fault_info() to introduce __get_fault_info() which
> will be used once the host is wrapped in a stage 2.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/hyp/switch.h | 37 ++++++++++++++-----------
>  1 file changed, 21 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index 6c1f51f25eb3..1f017c9851bb 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -160,19 +160,9 @@ static inline bool __translate_far_to_hpfar(u64 far, u64 *hpfar)
>  	return true;
>  }
>  
> -static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
> +static inline bool __get_fault_info(u64 esr, struct kvm_vcpu_fault_info *fault)
>  {
> -	u8 ec;
> -	u64 esr;
> -	u64 hpfar, far;
> -
> -	esr = vcpu->arch.fault.esr_el2;
> -	ec = ESR_ELx_EC(esr);
> -
> -	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
> -		return true;
> -
> -	far = read_sysreg_el2(SYS_FAR);
> +	fault->far_el2 = read_sysreg_el2(SYS_FAR);
>  
>  	/*
>  	 * The HPFAR can be invalid if the stage 2 fault did not
> @@ -188,14 +178,29 @@ static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
>  	if (!(esr & ESR_ELx_S1PTW) &&
>  	    (cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
>  	     (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
> -		if (!__translate_far_to_hpfar(far, &hpfar))
> +		if (!__translate_far_to_hpfar(fault->far_el2, &fault->hpfar_el2))
>  			return false;
>  	} else {
> -		hpfar = read_sysreg(hpfar_el2);
> +		fault->hpfar_el2 = read_sysreg(hpfar_el2);
>  	}
>  
> -	vcpu->arch.fault.far_el2 = far;
> -	vcpu->arch.fault.hpfar_el2 = hpfar;
> +	return true;
> +}
> +
> +static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
> +{
> +	u8 ec;
> +	u64 esr;
> +
> +	esr = vcpu->arch.fault.esr_el2;
> +	ec = ESR_ELx_EC(esr);
> +
> +	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
> +		return true;
> +
> +	if (!__get_fault_info(esr, &vcpu->arch.fault))
> +		return false;
> +
>  	return true;

Just return __get_fault_info(esr, &vcpu->arch.fault); here.

With that:

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 23/32] KVM: arm64: Make memcache anonymous in pgtable allocator
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-04 19:44     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:44 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:53PM +0000, Quentin Perret wrote:
> The current stage2 page-table allocator uses a memcache to get
> pre-allocated pages when it needs any. To allow re-using this code at
> EL2 which uses a concept of memory pools, make the memcache argument of
> kvm_pgtable_stage2_map() anonymous, and let the mm_ops zalloc_page()
> callbacks use it the way they need to.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 6 +++---
>  arch/arm64/kvm/hyp/pgtable.c         | 4 ++--
>  2 files changed, 5 insertions(+), 5 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 23/32] KVM: arm64: Make memcache anonymous in pgtable allocator
@ 2021-03-04 19:44     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:44 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:53PM +0000, Quentin Perret wrote:
> The current stage2 page-table allocator uses a memcache to get
> pre-allocated pages when it needs any. To allow re-using this code at
> EL2 which uses a concept of memory pools, make the memcache argument of
> kvm_pgtable_stage2_map() anonymous, and let the mm_ops zalloc_page()
> callbacks use it the way they need to.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 6 +++---
>  arch/arm64/kvm/hyp/pgtable.c         | 4 ++--
>  2 files changed, 5 insertions(+), 5 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 23/32] KVM: arm64: Make memcache anonymous in pgtable allocator
@ 2021-03-04 19:44     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:44 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:53PM +0000, Quentin Perret wrote:
> The current stage2 page-table allocator uses a memcache to get
> pre-allocated pages when it needs any. To allow re-using this code at
> EL2 which uses a concept of memory pools, make the memcache argument of
> kvm_pgtable_stage2_map() anonymous, and let the mm_ops zalloc_page()
> callbacks use it the way they need to.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 6 +++---
>  arch/arm64/kvm/hyp/pgtable.c         | 4 ++--
>  2 files changed, 5 insertions(+), 5 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 24/32] KVM: arm64: Reserve memory for host stage 2
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-04 19:49     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:49 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:54PM +0000, Quentin Perret wrote:
> Extend the memory pool allocated for the hypervisor to include enough
> pages to map all of memory at page granularity for the host stage 2.
> While at it, also reserve some memory for device mappings.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mm.h | 23 ++++++++++++++++++++++-
>  arch/arm64/kvm/hyp/nvhe/setup.c      | 12 ++++++++++++
>  arch/arm64/kvm/hyp/reserved_mem.c    |  2 ++
>  3 files changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> index ac0f7fcffd08..411a35db949c 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> @@ -53,7 +53,7 @@ static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
>  	return total;
>  }
>  
> -static inline unsigned long hyp_s1_pgtable_pages(void)
> +static inline unsigned long __hyp_pgtable_total_pages(void)
>  {
>  	unsigned long res = 0, i;
>  
> @@ -63,9 +63,30 @@ static inline unsigned long hyp_s1_pgtable_pages(void)
>  		res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT);
>  	}
>  
> +	return res;
> +}
> +
> +static inline unsigned long hyp_s1_pgtable_pages(void)
> +{
> +	unsigned long res;
> +
> +	res = __hyp_pgtable_total_pages();
> +
>  	/* Allow 1 GiB for private mappings */
>  	res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
>  
>  	return res;
>  }
> +
> +static inline unsigned long host_s2_mem_pgtable_pages(void)
> +{
> +	return __hyp_pgtable_total_pages() + 16;

Is this 16 due to the possibility of a concatenated pgd? If so, please add
a comment to that effect.

With that:

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 24/32] KVM: arm64: Reserve memory for host stage 2
@ 2021-03-04 19:49     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:49 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:54PM +0000, Quentin Perret wrote:
> Extend the memory pool allocated for the hypervisor to include enough
> pages to map all of memory at page granularity for the host stage 2.
> While at it, also reserve some memory for device mappings.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mm.h | 23 ++++++++++++++++++++++-
>  arch/arm64/kvm/hyp/nvhe/setup.c      | 12 ++++++++++++
>  arch/arm64/kvm/hyp/reserved_mem.c    |  2 ++
>  3 files changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> index ac0f7fcffd08..411a35db949c 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> @@ -53,7 +53,7 @@ static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
>  	return total;
>  }
>  
> -static inline unsigned long hyp_s1_pgtable_pages(void)
> +static inline unsigned long __hyp_pgtable_total_pages(void)
>  {
>  	unsigned long res = 0, i;
>  
> @@ -63,9 +63,30 @@ static inline unsigned long hyp_s1_pgtable_pages(void)
>  		res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT);
>  	}
>  
> +	return res;
> +}
> +
> +static inline unsigned long hyp_s1_pgtable_pages(void)
> +{
> +	unsigned long res;
> +
> +	res = __hyp_pgtable_total_pages();
> +
>  	/* Allow 1 GiB for private mappings */
>  	res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
>  
>  	return res;
>  }
> +
> +static inline unsigned long host_s2_mem_pgtable_pages(void)
> +{
> +	return __hyp_pgtable_total_pages() + 16;

Is this 16 due to the possibility of a concatenated pgd? If so, please add
a comment to that effect.

With that:

Acked-by: Will Deacon <will@kernel.org>

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 24/32] KVM: arm64: Reserve memory for host stage 2
@ 2021-03-04 19:49     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:49 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:54PM +0000, Quentin Perret wrote:
> Extend the memory pool allocated for the hypervisor to include enough
> pages to map all of memory at page granularity for the host stage 2.
> While at it, also reserve some memory for device mappings.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mm.h | 23 ++++++++++++++++++++++-
>  arch/arm64/kvm/hyp/nvhe/setup.c      | 12 ++++++++++++
>  arch/arm64/kvm/hyp/reserved_mem.c    |  2 ++
>  3 files changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> index ac0f7fcffd08..411a35db949c 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> @@ -53,7 +53,7 @@ static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
>  	return total;
>  }
>  
> -static inline unsigned long hyp_s1_pgtable_pages(void)
> +static inline unsigned long __hyp_pgtable_total_pages(void)
>  {
>  	unsigned long res = 0, i;
>  
> @@ -63,9 +63,30 @@ static inline unsigned long hyp_s1_pgtable_pages(void)
>  		res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT);
>  	}
>  
> +	return res;
> +}
> +
> +static inline unsigned long hyp_s1_pgtable_pages(void)
> +{
> +	unsigned long res;
> +
> +	res = __hyp_pgtable_total_pages();
> +
>  	/* Allow 1 GiB for private mappings */
>  	res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
>  
>  	return res;
>  }
> +
> +static inline unsigned long host_s2_mem_pgtable_pages(void)
> +{
> +	return __hyp_pgtable_total_pages() + 16;

Is this 16 due to the possibility of a concatenated pgd? If so, please add
a comment to that effect.

With that:

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 25/32] KVM: arm64: Sort the hypervisor memblocks
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-04 19:51     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:51 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:55PM +0000, Quentin Perret wrote:
> We will soon need to check if a Physical Address belongs to a memblock
> at EL2, so make sure to sort them so this can be done efficiently.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/reserved_mem.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 25/32] KVM: arm64: Sort the hypervisor memblocks
@ 2021-03-04 19:51     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:51 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:55PM +0000, Quentin Perret wrote:
> We will soon need to check if a Physical Address belongs to a memblock
> at EL2, so make sure to sort them so this can be done efficiently.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/reserved_mem.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)

Acked-by: Will Deacon <will@kernel.org>

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 25/32] KVM: arm64: Sort the hypervisor memblocks
@ 2021-03-04 19:51     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 19:51 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:55PM +0000, Quentin Perret wrote:
> We will soon need to check if a Physical Address belongs to a memblock
> at EL2, so make sure to sort them so this can be done efficiently.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/reserved_mem.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-04 20:00     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 20:00 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:56PM +0000, Quentin Perret wrote:
> Once we start unmapping portions of memory from the host stage 2 (such
> as e.g. the hypervisor memory sections, or pages that belong to
> protected guests), we will need a way to track page ownership. And
> given that all mappings in the host stage 2 will be identity-mapped, we
> can use the host stage 2 page-table itself as a simplistic rmap.
> 
> As a first step towards this, introduce a new protection attribute
> in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which
> allows to annotate portions of the IPA space as inaccessible. For
> simplicity, PROT_NONE mappings are created as invalid mappings with a
> software bit set.

Just as an observation, but given that they're invalid we can use any bit
from [63:2] to indicate that it's a PROT_NONE mapping, and that way we
can keep the real "software bits" for live mappings.

But we can of course change that later when we need the bit for something
else.

> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  2 ++
>  arch/arm64/kvm/hyp/pgtable.c         | 26 ++++++++++++++++++++++++--
>  2 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 9935dbae2cc1..c9f6ed76e0ad 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -80,6 +80,7 @@ struct kvm_pgtable {
>   * @KVM_PGTABLE_PROT_W:		Write permission.
>   * @KVM_PGTABLE_PROT_R:		Read permission.
>   * @KVM_PGTABLE_PROT_DEVICE:	Device attributes.
> + * @KVM_PGTABLE_PROT_NONE:	No permission.
>   */
>  enum kvm_pgtable_prot {
>  	KVM_PGTABLE_PROT_X			= BIT(0),
> @@ -87,6 +88,7 @@ enum kvm_pgtable_prot {
>  	KVM_PGTABLE_PROT_R			= BIT(2),
>  
>  	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
> +	KVM_PGTABLE_PROT_NONE			= BIT(4),

Why do we need an extra entry here? Couldn't we just create PROT_NONE
entries when none of R,W or X are set?

>  };
>  
>  #define PAGE_HYP		(KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index bdd6e3d4eeb6..8e7059fcfd40 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -48,6 +48,8 @@
>  					 KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
>  					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
>  
> +#define KVM_PTE_LEAF_SW_BIT_PROT_NONE	BIT(55)
> +
>  struct kvm_pgtable_walk_data {
>  	struct kvm_pgtable		*pgt;
>  	struct kvm_pgtable_walker	*walker;
> @@ -120,6 +122,16 @@ static bool kvm_pte_valid(kvm_pte_t pte)
>  	return pte & KVM_PTE_VALID;
>  }
>  
> +static bool kvm_pte_prot_none(kvm_pte_t pte)
> +{
> +	return pte & KVM_PTE_LEAF_SW_BIT_PROT_NONE;
> +}

I think it would be a good idea to check !kvm_pte_valid() in here too,
since it doesn't make sense to report true for valid (or table) entries.

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2
@ 2021-03-04 20:00     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 20:00 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:56PM +0000, Quentin Perret wrote:
> Once we start unmapping portions of memory from the host stage 2 (such
> as e.g. the hypervisor memory sections, or pages that belong to
> protected guests), we will need a way to track page ownership. And
> given that all mappings in the host stage 2 will be identity-mapped, we
> can use the host stage 2 page-table itself as a simplistic rmap.
> 
> As a first step towards this, introduce a new protection attribute
> in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which
> allows to annotate portions of the IPA space as inaccessible. For
> simplicity, PROT_NONE mappings are created as invalid mappings with a
> software bit set.

Just as an observation, but given that they're invalid we can use any bit
from [63:2] to indicate that it's a PROT_NONE mapping, and that way we
can keep the real "software bits" for live mappings.

But we can of course change that later when we need the bit for something
else.

> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  2 ++
>  arch/arm64/kvm/hyp/pgtable.c         | 26 ++++++++++++++++++++++++--
>  2 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 9935dbae2cc1..c9f6ed76e0ad 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -80,6 +80,7 @@ struct kvm_pgtable {
>   * @KVM_PGTABLE_PROT_W:		Write permission.
>   * @KVM_PGTABLE_PROT_R:		Read permission.
>   * @KVM_PGTABLE_PROT_DEVICE:	Device attributes.
> + * @KVM_PGTABLE_PROT_NONE:	No permission.
>   */
>  enum kvm_pgtable_prot {
>  	KVM_PGTABLE_PROT_X			= BIT(0),
> @@ -87,6 +88,7 @@ enum kvm_pgtable_prot {
>  	KVM_PGTABLE_PROT_R			= BIT(2),
>  
>  	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
> +	KVM_PGTABLE_PROT_NONE			= BIT(4),

Why do we need an extra entry here? Couldn't we just create PROT_NONE
entries when none of R,W or X are set?

>  };
>  
>  #define PAGE_HYP		(KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index bdd6e3d4eeb6..8e7059fcfd40 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -48,6 +48,8 @@
>  					 KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
>  					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
>  
> +#define KVM_PTE_LEAF_SW_BIT_PROT_NONE	BIT(55)
> +
>  struct kvm_pgtable_walk_data {
>  	struct kvm_pgtable		*pgt;
>  	struct kvm_pgtable_walker	*walker;
> @@ -120,6 +122,16 @@ static bool kvm_pte_valid(kvm_pte_t pte)
>  	return pte & KVM_PTE_VALID;
>  }
>  
> +static bool kvm_pte_prot_none(kvm_pte_t pte)
> +{
> +	return pte & KVM_PTE_LEAF_SW_BIT_PROT_NONE;
> +}

I think it would be a good idea to check !kvm_pte_valid() in here too,
since it doesn't make sense to report true for valid (or table) entries.

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2
@ 2021-03-04 20:00     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 20:00 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:56PM +0000, Quentin Perret wrote:
> Once we start unmapping portions of memory from the host stage 2 (such
> as e.g. the hypervisor memory sections, or pages that belong to
> protected guests), we will need a way to track page ownership. And
> given that all mappings in the host stage 2 will be identity-mapped, we
> can use the host stage 2 page-table itself as a simplistic rmap.
> 
> As a first step towards this, introduce a new protection attribute
> in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which
> allows to annotate portions of the IPA space as inaccessible. For
> simplicity, PROT_NONE mappings are created as invalid mappings with a
> software bit set.

Just as an observation, but given that they're invalid we can use any bit
from [63:2] to indicate that it's a PROT_NONE mapping, and that way we
can keep the real "software bits" for live mappings.

But we can of course change that later when we need the bit for something
else.

> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  2 ++
>  arch/arm64/kvm/hyp/pgtable.c         | 26 ++++++++++++++++++++++++--
>  2 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 9935dbae2cc1..c9f6ed76e0ad 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -80,6 +80,7 @@ struct kvm_pgtable {
>   * @KVM_PGTABLE_PROT_W:		Write permission.
>   * @KVM_PGTABLE_PROT_R:		Read permission.
>   * @KVM_PGTABLE_PROT_DEVICE:	Device attributes.
> + * @KVM_PGTABLE_PROT_NONE:	No permission.
>   */
>  enum kvm_pgtable_prot {
>  	KVM_PGTABLE_PROT_X			= BIT(0),
> @@ -87,6 +88,7 @@ enum kvm_pgtable_prot {
>  	KVM_PGTABLE_PROT_R			= BIT(2),
>  
>  	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
> +	KVM_PGTABLE_PROT_NONE			= BIT(4),

Why do we need an extra entry here? Couldn't we just create PROT_NONE
entries when none of R,W or X are set?

>  };
>  
>  #define PAGE_HYP		(KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index bdd6e3d4eeb6..8e7059fcfd40 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -48,6 +48,8 @@
>  					 KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
>  					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
>  
> +#define KVM_PTE_LEAF_SW_BIT_PROT_NONE	BIT(55)
> +
>  struct kvm_pgtable_walk_data {
>  	struct kvm_pgtable		*pgt;
>  	struct kvm_pgtable_walker	*walker;
> @@ -120,6 +122,16 @@ static bool kvm_pte_valid(kvm_pte_t pte)
>  	return pte & KVM_PTE_VALID;
>  }
>  
> +static bool kvm_pte_prot_none(kvm_pte_t pte)
> +{
> +	return pte & KVM_PTE_LEAF_SW_BIT_PROT_NONE;
> +}

I think it would be a good idea to check !kvm_pte_valid() in here too,
since it doesn't make sense to report true for valid (or table) entries.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 27/32] KVM: arm64: Refactor stage2_map_set_prot_attr()
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-04 20:03     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 20:03 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:57PM +0000, Quentin Perret wrote:
> In order to ease its re-use in other code paths, refactor
> stage2_map_set_prot_attr() to not depend on a stage2_map_data struct.
> No functional change intended.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 19 ++++++++-----------
>  1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 8e7059fcfd40..8aa01a9e2603 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -494,8 +494,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
>  	return vtcr;
>  }
>  
> -static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> -				    struct stage2_map_data *data)
> +static kvm_pte_t stage2_get_prot_attr(enum kvm_pgtable_prot prot)
>  {
>  	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
>  	kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
> @@ -504,15 +503,15 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
>  
>  	if (prot & KVM_PGTABLE_PROT_NONE) {
>  		if (prot != KVM_PGTABLE_PROT_NONE)
> -			return -EINVAL;
> +			return 0;

Hmm, does the architecture actually say that having all these attributes
as 0 is illegal? If not, I think it would be better to keep the int return
code and replace the 'data' parameter with a pointer to a kvm_pte_t.

Does that work?

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 27/32] KVM: arm64: Refactor stage2_map_set_prot_attr()
@ 2021-03-04 20:03     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 20:03 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:57PM +0000, Quentin Perret wrote:
> In order to ease its re-use in other code paths, refactor
> stage2_map_set_prot_attr() to not depend on a stage2_map_data struct.
> No functional change intended.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 19 ++++++++-----------
>  1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 8e7059fcfd40..8aa01a9e2603 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -494,8 +494,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
>  	return vtcr;
>  }
>  
> -static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> -				    struct stage2_map_data *data)
> +static kvm_pte_t stage2_get_prot_attr(enum kvm_pgtable_prot prot)
>  {
>  	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
>  	kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
> @@ -504,15 +503,15 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
>  
>  	if (prot & KVM_PGTABLE_PROT_NONE) {
>  		if (prot != KVM_PGTABLE_PROT_NONE)
> -			return -EINVAL;
> +			return 0;

Hmm, does the architecture actually say that having all these attributes
as 0 is illegal? If not, I think it would be better to keep the int return
code and replace the 'data' parameter with a pointer to a kvm_pte_t.

Does that work?

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 27/32] KVM: arm64: Refactor stage2_map_set_prot_attr()
@ 2021-03-04 20:03     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 20:03 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:57PM +0000, Quentin Perret wrote:
> In order to ease its re-use in other code paths, refactor
> stage2_map_set_prot_attr() to not depend on a stage2_map_data struct.
> No functional change intended.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 19 ++++++++-----------
>  1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 8e7059fcfd40..8aa01a9e2603 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -494,8 +494,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
>  	return vtcr;
>  }
>  
> -static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> -				    struct stage2_map_data *data)
> +static kvm_pte_t stage2_get_prot_attr(enum kvm_pgtable_prot prot)
>  {
>  	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
>  	kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
> @@ -504,15 +503,15 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
>  
>  	if (prot & KVM_PGTABLE_PROT_NONE) {
>  		if (prot != KVM_PGTABLE_PROT_NONE)
> -			return -EINVAL;
> +			return 0;

Hmm, does the architecture actually say that having all these attributes
as 0 is illegal? If not, I think it would be better to keep the int return
code and replace the 'data' parameter with a pointer to a kvm_pte_t.

Does that work?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 30/32] KVM: arm64: Page-align the .hyp sections
  2021-03-02 15:00   ` Quentin Perret
  (?)
@ 2021-03-04 20:05     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 20:05 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 03:00:00PM +0000, Quentin Perret wrote:
> We will soon unmap the .hyp sections from the host stage 2 in Protected
> nVHE mode, which obvisouly works with at least page granularity, so make
> sure to align them correctly.

s/obvisouly/obviously/

> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kernel/vmlinux.lds.S | 22 +++++++++-------------
>  1 file changed, 9 insertions(+), 13 deletions(-)

With the typo fixed:

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 30/32] KVM: arm64: Page-align the .hyp sections
@ 2021-03-04 20:05     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 20:05 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 03:00:00PM +0000, Quentin Perret wrote:
> We will soon unmap the .hyp sections from the host stage 2 in Protected
> nVHE mode, which obvisouly works with at least page granularity, so make
> sure to align them correctly.

s/obvisouly/obviously/

> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kernel/vmlinux.lds.S | 22 +++++++++-------------
>  1 file changed, 9 insertions(+), 13 deletions(-)

With the typo fixed:

Acked-by: Will Deacon <will@kernel.org>

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 30/32] KVM: arm64: Page-align the .hyp sections
@ 2021-03-04 20:05     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-04 20:05 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 03:00:00PM +0000, Quentin Perret wrote:
> We will soon unmap the .hyp sections from the host stage 2 in Protected
> nVHE mode, which obvisouly works with at least page granularity, so make
> sure to align them correctly.

s/obvisouly/obviously/

> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kernel/vmlinux.lds.S | 22 +++++++++-------------
>  1 file changed, 9 insertions(+), 13 deletions(-)

With the typo fixed:

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 16/32] KVM: arm64: Elevate hypervisor mappings creation at EL2
  2021-03-04 19:25     ` Will Deacon
  (?)
@ 2021-03-05  9:14       ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05  9:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Thursday 04 Mar 2021 at 19:25:41 (+0000), Will Deacon wrote:
> > +static int do_pkvm_init(u32 hyp_va_bits)
> > +{
> > +	void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
> > +	int ret;
> > +
> > +	preempt_disable();
> > +	hyp_install_host_vector();
> 
> It's a shame we need this both here _and_ on the reinit path, but it looks
> like it's necessary.

Right and I want this before the KVM vectors are installed on secondary
CPUs, to make sure they get the new pgtable from the start. Otherwise
I'd need to do the same dance on all of them to go a switch TTBR0_EL2
and such.

> > +	ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size,
> > +				num_possible_cpus(), kern_hyp_va(per_cpu_base),
> > +				hyp_va_bits);
> > +	preempt_enable();
> > +
> > +	return ret;
> > +}
> 
> [...]
> 
> >  /**
> >   * Inits Hyp-mode on all online CPUs
> >   */
> >  static int init_hyp_mode(void)
> >  {
> > +	u32 hyp_va_bits;
> >  	int cpu;
> > -	int err = 0;
> > +	int err = -ENOMEM;
> > +
> > +	/*
> > +	 * The protected Hyp-mode cannot be initialized if the memory pool
> > +	 * allocation has failed.
> > +	 */
> > +	if (is_protected_kvm_enabled() && !hyp_mem_base)
> > +		return err;
> 
> This skips the error message you get on the out_err path.

Ack, I'll fix.

> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 4d41d7838d53..9d331bf262d2 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -221,15 +221,39 @@ void free_hyp_pgds(void)
> >  	if (hyp_pgtable) {
> >  		kvm_pgtable_hyp_destroy(hyp_pgtable);
> >  		kfree(hyp_pgtable);
> > +		hyp_pgtable = NULL;
> >  	}
> >  	mutex_unlock(&kvm_hyp_pgd_mutex);
> >  }
> >  
> > +static bool kvm_host_owns_hyp_mappings(void)
> > +{
> > +	if (static_branch_likely(&kvm_protected_mode_initialized))
> > +		return false;
> > +
> > +	/*
> > +	 * This can happen at boot time when __create_hyp_mappings() is called
> > +	 * after the hyp protection has been enabled, but the static key has
> > +	 * not been flipped yet.
> > +	 */
> > +	if (!hyp_pgtable && is_protected_kvm_enabled())
> > +		return false;
> > +
> > +	WARN_ON(!hyp_pgtable);
> > +
> > +	return true;
> 
> 	return !(WARN_ON(!hyp_pgtable) && is_protected_kvm_enabled());

Wouldn't this WARN when I have !hyp_pgtable && is_protected_kvm_enabled()
but the static key is still off (which can happen, sadly, as per the
comment above)?

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 16/32] KVM: arm64: Elevate hypervisor mappings creation at EL2
@ 2021-03-05  9:14       ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05  9:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Thursday 04 Mar 2021 at 19:25:41 (+0000), Will Deacon wrote:
> > +static int do_pkvm_init(u32 hyp_va_bits)
> > +{
> > +	void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
> > +	int ret;
> > +
> > +	preempt_disable();
> > +	hyp_install_host_vector();
> 
> It's a shame we need this both here _and_ on the reinit path, but it looks
> like it's necessary.

Right and I want this before the KVM vectors are installed on secondary
CPUs, to make sure they get the new pgtable from the start. Otherwise
I'd need to do the same dance on all of them to go a switch TTBR0_EL2
and such.

> > +	ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size,
> > +				num_possible_cpus(), kern_hyp_va(per_cpu_base),
> > +				hyp_va_bits);
> > +	preempt_enable();
> > +
> > +	return ret;
> > +}
> 
> [...]
> 
> >  /**
> >   * Inits Hyp-mode on all online CPUs
> >   */
> >  static int init_hyp_mode(void)
> >  {
> > +	u32 hyp_va_bits;
> >  	int cpu;
> > -	int err = 0;
> > +	int err = -ENOMEM;
> > +
> > +	/*
> > +	 * The protected Hyp-mode cannot be initialized if the memory pool
> > +	 * allocation has failed.
> > +	 */
> > +	if (is_protected_kvm_enabled() && !hyp_mem_base)
> > +		return err;
> 
> This skips the error message you get on the out_err path.

Ack, I'll fix.

> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 4d41d7838d53..9d331bf262d2 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -221,15 +221,39 @@ void free_hyp_pgds(void)
> >  	if (hyp_pgtable) {
> >  		kvm_pgtable_hyp_destroy(hyp_pgtable);
> >  		kfree(hyp_pgtable);
> > +		hyp_pgtable = NULL;
> >  	}
> >  	mutex_unlock(&kvm_hyp_pgd_mutex);
> >  }
> >  
> > +static bool kvm_host_owns_hyp_mappings(void)
> > +{
> > +	if (static_branch_likely(&kvm_protected_mode_initialized))
> > +		return false;
> > +
> > +	/*
> > +	 * This can happen at boot time when __create_hyp_mappings() is called
> > +	 * after the hyp protection has been enabled, but the static key has
> > +	 * not been flipped yet.
> > +	 */
> > +	if (!hyp_pgtable && is_protected_kvm_enabled())
> > +		return false;
> > +
> > +	WARN_ON(!hyp_pgtable);
> > +
> > +	return true;
> 
> 	return !(WARN_ON(!hyp_pgtable) && is_protected_kvm_enabled());

Wouldn't this WARN when I have !hyp_pgtable && is_protected_kvm_enabled()
but the static key is still off (which can happen, sadly, as per the
comment above)?

Thanks,
Quentin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 16/32] KVM: arm64: Elevate hypervisor mappings creation at EL2
@ 2021-03-05  9:14       ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05  9:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Thursday 04 Mar 2021 at 19:25:41 (+0000), Will Deacon wrote:
> > +static int do_pkvm_init(u32 hyp_va_bits)
> > +{
> > +	void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
> > +	int ret;
> > +
> > +	preempt_disable();
> > +	hyp_install_host_vector();
> 
> It's a shame we need this both here _and_ on the reinit path, but it looks
> like it's necessary.

Right and I want this before the KVM vectors are installed on secondary
CPUs, to make sure they get the new pgtable from the start. Otherwise
I'd need to do the same dance on all of them to go a switch TTBR0_EL2
and such.

> > +	ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size,
> > +				num_possible_cpus(), kern_hyp_va(per_cpu_base),
> > +				hyp_va_bits);
> > +	preempt_enable();
> > +
> > +	return ret;
> > +}
> 
> [...]
> 
> >  /**
> >   * Inits Hyp-mode on all online CPUs
> >   */
> >  static int init_hyp_mode(void)
> >  {
> > +	u32 hyp_va_bits;
> >  	int cpu;
> > -	int err = 0;
> > +	int err = -ENOMEM;
> > +
> > +	/*
> > +	 * The protected Hyp-mode cannot be initialized if the memory pool
> > +	 * allocation has failed.
> > +	 */
> > +	if (is_protected_kvm_enabled() && !hyp_mem_base)
> > +		return err;
> 
> This skips the error message you get on the out_err path.

Ack, I'll fix.

> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 4d41d7838d53..9d331bf262d2 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -221,15 +221,39 @@ void free_hyp_pgds(void)
> >  	if (hyp_pgtable) {
> >  		kvm_pgtable_hyp_destroy(hyp_pgtable);
> >  		kfree(hyp_pgtable);
> > +		hyp_pgtable = NULL;
> >  	}
> >  	mutex_unlock(&kvm_hyp_pgd_mutex);
> >  }
> >  
> > +static bool kvm_host_owns_hyp_mappings(void)
> > +{
> > +	if (static_branch_likely(&kvm_protected_mode_initialized))
> > +		return false;
> > +
> > +	/*
> > +	 * This can happen at boot time when __create_hyp_mappings() is called
> > +	 * after the hyp protection has been enabled, but the static key has
> > +	 * not been flipped yet.
> > +	 */
> > +	if (!hyp_pgtable && is_protected_kvm_enabled())
> > +		return false;
> > +
> > +	WARN_ON(!hyp_pgtable);
> > +
> > +	return true;
> 
> 	return !(WARN_ON(!hyp_pgtable) && is_protected_kvm_enabled());

Wouldn't this WARN when I have !hyp_pgtable && is_protected_kvm_enabled()
but the static key is still off (which can happen, sadly, as per the
comment above)?

Thanks,
Quentin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 24/32] KVM: arm64: Reserve memory for host stage 2
  2021-03-04 19:49     ` Will Deacon
  (?)
@ 2021-03-05  9:17       ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05  9:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Thursday 04 Mar 2021 at 19:49:53 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:54PM +0000, Quentin Perret wrote:
> > Extend the memory pool allocated for the hypervisor to include enough
> > pages to map all of memory at page granularity for the host stage 2.
> > While at it, also reserve some memory for device mappings.
> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/mm.h | 23 ++++++++++++++++++++++-
> >  arch/arm64/kvm/hyp/nvhe/setup.c      | 12 ++++++++++++
> >  arch/arm64/kvm/hyp/reserved_mem.c    |  2 ++
> >  3 files changed, 36 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > index ac0f7fcffd08..411a35db949c 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > @@ -53,7 +53,7 @@ static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
> >  	return total;
> >  }
> >  
> > -static inline unsigned long hyp_s1_pgtable_pages(void)
> > +static inline unsigned long __hyp_pgtable_total_pages(void)
> >  {
> >  	unsigned long res = 0, i;
> >  
> > @@ -63,9 +63,30 @@ static inline unsigned long hyp_s1_pgtable_pages(void)
> >  		res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT);
> >  	}
> >  
> > +	return res;
> > +}
> > +
> > +static inline unsigned long hyp_s1_pgtable_pages(void)
> > +{
> > +	unsigned long res;
> > +
> > +	res = __hyp_pgtable_total_pages();
> > +
> >  	/* Allow 1 GiB for private mappings */
> >  	res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
> >  
> >  	return res;
> >  }
> > +
> > +static inline unsigned long host_s2_mem_pgtable_pages(void)
> > +{
> > +	return __hyp_pgtable_total_pages() + 16;
> 
> Is this 16 due to the possibility of a concatenated pgd?

Yes it is, to be sure we have a safe upper-bound.

> If so, please add a comment to that effect.

Will do, thanks.
Quentin

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 24/32] KVM: arm64: Reserve memory for host stage 2
@ 2021-03-05  9:17       ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05  9:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Thursday 04 Mar 2021 at 19:49:53 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:54PM +0000, Quentin Perret wrote:
> > Extend the memory pool allocated for the hypervisor to include enough
> > pages to map all of memory at page granularity for the host stage 2.
> > While at it, also reserve some memory for device mappings.
> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/mm.h | 23 ++++++++++++++++++++++-
> >  arch/arm64/kvm/hyp/nvhe/setup.c      | 12 ++++++++++++
> >  arch/arm64/kvm/hyp/reserved_mem.c    |  2 ++
> >  3 files changed, 36 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > index ac0f7fcffd08..411a35db949c 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > @@ -53,7 +53,7 @@ static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
> >  	return total;
> >  }
> >  
> > -static inline unsigned long hyp_s1_pgtable_pages(void)
> > +static inline unsigned long __hyp_pgtable_total_pages(void)
> >  {
> >  	unsigned long res = 0, i;
> >  
> > @@ -63,9 +63,30 @@ static inline unsigned long hyp_s1_pgtable_pages(void)
> >  		res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT);
> >  	}
> >  
> > +	return res;
> > +}
> > +
> > +static inline unsigned long hyp_s1_pgtable_pages(void)
> > +{
> > +	unsigned long res;
> > +
> > +	res = __hyp_pgtable_total_pages();
> > +
> >  	/* Allow 1 GiB for private mappings */
> >  	res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
> >  
> >  	return res;
> >  }
> > +
> > +static inline unsigned long host_s2_mem_pgtable_pages(void)
> > +{
> > +	return __hyp_pgtable_total_pages() + 16;
> 
> Is this 16 due to the possibility of a concatenated pgd?

Yes it is, to be sure we have a safe upper-bound.

> If so, please add a comment to that effect.

Will do, thanks.
Quentin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 24/32] KVM: arm64: Reserve memory for host stage 2
@ 2021-03-05  9:17       ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05  9:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Thursday 04 Mar 2021 at 19:49:53 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:54PM +0000, Quentin Perret wrote:
> > Extend the memory pool allocated for the hypervisor to include enough
> > pages to map all of memory at page granularity for the host stage 2.
> > While at it, also reserve some memory for device mappings.
> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/mm.h | 23 ++++++++++++++++++++++-
> >  arch/arm64/kvm/hyp/nvhe/setup.c      | 12 ++++++++++++
> >  arch/arm64/kvm/hyp/reserved_mem.c    |  2 ++
> >  3 files changed, 36 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > index ac0f7fcffd08..411a35db949c 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > @@ -53,7 +53,7 @@ static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
> >  	return total;
> >  }
> >  
> > -static inline unsigned long hyp_s1_pgtable_pages(void)
> > +static inline unsigned long __hyp_pgtable_total_pages(void)
> >  {
> >  	unsigned long res = 0, i;
> >  
> > @@ -63,9 +63,30 @@ static inline unsigned long hyp_s1_pgtable_pages(void)
> >  		res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT);
> >  	}
> >  
> > +	return res;
> > +}
> > +
> > +static inline unsigned long hyp_s1_pgtable_pages(void)
> > +{
> > +	unsigned long res;
> > +
> > +	res = __hyp_pgtable_total_pages();
> > +
> >  	/* Allow 1 GiB for private mappings */
> >  	res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
> >  
> >  	return res;
> >  }
> > +
> > +static inline unsigned long host_s2_mem_pgtable_pages(void)
> > +{
> > +	return __hyp_pgtable_total_pages() + 16;
> 
> Is this 16 due to the possibility of a concatenated pgd?

Yes it is, to be sure we have a safe upper-bound.

> If so, please add a comment to that effect.

Will do, thanks.
Quentin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 27/32] KVM: arm64: Refactor stage2_map_set_prot_attr()
  2021-03-04 20:03     ` Will Deacon
  (?)
@ 2021-03-05  9:18       ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05  9:18 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Thursday 04 Mar 2021 at 20:03:36 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:57PM +0000, Quentin Perret wrote:
> > In order to ease its re-use in other code paths, refactor
> > stage2_map_set_prot_attr() to not depend on a stage2_map_data struct.
> > No functional change intended.
> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 19 ++++++++-----------
> >  1 file changed, 8 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 8e7059fcfd40..8aa01a9e2603 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -494,8 +494,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
> >  	return vtcr;
> >  }
> >  
> > -static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> > -				    struct stage2_map_data *data)
> > +static kvm_pte_t stage2_get_prot_attr(enum kvm_pgtable_prot prot)
> >  {
> >  	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
> >  	kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
> > @@ -504,15 +503,15 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> >  
> >  	if (prot & KVM_PGTABLE_PROT_NONE) {
> >  		if (prot != KVM_PGTABLE_PROT_NONE)
> > -			return -EINVAL;
> > +			return 0;
> 
> Hmm, does the architecture actually say that having all these attributes
> as 0 is illegal?

Hmm, that's a good point, that might not be the case. I assumed we would
have no use for this, but there we can easily avoid the restriction
so...

> If not, I think it would be better to keep the int return
> code and replace the 'data' parameter with a pointer to a kvm_pte_t.
> 
> Does that work?

I think so yes, I'll fix it up.

Cheers,
Quentin

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 27/32] KVM: arm64: Refactor stage2_map_set_prot_attr()
@ 2021-03-05  9:18       ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05  9:18 UTC (permalink / raw)
  To: Will Deacon
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Thursday 04 Mar 2021 at 20:03:36 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:57PM +0000, Quentin Perret wrote:
> > In order to ease its re-use in other code paths, refactor
> > stage2_map_set_prot_attr() to not depend on a stage2_map_data struct.
> > No functional change intended.
> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 19 ++++++++-----------
> >  1 file changed, 8 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 8e7059fcfd40..8aa01a9e2603 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -494,8 +494,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
> >  	return vtcr;
> >  }
> >  
> > -static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> > -				    struct stage2_map_data *data)
> > +static kvm_pte_t stage2_get_prot_attr(enum kvm_pgtable_prot prot)
> >  {
> >  	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
> >  	kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
> > @@ -504,15 +503,15 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> >  
> >  	if (prot & KVM_PGTABLE_PROT_NONE) {
> >  		if (prot != KVM_PGTABLE_PROT_NONE)
> > -			return -EINVAL;
> > +			return 0;
> 
> Hmm, does the architecture actually say that having all these attributes
> as 0 is illegal?

Hmm, that's a good point, that might not be the case. I assumed we would
have no use for this, but there we can easily avoid the restriction
so...

> If not, I think it would be better to keep the int return
> code and replace the 'data' parameter with a pointer to a kvm_pte_t.
> 
> Does that work?

I think so yes, I'll fix it up.

Cheers,
Quentin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 27/32] KVM: arm64: Refactor stage2_map_set_prot_attr()
@ 2021-03-05  9:18       ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05  9:18 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Thursday 04 Mar 2021 at 20:03:36 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:57PM +0000, Quentin Perret wrote:
> > In order to ease its re-use in other code paths, refactor
> > stage2_map_set_prot_attr() to not depend on a stage2_map_data struct.
> > No functional change intended.
> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 19 ++++++++-----------
> >  1 file changed, 8 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 8e7059fcfd40..8aa01a9e2603 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -494,8 +494,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
> >  	return vtcr;
> >  }
> >  
> > -static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> > -				    struct stage2_map_data *data)
> > +static kvm_pte_t stage2_get_prot_attr(enum kvm_pgtable_prot prot)
> >  {
> >  	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
> >  	kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
> > @@ -504,15 +503,15 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> >  
> >  	if (prot & KVM_PGTABLE_PROT_NONE) {
> >  		if (prot != KVM_PGTABLE_PROT_NONE)
> > -			return -EINVAL;
> > +			return 0;
> 
> Hmm, does the architecture actually say that having all these attributes
> as 0 is illegal?

Hmm, that's a good point, that might not be the case. I assumed we would
have no use for this, but there we can easily avoid the restriction
so...

> If not, I think it would be better to keep the int return
> code and replace the 'data' parameter with a pointer to a kvm_pte_t.
> 
> Does that work?

I think so yes, I'll fix it up.

Cheers,
Quentin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2
  2021-03-04 20:00     ` Will Deacon
  (?)
@ 2021-03-05  9:52       ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05  9:52 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Thursday 04 Mar 2021 at 20:00:45 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:56PM +0000, Quentin Perret wrote:
> > Once we start unmapping portions of memory from the host stage 2 (such
> > as e.g. the hypervisor memory sections, or pages that belong to
> > protected guests), we will need a way to track page ownership. And
> > given that all mappings in the host stage 2 will be identity-mapped, we
> > can use the host stage 2 page-table itself as a simplistic rmap.
> > 
> > As a first step towards this, introduce a new protection attribute
> > in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which
> > allows to annotate portions of the IPA space as inaccessible. For
> > simplicity, PROT_NONE mappings are created as invalid mappings with a
> > software bit set.
> 
> Just as an observation, but given that they're invalid we can use any bit
> from [63:2] to indicate that it's a PROT_NONE mapping, and that way we
> can keep the real "software bits" for live mappings.
> 
> But we can of course change that later when we need the bit for something
> else.

Right, so I used this approach for consistency with the kernel's
PROT_NONE mappings:

	#define PTE_PROT_NONE	(_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */

And in fact now that I think about it, it might be worth re-using the
same bit in stage 2.

But yes it would be pretty neat to use the other bits of invalid
mappings to add metadata about the pages. I could even drop the
PROT_NONE stuff straight away in favor of a more extensive mechanism for
tracking page ownership...

Thinking about it, it should be relatively straightforward to construct
the host stage 2 with the following invariants:

  1. everything is identity-mapped in the host stage 2;

  2. all valid mappings imply the underlying PA range belongs to the
     host;

  3. bits [63:32] (say) of all invalid mappings are used to store a
     unique identifier for the owner of the underlying PA range;

  4. the host identifier is 0, such that it owns all of memory by
     default at boot as its pgd is zeroed;

And then I could replace my PROT_NONE permission stuff by an ownership
change. E.g. the hypervisor would have its own identifier, and I could
use it to mark the .hyp memory sections as owned by the hyp (which
implies inaccessible by the host). And that should scale quite easily
when we start running protected guests as we'll assign them their own
identifiers. Sharing pages between guests (or even worse, between host
and guests) is a bit trickier, but maybe that is for later.

Thoughts?

> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h |  2 ++
> >  arch/arm64/kvm/hyp/pgtable.c         | 26 ++++++++++++++++++++++++--
> >  2 files changed, 26 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 9935dbae2cc1..c9f6ed76e0ad 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -80,6 +80,7 @@ struct kvm_pgtable {
> >   * @KVM_PGTABLE_PROT_W:		Write permission.
> >   * @KVM_PGTABLE_PROT_R:		Read permission.
> >   * @KVM_PGTABLE_PROT_DEVICE:	Device attributes.
> > + * @KVM_PGTABLE_PROT_NONE:	No permission.
> >   */
> >  enum kvm_pgtable_prot {
> >  	KVM_PGTABLE_PROT_X			= BIT(0),
> > @@ -87,6 +88,7 @@ enum kvm_pgtable_prot {
> >  	KVM_PGTABLE_PROT_R			= BIT(2),
> >  
> >  	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
> > +	KVM_PGTABLE_PROT_NONE			= BIT(4),
> 
> Why do we need an extra entry here? Couldn't we just create PROT_NONE
> entries when none of R,W or X are set?

The kernel has an explicit PAGE_NONE permission, so I followed the same
idea, but that could work as well. Now, as per the above discussion that
may not be relevant if we implement the page ownership thing.

> >  };
> >  
> >  #define PAGE_HYP		(KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index bdd6e3d4eeb6..8e7059fcfd40 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -48,6 +48,8 @@
> >  					 KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
> >  					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
> >  
> > +#define KVM_PTE_LEAF_SW_BIT_PROT_NONE	BIT(55)
> > +
> >  struct kvm_pgtable_walk_data {
> >  	struct kvm_pgtable		*pgt;
> >  	struct kvm_pgtable_walker	*walker;
> > @@ -120,6 +122,16 @@ static bool kvm_pte_valid(kvm_pte_t pte)
> >  	return pte & KVM_PTE_VALID;
> >  }
> >  
> > +static bool kvm_pte_prot_none(kvm_pte_t pte)
> > +{
> > +	return pte & KVM_PTE_LEAF_SW_BIT_PROT_NONE;
> > +}
> 
> I think it would be a good idea to check !kvm_pte_valid() in here too,
> since it doesn't make sense to report true for valid (or table) entries.

Ack.

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2
@ 2021-03-05  9:52       ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05  9:52 UTC (permalink / raw)
  To: Will Deacon
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Thursday 04 Mar 2021 at 20:00:45 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:56PM +0000, Quentin Perret wrote:
> > Once we start unmapping portions of memory from the host stage 2 (such
> > as e.g. the hypervisor memory sections, or pages that belong to
> > protected guests), we will need a way to track page ownership. And
> > given that all mappings in the host stage 2 will be identity-mapped, we
> > can use the host stage 2 page-table itself as a simplistic rmap.
> > 
> > As a first step towards this, introduce a new protection attribute
> > in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which
> > allows to annotate portions of the IPA space as inaccessible. For
> > simplicity, PROT_NONE mappings are created as invalid mappings with a
> > software bit set.
> 
> Just as an observation, but given that they're invalid we can use any bit
> from [63:2] to indicate that it's a PROT_NONE mapping, and that way we
> can keep the real "software bits" for live mappings.
> 
> But we can of course change that later when we need the bit for something
> else.

Right, so I used this approach for consistency with the kernel's
PROT_NONE mappings:

	#define PTE_PROT_NONE	(_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */

And in fact now that I think about it, it might be worth re-using the
same bit in stage 2.

But yes it would be pretty neat to use the other bits of invalid
mappings to add metadata about the pages. I could even drop the
PROT_NONE stuff straight away in favor of a more extensive mechanism for
tracking page ownership...

Thinking about it, it should be relatively straightforward to construct
the host stage 2 with the following invariants:

  1. everything is identity-mapped in the host stage 2;

  2. all valid mappings imply the underlying PA range belongs to the
     host;

  3. bits [63:32] (say) of all invalid mappings are used to store a
     unique identifier for the owner of the underlying PA range;

  4. the host identifier is 0, such that it owns all of memory by
     default at boot as its pgd is zeroed;

And then I could replace my PROT_NONE permission stuff by an ownership
change. E.g. the hypervisor would have its own identifier, and I could
use it to mark the .hyp memory sections as owned by the hyp (which
implies inaccessible by the host). And that should scale quite easily
when we start running protected guests as we'll assign them their own
identifiers. Sharing pages between guests (or even worse, between host
and guests) is a bit trickier, but maybe that is for later.

Thoughts?

> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h |  2 ++
> >  arch/arm64/kvm/hyp/pgtable.c         | 26 ++++++++++++++++++++++++--
> >  2 files changed, 26 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 9935dbae2cc1..c9f6ed76e0ad 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -80,6 +80,7 @@ struct kvm_pgtable {
> >   * @KVM_PGTABLE_PROT_W:		Write permission.
> >   * @KVM_PGTABLE_PROT_R:		Read permission.
> >   * @KVM_PGTABLE_PROT_DEVICE:	Device attributes.
> > + * @KVM_PGTABLE_PROT_NONE:	No permission.
> >   */
> >  enum kvm_pgtable_prot {
> >  	KVM_PGTABLE_PROT_X			= BIT(0),
> > @@ -87,6 +88,7 @@ enum kvm_pgtable_prot {
> >  	KVM_PGTABLE_PROT_R			= BIT(2),
> >  
> >  	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
> > +	KVM_PGTABLE_PROT_NONE			= BIT(4),
> 
> Why do we need an extra entry here? Couldn't we just create PROT_NONE
> entries when none of R,W or X are set?

The kernel has an explicit PAGE_NONE permission, so I followed the same
idea, but that could work as well. Now, as per the above discussion that
may not be relevant if we implement the page ownership thing.

> >  };
> >  
> >  #define PAGE_HYP		(KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index bdd6e3d4eeb6..8e7059fcfd40 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -48,6 +48,8 @@
> >  					 KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
> >  					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
> >  
> > +#define KVM_PTE_LEAF_SW_BIT_PROT_NONE	BIT(55)
> > +
> >  struct kvm_pgtable_walk_data {
> >  	struct kvm_pgtable		*pgt;
> >  	struct kvm_pgtable_walker	*walker;
> > @@ -120,6 +122,16 @@ static bool kvm_pte_valid(kvm_pte_t pte)
> >  	return pte & KVM_PTE_VALID;
> >  }
> >  
> > +static bool kvm_pte_prot_none(kvm_pte_t pte)
> > +{
> > +	return pte & KVM_PTE_LEAF_SW_BIT_PROT_NONE;
> > +}
> 
> I think it would be a good idea to check !kvm_pte_valid() in here too,
> since it doesn't make sense to report true for valid (or table) entries.

Ack.

Thanks,
Quentin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2
@ 2021-03-05  9:52       ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05  9:52 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Thursday 04 Mar 2021 at 20:00:45 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:56PM +0000, Quentin Perret wrote:
> > Once we start unmapping portions of memory from the host stage 2 (such
> > as e.g. the hypervisor memory sections, or pages that belong to
> > protected guests), we will need a way to track page ownership. And
> > given that all mappings in the host stage 2 will be identity-mapped, we
> > can use the host stage 2 page-table itself as a simplistic rmap.
> > 
> > As a first step towards this, introduce a new protection attribute
> > in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which
> > allows to annotate portions of the IPA space as inaccessible. For
> > simplicity, PROT_NONE mappings are created as invalid mappings with a
> > software bit set.
> 
> Just as an observation, but given that they're invalid we can use any bit
> from [63:2] to indicate that it's a PROT_NONE mapping, and that way we
> can keep the real "software bits" for live mappings.
> 
> But we can of course change that later when we need the bit for something
> else.

Right, so I used this approach for consistency with the kernel's
PROT_NONE mappings:

	#define PTE_PROT_NONE	(_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */

And in fact now that I think about it, it might be worth re-using the
same bit in stage 2.

But yes it would be pretty neat to use the other bits of invalid
mappings to add metadata about the pages. I could even drop the
PROT_NONE stuff straight away in favor of a more extensive mechanism for
tracking page ownership...

Thinking about it, it should be relatively straightforward to construct
the host stage 2 with the following invariants:

  1. everything is identity-mapped in the host stage 2;

  2. all valid mappings imply the underlying PA range belongs to the
     host;

  3. bits [63:32] (say) of all invalid mappings are used to store a
     unique identifier for the owner of the underlying PA range;

  4. the host identifier is 0, such that it owns all of memory by
     default at boot as its pgd is zeroed;

And then I could replace my PROT_NONE permission stuff by an ownership
change. E.g. the hypervisor would have its own identifier, and I could
use it to mark the .hyp memory sections as owned by the hyp (which
implies inaccessible by the host). And that should scale quite easily
when we start running protected guests as we'll assign them their own
identifiers. Sharing pages between guests (or even worse, between host
and guests) is a bit trickier, but maybe that is for later.

Thoughts?

> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h |  2 ++
> >  arch/arm64/kvm/hyp/pgtable.c         | 26 ++++++++++++++++++++++++--
> >  2 files changed, 26 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 9935dbae2cc1..c9f6ed76e0ad 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -80,6 +80,7 @@ struct kvm_pgtable {
> >   * @KVM_PGTABLE_PROT_W:		Write permission.
> >   * @KVM_PGTABLE_PROT_R:		Read permission.
> >   * @KVM_PGTABLE_PROT_DEVICE:	Device attributes.
> > + * @KVM_PGTABLE_PROT_NONE:	No permission.
> >   */
> >  enum kvm_pgtable_prot {
> >  	KVM_PGTABLE_PROT_X			= BIT(0),
> > @@ -87,6 +88,7 @@ enum kvm_pgtable_prot {
> >  	KVM_PGTABLE_PROT_R			= BIT(2),
> >  
> >  	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
> > +	KVM_PGTABLE_PROT_NONE			= BIT(4),
> 
> Why do we need an extra entry here? Couldn't we just create PROT_NONE
> entries when none of R,W or X are set?

The kernel has an explicit PAGE_NONE permission, so I followed the same
idea, but that could work as well. Now, as per the above discussion that
may not be relevant if we implement the page ownership thing.

> >  };
> >  
> >  #define PAGE_HYP		(KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index bdd6e3d4eeb6..8e7059fcfd40 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -48,6 +48,8 @@
> >  					 KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
> >  					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
> >  
> > +#define KVM_PTE_LEAF_SW_BIT_PROT_NONE	BIT(55)
> > +
> >  struct kvm_pgtable_walk_data {
> >  	struct kvm_pgtable		*pgt;
> >  	struct kvm_pgtable_walker	*walker;
> > @@ -120,6 +122,16 @@ static bool kvm_pte_valid(kvm_pte_t pte)
> >  	return pte & KVM_PTE_VALID;
> >  }
> >  
> > +static bool kvm_pte_prot_none(kvm_pte_t pte)
> > +{
> > +	return pte & KVM_PTE_LEAF_SW_BIT_PROT_NONE;
> > +}
> 
> I think it would be a good idea to check !kvm_pte_valid() in here too,
> since it doesn't make sense to report true for valid (or table) entries.

Ack.

Thanks,
Quentin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-05 14:39     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 14:39 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:58PM +0000, Quentin Perret wrote:
> Add a new map function to the KVM page-table library that allows to
> greedily create block identity-mappings. This will be useful to create
> lazily the host stage 2 page-table as it will own most of memory and
> will always be identity mapped.
> 
> The new helper function creates the mapping in 2 steps: it first walks
> the page-table to compute the largest possible granule that can be used
> to idmap a given address without overriding existing incompatible
> mappings; and then creates a mapping accordingly.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  37 +++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 119 +++++++++++++++++++++++++++
>  2 files changed, 156 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index c9f6ed76e0ad..e51dcce69a5e 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -96,6 +96,16 @@ enum kvm_pgtable_prot {
>  #define PAGE_HYP_RO		(KVM_PGTABLE_PROT_R)
>  #define PAGE_HYP_DEVICE		(PAGE_HYP | KVM_PGTABLE_PROT_DEVICE)
>  
> +/**
> + * struct kvm_mem_range - Range of Intermediate Physical Addresses
> + * @start:	Start of the range.
> + * @end:	End of the range.
> + */
> +struct kvm_mem_range {
> +	u64 start;
> +	u64 end;
> +};
> +
>  /**
>   * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
>   * @KVM_PGTABLE_WALK_LEAF:		Visit leaf entries, including invalid
> @@ -379,4 +389,31 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
>  int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>  		     struct kvm_pgtable_walker *walker);
>  
> +/**
> + * kvm_pgtable_stage2_idmap_greedy() - Identity-map an Intermediate Physical
> + *				       Address with a leaf entry at the highest
> + *				       possible level.

Not sure it's worth mentioning "highest possible level" here, as
realistically the caller still has to provide a memcache to deal with the
worst case and the structure of the page-table shouldn't matter.

> + * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> + * @addr:	Input address to identity-map.
> + * @prot:	Permissions and attributes for the mapping.
> + * @range:	Boundaries of the maximum memory region to map.
> + * @mc:		Cache of pre-allocated memory from which to allocate page-table
> + *		pages.
> + *
> + * This function attempts to install high-level identity-mappings covering @addr

"high-level"? (again, I think I'd just drop this)

> + * without overriding existing mappings with incompatible permissions or
> + * attributes. An existing table entry may be coalesced into a block mapping
> + * if and only if it covers @addr and all its leafs are either invalid and/or

s/leafs/leaf entries/

> + * have permissions and attributes strictly matching @prot. The mapping is
> + * guaranteed to be contained within the boundaries specified by @range at call
> + * time. If only a subset of the memory specified by @range is mapped (because
> + * of e.g. alignment issues or existing incompatible mappings), @range will be
> + * updated accordingly.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr,
> +				    enum kvm_pgtable_prot prot,
> +				    struct kvm_mem_range *range,
> +				    void *mc);
>  #endif	/* __ARM64_KVM_PGTABLE_H__ */
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 8aa01a9e2603..6897d771e2b2 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -987,3 +987,122 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>  	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
>  	pgt->pgd = NULL;
>  }
> +
> +struct stage2_reduce_range_data {
> +	kvm_pte_t attr;
> +	u64 target_addr;
> +	u32 start_level;
> +	struct kvm_mem_range *range;
> +};
> +
> +static int __stage2_reduce_range(struct stage2_reduce_range_data *data, u64 addr)
> +{
> +	u32 level = data->start_level;
> +
> +	for (; level < KVM_PGTABLE_MAX_LEVELS; level++) {
> +		u64 granule = kvm_granule_size(level);
> +		u64 start = ALIGN_DOWN(data->target_addr, granule);
> +		u64 end = start + granule;
> +
> +		/*
> +		 * The pinned address is in the current range, try one level
> +		 * deeper.
> +		 */
> +		if (start == ALIGN_DOWN(addr, granule))
> +			continue;
> +
> +		/*
> +		 * Make sure the current range is a reduction of the existing
> +		 * range before updating it.
> +		 */
> +		if (data->range->start <= start && end <= data->range->end) {
> +			data->start_level = level;
> +			data->range->start = start;
> +			data->range->end = end;
> +			return 0;
> +		}
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +#define KVM_PTE_LEAF_S2_COMPAT_MASK	(KVM_PTE_LEAF_ATTR_S2_PERMS | \
> +					 KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | \
> +					 KVM_PTE_LEAF_SW_BIT_PROT_NONE)
> +
> +static int stage2_reduce_range_walker(u64 addr, u64 end, u32 level,
> +				      kvm_pte_t *ptep,
> +				      enum kvm_pgtable_walk_flags flag,
> +				      void * const arg)
> +{
> +	struct stage2_reduce_range_data *data = arg;
> +	kvm_pte_t attr;
> +	int ret;
> +
> +	if (addr < data->range->start || addr >= data->range->end)
> +		return 0;
> +
> +	attr = *ptep & KVM_PTE_LEAF_S2_COMPAT_MASK;
> +	if (!attr || attr == data->attr)
> +		return 0;
> +
> +	/*
> +	 * An existing mapping with incompatible protection attributes is
> +	 * 'pinned', so reduce the range if we hit one.
> +	 */
> +	ret = __stage2_reduce_range(data, addr);
> +	if (ret)
> +		return ret;
> +
> +	return -EAGAIN;
> +}
> +
> +static int stage2_reduce_range(struct kvm_pgtable *pgt, u64 addr,
> +			       enum kvm_pgtable_prot prot,
> +			       struct kvm_mem_range *range)
> +{
> +	struct stage2_reduce_range_data data = {
> +		.start_level	= pgt->start_level,
> +		.range		= range,
> +		.target_addr	= addr,
> +	};
> +	struct kvm_pgtable_walker walker = {
> +		.cb		= stage2_reduce_range_walker,
> +		.flags		= KVM_PGTABLE_WALK_LEAF,
> +		.arg		= &data,
> +	};
> +	int ret;
> +
> +	data.attr = stage2_get_prot_attr(prot) & KVM_PTE_LEAF_S2_COMPAT_MASK;
> +	if (!data.attr)
> +		return -EINVAL;

(this will need updating based on the other discussion we had)

> +	/* Reduce the kvm_mem_range to a granule size */
> +	ret = __stage2_reduce_range(&data, range->end);
> +	if (ret)
> +		return ret;
> +
> +	/* Walk the range to check permissions and reduce further if needed */
> +	do {
> +		ret = kvm_pgtable_walk(pgt, range->start, range->end, &walker);

(we spent some time debugging an issue here and you spotted that you're
passing range->end instead of the size ;)

> +	} while (ret == -EAGAIN);

I'm a bit nervous about this loop -- what guarantees forward progress here?
Can we return to the host after a few tries instead?

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
@ 2021-03-05 14:39     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 14:39 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:58PM +0000, Quentin Perret wrote:
> Add a new map function to the KVM page-table library that allows to
> greedily create block identity-mappings. This will be useful to create
> lazily the host stage 2 page-table as it will own most of memory and
> will always be identity mapped.
> 
> The new helper function creates the mapping in 2 steps: it first walks
> the page-table to compute the largest possible granule that can be used
> to idmap a given address without overriding existing incompatible
> mappings; and then creates a mapping accordingly.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  37 +++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 119 +++++++++++++++++++++++++++
>  2 files changed, 156 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index c9f6ed76e0ad..e51dcce69a5e 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -96,6 +96,16 @@ enum kvm_pgtable_prot {
>  #define PAGE_HYP_RO		(KVM_PGTABLE_PROT_R)
>  #define PAGE_HYP_DEVICE		(PAGE_HYP | KVM_PGTABLE_PROT_DEVICE)
>  
> +/**
> + * struct kvm_mem_range - Range of Intermediate Physical Addresses
> + * @start:	Start of the range.
> + * @end:	End of the range.
> + */
> +struct kvm_mem_range {
> +	u64 start;
> +	u64 end;
> +};
> +
>  /**
>   * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
>   * @KVM_PGTABLE_WALK_LEAF:		Visit leaf entries, including invalid
> @@ -379,4 +389,31 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
>  int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>  		     struct kvm_pgtable_walker *walker);
>  
> +/**
> + * kvm_pgtable_stage2_idmap_greedy() - Identity-map an Intermediate Physical
> + *				       Address with a leaf entry at the highest
> + *				       possible level.

Not sure it's worth mentioning "highest possible level" here, as
realistically the caller still has to provide a memcache to deal with the
worst case and the structure of the page-table shouldn't matter.

> + * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> + * @addr:	Input address to identity-map.
> + * @prot:	Permissions and attributes for the mapping.
> + * @range:	Boundaries of the maximum memory region to map.
> + * @mc:		Cache of pre-allocated memory from which to allocate page-table
> + *		pages.
> + *
> + * This function attempts to install high-level identity-mappings covering @addr

"high-level"? (again, I think I'd just drop this)

> + * without overriding existing mappings with incompatible permissions or
> + * attributes. An existing table entry may be coalesced into a block mapping
> + * if and only if it covers @addr and all its leafs are either invalid and/or

s/leafs/leaf entries/

> + * have permissions and attributes strictly matching @prot. The mapping is
> + * guaranteed to be contained within the boundaries specified by @range at call
> + * time. If only a subset of the memory specified by @range is mapped (because
> + * of e.g. alignment issues or existing incompatible mappings), @range will be
> + * updated accordingly.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr,
> +				    enum kvm_pgtable_prot prot,
> +				    struct kvm_mem_range *range,
> +				    void *mc);
>  #endif	/* __ARM64_KVM_PGTABLE_H__ */
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 8aa01a9e2603..6897d771e2b2 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -987,3 +987,122 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>  	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
>  	pgt->pgd = NULL;
>  }
> +
> +struct stage2_reduce_range_data {
> +	kvm_pte_t attr;
> +	u64 target_addr;
> +	u32 start_level;
> +	struct kvm_mem_range *range;
> +};
> +
> +static int __stage2_reduce_range(struct stage2_reduce_range_data *data, u64 addr)
> +{
> +	u32 level = data->start_level;
> +
> +	for (; level < KVM_PGTABLE_MAX_LEVELS; level++) {
> +		u64 granule = kvm_granule_size(level);
> +		u64 start = ALIGN_DOWN(data->target_addr, granule);
> +		u64 end = start + granule;
> +
> +		/*
> +		 * The pinned address is in the current range, try one level
> +		 * deeper.
> +		 */
> +		if (start == ALIGN_DOWN(addr, granule))
> +			continue;
> +
> +		/*
> +		 * Make sure the current range is a reduction of the existing
> +		 * range before updating it.
> +		 */
> +		if (data->range->start <= start && end <= data->range->end) {
> +			data->start_level = level;
> +			data->range->start = start;
> +			data->range->end = end;
> +			return 0;
> +		}
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +#define KVM_PTE_LEAF_S2_COMPAT_MASK	(KVM_PTE_LEAF_ATTR_S2_PERMS | \
> +					 KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | \
> +					 KVM_PTE_LEAF_SW_BIT_PROT_NONE)
> +
> +static int stage2_reduce_range_walker(u64 addr, u64 end, u32 level,
> +				      kvm_pte_t *ptep,
> +				      enum kvm_pgtable_walk_flags flag,
> +				      void * const arg)
> +{
> +	struct stage2_reduce_range_data *data = arg;
> +	kvm_pte_t attr;
> +	int ret;
> +
> +	if (addr < data->range->start || addr >= data->range->end)
> +		return 0;
> +
> +	attr = *ptep & KVM_PTE_LEAF_S2_COMPAT_MASK;
> +	if (!attr || attr == data->attr)
> +		return 0;
> +
> +	/*
> +	 * An existing mapping with incompatible protection attributes is
> +	 * 'pinned', so reduce the range if we hit one.
> +	 */
> +	ret = __stage2_reduce_range(data, addr);
> +	if (ret)
> +		return ret;
> +
> +	return -EAGAIN;
> +}
> +
> +static int stage2_reduce_range(struct kvm_pgtable *pgt, u64 addr,
> +			       enum kvm_pgtable_prot prot,
> +			       struct kvm_mem_range *range)
> +{
> +	struct stage2_reduce_range_data data = {
> +		.start_level	= pgt->start_level,
> +		.range		= range,
> +		.target_addr	= addr,
> +	};
> +	struct kvm_pgtable_walker walker = {
> +		.cb		= stage2_reduce_range_walker,
> +		.flags		= KVM_PGTABLE_WALK_LEAF,
> +		.arg		= &data,
> +	};
> +	int ret;
> +
> +	data.attr = stage2_get_prot_attr(prot) & KVM_PTE_LEAF_S2_COMPAT_MASK;
> +	if (!data.attr)
> +		return -EINVAL;

(this will need updating based on the other discussion we had)

> +	/* Reduce the kvm_mem_range to a granule size */
> +	ret = __stage2_reduce_range(&data, range->end);
> +	if (ret)
> +		return ret;
> +
> +	/* Walk the range to check permissions and reduce further if needed */
> +	do {
> +		ret = kvm_pgtable_walk(pgt, range->start, range->end, &walker);

(we spent some time debugging an issue here and you spotted that you're
passing range->end instead of the size ;)

> +	} while (ret == -EAGAIN);

I'm a bit nervous about this loop -- what guarantees forward progress here?
Can we return to the host after a few tries instead?

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
@ 2021-03-05 14:39     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 14:39 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:58PM +0000, Quentin Perret wrote:
> Add a new map function to the KVM page-table library that allows to
> greedily create block identity-mappings. This will be useful to create
> lazily the host stage 2 page-table as it will own most of memory and
> will always be identity mapped.
> 
> The new helper function creates the mapping in 2 steps: it first walks
> the page-table to compute the largest possible granule that can be used
> to idmap a given address without overriding existing incompatible
> mappings; and then creates a mapping accordingly.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  37 +++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 119 +++++++++++++++++++++++++++
>  2 files changed, 156 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index c9f6ed76e0ad..e51dcce69a5e 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -96,6 +96,16 @@ enum kvm_pgtable_prot {
>  #define PAGE_HYP_RO		(KVM_PGTABLE_PROT_R)
>  #define PAGE_HYP_DEVICE		(PAGE_HYP | KVM_PGTABLE_PROT_DEVICE)
>  
> +/**
> + * struct kvm_mem_range - Range of Intermediate Physical Addresses
> + * @start:	Start of the range.
> + * @end:	End of the range.
> + */
> +struct kvm_mem_range {
> +	u64 start;
> +	u64 end;
> +};
> +
>  /**
>   * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
>   * @KVM_PGTABLE_WALK_LEAF:		Visit leaf entries, including invalid
> @@ -379,4 +389,31 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
>  int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>  		     struct kvm_pgtable_walker *walker);
>  
> +/**
> + * kvm_pgtable_stage2_idmap_greedy() - Identity-map an Intermediate Physical
> + *				       Address with a leaf entry at the highest
> + *				       possible level.

Not sure it's worth mentioning "highest possible level" here, as
realistically the caller still has to provide a memcache to deal with the
worst case and the structure of the page-table shouldn't matter.

> + * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> + * @addr:	Input address to identity-map.
> + * @prot:	Permissions and attributes for the mapping.
> + * @range:	Boundaries of the maximum memory region to map.
> + * @mc:		Cache of pre-allocated memory from which to allocate page-table
> + *		pages.
> + *
> + * This function attempts to install high-level identity-mappings covering @addr

"high-level"? (again, I think I'd just drop this)

> + * without overriding existing mappings with incompatible permissions or
> + * attributes. An existing table entry may be coalesced into a block mapping
> + * if and only if it covers @addr and all its leafs are either invalid and/or

s/leafs/leaf entries/

> + * have permissions and attributes strictly matching @prot. The mapping is
> + * guaranteed to be contained within the boundaries specified by @range at call
> + * time. If only a subset of the memory specified by @range is mapped (because
> + * of e.g. alignment issues or existing incompatible mappings), @range will be
> + * updated accordingly.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr,
> +				    enum kvm_pgtable_prot prot,
> +				    struct kvm_mem_range *range,
> +				    void *mc);
>  #endif	/* __ARM64_KVM_PGTABLE_H__ */
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 8aa01a9e2603..6897d771e2b2 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -987,3 +987,122 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>  	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
>  	pgt->pgd = NULL;
>  }
> +
> +struct stage2_reduce_range_data {
> +	kvm_pte_t attr;
> +	u64 target_addr;
> +	u32 start_level;
> +	struct kvm_mem_range *range;
> +};
> +
> +static int __stage2_reduce_range(struct stage2_reduce_range_data *data, u64 addr)
> +{
> +	u32 level = data->start_level;
> +
> +	for (; level < KVM_PGTABLE_MAX_LEVELS; level++) {
> +		u64 granule = kvm_granule_size(level);
> +		u64 start = ALIGN_DOWN(data->target_addr, granule);
> +		u64 end = start + granule;
> +
> +		/*
> +		 * The pinned address is in the current range, try one level
> +		 * deeper.
> +		 */
> +		if (start == ALIGN_DOWN(addr, granule))
> +			continue;
> +
> +		/*
> +		 * Make sure the current range is a reduction of the existing
> +		 * range before updating it.
> +		 */
> +		if (data->range->start <= start && end <= data->range->end) {
> +			data->start_level = level;
> +			data->range->start = start;
> +			data->range->end = end;
> +			return 0;
> +		}
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +#define KVM_PTE_LEAF_S2_COMPAT_MASK	(KVM_PTE_LEAF_ATTR_S2_PERMS | \
> +					 KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | \
> +					 KVM_PTE_LEAF_SW_BIT_PROT_NONE)
> +
> +static int stage2_reduce_range_walker(u64 addr, u64 end, u32 level,
> +				      kvm_pte_t *ptep,
> +				      enum kvm_pgtable_walk_flags flag,
> +				      void * const arg)
> +{
> +	struct stage2_reduce_range_data *data = arg;
> +	kvm_pte_t attr;
> +	int ret;
> +
> +	if (addr < data->range->start || addr >= data->range->end)
> +		return 0;
> +
> +	attr = *ptep & KVM_PTE_LEAF_S2_COMPAT_MASK;
> +	if (!attr || attr == data->attr)
> +		return 0;
> +
> +	/*
> +	 * An existing mapping with incompatible protection attributes is
> +	 * 'pinned', so reduce the range if we hit one.
> +	 */
> +	ret = __stage2_reduce_range(data, addr);
> +	if (ret)
> +		return ret;
> +
> +	return -EAGAIN;
> +}
> +
> +static int stage2_reduce_range(struct kvm_pgtable *pgt, u64 addr,
> +			       enum kvm_pgtable_prot prot,
> +			       struct kvm_mem_range *range)
> +{
> +	struct stage2_reduce_range_data data = {
> +		.start_level	= pgt->start_level,
> +		.range		= range,
> +		.target_addr	= addr,
> +	};
> +	struct kvm_pgtable_walker walker = {
> +		.cb		= stage2_reduce_range_walker,
> +		.flags		= KVM_PGTABLE_WALK_LEAF,
> +		.arg		= &data,
> +	};
> +	int ret;
> +
> +	data.attr = stage2_get_prot_attr(prot) & KVM_PTE_LEAF_S2_COMPAT_MASK;
> +	if (!data.attr)
> +		return -EINVAL;

(this will need updating based on the other discussion we had)

> +	/* Reduce the kvm_mem_range to a granule size */
> +	ret = __stage2_reduce_range(&data, range->end);
> +	if (ret)
> +		return ret;
> +
> +	/* Walk the range to check permissions and reduce further if needed */
> +	do {
> +		ret = kvm_pgtable_walk(pgt, range->start, range->end, &walker);

(we spent some time debugging an issue here and you spotted that you're
passing range->end instead of the size ;)

> +	} while (ret == -EAGAIN);

I'm a bit nervous about this loop -- what guarantees forward progress here?
Can we return to the host after a few tries instead?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
  2021-03-05 14:39     ` Will Deacon
  (?)
@ 2021-03-05 15:03       ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05 15:03 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Friday 05 Mar 2021 at 14:39:42 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:58PM +0000, Quentin Perret wrote:
> > +/**
> > + * kvm_pgtable_stage2_idmap_greedy() - Identity-map an Intermediate Physical
> > + *				       Address with a leaf entry at the highest
> > + *				       possible level.
> 
> Not sure it's worth mentioning "highest possible level" here, as
> realistically the caller still has to provide a memcache to deal with the
> worst case and the structure of the page-table shouldn't matter.

Right, we need to pass a range so I suppose that should be enough to
say 'this tries to cover large portions of memory'.

> > + * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> > + * @addr:	Input address to identity-map.
> > + * @prot:	Permissions and attributes for the mapping.
> > + * @range:	Boundaries of the maximum memory region to map.
> > + * @mc:		Cache of pre-allocated memory from which to allocate page-table
> > + *		pages.
> > + *
> > + * This function attempts to install high-level identity-mappings covering @addr
> 
> "high-level"? (again, I think I'd just drop this)
> 
> > + * without overriding existing mappings with incompatible permissions or
> > + * attributes. An existing table entry may be coalesced into a block mapping
> > + * if and only if it covers @addr and all its leafs are either invalid and/or
> 
> s/leafs/leaf entries/

Ack for both.

> > + * have permissions and attributes strictly matching @prot. The mapping is
> > + * guaranteed to be contained within the boundaries specified by @range at call
> > + * time. If only a subset of the memory specified by @range is mapped (because
> > + * of e.g. alignment issues or existing incompatible mappings), @range will be
> > + * updated accordingly.
> > + *
> > + * Return: 0 on success, negative error code on failure.
> > + */
> > +int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr,
> > +				    enum kvm_pgtable_prot prot,
> > +				    struct kvm_mem_range *range,
> > +				    void *mc);
> >  #endif	/* __ARM64_KVM_PGTABLE_H__ */
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 8aa01a9e2603..6897d771e2b2 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -987,3 +987,122 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
> >  	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
> >  	pgt->pgd = NULL;
> >  }
> > +
> > +struct stage2_reduce_range_data {
> > +	kvm_pte_t attr;
> > +	u64 target_addr;
> > +	u32 start_level;
> > +	struct kvm_mem_range *range;
> > +};
> > +
> > +static int __stage2_reduce_range(struct stage2_reduce_range_data *data, u64 addr)
> > +{
> > +	u32 level = data->start_level;
> > +
> > +	for (; level < KVM_PGTABLE_MAX_LEVELS; level++) {
> > +		u64 granule = kvm_granule_size(level);
> > +		u64 start = ALIGN_DOWN(data->target_addr, granule);
> > +		u64 end = start + granule;
> > +
> > +		/*
> > +		 * The pinned address is in the current range, try one level
> > +		 * deeper.
> > +		 */
> > +		if (start == ALIGN_DOWN(addr, granule))
> > +			continue;
> > +
> > +		/*
> > +		 * Make sure the current range is a reduction of the existing
> > +		 * range before updating it.
> > +		 */
> > +		if (data->range->start <= start && end <= data->range->end) {
> > +			data->start_level = level;
> > +			data->range->start = start;
> > +			data->range->end = end;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	return -EINVAL;
> > +}
> > +
> > +#define KVM_PTE_LEAF_S2_COMPAT_MASK	(KVM_PTE_LEAF_ATTR_S2_PERMS | \
> > +					 KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | \
> > +					 KVM_PTE_LEAF_SW_BIT_PROT_NONE)
> > +
> > +static int stage2_reduce_range_walker(u64 addr, u64 end, u32 level,
> > +				      kvm_pte_t *ptep,
> > +				      enum kvm_pgtable_walk_flags flag,
> > +				      void * const arg)
> > +{
> > +	struct stage2_reduce_range_data *data = arg;
> > +	kvm_pte_t attr;
> > +	int ret;
> > +
> > +	if (addr < data->range->start || addr >= data->range->end)
> > +		return 0;
> > +
> > +	attr = *ptep & KVM_PTE_LEAF_S2_COMPAT_MASK;
> > +	if (!attr || attr == data->attr)
> > +		return 0;
> > +
> > +	/*
> > +	 * An existing mapping with incompatible protection attributes is
> > +	 * 'pinned', so reduce the range if we hit one.
> > +	 */
> > +	ret = __stage2_reduce_range(data, addr);
> > +	if (ret)
> > +		return ret;
> > +
> > +	return -EAGAIN;
> > +}
> > +
> > +static int stage2_reduce_range(struct kvm_pgtable *pgt, u64 addr,
> > +			       enum kvm_pgtable_prot prot,
> > +			       struct kvm_mem_range *range)
> > +{
> > +	struct stage2_reduce_range_data data = {
> > +		.start_level	= pgt->start_level,
> > +		.range		= range,
> > +		.target_addr	= addr,
> > +	};
> > +	struct kvm_pgtable_walker walker = {
> > +		.cb		= stage2_reduce_range_walker,
> > +		.flags		= KVM_PGTABLE_WALK_LEAF,
> > +		.arg		= &data,
> > +	};
> > +	int ret;
> > +
> > +	data.attr = stage2_get_prot_attr(prot) & KVM_PTE_LEAF_S2_COMPAT_MASK;
> > +	if (!data.attr)
> > +		return -EINVAL;
> 
> (this will need updating based on the other discussion we had)

Ack.

> > +	/* Reduce the kvm_mem_range to a granule size */
> > +	ret = __stage2_reduce_range(&data, range->end);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* Walk the range to check permissions and reduce further if needed */
> > +	do {
> > +		ret = kvm_pgtable_walk(pgt, range->start, range->end, &walker);
> 
> (we spent some time debugging an issue here and you spotted that you're
> passing range->end instead of the size ;)

Yep, I have the fix applied locally, and ready to fly in v4 :)

> > +	} while (ret == -EAGAIN);
> 
> I'm a bit nervous about this loop -- what guarantees forward progress here?
> Can we return to the host after a few tries instead?

-EAGAIN only happens when we've been able to successfully reduce the
range to a potentially valid granule size. That can't happen infinitely.

We're guaranteed to fail when trying to reduce the range to a
granularity smaller than PAGE_SIZE (the -EINVAL case of
__stage2_reduce_range), which is indicative of a host memory abort in a
page it should not access (because marked PROT_NONE for instance).

Cheers,
Quentin

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
@ 2021-03-05 15:03       ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05 15:03 UTC (permalink / raw)
  To: Will Deacon
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Friday 05 Mar 2021 at 14:39:42 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:58PM +0000, Quentin Perret wrote:
> > +/**
> > + * kvm_pgtable_stage2_idmap_greedy() - Identity-map an Intermediate Physical
> > + *				       Address with a leaf entry at the highest
> > + *				       possible level.
> 
> Not sure it's worth mentioning "highest possible level" here, as
> realistically the caller still has to provide a memcache to deal with the
> worst case and the structure of the page-table shouldn't matter.

Right, we need to pass a range so I suppose that should be enough to
say 'this tries to cover large portions of memory'.

> > + * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> > + * @addr:	Input address to identity-map.
> > + * @prot:	Permissions and attributes for the mapping.
> > + * @range:	Boundaries of the maximum memory region to map.
> > + * @mc:		Cache of pre-allocated memory from which to allocate page-table
> > + *		pages.
> > + *
> > + * This function attempts to install high-level identity-mappings covering @addr
> 
> "high-level"? (again, I think I'd just drop this)
> 
> > + * without overriding existing mappings with incompatible permissions or
> > + * attributes. An existing table entry may be coalesced into a block mapping
> > + * if and only if it covers @addr and all its leafs are either invalid and/or
> 
> s/leafs/leaf entries/

Ack for both.

> > + * have permissions and attributes strictly matching @prot. The mapping is
> > + * guaranteed to be contained within the boundaries specified by @range at call
> > + * time. If only a subset of the memory specified by @range is mapped (because
> > + * of e.g. alignment issues or existing incompatible mappings), @range will be
> > + * updated accordingly.
> > + *
> > + * Return: 0 on success, negative error code on failure.
> > + */
> > +int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr,
> > +				    enum kvm_pgtable_prot prot,
> > +				    struct kvm_mem_range *range,
> > +				    void *mc);
> >  #endif	/* __ARM64_KVM_PGTABLE_H__ */
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 8aa01a9e2603..6897d771e2b2 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -987,3 +987,122 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
> >  	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
> >  	pgt->pgd = NULL;
> >  }
> > +
> > +struct stage2_reduce_range_data {
> > +	kvm_pte_t attr;
> > +	u64 target_addr;
> > +	u32 start_level;
> > +	struct kvm_mem_range *range;
> > +};
> > +
> > +static int __stage2_reduce_range(struct stage2_reduce_range_data *data, u64 addr)
> > +{
> > +	u32 level = data->start_level;
> > +
> > +	for (; level < KVM_PGTABLE_MAX_LEVELS; level++) {
> > +		u64 granule = kvm_granule_size(level);
> > +		u64 start = ALIGN_DOWN(data->target_addr, granule);
> > +		u64 end = start + granule;
> > +
> > +		/*
> > +		 * The pinned address is in the current range, try one level
> > +		 * deeper.
> > +		 */
> > +		if (start == ALIGN_DOWN(addr, granule))
> > +			continue;
> > +
> > +		/*
> > +		 * Make sure the current range is a reduction of the existing
> > +		 * range before updating it.
> > +		 */
> > +		if (data->range->start <= start && end <= data->range->end) {
> > +			data->start_level = level;
> > +			data->range->start = start;
> > +			data->range->end = end;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	return -EINVAL;
> > +}
> > +
> > +#define KVM_PTE_LEAF_S2_COMPAT_MASK	(KVM_PTE_LEAF_ATTR_S2_PERMS | \
> > +					 KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | \
> > +					 KVM_PTE_LEAF_SW_BIT_PROT_NONE)
> > +
> > +static int stage2_reduce_range_walker(u64 addr, u64 end, u32 level,
> > +				      kvm_pte_t *ptep,
> > +				      enum kvm_pgtable_walk_flags flag,
> > +				      void * const arg)
> > +{
> > +	struct stage2_reduce_range_data *data = arg;
> > +	kvm_pte_t attr;
> > +	int ret;
> > +
> > +	if (addr < data->range->start || addr >= data->range->end)
> > +		return 0;
> > +
> > +	attr = *ptep & KVM_PTE_LEAF_S2_COMPAT_MASK;
> > +	if (!attr || attr == data->attr)
> > +		return 0;
> > +
> > +	/*
> > +	 * An existing mapping with incompatible protection attributes is
> > +	 * 'pinned', so reduce the range if we hit one.
> > +	 */
> > +	ret = __stage2_reduce_range(data, addr);
> > +	if (ret)
> > +		return ret;
> > +
> > +	return -EAGAIN;
> > +}
> > +
> > +static int stage2_reduce_range(struct kvm_pgtable *pgt, u64 addr,
> > +			       enum kvm_pgtable_prot prot,
> > +			       struct kvm_mem_range *range)
> > +{
> > +	struct stage2_reduce_range_data data = {
> > +		.start_level	= pgt->start_level,
> > +		.range		= range,
> > +		.target_addr	= addr,
> > +	};
> > +	struct kvm_pgtable_walker walker = {
> > +		.cb		= stage2_reduce_range_walker,
> > +		.flags		= KVM_PGTABLE_WALK_LEAF,
> > +		.arg		= &data,
> > +	};
> > +	int ret;
> > +
> > +	data.attr = stage2_get_prot_attr(prot) & KVM_PTE_LEAF_S2_COMPAT_MASK;
> > +	if (!data.attr)
> > +		return -EINVAL;
> 
> (this will need updating based on the other discussion we had)

Ack.

> > +	/* Reduce the kvm_mem_range to a granule size */
> > +	ret = __stage2_reduce_range(&data, range->end);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* Walk the range to check permissions and reduce further if needed */
> > +	do {
> > +		ret = kvm_pgtable_walk(pgt, range->start, range->end, &walker);
> 
> (we spent some time debugging an issue here and you spotted that you're
> passing range->end instead of the size ;)

Yep, I have the fix applied locally, and ready to fly in v4 :)

> > +	} while (ret == -EAGAIN);
> 
> I'm a bit nervous about this loop -- what guarantees forward progress here?
> Can we return to the host after a few tries instead?

-EAGAIN only happens when we've been able to successfully reduce the
range to a potentially valid granule size. That can't happen infinitely.

We're guaranteed to fail when trying to reduce the range to a
granularity smaller than PAGE_SIZE (the -EINVAL case of
__stage2_reduce_range), which is indicative of a host memory abort in a
page it should not access (because marked PROT_NONE for instance).

Cheers,
Quentin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
@ 2021-03-05 15:03       ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-05 15:03 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Friday 05 Mar 2021 at 14:39:42 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:58PM +0000, Quentin Perret wrote:
> > +/**
> > + * kvm_pgtable_stage2_idmap_greedy() - Identity-map an Intermediate Physical
> > + *				       Address with a leaf entry at the highest
> > + *				       possible level.
> 
> Not sure it's worth mentioning "highest possible level" here, as
> realistically the caller still has to provide a memcache to deal with the
> worst case and the structure of the page-table shouldn't matter.

Right, we need to pass a range so I suppose that should be enough to
say 'this tries to cover large portions of memory'.

> > + * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> > + * @addr:	Input address to identity-map.
> > + * @prot:	Permissions and attributes for the mapping.
> > + * @range:	Boundaries of the maximum memory region to map.
> > + * @mc:		Cache of pre-allocated memory from which to allocate page-table
> > + *		pages.
> > + *
> > + * This function attempts to install high-level identity-mappings covering @addr
> 
> "high-level"? (again, I think I'd just drop this)
> 
> > + * without overriding existing mappings with incompatible permissions or
> > + * attributes. An existing table entry may be coalesced into a block mapping
> > + * if and only if it covers @addr and all its leafs are either invalid and/or
> 
> s/leafs/leaf entries/

Ack for both.

> > + * have permissions and attributes strictly matching @prot. The mapping is
> > + * guaranteed to be contained within the boundaries specified by @range at call
> > + * time. If only a subset of the memory specified by @range is mapped (because
> > + * of e.g. alignment issues or existing incompatible mappings), @range will be
> > + * updated accordingly.
> > + *
> > + * Return: 0 on success, negative error code on failure.
> > + */
> > +int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr,
> > +				    enum kvm_pgtable_prot prot,
> > +				    struct kvm_mem_range *range,
> > +				    void *mc);
> >  #endif	/* __ARM64_KVM_PGTABLE_H__ */
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 8aa01a9e2603..6897d771e2b2 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -987,3 +987,122 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
> >  	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
> >  	pgt->pgd = NULL;
> >  }
> > +
> > +struct stage2_reduce_range_data {
> > +	kvm_pte_t attr;
> > +	u64 target_addr;
> > +	u32 start_level;
> > +	struct kvm_mem_range *range;
> > +};
> > +
> > +static int __stage2_reduce_range(struct stage2_reduce_range_data *data, u64 addr)
> > +{
> > +	u32 level = data->start_level;
> > +
> > +	for (; level < KVM_PGTABLE_MAX_LEVELS; level++) {
> > +		u64 granule = kvm_granule_size(level);
> > +		u64 start = ALIGN_DOWN(data->target_addr, granule);
> > +		u64 end = start + granule;
> > +
> > +		/*
> > +		 * The pinned address is in the current range, try one level
> > +		 * deeper.
> > +		 */
> > +		if (start == ALIGN_DOWN(addr, granule))
> > +			continue;
> > +
> > +		/*
> > +		 * Make sure the current range is a reduction of the existing
> > +		 * range before updating it.
> > +		 */
> > +		if (data->range->start <= start && end <= data->range->end) {
> > +			data->start_level = level;
> > +			data->range->start = start;
> > +			data->range->end = end;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	return -EINVAL;
> > +}
> > +
> > +#define KVM_PTE_LEAF_S2_COMPAT_MASK	(KVM_PTE_LEAF_ATTR_S2_PERMS | \
> > +					 KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | \
> > +					 KVM_PTE_LEAF_SW_BIT_PROT_NONE)
> > +
> > +static int stage2_reduce_range_walker(u64 addr, u64 end, u32 level,
> > +				      kvm_pte_t *ptep,
> > +				      enum kvm_pgtable_walk_flags flag,
> > +				      void * const arg)
> > +{
> > +	struct stage2_reduce_range_data *data = arg;
> > +	kvm_pte_t attr;
> > +	int ret;
> > +
> > +	if (addr < data->range->start || addr >= data->range->end)
> > +		return 0;
> > +
> > +	attr = *ptep & KVM_PTE_LEAF_S2_COMPAT_MASK;
> > +	if (!attr || attr == data->attr)
> > +		return 0;
> > +
> > +	/*
> > +	 * An existing mapping with incompatible protection attributes is
> > +	 * 'pinned', so reduce the range if we hit one.
> > +	 */
> > +	ret = __stage2_reduce_range(data, addr);
> > +	if (ret)
> > +		return ret;
> > +
> > +	return -EAGAIN;
> > +}
> > +
> > +static int stage2_reduce_range(struct kvm_pgtable *pgt, u64 addr,
> > +			       enum kvm_pgtable_prot prot,
> > +			       struct kvm_mem_range *range)
> > +{
> > +	struct stage2_reduce_range_data data = {
> > +		.start_level	= pgt->start_level,
> > +		.range		= range,
> > +		.target_addr	= addr,
> > +	};
> > +	struct kvm_pgtable_walker walker = {
> > +		.cb		= stage2_reduce_range_walker,
> > +		.flags		= KVM_PGTABLE_WALK_LEAF,
> > +		.arg		= &data,
> > +	};
> > +	int ret;
> > +
> > +	data.attr = stage2_get_prot_attr(prot) & KVM_PTE_LEAF_S2_COMPAT_MASK;
> > +	if (!data.attr)
> > +		return -EINVAL;
> 
> (this will need updating based on the other discussion we had)

Ack.

> > +	/* Reduce the kvm_mem_range to a granule size */
> > +	ret = __stage2_reduce_range(&data, range->end);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* Walk the range to check permissions and reduce further if needed */
> > +	do {
> > +		ret = kvm_pgtable_walk(pgt, range->start, range->end, &walker);
> 
> (we spent some time debugging an issue here and you spotted that you're
> passing range->end instead of the size ;)

Yep, I have the fix applied locally, and ready to fly in v4 :)

> > +	} while (ret == -EAGAIN);
> 
> I'm a bit nervous about this loop -- what guarantees forward progress here?
> Can we return to the host after a few tries instead?

-EAGAIN only happens when we've been able to successfully reduce the
range to a potentially valid granule size. That can't happen infinitely.

We're guaranteed to fail when trying to reduce the range to a
granularity smaller than PAGE_SIZE (the -EINVAL case of
__stage2_reduce_range), which is indicative of a host memory abort in a
page it should not access (because marked PROT_NONE for instance).

Cheers,
Quentin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
  2021-03-05 15:03       ` Quentin Perret
  (?)
@ 2021-03-05 16:59         ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 16:59 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Fri, Mar 05, 2021 at 03:03:36PM +0000, Quentin Perret wrote:
> On Friday 05 Mar 2021 at 14:39:42 (+0000), Will Deacon wrote:
> > On Tue, Mar 02, 2021 at 02:59:58PM +0000, Quentin Perret wrote:
> > > +	/* Reduce the kvm_mem_range to a granule size */
> > > +	ret = __stage2_reduce_range(&data, range->end);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	/* Walk the range to check permissions and reduce further if needed */
> > > +	do {
> > > +		ret = kvm_pgtable_walk(pgt, range->start, range->end, &walker);
> > 
> > (we spent some time debugging an issue here and you spotted that you're
> > passing range->end instead of the size ;)
> 
> Yep, I have the fix applied locally, and ready to fly in v4 :)
> 
> > > +	} while (ret == -EAGAIN);
> > 
> > I'm a bit nervous about this loop -- what guarantees forward progress here?
> > Can we return to the host after a few tries instead?
> 
> -EAGAIN only happens when we've been able to successfully reduce the
> range to a potentially valid granule size. That can't happen infinitely.
> 
> We're guaranteed to fail when trying to reduce the range to a
> granularity smaller than PAGE_SIZE (the -EINVAL case of
> __stage2_reduce_range), which is indicative of a host memory abort in a
> page it should not access (because marked PROT_NONE for instance).

Can you compute an upper bound on the number of iterations based on the
number of page-table levels then? I'm just conscious that all of this is
effectively running with irqs disabled, and so being able to reason about
the amount of work we're going to do makes me much more comfortable.

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
@ 2021-03-05 16:59         ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 16:59 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Fri, Mar 05, 2021 at 03:03:36PM +0000, Quentin Perret wrote:
> On Friday 05 Mar 2021 at 14:39:42 (+0000), Will Deacon wrote:
> > On Tue, Mar 02, 2021 at 02:59:58PM +0000, Quentin Perret wrote:
> > > +	/* Reduce the kvm_mem_range to a granule size */
> > > +	ret = __stage2_reduce_range(&data, range->end);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	/* Walk the range to check permissions and reduce further if needed */
> > > +	do {
> > > +		ret = kvm_pgtable_walk(pgt, range->start, range->end, &walker);
> > 
> > (we spent some time debugging an issue here and you spotted that you're
> > passing range->end instead of the size ;)
> 
> Yep, I have the fix applied locally, and ready to fly in v4 :)
> 
> > > +	} while (ret == -EAGAIN);
> > 
> > I'm a bit nervous about this loop -- what guarantees forward progress here?
> > Can we return to the host after a few tries instead?
> 
> -EAGAIN only happens when we've been able to successfully reduce the
> range to a potentially valid granule size. That can't happen infinitely.
> 
> We're guaranteed to fail when trying to reduce the range to a
> granularity smaller than PAGE_SIZE (the -EINVAL case of
> __stage2_reduce_range), which is indicative of a host memory abort in a
> page it should not access (because marked PROT_NONE for instance).

Can you compute an upper bound on the number of iterations based on the
number of page-table levels then? I'm just conscious that all of this is
effectively running with irqs disabled, and so being able to reason about
the amount of work we're going to do makes me much more comfortable.

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
@ 2021-03-05 16:59         ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 16:59 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Fri, Mar 05, 2021 at 03:03:36PM +0000, Quentin Perret wrote:
> On Friday 05 Mar 2021 at 14:39:42 (+0000), Will Deacon wrote:
> > On Tue, Mar 02, 2021 at 02:59:58PM +0000, Quentin Perret wrote:
> > > +	/* Reduce the kvm_mem_range to a granule size */
> > > +	ret = __stage2_reduce_range(&data, range->end);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	/* Walk the range to check permissions and reduce further if needed */
> > > +	do {
> > > +		ret = kvm_pgtable_walk(pgt, range->start, range->end, &walker);
> > 
> > (we spent some time debugging an issue here and you spotted that you're
> > passing range->end instead of the size ;)
> 
> Yep, I have the fix applied locally, and ready to fly in v4 :)
> 
> > > +	} while (ret == -EAGAIN);
> > 
> > I'm a bit nervous about this loop -- what guarantees forward progress here?
> > Can we return to the host after a few tries instead?
> 
> -EAGAIN only happens when we've been able to successfully reduce the
> range to a potentially valid granule size. That can't happen infinitely.
> 
> We're guaranteed to fail when trying to reduce the range to a
> granularity smaller than PAGE_SIZE (the -EINVAL case of
> __stage2_reduce_range), which is indicative of a host memory abort in a
> page it should not access (because marked PROT_NONE for instance).

Can you compute an upper bound on the number of iterations based on the
number of page-table levels then? I'm just conscious that all of this is
effectively running with irqs disabled, and so being able to reason about
the amount of work we're going to do makes me much more comfortable.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 31/32] KVM: arm64: Disable PMU support in protected mode
  2021-03-02 15:00   ` Quentin Perret
  (?)
@ 2021-03-05 19:02     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 19:02 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 03:00:01PM +0000, Quentin Perret wrote:
> The host currently writes directly in EL2 per-CPU data sections from
> the PMU code when running in nVHE. In preparation for unmapping the EL2
> sections from the host stage 2, disable PMU support in protected mode as
> we currently do not have a use-case for it.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/perf.c | 3 ++-
>  arch/arm64/kvm/pmu.c  | 8 ++++----
>  2 files changed, 6 insertions(+), 5 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 31/32] KVM: arm64: Disable PMU support in protected mode
@ 2021-03-05 19:02     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 19:02 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 03:00:01PM +0000, Quentin Perret wrote:
> The host currently writes directly in EL2 per-CPU data sections from
> the PMU code when running in nVHE. In preparation for unmapping the EL2
> sections from the host stage 2, disable PMU support in protected mode as
> we currently do not have a use-case for it.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/perf.c | 3 ++-
>  arch/arm64/kvm/pmu.c  | 8 ++++----
>  2 files changed, 6 insertions(+), 5 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 31/32] KVM: arm64: Disable PMU support in protected mode
@ 2021-03-05 19:02     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 19:02 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 03:00:01PM +0000, Quentin Perret wrote:
> The host currently writes directly in EL2 per-CPU data sections from
> the PMU code when running in nVHE. In preparation for unmapping the EL2
> sections from the host stage 2, disable PMU support in protected mode as
> we currently do not have a use-case for it.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/perf.c | 3 ++-
>  arch/arm64/kvm/pmu.c  | 8 ++++----
>  2 files changed, 6 insertions(+), 5 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2
  2021-03-05  9:52       ` Quentin Perret
  (?)
@ 2021-03-05 19:03         ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 19:03 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Fri, Mar 05, 2021 at 09:52:12AM +0000, Quentin Perret wrote:
> On Thursday 04 Mar 2021 at 20:00:45 (+0000), Will Deacon wrote:
> > On Tue, Mar 02, 2021 at 02:59:56PM +0000, Quentin Perret wrote:
> > > Once we start unmapping portions of memory from the host stage 2 (such
> > > as e.g. the hypervisor memory sections, or pages that belong to
> > > protected guests), we will need a way to track page ownership. And
> > > given that all mappings in the host stage 2 will be identity-mapped, we
> > > can use the host stage 2 page-table itself as a simplistic rmap.
> > > 
> > > As a first step towards this, introduce a new protection attribute
> > > in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which
> > > allows to annotate portions of the IPA space as inaccessible. For
> > > simplicity, PROT_NONE mappings are created as invalid mappings with a
> > > software bit set.
> > 
> > Just as an observation, but given that they're invalid we can use any bit
> > from [63:2] to indicate that it's a PROT_NONE mapping, and that way we
> > can keep the real "software bits" for live mappings.
> > 
> > But we can of course change that later when we need the bit for something
> > else.
> 
> Right, so I used this approach for consistency with the kernel's
> PROT_NONE mappings:
> 
> 	#define PTE_PROT_NONE	(_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
> 
> And in fact now that I think about it, it might be worth re-using the
> same bit in stage 2.
> 
> But yes it would be pretty neat to use the other bits of invalid
> mappings to add metadata about the pages. I could even drop the
> PROT_NONE stuff straight away in favor of a more extensive mechanism for
> tracking page ownership...
> 
> Thinking about it, it should be relatively straightforward to construct
> the host stage 2 with the following invariants:
> 
>   1. everything is identity-mapped in the host stage 2;
> 
>   2. all valid mappings imply the underlying PA range belongs to the
>      host;
> 
>   3. bits [63:32] (say) of all invalid mappings are used to store a
>      unique identifier for the owner of the underlying PA range;
> 
>   4. the host identifier is 0, such that it owns all of memory by
>      default at boot as its pgd is zeroed;
> 
> And then I could replace my PROT_NONE permission stuff by an ownership
> change. E.g. the hypervisor would have its own identifier, and I could
> use it to mark the .hyp memory sections as owned by the hyp (which
> implies inaccessible by the host). And that should scale quite easily
> when we start running protected guests as we'll assign them their own
> identifiers. Sharing pages between guests (or even worse, between host
> and guests) is a bit trickier, but maybe that is for later.
> 
> Thoughts?

I think this sounds like a worthwhile generalisation to me, although virtio
brings an immediate need for shared pages and so we'll still need a software
bit for those so that we e.g. prevent the host from donating such a shared
page to the hypervisor.

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2
@ 2021-03-05 19:03         ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 19:03 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Fri, Mar 05, 2021 at 09:52:12AM +0000, Quentin Perret wrote:
> On Thursday 04 Mar 2021 at 20:00:45 (+0000), Will Deacon wrote:
> > On Tue, Mar 02, 2021 at 02:59:56PM +0000, Quentin Perret wrote:
> > > Once we start unmapping portions of memory from the host stage 2 (such
> > > as e.g. the hypervisor memory sections, or pages that belong to
> > > protected guests), we will need a way to track page ownership. And
> > > given that all mappings in the host stage 2 will be identity-mapped, we
> > > can use the host stage 2 page-table itself as a simplistic rmap.
> > > 
> > > As a first step towards this, introduce a new protection attribute
> > > in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which
> > > allows to annotate portions of the IPA space as inaccessible. For
> > > simplicity, PROT_NONE mappings are created as invalid mappings with a
> > > software bit set.
> > 
> > Just as an observation, but given that they're invalid we can use any bit
> > from [63:2] to indicate that it's a PROT_NONE mapping, and that way we
> > can keep the real "software bits" for live mappings.
> > 
> > But we can of course change that later when we need the bit for something
> > else.
> 
> Right, so I used this approach for consistency with the kernel's
> PROT_NONE mappings:
> 
> 	#define PTE_PROT_NONE	(_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
> 
> And in fact now that I think about it, it might be worth re-using the
> same bit in stage 2.
> 
> But yes it would be pretty neat to use the other bits of invalid
> mappings to add metadata about the pages. I could even drop the
> PROT_NONE stuff straight away in favor of a more extensive mechanism for
> tracking page ownership...
> 
> Thinking about it, it should be relatively straightforward to construct
> the host stage 2 with the following invariants:
> 
>   1. everything is identity-mapped in the host stage 2;
> 
>   2. all valid mappings imply the underlying PA range belongs to the
>      host;
> 
>   3. bits [63:32] (say) of all invalid mappings are used to store a
>      unique identifier for the owner of the underlying PA range;
> 
>   4. the host identifier is 0, such that it owns all of memory by
>      default at boot as its pgd is zeroed;
> 
> And then I could replace my PROT_NONE permission stuff by an ownership
> change. E.g. the hypervisor would have its own identifier, and I could
> use it to mark the .hyp memory sections as owned by the hyp (which
> implies inaccessible by the host). And that should scale quite easily
> when we start running protected guests as we'll assign them their own
> identifiers. Sharing pages between guests (or even worse, between host
> and guests) is a bit trickier, but maybe that is for later.
> 
> Thoughts?

I think this sounds like a worthwhile generalisation to me, although virtio
brings an immediate need for shared pages and so we'll still need a software
bit for those so that we e.g. prevent the host from donating such a shared
page to the hypervisor.

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2
@ 2021-03-05 19:03         ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 19:03 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Fri, Mar 05, 2021 at 09:52:12AM +0000, Quentin Perret wrote:
> On Thursday 04 Mar 2021 at 20:00:45 (+0000), Will Deacon wrote:
> > On Tue, Mar 02, 2021 at 02:59:56PM +0000, Quentin Perret wrote:
> > > Once we start unmapping portions of memory from the host stage 2 (such
> > > as e.g. the hypervisor memory sections, or pages that belong to
> > > protected guests), we will need a way to track page ownership. And
> > > given that all mappings in the host stage 2 will be identity-mapped, we
> > > can use the host stage 2 page-table itself as a simplistic rmap.
> > > 
> > > As a first step towards this, introduce a new protection attribute
> > > in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which
> > > allows to annotate portions of the IPA space as inaccessible. For
> > > simplicity, PROT_NONE mappings are created as invalid mappings with a
> > > software bit set.
> > 
> > Just as an observation, but given that they're invalid we can use any bit
> > from [63:2] to indicate that it's a PROT_NONE mapping, and that way we
> > can keep the real "software bits" for live mappings.
> > 
> > But we can of course change that later when we need the bit for something
> > else.
> 
> Right, so I used this approach for consistency with the kernel's
> PROT_NONE mappings:
> 
> 	#define PTE_PROT_NONE	(_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
> 
> And in fact now that I think about it, it might be worth re-using the
> same bit in stage 2.
> 
> But yes it would be pretty neat to use the other bits of invalid
> mappings to add metadata about the pages. I could even drop the
> PROT_NONE stuff straight away in favor of a more extensive mechanism for
> tracking page ownership...
> 
> Thinking about it, it should be relatively straightforward to construct
> the host stage 2 with the following invariants:
> 
>   1. everything is identity-mapped in the host stage 2;
> 
>   2. all valid mappings imply the underlying PA range belongs to the
>      host;
> 
>   3. bits [63:32] (say) of all invalid mappings are used to store a
>      unique identifier for the owner of the underlying PA range;
> 
>   4. the host identifier is 0, such that it owns all of memory by
>      default at boot as its pgd is zeroed;
> 
> And then I could replace my PROT_NONE permission stuff by an ownership
> change. E.g. the hypervisor would have its own identifier, and I could
> use it to mark the .hyp memory sections as owned by the hyp (which
> implies inaccessible by the host). And that should scale quite easily
> when we start running protected guests as we'll assign them their own
> identifiers. Sharing pages between guests (or even worse, between host
> and guests) is a bit trickier, but maybe that is for later.
> 
> Thoughts?

I think this sounds like a worthwhile generalisation to me, although virtio
brings an immediate need for shared pages and so we'll still need a software
bit for those so that we e.g. prevent the host from donating such a shared
page to the hypervisor.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 32/32] KVM: arm64: Protect the .hyp sections from the host
  2021-03-02 15:00   ` Quentin Perret
  (?)
@ 2021-03-05 19:13     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 19:13 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 03:00:02PM +0000, Quentin Perret wrote:
> When KVM runs in nVHE protected mode, use the host stage 2 to unmap the
> hypervisor sections. The long-term goal is to ensure the EL2 code can
> remain robust regardless of the host's state, so this starts by making
> sure the host cannot e.g. write to the .hyp sections directly.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h              |  1 +
>  arch/arm64/kvm/arm.c                          | 46 +++++++++++++++++++
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  9 ++++
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 22 +++++++++
>  5 files changed, 80 insertions(+)

[...]

>  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 2252ad1a8945..ed480facdc88 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -196,6 +196,28 @@ static int host_stage2_idmap(u64 addr)
>  	return ret;
>  }
>  
> +int __pkvm_host_unmap(phys_addr_t start, phys_addr_t end)
> +{
> +	struct kvm_mem_range r1, r2;
> +	int ret;
> +
> +	/*
> +	 * host_stage2_unmap_dev_all() currently relies on MMIO mappings being
> +	 * non-persistent, so don't allow PROT_NONE in MMIO range.
> +	 */
> +	if (!find_mem_range(start, &r1) || !find_mem_range(end, &r2))
> +		return -EINVAL;
> +	if (r1.start != r2.start)
> +		return -EINVAL;


Feels like this should be in a helper to determine whether or not a range is
solely covered by memory.

Either way:

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 32/32] KVM: arm64: Protect the .hyp sections from the host
@ 2021-03-05 19:13     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 19:13 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 03:00:02PM +0000, Quentin Perret wrote:
> When KVM runs in nVHE protected mode, use the host stage 2 to unmap the
> hypervisor sections. The long-term goal is to ensure the EL2 code can
> remain robust regardless of the host's state, so this starts by making
> sure the host cannot e.g. write to the .hyp sections directly.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h              |  1 +
>  arch/arm64/kvm/arm.c                          | 46 +++++++++++++++++++
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  9 ++++
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 22 +++++++++
>  5 files changed, 80 insertions(+)

[...]

>  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 2252ad1a8945..ed480facdc88 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -196,6 +196,28 @@ static int host_stage2_idmap(u64 addr)
>  	return ret;
>  }
>  
> +int __pkvm_host_unmap(phys_addr_t start, phys_addr_t end)
> +{
> +	struct kvm_mem_range r1, r2;
> +	int ret;
> +
> +	/*
> +	 * host_stage2_unmap_dev_all() currently relies on MMIO mappings being
> +	 * non-persistent, so don't allow PROT_NONE in MMIO range.
> +	 */
> +	if (!find_mem_range(start, &r1) || !find_mem_range(end, &r2))
> +		return -EINVAL;
> +	if (r1.start != r2.start)
> +		return -EINVAL;


Feels like this should be in a helper to determine whether or not a range is
solely covered by memory.

Either way:

Acked-by: Will Deacon <will@kernel.org>

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 32/32] KVM: arm64: Protect the .hyp sections from the host
@ 2021-03-05 19:13     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 19:13 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 03:00:02PM +0000, Quentin Perret wrote:
> When KVM runs in nVHE protected mode, use the host stage 2 to unmap the
> hypervisor sections. The long-term goal is to ensure the EL2 code can
> remain robust regardless of the host's state, so this starts by making
> sure the host cannot e.g. write to the .hyp sections directly.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h              |  1 +
>  arch/arm64/kvm/arm.c                          | 46 +++++++++++++++++++
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  9 ++++
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 22 +++++++++
>  5 files changed, 80 insertions(+)

[...]

>  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 2252ad1a8945..ed480facdc88 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -196,6 +196,28 @@ static int host_stage2_idmap(u64 addr)
>  	return ret;
>  }
>  
> +int __pkvm_host_unmap(phys_addr_t start, phys_addr_t end)
> +{
> +	struct kvm_mem_range r1, r2;
> +	int ret;
> +
> +	/*
> +	 * host_stage2_unmap_dev_all() currently relies on MMIO mappings being
> +	 * non-persistent, so don't allow PROT_NONE in MMIO range.
> +	 */
> +	if (!find_mem_range(start, &r1) || !find_mem_range(end, &r2))
> +		return -EINVAL;
> +	if (r1.start != r2.start)
> +		return -EINVAL;


Feels like this should be in a helper to determine whether or not a range is
solely covered by memory.

Either way:

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
  2021-03-02 14:59   ` Quentin Perret
  (?)
@ 2021-03-05 19:29     ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 19:29 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:59PM +0000, Quentin Perret wrote:
> When KVM runs in protected nVHE mode, make use of a stage 2 page-table
> to give the hypervisor some control over the host memory accesses. The
> host stage 2 is created lazily using large block mappings if possible,
> and will default to page mappings in absence of a better solution.
> 
> From this point on, memory accesses from the host to protected memory
> regions (e.g. marked PROT_NONE) are fatal and lead to hyp_panic().
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h              |   1 +
>  arch/arm64/include/asm/kvm_cpufeature.h       |   2 +
>  arch/arm64/kernel/image-vars.h                |   3 +
>  arch/arm64/kvm/arm.c                          |  10 +
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  34 +++
>  arch/arm64/kvm/hyp/nvhe/Makefile              |   2 +-
>  arch/arm64/kvm/hyp/nvhe/hyp-init.S            |   1 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  11 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 213 ++++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/setup.c               |   5 +
>  arch/arm64/kvm/hyp/nvhe/switch.c              |   7 +-
>  arch/arm64/kvm/hyp/nvhe/tlb.c                 |   4 +-
>  12 files changed, 286 insertions(+), 7 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c

[...]

> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> new file mode 100644
> index 000000000000..d293cb328cc4
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2020 Google LLC
> + * Author: Quentin Perret <qperret@google.com>
> + */
> +
> +#ifndef __KVM_NVHE_MEM_PROTECT__
> +#define __KVM_NVHE_MEM_PROTECT__
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_pgtable.h>
> +#include <asm/virt.h>
> +#include <nvhe/spinlock.h>
> +
> +struct host_kvm {
> +	struct kvm_arch arch;
> +	struct kvm_pgtable pgt;
> +	struct kvm_pgtable_mm_ops mm_ops;
> +	hyp_spinlock_t lock;
> +};
> +extern struct host_kvm host_kvm;
> +
> +int __pkvm_prot_finalize(void);
> +int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool);
> +void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
> +
> +static __always_inline void __load_host_stage2(void)
> +{
> +	if (static_branch_likely(&kvm_protected_mode_initialized))
> +		__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> +	else
> +		write_sysreg(0, vttbr_el2);

I realise you've just moved the code, but why is this needed? All we've
done is point the stage-2 ttb at 0x0 (i.e. we haven't disabled anything).

> +#endif /* __KVM_NVHE_MEM_PROTECT__ */
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> index e204ea77ab27..ce49795324a7 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -14,7 +14,7 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
>  
>  obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
>  	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
> -	 cache.o cpufeature.o setup.o mm.o
> +	 cache.o cpufeature.o setup.o mm.o mem_protect.o
>  obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
>  	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
>  obj-y += $(lib-objs)
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> index f312672d895e..6fa01b04954f 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> @@ -119,6 +119,7 @@ alternative_else_nop_endif
>  
>  	/* Invalidate the stale TLBs from Bootloader */
>  	tlbi	alle2
> +	tlbi	vmalls12e1
>  	dsb	sy
>  
>  	/*
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index ae6503c9be15..f47028d3fd0a 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -13,6 +13,7 @@
>  #include <asm/kvm_hyp.h>
>  #include <asm/kvm_mmu.h>
>  
> +#include <nvhe/mem_protect.h>
>  #include <nvhe/mm.h>
>  #include <nvhe/trap_handler.h>
>  
> @@ -151,6 +152,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
>  	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
>  }
>  
> +static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
> +{
> +	cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
> +}
>  typedef void (*hcall_t)(struct kvm_cpu_context *);
>  
>  #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
> @@ -174,6 +179,7 @@ static const hcall_t host_hcall[] = {
>  	HANDLE_FUNC(__pkvm_cpu_set_vector),
>  	HANDLE_FUNC(__pkvm_create_mappings),
>  	HANDLE_FUNC(__pkvm_create_private_mapping),
> +	HANDLE_FUNC(__pkvm_prot_finalize),
>  };
>  
>  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> @@ -226,6 +232,11 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
>  	case ESR_ELx_EC_SMC64:
>  		handle_host_smc(host_ctxt);
>  		break;
> +	case ESR_ELx_EC_IABT_LOW:
> +		fallthrough;
> +	case ESR_ELx_EC_DABT_LOW:
> +		handle_host_mem_abort(host_ctxt);
> +		break;
>  	default:
>  		hyp_panic();
>  	}
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> new file mode 100644
> index 000000000000..2252ad1a8945
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -0,0 +1,213 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 Google LLC
> + * Author: Quentin Perret <qperret@google.com>
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_cpufeature.h>
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
> +#include <asm/kvm_pgtable.h>
> +#include <asm/stage2_pgtable.h>
> +
> +#include <hyp/switch.h>
> +
> +#include <nvhe/gfp.h>
> +#include <nvhe/memory.h>
> +#include <nvhe/mem_protect.h>
> +#include <nvhe/mm.h>
> +
> +extern unsigned long hyp_nr_cpus;
> +struct host_kvm host_kvm;
> +
> +struct hyp_pool host_s2_mem;
> +struct hyp_pool host_s2_dev;
> +
> +static void *host_s2_zalloc_pages_exact(size_t size)
> +{
> +	return hyp_alloc_pages(&host_s2_mem, get_order(size));
> +}
> +
> +static void *host_s2_zalloc_page(void *pool)
> +{
> +	return hyp_alloc_pages(pool, 0);
> +}
> +
> +static int prepare_s2_pools(void *mem_pgt_pool, void *dev_pgt_pool)
> +{
> +	unsigned long nr_pages, pfn;
> +	int ret;
> +
> +	pfn = hyp_virt_to_pfn(mem_pgt_pool);
> +	nr_pages = host_s2_mem_pgtable_pages();
> +	ret = hyp_pool_init(&host_s2_mem, pfn, nr_pages, 0);
> +	if (ret)
> +		return ret;
> +
> +	pfn = hyp_virt_to_pfn(dev_pgt_pool);
> +	nr_pages = host_s2_dev_pgtable_pages();
> +	ret = hyp_pool_init(&host_s2_dev, pfn, nr_pages, 0);
> +	if (ret)
> +		return ret;
> +
> +	host_kvm.mm_ops.zalloc_pages_exact = host_s2_zalloc_pages_exact;
> +	host_kvm.mm_ops.zalloc_page = host_s2_zalloc_page;
> +	host_kvm.mm_ops.phys_to_virt = hyp_phys_to_virt;
> +	host_kvm.mm_ops.virt_to_phys = hyp_virt_to_phys;
> +	host_kvm.mm_ops.page_count = hyp_page_count;
> +	host_kvm.mm_ops.get_page = hyp_get_page;
> +	host_kvm.mm_ops.put_page = hyp_put_page;

Same comment I had earlier on about struct initialisation -- I think you
can use the same trict here for the host_kvm.mm_ops.

> +
> +	return 0;
> +}
> +
> +static void prepare_host_vtcr(void)
> +{
> +	u32 parange, phys_shift;
> +	u64 mmfr0, mmfr1;
> +
> +	mmfr0 = arm64_ftr_reg_id_aa64mmfr0_el1.sys_val;
> +	mmfr1 = arm64_ftr_reg_id_aa64mmfr1_el1.sys_val;
> +
> +	/* The host stage 2 is id-mapped, so use parange for T0SZ */
> +	parange = kvm_get_parange(mmfr0);
> +	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
> +
> +	host_kvm.arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
> +}
> +
> +int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool)
> +{
> +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> +	int ret;
> +
> +	prepare_host_vtcr();
> +	hyp_spin_lock_init(&host_kvm.lock);
> +
> +	ret = prepare_s2_pools(mem_pgt_pool, dev_pgt_pool);
> +	if (ret)
> +		return ret;
> +
> +	ret = kvm_pgtable_stage2_init(&host_kvm.pgt, &host_kvm.arch,
> +				      &host_kvm.mm_ops);
> +	if (ret)
> +		return ret;
> +
> +	mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
> +	mmu->arch = &host_kvm.arch;
> +	mmu->pgt = &host_kvm.pgt;
> +	mmu->vmid.vmid_gen = 0;
> +	mmu->vmid.vmid = 0;
> +
> +	return 0;
> +}
> +
> +int __pkvm_prot_finalize(void)
> +{
> +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> +	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> +
> +	params->vttbr = kvm_get_vttbr(mmu);
> +	params->vtcr = host_kvm.arch.vtcr;
> +	params->hcr_el2 |= HCR_VM;
> +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> +		params->hcr_el2 |= HCR_FWB;
> +	kvm_flush_dcache_to_poc(params, sizeof(*params));
> +
> +	write_sysreg(params->hcr_el2, hcr_el2);
> +	__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);

AFAICT, there's no ISB here. Do we need one before the TLB invalidation?

> +	__tlbi(vmalls12e1is);
> +	dsb(ish);

Given that the TLB is invalidated on the boot path, please can you add
a comment here about the stale entries which you need to invalidate?

Also, does this need to be inner-shareable? I thought this function ran on
each CPU.

> +	isb();
> +
> +	return 0;
> +}
> +
> +static void host_stage2_unmap_dev_all(void)
> +{
> +	struct kvm_pgtable *pgt = &host_kvm.pgt;
> +	struct memblock_region *reg;
> +	u64 addr = 0;
> +	int i;
> +
> +	/* Unmap all non-memory regions to recycle the pages */
> +	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
> +		reg = &hyp_memory[i];
> +		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
> +	}
> +	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);

Is this just going to return -ERANGE?

> +static int host_stage2_idmap(u64 addr)
> +{
> +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> +	struct kvm_mem_range range;
> +	bool is_memory = find_mem_range(addr, &range);
> +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> +	int ret;
> +
> +	if (is_memory)
> +		prot |= KVM_PGTABLE_PROT_X;
> +
> +	hyp_spin_lock(&host_kvm.lock);
> +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> +					      &range, pool);
> +	if (is_memory || ret != -ENOMEM)
> +		goto unlock;
> +	host_stage2_unmap_dev_all();
> +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> +					      &range, pool);

I find this _really_ hard to reason about, as range is passed by reference
and we don't reset it after the first call returns -ENOMEM for an MMIO
mapping. Maybe some commentary on why it's still valid here?

Other than that, this is looking really good to me.

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
@ 2021-03-05 19:29     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 19:29 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Tue, Mar 02, 2021 at 02:59:59PM +0000, Quentin Perret wrote:
> When KVM runs in protected nVHE mode, make use of a stage 2 page-table
> to give the hypervisor some control over the host memory accesses. The
> host stage 2 is created lazily using large block mappings if possible,
> and will default to page mappings in absence of a better solution.
> 
> From this point on, memory accesses from the host to protected memory
> regions (e.g. marked PROT_NONE) are fatal and lead to hyp_panic().
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h              |   1 +
>  arch/arm64/include/asm/kvm_cpufeature.h       |   2 +
>  arch/arm64/kernel/image-vars.h                |   3 +
>  arch/arm64/kvm/arm.c                          |  10 +
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  34 +++
>  arch/arm64/kvm/hyp/nvhe/Makefile              |   2 +-
>  arch/arm64/kvm/hyp/nvhe/hyp-init.S            |   1 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  11 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 213 ++++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/setup.c               |   5 +
>  arch/arm64/kvm/hyp/nvhe/switch.c              |   7 +-
>  arch/arm64/kvm/hyp/nvhe/tlb.c                 |   4 +-
>  12 files changed, 286 insertions(+), 7 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c

[...]

> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> new file mode 100644
> index 000000000000..d293cb328cc4
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2020 Google LLC
> + * Author: Quentin Perret <qperret@google.com>
> + */
> +
> +#ifndef __KVM_NVHE_MEM_PROTECT__
> +#define __KVM_NVHE_MEM_PROTECT__
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_pgtable.h>
> +#include <asm/virt.h>
> +#include <nvhe/spinlock.h>
> +
> +struct host_kvm {
> +	struct kvm_arch arch;
> +	struct kvm_pgtable pgt;
> +	struct kvm_pgtable_mm_ops mm_ops;
> +	hyp_spinlock_t lock;
> +};
> +extern struct host_kvm host_kvm;
> +
> +int __pkvm_prot_finalize(void);
> +int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool);
> +void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
> +
> +static __always_inline void __load_host_stage2(void)
> +{
> +	if (static_branch_likely(&kvm_protected_mode_initialized))
> +		__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> +	else
> +		write_sysreg(0, vttbr_el2);

I realise you've just moved the code, but why is this needed? All we've
done is point the stage-2 ttb at 0x0 (i.e. we haven't disabled anything).

> +#endif /* __KVM_NVHE_MEM_PROTECT__ */
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> index e204ea77ab27..ce49795324a7 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -14,7 +14,7 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
>  
>  obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
>  	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
> -	 cache.o cpufeature.o setup.o mm.o
> +	 cache.o cpufeature.o setup.o mm.o mem_protect.o
>  obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
>  	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
>  obj-y += $(lib-objs)
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> index f312672d895e..6fa01b04954f 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> @@ -119,6 +119,7 @@ alternative_else_nop_endif
>  
>  	/* Invalidate the stale TLBs from Bootloader */
>  	tlbi	alle2
> +	tlbi	vmalls12e1
>  	dsb	sy
>  
>  	/*
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index ae6503c9be15..f47028d3fd0a 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -13,6 +13,7 @@
>  #include <asm/kvm_hyp.h>
>  #include <asm/kvm_mmu.h>
>  
> +#include <nvhe/mem_protect.h>
>  #include <nvhe/mm.h>
>  #include <nvhe/trap_handler.h>
>  
> @@ -151,6 +152,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
>  	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
>  }
>  
> +static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
> +{
> +	cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
> +}
>  typedef void (*hcall_t)(struct kvm_cpu_context *);
>  
>  #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
> @@ -174,6 +179,7 @@ static const hcall_t host_hcall[] = {
>  	HANDLE_FUNC(__pkvm_cpu_set_vector),
>  	HANDLE_FUNC(__pkvm_create_mappings),
>  	HANDLE_FUNC(__pkvm_create_private_mapping),
> +	HANDLE_FUNC(__pkvm_prot_finalize),
>  };
>  
>  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> @@ -226,6 +232,11 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
>  	case ESR_ELx_EC_SMC64:
>  		handle_host_smc(host_ctxt);
>  		break;
> +	case ESR_ELx_EC_IABT_LOW:
> +		fallthrough;
> +	case ESR_ELx_EC_DABT_LOW:
> +		handle_host_mem_abort(host_ctxt);
> +		break;
>  	default:
>  		hyp_panic();
>  	}
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> new file mode 100644
> index 000000000000..2252ad1a8945
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -0,0 +1,213 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 Google LLC
> + * Author: Quentin Perret <qperret@google.com>
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_cpufeature.h>
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
> +#include <asm/kvm_pgtable.h>
> +#include <asm/stage2_pgtable.h>
> +
> +#include <hyp/switch.h>
> +
> +#include <nvhe/gfp.h>
> +#include <nvhe/memory.h>
> +#include <nvhe/mem_protect.h>
> +#include <nvhe/mm.h>
> +
> +extern unsigned long hyp_nr_cpus;
> +struct host_kvm host_kvm;
> +
> +struct hyp_pool host_s2_mem;
> +struct hyp_pool host_s2_dev;
> +
> +static void *host_s2_zalloc_pages_exact(size_t size)
> +{
> +	return hyp_alloc_pages(&host_s2_mem, get_order(size));
> +}
> +
> +static void *host_s2_zalloc_page(void *pool)
> +{
> +	return hyp_alloc_pages(pool, 0);
> +}
> +
> +static int prepare_s2_pools(void *mem_pgt_pool, void *dev_pgt_pool)
> +{
> +	unsigned long nr_pages, pfn;
> +	int ret;
> +
> +	pfn = hyp_virt_to_pfn(mem_pgt_pool);
> +	nr_pages = host_s2_mem_pgtable_pages();
> +	ret = hyp_pool_init(&host_s2_mem, pfn, nr_pages, 0);
> +	if (ret)
> +		return ret;
> +
> +	pfn = hyp_virt_to_pfn(dev_pgt_pool);
> +	nr_pages = host_s2_dev_pgtable_pages();
> +	ret = hyp_pool_init(&host_s2_dev, pfn, nr_pages, 0);
> +	if (ret)
> +		return ret;
> +
> +	host_kvm.mm_ops.zalloc_pages_exact = host_s2_zalloc_pages_exact;
> +	host_kvm.mm_ops.zalloc_page = host_s2_zalloc_page;
> +	host_kvm.mm_ops.phys_to_virt = hyp_phys_to_virt;
> +	host_kvm.mm_ops.virt_to_phys = hyp_virt_to_phys;
> +	host_kvm.mm_ops.page_count = hyp_page_count;
> +	host_kvm.mm_ops.get_page = hyp_get_page;
> +	host_kvm.mm_ops.put_page = hyp_put_page;

Same comment I had earlier on about struct initialisation -- I think you
can use the same trict here for the host_kvm.mm_ops.

> +
> +	return 0;
> +}
> +
> +static void prepare_host_vtcr(void)
> +{
> +	u32 parange, phys_shift;
> +	u64 mmfr0, mmfr1;
> +
> +	mmfr0 = arm64_ftr_reg_id_aa64mmfr0_el1.sys_val;
> +	mmfr1 = arm64_ftr_reg_id_aa64mmfr1_el1.sys_val;
> +
> +	/* The host stage 2 is id-mapped, so use parange for T0SZ */
> +	parange = kvm_get_parange(mmfr0);
> +	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
> +
> +	host_kvm.arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
> +}
> +
> +int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool)
> +{
> +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> +	int ret;
> +
> +	prepare_host_vtcr();
> +	hyp_spin_lock_init(&host_kvm.lock);
> +
> +	ret = prepare_s2_pools(mem_pgt_pool, dev_pgt_pool);
> +	if (ret)
> +		return ret;
> +
> +	ret = kvm_pgtable_stage2_init(&host_kvm.pgt, &host_kvm.arch,
> +				      &host_kvm.mm_ops);
> +	if (ret)
> +		return ret;
> +
> +	mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
> +	mmu->arch = &host_kvm.arch;
> +	mmu->pgt = &host_kvm.pgt;
> +	mmu->vmid.vmid_gen = 0;
> +	mmu->vmid.vmid = 0;
> +
> +	return 0;
> +}
> +
> +int __pkvm_prot_finalize(void)
> +{
> +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> +	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> +
> +	params->vttbr = kvm_get_vttbr(mmu);
> +	params->vtcr = host_kvm.arch.vtcr;
> +	params->hcr_el2 |= HCR_VM;
> +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> +		params->hcr_el2 |= HCR_FWB;
> +	kvm_flush_dcache_to_poc(params, sizeof(*params));
> +
> +	write_sysreg(params->hcr_el2, hcr_el2);
> +	__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);

AFAICT, there's no ISB here. Do we need one before the TLB invalidation?

> +	__tlbi(vmalls12e1is);
> +	dsb(ish);

Given that the TLB is invalidated on the boot path, please can you add
a comment here about the stale entries which you need to invalidate?

Also, does this need to be inner-shareable? I thought this function ran on
each CPU.

> +	isb();
> +
> +	return 0;
> +}
> +
> +static void host_stage2_unmap_dev_all(void)
> +{
> +	struct kvm_pgtable *pgt = &host_kvm.pgt;
> +	struct memblock_region *reg;
> +	u64 addr = 0;
> +	int i;
> +
> +	/* Unmap all non-memory regions to recycle the pages */
> +	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
> +		reg = &hyp_memory[i];
> +		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
> +	}
> +	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);

Is this just going to return -ERANGE?

> +static int host_stage2_idmap(u64 addr)
> +{
> +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> +	struct kvm_mem_range range;
> +	bool is_memory = find_mem_range(addr, &range);
> +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> +	int ret;
> +
> +	if (is_memory)
> +		prot |= KVM_PGTABLE_PROT_X;
> +
> +	hyp_spin_lock(&host_kvm.lock);
> +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> +					      &range, pool);
> +	if (is_memory || ret != -ENOMEM)
> +		goto unlock;
> +	host_stage2_unmap_dev_all();
> +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> +					      &range, pool);

I find this _really_ hard to reason about, as range is passed by reference
and we don't reset it after the first call returns -ENOMEM for an MMIO
mapping. Maybe some commentary on why it's still valid here?

Other than that, this is looking really good to me.

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
@ 2021-03-05 19:29     ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-05 19:29 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Tue, Mar 02, 2021 at 02:59:59PM +0000, Quentin Perret wrote:
> When KVM runs in protected nVHE mode, make use of a stage 2 page-table
> to give the hypervisor some control over the host memory accesses. The
> host stage 2 is created lazily using large block mappings if possible,
> and will default to page mappings in absence of a better solution.
> 
> From this point on, memory accesses from the host to protected memory
> regions (e.g. marked PROT_NONE) are fatal and lead to hyp_panic().
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h              |   1 +
>  arch/arm64/include/asm/kvm_cpufeature.h       |   2 +
>  arch/arm64/kernel/image-vars.h                |   3 +
>  arch/arm64/kvm/arm.c                          |  10 +
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  34 +++
>  arch/arm64/kvm/hyp/nvhe/Makefile              |   2 +-
>  arch/arm64/kvm/hyp/nvhe/hyp-init.S            |   1 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  11 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 213 ++++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/setup.c               |   5 +
>  arch/arm64/kvm/hyp/nvhe/switch.c              |   7 +-
>  arch/arm64/kvm/hyp/nvhe/tlb.c                 |   4 +-
>  12 files changed, 286 insertions(+), 7 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c

[...]

> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> new file mode 100644
> index 000000000000..d293cb328cc4
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2020 Google LLC
> + * Author: Quentin Perret <qperret@google.com>
> + */
> +
> +#ifndef __KVM_NVHE_MEM_PROTECT__
> +#define __KVM_NVHE_MEM_PROTECT__
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_pgtable.h>
> +#include <asm/virt.h>
> +#include <nvhe/spinlock.h>
> +
> +struct host_kvm {
> +	struct kvm_arch arch;
> +	struct kvm_pgtable pgt;
> +	struct kvm_pgtable_mm_ops mm_ops;
> +	hyp_spinlock_t lock;
> +};
> +extern struct host_kvm host_kvm;
> +
> +int __pkvm_prot_finalize(void);
> +int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool);
> +void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
> +
> +static __always_inline void __load_host_stage2(void)
> +{
> +	if (static_branch_likely(&kvm_protected_mode_initialized))
> +		__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> +	else
> +		write_sysreg(0, vttbr_el2);

I realise you've just moved the code, but why is this needed? All we've
done is point the stage-2 ttb at 0x0 (i.e. we haven't disabled anything).

> +#endif /* __KVM_NVHE_MEM_PROTECT__ */
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> index e204ea77ab27..ce49795324a7 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -14,7 +14,7 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
>  
>  obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
>  	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
> -	 cache.o cpufeature.o setup.o mm.o
> +	 cache.o cpufeature.o setup.o mm.o mem_protect.o
>  obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
>  	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
>  obj-y += $(lib-objs)
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> index f312672d895e..6fa01b04954f 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> @@ -119,6 +119,7 @@ alternative_else_nop_endif
>  
>  	/* Invalidate the stale TLBs from Bootloader */
>  	tlbi	alle2
> +	tlbi	vmalls12e1
>  	dsb	sy
>  
>  	/*
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index ae6503c9be15..f47028d3fd0a 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -13,6 +13,7 @@
>  #include <asm/kvm_hyp.h>
>  #include <asm/kvm_mmu.h>
>  
> +#include <nvhe/mem_protect.h>
>  #include <nvhe/mm.h>
>  #include <nvhe/trap_handler.h>
>  
> @@ -151,6 +152,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
>  	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
>  }
>  
> +static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
> +{
> +	cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
> +}
>  typedef void (*hcall_t)(struct kvm_cpu_context *);
>  
>  #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
> @@ -174,6 +179,7 @@ static const hcall_t host_hcall[] = {
>  	HANDLE_FUNC(__pkvm_cpu_set_vector),
>  	HANDLE_FUNC(__pkvm_create_mappings),
>  	HANDLE_FUNC(__pkvm_create_private_mapping),
> +	HANDLE_FUNC(__pkvm_prot_finalize),
>  };
>  
>  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> @@ -226,6 +232,11 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
>  	case ESR_ELx_EC_SMC64:
>  		handle_host_smc(host_ctxt);
>  		break;
> +	case ESR_ELx_EC_IABT_LOW:
> +		fallthrough;
> +	case ESR_ELx_EC_DABT_LOW:
> +		handle_host_mem_abort(host_ctxt);
> +		break;
>  	default:
>  		hyp_panic();
>  	}
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> new file mode 100644
> index 000000000000..2252ad1a8945
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -0,0 +1,213 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 Google LLC
> + * Author: Quentin Perret <qperret@google.com>
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_cpufeature.h>
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
> +#include <asm/kvm_pgtable.h>
> +#include <asm/stage2_pgtable.h>
> +
> +#include <hyp/switch.h>
> +
> +#include <nvhe/gfp.h>
> +#include <nvhe/memory.h>
> +#include <nvhe/mem_protect.h>
> +#include <nvhe/mm.h>
> +
> +extern unsigned long hyp_nr_cpus;
> +struct host_kvm host_kvm;
> +
> +struct hyp_pool host_s2_mem;
> +struct hyp_pool host_s2_dev;
> +
> +static void *host_s2_zalloc_pages_exact(size_t size)
> +{
> +	return hyp_alloc_pages(&host_s2_mem, get_order(size));
> +}
> +
> +static void *host_s2_zalloc_page(void *pool)
> +{
> +	return hyp_alloc_pages(pool, 0);
> +}
> +
> +static int prepare_s2_pools(void *mem_pgt_pool, void *dev_pgt_pool)
> +{
> +	unsigned long nr_pages, pfn;
> +	int ret;
> +
> +	pfn = hyp_virt_to_pfn(mem_pgt_pool);
> +	nr_pages = host_s2_mem_pgtable_pages();
> +	ret = hyp_pool_init(&host_s2_mem, pfn, nr_pages, 0);
> +	if (ret)
> +		return ret;
> +
> +	pfn = hyp_virt_to_pfn(dev_pgt_pool);
> +	nr_pages = host_s2_dev_pgtable_pages();
> +	ret = hyp_pool_init(&host_s2_dev, pfn, nr_pages, 0);
> +	if (ret)
> +		return ret;
> +
> +	host_kvm.mm_ops.zalloc_pages_exact = host_s2_zalloc_pages_exact;
> +	host_kvm.mm_ops.zalloc_page = host_s2_zalloc_page;
> +	host_kvm.mm_ops.phys_to_virt = hyp_phys_to_virt;
> +	host_kvm.mm_ops.virt_to_phys = hyp_virt_to_phys;
> +	host_kvm.mm_ops.page_count = hyp_page_count;
> +	host_kvm.mm_ops.get_page = hyp_get_page;
> +	host_kvm.mm_ops.put_page = hyp_put_page;

Same comment I had earlier on about struct initialisation -- I think you
can use the same trict here for the host_kvm.mm_ops.

> +
> +	return 0;
> +}
> +
> +static void prepare_host_vtcr(void)
> +{
> +	u32 parange, phys_shift;
> +	u64 mmfr0, mmfr1;
> +
> +	mmfr0 = arm64_ftr_reg_id_aa64mmfr0_el1.sys_val;
> +	mmfr1 = arm64_ftr_reg_id_aa64mmfr1_el1.sys_val;
> +
> +	/* The host stage 2 is id-mapped, so use parange for T0SZ */
> +	parange = kvm_get_parange(mmfr0);
> +	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
> +
> +	host_kvm.arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
> +}
> +
> +int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool)
> +{
> +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> +	int ret;
> +
> +	prepare_host_vtcr();
> +	hyp_spin_lock_init(&host_kvm.lock);
> +
> +	ret = prepare_s2_pools(mem_pgt_pool, dev_pgt_pool);
> +	if (ret)
> +		return ret;
> +
> +	ret = kvm_pgtable_stage2_init(&host_kvm.pgt, &host_kvm.arch,
> +				      &host_kvm.mm_ops);
> +	if (ret)
> +		return ret;
> +
> +	mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
> +	mmu->arch = &host_kvm.arch;
> +	mmu->pgt = &host_kvm.pgt;
> +	mmu->vmid.vmid_gen = 0;
> +	mmu->vmid.vmid = 0;
> +
> +	return 0;
> +}
> +
> +int __pkvm_prot_finalize(void)
> +{
> +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> +	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> +
> +	params->vttbr = kvm_get_vttbr(mmu);
> +	params->vtcr = host_kvm.arch.vtcr;
> +	params->hcr_el2 |= HCR_VM;
> +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> +		params->hcr_el2 |= HCR_FWB;
> +	kvm_flush_dcache_to_poc(params, sizeof(*params));
> +
> +	write_sysreg(params->hcr_el2, hcr_el2);
> +	__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);

AFAICT, there's no ISB here. Do we need one before the TLB invalidation?

> +	__tlbi(vmalls12e1is);
> +	dsb(ish);

Given that the TLB is invalidated on the boot path, please can you add
a comment here about the stale entries which you need to invalidate?

Also, does this need to be inner-shareable? I thought this function ran on
each CPU.

> +	isb();
> +
> +	return 0;
> +}
> +
> +static void host_stage2_unmap_dev_all(void)
> +{
> +	struct kvm_pgtable *pgt = &host_kvm.pgt;
> +	struct memblock_region *reg;
> +	u64 addr = 0;
> +	int i;
> +
> +	/* Unmap all non-memory regions to recycle the pages */
> +	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
> +		reg = &hyp_memory[i];
> +		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
> +	}
> +	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);

Is this just going to return -ERANGE?

> +static int host_stage2_idmap(u64 addr)
> +{
> +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> +	struct kvm_mem_range range;
> +	bool is_memory = find_mem_range(addr, &range);
> +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> +	int ret;
> +
> +	if (is_memory)
> +		prot |= KVM_PGTABLE_PROT_X;
> +
> +	hyp_spin_lock(&host_kvm.lock);
> +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> +					      &range, pool);
> +	if (is_memory || ret != -ENOMEM)
> +		goto unlock;
> +	host_stage2_unmap_dev_all();
> +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> +					      &range, pool);

I find this _really_ hard to reason about, as range is passed by reference
and we don't reset it after the first call returns -ENOMEM for an MMIO
mapping. Maybe some commentary on why it's still valid here?

Other than that, this is looking really good to me.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
  2021-03-05 19:29     ` Will Deacon
  (?)
@ 2021-03-08  9:22       ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-08  9:22 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Friday 05 Mar 2021 at 19:29:06 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:59PM +0000, Quentin Perret wrote:
> > +static __always_inline void __load_host_stage2(void)
> > +{
> > +	if (static_branch_likely(&kvm_protected_mode_initialized))
> > +		__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> > +	else
> > +		write_sysreg(0, vttbr_el2);
> 
> I realise you've just moved the code, but why is this needed? All we've
> done is point the stage-2 ttb at 0x0 (i.e. we haven't disabled anything).

Right, but that is also used for tlb maintenance operations, e.g.
__tlb_switch_to_host() and friends.

> > +#endif /* __KVM_NVHE_MEM_PROTECT__ */
> > diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> > index e204ea77ab27..ce49795324a7 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> > +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> > @@ -14,7 +14,7 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
> >  
> >  obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
> >  	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
> > -	 cache.o cpufeature.o setup.o mm.o
> > +	 cache.o cpufeature.o setup.o mm.o mem_protect.o
> >  obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
> >  	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
> >  obj-y += $(lib-objs)
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > index f312672d895e..6fa01b04954f 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > @@ -119,6 +119,7 @@ alternative_else_nop_endif
> >  
> >  	/* Invalidate the stale TLBs from Bootloader */
> >  	tlbi	alle2
> > +	tlbi	vmalls12e1
> >  	dsb	sy
> >  
> >  	/*
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index ae6503c9be15..f47028d3fd0a 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -13,6 +13,7 @@
> >  #include <asm/kvm_hyp.h>
> >  #include <asm/kvm_mmu.h>
> >  
> > +#include <nvhe/mem_protect.h>
> >  #include <nvhe/mm.h>
> >  #include <nvhe/trap_handler.h>
> >  
> > @@ -151,6 +152,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
> >  	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
> >  }
> >  
> > +static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
> > +{
> > +	cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
> > +}
> >  typedef void (*hcall_t)(struct kvm_cpu_context *);
> >  
> >  #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
> > @@ -174,6 +179,7 @@ static const hcall_t host_hcall[] = {
> >  	HANDLE_FUNC(__pkvm_cpu_set_vector),
> >  	HANDLE_FUNC(__pkvm_create_mappings),
> >  	HANDLE_FUNC(__pkvm_create_private_mapping),
> > +	HANDLE_FUNC(__pkvm_prot_finalize),
> >  };
> >  
> >  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> > @@ -226,6 +232,11 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
> >  	case ESR_ELx_EC_SMC64:
> >  		handle_host_smc(host_ctxt);
> >  		break;
> > +	case ESR_ELx_EC_IABT_LOW:
> > +		fallthrough;
> > +	case ESR_ELx_EC_DABT_LOW:
> > +		handle_host_mem_abort(host_ctxt);
> > +		break;
> >  	default:
> >  		hyp_panic();
> >  	}
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > new file mode 100644
> > index 000000000000..2252ad1a8945
> > --- /dev/null
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -0,0 +1,213 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2020 Google LLC
> > + * Author: Quentin Perret <qperret@google.com>
> > + */
> > +
> > +#include <linux/kvm_host.h>
> > +#include <asm/kvm_cpufeature.h>
> > +#include <asm/kvm_emulate.h>
> > +#include <asm/kvm_hyp.h>
> > +#include <asm/kvm_mmu.h>
> > +#include <asm/kvm_pgtable.h>
> > +#include <asm/stage2_pgtable.h>
> > +
> > +#include <hyp/switch.h>
> > +
> > +#include <nvhe/gfp.h>
> > +#include <nvhe/memory.h>
> > +#include <nvhe/mem_protect.h>
> > +#include <nvhe/mm.h>
> > +
> > +extern unsigned long hyp_nr_cpus;
> > +struct host_kvm host_kvm;
> > +
> > +struct hyp_pool host_s2_mem;
> > +struct hyp_pool host_s2_dev;
> > +
> > +static void *host_s2_zalloc_pages_exact(size_t size)
> > +{
> > +	return hyp_alloc_pages(&host_s2_mem, get_order(size));
> > +}
> > +
> > +static void *host_s2_zalloc_page(void *pool)
> > +{
> > +	return hyp_alloc_pages(pool, 0);
> > +}
> > +
> > +static int prepare_s2_pools(void *mem_pgt_pool, void *dev_pgt_pool)
> > +{
> > +	unsigned long nr_pages, pfn;
> > +	int ret;
> > +
> > +	pfn = hyp_virt_to_pfn(mem_pgt_pool);
> > +	nr_pages = host_s2_mem_pgtable_pages();
> > +	ret = hyp_pool_init(&host_s2_mem, pfn, nr_pages, 0);
> > +	if (ret)
> > +		return ret;
> > +
> > +	pfn = hyp_virt_to_pfn(dev_pgt_pool);
> > +	nr_pages = host_s2_dev_pgtable_pages();
> > +	ret = hyp_pool_init(&host_s2_dev, pfn, nr_pages, 0);
> > +	if (ret)
> > +		return ret;
> > +
> > +	host_kvm.mm_ops.zalloc_pages_exact = host_s2_zalloc_pages_exact;
> > +	host_kvm.mm_ops.zalloc_page = host_s2_zalloc_page;
> > +	host_kvm.mm_ops.phys_to_virt = hyp_phys_to_virt;
> > +	host_kvm.mm_ops.virt_to_phys = hyp_virt_to_phys;
> > +	host_kvm.mm_ops.page_count = hyp_page_count;
> > +	host_kvm.mm_ops.get_page = hyp_get_page;
> > +	host_kvm.mm_ops.put_page = hyp_put_page;
> 
> Same comment I had earlier on about struct initialisation -- I think you
> can use the same trict here for the host_kvm.mm_ops.

+1

> > +
> > +	return 0;
> > +}
> > +
> > +static void prepare_host_vtcr(void)
> > +{
> > +	u32 parange, phys_shift;
> > +	u64 mmfr0, mmfr1;
> > +
> > +	mmfr0 = arm64_ftr_reg_id_aa64mmfr0_el1.sys_val;
> > +	mmfr1 = arm64_ftr_reg_id_aa64mmfr1_el1.sys_val;
> > +
> > +	/* The host stage 2 is id-mapped, so use parange for T0SZ */
> > +	parange = kvm_get_parange(mmfr0);
> > +	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
> > +
> > +	host_kvm.arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
> > +}
> > +
> > +int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool)
> > +{
> > +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> > +	int ret;
> > +
> > +	prepare_host_vtcr();
> > +	hyp_spin_lock_init(&host_kvm.lock);
> > +
> > +	ret = prepare_s2_pools(mem_pgt_pool, dev_pgt_pool);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = kvm_pgtable_stage2_init(&host_kvm.pgt, &host_kvm.arch,
> > +				      &host_kvm.mm_ops);
> > +	if (ret)
> > +		return ret;
> > +
> > +	mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
> > +	mmu->arch = &host_kvm.arch;
> > +	mmu->pgt = &host_kvm.pgt;
> > +	mmu->vmid.vmid_gen = 0;
> > +	mmu->vmid.vmid = 0;
> > +
> > +	return 0;
> > +}
> > +
> > +int __pkvm_prot_finalize(void)
> > +{
> > +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> > +	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> > +
> > +	params->vttbr = kvm_get_vttbr(mmu);
> > +	params->vtcr = host_kvm.arch.vtcr;
> > +	params->hcr_el2 |= HCR_VM;
> > +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> > +		params->hcr_el2 |= HCR_FWB;
> > +	kvm_flush_dcache_to_poc(params, sizeof(*params));
> > +
> > +	write_sysreg(params->hcr_el2, hcr_el2);
> > +	__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> 
> AFAICT, there's no ISB here. Do we need one before the TLB invalidation?

You mean for the ARM64_WORKAROUND_SPECULATIVE_AT case? __load_stage2()
should already add one for me no?

> > +	__tlbi(vmalls12e1is);
> > +	dsb(ish);
> 
> Given that the TLB is invalidated on the boot path, please can you add
> a comment here about the stale entries which you need to invalidate?

Sure -- that is for HCR bits cached in TLBs for VMID 0. Thinking about
it I could probably reduce the tlbi scope as well.

> Also, does this need to be inner-shareable? I thought this function ran on
> each CPU.

Hmm, correct, nsh should do.

> > +	isb();
> > +
> > +	return 0;
> > +}
> > +
> > +static void host_stage2_unmap_dev_all(void)
> > +{
> > +	struct kvm_pgtable *pgt = &host_kvm.pgt;
> > +	struct memblock_region *reg;
> > +	u64 addr = 0;
> > +	int i;
> > +
> > +	/* Unmap all non-memory regions to recycle the pages */
> > +	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
> > +		reg = &hyp_memory[i];
> > +		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
> > +	}
> > +	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);
> 
> Is this just going to return -ERANGE?

Hrmpf, yes, that wants BIT(pgt->ia_bits) I think. And that wants testing
as well, clearly.

> > +static int host_stage2_idmap(u64 addr)
> > +{
> > +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> > +	struct kvm_mem_range range;
> > +	bool is_memory = find_mem_range(addr, &range);
> > +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> > +	int ret;
> > +
> > +	if (is_memory)
> > +		prot |= KVM_PGTABLE_PROT_X;
> > +
> > +	hyp_spin_lock(&host_kvm.lock);
> > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > +					      &range, pool);
> > +	if (is_memory || ret != -ENOMEM)
> > +		goto unlock;
> > +	host_stage2_unmap_dev_all();
> > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > +					      &range, pool);
> 
> I find this _really_ hard to reason about, as range is passed by reference
> and we don't reset it after the first call returns -ENOMEM for an MMIO
> mapping. Maybe some commentary on why it's still valid here?

Sure, I'll add something. FWIW, that is intended -- -ENOMEM can only be
caused by the call to kvm_pgtable_stage2_map() which leaves the range
untouched. So, as long as we don't release the lock, the range returned
by the first call to kvm_pgtable_stage2_idmap_greedy() should still be
valid. I suppose I could call kvm_pgtable_stage2_map() directly the
second time to make it obvious but I thought this would expose the
internal of kvm_pgtable_stage2_idmap_greedy() a little bit too much.

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
@ 2021-03-08  9:22       ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-08  9:22 UTC (permalink / raw)
  To: Will Deacon
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Friday 05 Mar 2021 at 19:29:06 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:59PM +0000, Quentin Perret wrote:
> > +static __always_inline void __load_host_stage2(void)
> > +{
> > +	if (static_branch_likely(&kvm_protected_mode_initialized))
> > +		__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> > +	else
> > +		write_sysreg(0, vttbr_el2);
> 
> I realise you've just moved the code, but why is this needed? All we've
> done is point the stage-2 ttb at 0x0 (i.e. we haven't disabled anything).

Right, but that is also used for tlb maintenance operations, e.g.
__tlb_switch_to_host() and friends.

> > +#endif /* __KVM_NVHE_MEM_PROTECT__ */
> > diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> > index e204ea77ab27..ce49795324a7 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> > +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> > @@ -14,7 +14,7 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
> >  
> >  obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
> >  	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
> > -	 cache.o cpufeature.o setup.o mm.o
> > +	 cache.o cpufeature.o setup.o mm.o mem_protect.o
> >  obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
> >  	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
> >  obj-y += $(lib-objs)
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > index f312672d895e..6fa01b04954f 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > @@ -119,6 +119,7 @@ alternative_else_nop_endif
> >  
> >  	/* Invalidate the stale TLBs from Bootloader */
> >  	tlbi	alle2
> > +	tlbi	vmalls12e1
> >  	dsb	sy
> >  
> >  	/*
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index ae6503c9be15..f47028d3fd0a 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -13,6 +13,7 @@
> >  #include <asm/kvm_hyp.h>
> >  #include <asm/kvm_mmu.h>
> >  
> > +#include <nvhe/mem_protect.h>
> >  #include <nvhe/mm.h>
> >  #include <nvhe/trap_handler.h>
> >  
> > @@ -151,6 +152,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
> >  	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
> >  }
> >  
> > +static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
> > +{
> > +	cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
> > +}
> >  typedef void (*hcall_t)(struct kvm_cpu_context *);
> >  
> >  #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
> > @@ -174,6 +179,7 @@ static const hcall_t host_hcall[] = {
> >  	HANDLE_FUNC(__pkvm_cpu_set_vector),
> >  	HANDLE_FUNC(__pkvm_create_mappings),
> >  	HANDLE_FUNC(__pkvm_create_private_mapping),
> > +	HANDLE_FUNC(__pkvm_prot_finalize),
> >  };
> >  
> >  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> > @@ -226,6 +232,11 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
> >  	case ESR_ELx_EC_SMC64:
> >  		handle_host_smc(host_ctxt);
> >  		break;
> > +	case ESR_ELx_EC_IABT_LOW:
> > +		fallthrough;
> > +	case ESR_ELx_EC_DABT_LOW:
> > +		handle_host_mem_abort(host_ctxt);
> > +		break;
> >  	default:
> >  		hyp_panic();
> >  	}
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > new file mode 100644
> > index 000000000000..2252ad1a8945
> > --- /dev/null
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -0,0 +1,213 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2020 Google LLC
> > + * Author: Quentin Perret <qperret@google.com>
> > + */
> > +
> > +#include <linux/kvm_host.h>
> > +#include <asm/kvm_cpufeature.h>
> > +#include <asm/kvm_emulate.h>
> > +#include <asm/kvm_hyp.h>
> > +#include <asm/kvm_mmu.h>
> > +#include <asm/kvm_pgtable.h>
> > +#include <asm/stage2_pgtable.h>
> > +
> > +#include <hyp/switch.h>
> > +
> > +#include <nvhe/gfp.h>
> > +#include <nvhe/memory.h>
> > +#include <nvhe/mem_protect.h>
> > +#include <nvhe/mm.h>
> > +
> > +extern unsigned long hyp_nr_cpus;
> > +struct host_kvm host_kvm;
> > +
> > +struct hyp_pool host_s2_mem;
> > +struct hyp_pool host_s2_dev;
> > +
> > +static void *host_s2_zalloc_pages_exact(size_t size)
> > +{
> > +	return hyp_alloc_pages(&host_s2_mem, get_order(size));
> > +}
> > +
> > +static void *host_s2_zalloc_page(void *pool)
> > +{
> > +	return hyp_alloc_pages(pool, 0);
> > +}
> > +
> > +static int prepare_s2_pools(void *mem_pgt_pool, void *dev_pgt_pool)
> > +{
> > +	unsigned long nr_pages, pfn;
> > +	int ret;
> > +
> > +	pfn = hyp_virt_to_pfn(mem_pgt_pool);
> > +	nr_pages = host_s2_mem_pgtable_pages();
> > +	ret = hyp_pool_init(&host_s2_mem, pfn, nr_pages, 0);
> > +	if (ret)
> > +		return ret;
> > +
> > +	pfn = hyp_virt_to_pfn(dev_pgt_pool);
> > +	nr_pages = host_s2_dev_pgtable_pages();
> > +	ret = hyp_pool_init(&host_s2_dev, pfn, nr_pages, 0);
> > +	if (ret)
> > +		return ret;
> > +
> > +	host_kvm.mm_ops.zalloc_pages_exact = host_s2_zalloc_pages_exact;
> > +	host_kvm.mm_ops.zalloc_page = host_s2_zalloc_page;
> > +	host_kvm.mm_ops.phys_to_virt = hyp_phys_to_virt;
> > +	host_kvm.mm_ops.virt_to_phys = hyp_virt_to_phys;
> > +	host_kvm.mm_ops.page_count = hyp_page_count;
> > +	host_kvm.mm_ops.get_page = hyp_get_page;
> > +	host_kvm.mm_ops.put_page = hyp_put_page;
> 
> Same comment I had earlier on about struct initialisation -- I think you
> can use the same trict here for the host_kvm.mm_ops.

+1

> > +
> > +	return 0;
> > +}
> > +
> > +static void prepare_host_vtcr(void)
> > +{
> > +	u32 parange, phys_shift;
> > +	u64 mmfr0, mmfr1;
> > +
> > +	mmfr0 = arm64_ftr_reg_id_aa64mmfr0_el1.sys_val;
> > +	mmfr1 = arm64_ftr_reg_id_aa64mmfr1_el1.sys_val;
> > +
> > +	/* The host stage 2 is id-mapped, so use parange for T0SZ */
> > +	parange = kvm_get_parange(mmfr0);
> > +	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
> > +
> > +	host_kvm.arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
> > +}
> > +
> > +int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool)
> > +{
> > +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> > +	int ret;
> > +
> > +	prepare_host_vtcr();
> > +	hyp_spin_lock_init(&host_kvm.lock);
> > +
> > +	ret = prepare_s2_pools(mem_pgt_pool, dev_pgt_pool);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = kvm_pgtable_stage2_init(&host_kvm.pgt, &host_kvm.arch,
> > +				      &host_kvm.mm_ops);
> > +	if (ret)
> > +		return ret;
> > +
> > +	mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
> > +	mmu->arch = &host_kvm.arch;
> > +	mmu->pgt = &host_kvm.pgt;
> > +	mmu->vmid.vmid_gen = 0;
> > +	mmu->vmid.vmid = 0;
> > +
> > +	return 0;
> > +}
> > +
> > +int __pkvm_prot_finalize(void)
> > +{
> > +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> > +	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> > +
> > +	params->vttbr = kvm_get_vttbr(mmu);
> > +	params->vtcr = host_kvm.arch.vtcr;
> > +	params->hcr_el2 |= HCR_VM;
> > +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> > +		params->hcr_el2 |= HCR_FWB;
> > +	kvm_flush_dcache_to_poc(params, sizeof(*params));
> > +
> > +	write_sysreg(params->hcr_el2, hcr_el2);
> > +	__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> 
> AFAICT, there's no ISB here. Do we need one before the TLB invalidation?

You mean for the ARM64_WORKAROUND_SPECULATIVE_AT case? __load_stage2()
should already add one for me no?

> > +	__tlbi(vmalls12e1is);
> > +	dsb(ish);
> 
> Given that the TLB is invalidated on the boot path, please can you add
> a comment here about the stale entries which you need to invalidate?

Sure -- that is for HCR bits cached in TLBs for VMID 0. Thinking about
it I could probably reduce the tlbi scope as well.

> Also, does this need to be inner-shareable? I thought this function ran on
> each CPU.

Hmm, correct, nsh should do.

> > +	isb();
> > +
> > +	return 0;
> > +}
> > +
> > +static void host_stage2_unmap_dev_all(void)
> > +{
> > +	struct kvm_pgtable *pgt = &host_kvm.pgt;
> > +	struct memblock_region *reg;
> > +	u64 addr = 0;
> > +	int i;
> > +
> > +	/* Unmap all non-memory regions to recycle the pages */
> > +	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
> > +		reg = &hyp_memory[i];
> > +		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
> > +	}
> > +	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);
> 
> Is this just going to return -ERANGE?

Hrmpf, yes, that wants BIT(pgt->ia_bits) I think. And that wants testing
as well, clearly.

> > +static int host_stage2_idmap(u64 addr)
> > +{
> > +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> > +	struct kvm_mem_range range;
> > +	bool is_memory = find_mem_range(addr, &range);
> > +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> > +	int ret;
> > +
> > +	if (is_memory)
> > +		prot |= KVM_PGTABLE_PROT_X;
> > +
> > +	hyp_spin_lock(&host_kvm.lock);
> > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > +					      &range, pool);
> > +	if (is_memory || ret != -ENOMEM)
> > +		goto unlock;
> > +	host_stage2_unmap_dev_all();
> > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > +					      &range, pool);
> 
> I find this _really_ hard to reason about, as range is passed by reference
> and we don't reset it after the first call returns -ENOMEM for an MMIO
> mapping. Maybe some commentary on why it's still valid here?

Sure, I'll add something. FWIW, that is intended -- -ENOMEM can only be
caused by the call to kvm_pgtable_stage2_map() which leaves the range
untouched. So, as long as we don't release the lock, the range returned
by the first call to kvm_pgtable_stage2_idmap_greedy() should still be
valid. I suppose I could call kvm_pgtable_stage2_map() directly the
second time to make it obvious but I thought this would expose the
internal of kvm_pgtable_stage2_idmap_greedy() a little bit too much.

Thanks,
Quentin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
@ 2021-03-08  9:22       ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-08  9:22 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Friday 05 Mar 2021 at 19:29:06 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:59PM +0000, Quentin Perret wrote:
> > +static __always_inline void __load_host_stage2(void)
> > +{
> > +	if (static_branch_likely(&kvm_protected_mode_initialized))
> > +		__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> > +	else
> > +		write_sysreg(0, vttbr_el2);
> 
> I realise you've just moved the code, but why is this needed? All we've
> done is point the stage-2 ttb at 0x0 (i.e. we haven't disabled anything).

Right, but that is also used for tlb maintenance operations, e.g.
__tlb_switch_to_host() and friends.

> > +#endif /* __KVM_NVHE_MEM_PROTECT__ */
> > diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
> > index e204ea77ab27..ce49795324a7 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> > +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> > @@ -14,7 +14,7 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
> >  
> >  obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
> >  	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
> > -	 cache.o cpufeature.o setup.o mm.o
> > +	 cache.o cpufeature.o setup.o mm.o mem_protect.o
> >  obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
> >  	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
> >  obj-y += $(lib-objs)
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > index f312672d895e..6fa01b04954f 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > @@ -119,6 +119,7 @@ alternative_else_nop_endif
> >  
> >  	/* Invalidate the stale TLBs from Bootloader */
> >  	tlbi	alle2
> > +	tlbi	vmalls12e1
> >  	dsb	sy
> >  
> >  	/*
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index ae6503c9be15..f47028d3fd0a 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -13,6 +13,7 @@
> >  #include <asm/kvm_hyp.h>
> >  #include <asm/kvm_mmu.h>
> >  
> > +#include <nvhe/mem_protect.h>
> >  #include <nvhe/mm.h>
> >  #include <nvhe/trap_handler.h>
> >  
> > @@ -151,6 +152,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
> >  	cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
> >  }
> >  
> > +static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
> > +{
> > +	cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
> > +}
> >  typedef void (*hcall_t)(struct kvm_cpu_context *);
> >  
> >  #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
> > @@ -174,6 +179,7 @@ static const hcall_t host_hcall[] = {
> >  	HANDLE_FUNC(__pkvm_cpu_set_vector),
> >  	HANDLE_FUNC(__pkvm_create_mappings),
> >  	HANDLE_FUNC(__pkvm_create_private_mapping),
> > +	HANDLE_FUNC(__pkvm_prot_finalize),
> >  };
> >  
> >  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> > @@ -226,6 +232,11 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
> >  	case ESR_ELx_EC_SMC64:
> >  		handle_host_smc(host_ctxt);
> >  		break;
> > +	case ESR_ELx_EC_IABT_LOW:
> > +		fallthrough;
> > +	case ESR_ELx_EC_DABT_LOW:
> > +		handle_host_mem_abort(host_ctxt);
> > +		break;
> >  	default:
> >  		hyp_panic();
> >  	}
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > new file mode 100644
> > index 000000000000..2252ad1a8945
> > --- /dev/null
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -0,0 +1,213 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2020 Google LLC
> > + * Author: Quentin Perret <qperret@google.com>
> > + */
> > +
> > +#include <linux/kvm_host.h>
> > +#include <asm/kvm_cpufeature.h>
> > +#include <asm/kvm_emulate.h>
> > +#include <asm/kvm_hyp.h>
> > +#include <asm/kvm_mmu.h>
> > +#include <asm/kvm_pgtable.h>
> > +#include <asm/stage2_pgtable.h>
> > +
> > +#include <hyp/switch.h>
> > +
> > +#include <nvhe/gfp.h>
> > +#include <nvhe/memory.h>
> > +#include <nvhe/mem_protect.h>
> > +#include <nvhe/mm.h>
> > +
> > +extern unsigned long hyp_nr_cpus;
> > +struct host_kvm host_kvm;
> > +
> > +struct hyp_pool host_s2_mem;
> > +struct hyp_pool host_s2_dev;
> > +
> > +static void *host_s2_zalloc_pages_exact(size_t size)
> > +{
> > +	return hyp_alloc_pages(&host_s2_mem, get_order(size));
> > +}
> > +
> > +static void *host_s2_zalloc_page(void *pool)
> > +{
> > +	return hyp_alloc_pages(pool, 0);
> > +}
> > +
> > +static int prepare_s2_pools(void *mem_pgt_pool, void *dev_pgt_pool)
> > +{
> > +	unsigned long nr_pages, pfn;
> > +	int ret;
> > +
> > +	pfn = hyp_virt_to_pfn(mem_pgt_pool);
> > +	nr_pages = host_s2_mem_pgtable_pages();
> > +	ret = hyp_pool_init(&host_s2_mem, pfn, nr_pages, 0);
> > +	if (ret)
> > +		return ret;
> > +
> > +	pfn = hyp_virt_to_pfn(dev_pgt_pool);
> > +	nr_pages = host_s2_dev_pgtable_pages();
> > +	ret = hyp_pool_init(&host_s2_dev, pfn, nr_pages, 0);
> > +	if (ret)
> > +		return ret;
> > +
> > +	host_kvm.mm_ops.zalloc_pages_exact = host_s2_zalloc_pages_exact;
> > +	host_kvm.mm_ops.zalloc_page = host_s2_zalloc_page;
> > +	host_kvm.mm_ops.phys_to_virt = hyp_phys_to_virt;
> > +	host_kvm.mm_ops.virt_to_phys = hyp_virt_to_phys;
> > +	host_kvm.mm_ops.page_count = hyp_page_count;
> > +	host_kvm.mm_ops.get_page = hyp_get_page;
> > +	host_kvm.mm_ops.put_page = hyp_put_page;
> 
> Same comment I had earlier on about struct initialisation -- I think you
> can use the same trict here for the host_kvm.mm_ops.

+1

> > +
> > +	return 0;
> > +}
> > +
> > +static void prepare_host_vtcr(void)
> > +{
> > +	u32 parange, phys_shift;
> > +	u64 mmfr0, mmfr1;
> > +
> > +	mmfr0 = arm64_ftr_reg_id_aa64mmfr0_el1.sys_val;
> > +	mmfr1 = arm64_ftr_reg_id_aa64mmfr1_el1.sys_val;
> > +
> > +	/* The host stage 2 is id-mapped, so use parange for T0SZ */
> > +	parange = kvm_get_parange(mmfr0);
> > +	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
> > +
> > +	host_kvm.arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
> > +}
> > +
> > +int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool)
> > +{
> > +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> > +	int ret;
> > +
> > +	prepare_host_vtcr();
> > +	hyp_spin_lock_init(&host_kvm.lock);
> > +
> > +	ret = prepare_s2_pools(mem_pgt_pool, dev_pgt_pool);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = kvm_pgtable_stage2_init(&host_kvm.pgt, &host_kvm.arch,
> > +				      &host_kvm.mm_ops);
> > +	if (ret)
> > +		return ret;
> > +
> > +	mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
> > +	mmu->arch = &host_kvm.arch;
> > +	mmu->pgt = &host_kvm.pgt;
> > +	mmu->vmid.vmid_gen = 0;
> > +	mmu->vmid.vmid = 0;
> > +
> > +	return 0;
> > +}
> > +
> > +int __pkvm_prot_finalize(void)
> > +{
> > +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> > +	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> > +
> > +	params->vttbr = kvm_get_vttbr(mmu);
> > +	params->vtcr = host_kvm.arch.vtcr;
> > +	params->hcr_el2 |= HCR_VM;
> > +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> > +		params->hcr_el2 |= HCR_FWB;
> > +	kvm_flush_dcache_to_poc(params, sizeof(*params));
> > +
> > +	write_sysreg(params->hcr_el2, hcr_el2);
> > +	__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> 
> AFAICT, there's no ISB here. Do we need one before the TLB invalidation?

You mean for the ARM64_WORKAROUND_SPECULATIVE_AT case? __load_stage2()
should already add one for me no?

> > +	__tlbi(vmalls12e1is);
> > +	dsb(ish);
> 
> Given that the TLB is invalidated on the boot path, please can you add
> a comment here about the stale entries which you need to invalidate?

Sure -- that is for HCR bits cached in TLBs for VMID 0. Thinking about
it I could probably reduce the tlbi scope as well.

> Also, does this need to be inner-shareable? I thought this function ran on
> each CPU.

Hmm, correct, nsh should do.

> > +	isb();
> > +
> > +	return 0;
> > +}
> > +
> > +static void host_stage2_unmap_dev_all(void)
> > +{
> > +	struct kvm_pgtable *pgt = &host_kvm.pgt;
> > +	struct memblock_region *reg;
> > +	u64 addr = 0;
> > +	int i;
> > +
> > +	/* Unmap all non-memory regions to recycle the pages */
> > +	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
> > +		reg = &hyp_memory[i];
> > +		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
> > +	}
> > +	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);
> 
> Is this just going to return -ERANGE?

Hrmpf, yes, that wants BIT(pgt->ia_bits) I think. And that wants testing
as well, clearly.

> > +static int host_stage2_idmap(u64 addr)
> > +{
> > +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> > +	struct kvm_mem_range range;
> > +	bool is_memory = find_mem_range(addr, &range);
> > +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> > +	int ret;
> > +
> > +	if (is_memory)
> > +		prot |= KVM_PGTABLE_PROT_X;
> > +
> > +	hyp_spin_lock(&host_kvm.lock);
> > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > +					      &range, pool);
> > +	if (is_memory || ret != -ENOMEM)
> > +		goto unlock;
> > +	host_stage2_unmap_dev_all();
> > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > +					      &range, pool);
> 
> I find this _really_ hard to reason about, as range is passed by reference
> and we don't reset it after the first call returns -ENOMEM for an MMIO
> mapping. Maybe some commentary on why it's still valid here?

Sure, I'll add something. FWIW, that is intended -- -ENOMEM can only be
caused by the call to kvm_pgtable_stage2_map() which leaves the range
untouched. So, as long as we don't release the lock, the range returned
by the first call to kvm_pgtable_stage2_idmap_greedy() should still be
valid. I suppose I could call kvm_pgtable_stage2_map() directly the
second time to make it obvious but I thought this would expose the
internal of kvm_pgtable_stage2_idmap_greedy() a little bit too much.

Thanks,
Quentin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
  2021-03-08  9:22       ` Quentin Perret
  (?)
@ 2021-03-08 12:46         ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-08 12:46 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Mon, Mar 08, 2021 at 09:22:29AM +0000, Quentin Perret wrote:
> On Friday 05 Mar 2021 at 19:29:06 (+0000), Will Deacon wrote:
> > On Tue, Mar 02, 2021 at 02:59:59PM +0000, Quentin Perret wrote:
> > > +static __always_inline void __load_host_stage2(void)
> > > +{
> > > +	if (static_branch_likely(&kvm_protected_mode_initialized))
> > > +		__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> > > +	else
> > > +		write_sysreg(0, vttbr_el2);
> > 
> > I realise you've just moved the code, but why is this needed? All we've
> > done is point the stage-2 ttb at 0x0 (i.e. we haven't disabled anything).
> 
> Right, but that is also used for tlb maintenance operations, e.g.
> __tlb_switch_to_host() and friends.

Good point, the VMID is needed in that situation.

> > > +int __pkvm_prot_finalize(void)
> > > +{
> > > +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> > > +	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> > > +
> > > +	params->vttbr = kvm_get_vttbr(mmu);
> > > +	params->vtcr = host_kvm.arch.vtcr;
> > > +	params->hcr_el2 |= HCR_VM;
> > > +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> > > +		params->hcr_el2 |= HCR_FWB;
> > > +	kvm_flush_dcache_to_poc(params, sizeof(*params));
> > > +
> > > +	write_sysreg(params->hcr_el2, hcr_el2);
> > > +	__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> > 
> > AFAICT, there's no ISB here. Do we need one before the TLB invalidation?
> 
> You mean for the ARM64_WORKAROUND_SPECULATIVE_AT case? __load_stage2()
> should already add one for me no?

__load_stage2() _only_ has the ISB if ARM64_WORKAROUND_SPECULATIVE_AT is
present, whereas I think you need one unconditionall here so that the
system register write has taken effect before the TLB invalidation.

It's similar to the comment at the end of __tlb_switch_to_guest().

Having said that, I do worry that ARM64_WORKAROUND_SPECULATIVE_AT probably
needs a closer look in the world of pKVM, since it currently special-cases
the host.

> > > +	__tlbi(vmalls12e1is);
> > > +	dsb(ish);
> > 
> > Given that the TLB is invalidated on the boot path, please can you add
> > a comment here about the stale entries which you need to invalidate?
> 
> Sure -- that is for HCR bits cached in TLBs for VMID 0. Thinking about
> it I could probably reduce the tlbi scope as well.
> 
> > Also, does this need to be inner-shareable? I thought this function ran on
> > each CPU.
> 
> Hmm, correct, nsh should do.

Cool, then you can do that for both the TLBI and the DSB instructions (and
please add a comment that the invalidation is due to the HCR bits).

> > > +static void host_stage2_unmap_dev_all(void)
> > > +{
> > > +	struct kvm_pgtable *pgt = &host_kvm.pgt;
> > > +	struct memblock_region *reg;
> > > +	u64 addr = 0;
> > > +	int i;
> > > +
> > > +	/* Unmap all non-memory regions to recycle the pages */
> > > +	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
> > > +		reg = &hyp_memory[i];
> > > +		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
> > > +	}
> > > +	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);
> > 
> > Is this just going to return -ERANGE?
> 
> Hrmpf, yes, that wants BIT(pgt->ia_bits) I think. And that wants testing
> as well, clearly.

Agreed, maybe it's worth checking the return value.

> > > +static int host_stage2_idmap(u64 addr)
> > > +{
> > > +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> > > +	struct kvm_mem_range range;
> > > +	bool is_memory = find_mem_range(addr, &range);
> > > +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> > > +	int ret;
> > > +
> > > +	if (is_memory)
> > > +		prot |= KVM_PGTABLE_PROT_X;
> > > +
> > > +	hyp_spin_lock(&host_kvm.lock);
> > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > +					      &range, pool);
> > > +	if (is_memory || ret != -ENOMEM)
> > > +		goto unlock;
> > > +	host_stage2_unmap_dev_all();
> > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > +					      &range, pool);
> > 
> > I find this _really_ hard to reason about, as range is passed by reference
> > and we don't reset it after the first call returns -ENOMEM for an MMIO
> > mapping. Maybe some commentary on why it's still valid here?
> 
> Sure, I'll add something. FWIW, that is intended -- -ENOMEM can only be
> caused by the call to kvm_pgtable_stage2_map() which leaves the range
> untouched. So, as long as we don't release the lock, the range returned
> by the first call to kvm_pgtable_stage2_idmap_greedy() should still be
> valid. I suppose I could call kvm_pgtable_stage2_map() directly the
> second time to make it obvious but I thought this would expose the
> internal of kvm_pgtable_stage2_idmap_greedy() a little bit too much.

I can see it both ways, but updating the kerneldoc for
kvm_pgtable_stage2_idmap_greedy() to state in which cases the range is
updated and how would be helpful. It just says "negative error code on
failure" at the moment.

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
@ 2021-03-08 12:46         ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-08 12:46 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Mon, Mar 08, 2021 at 09:22:29AM +0000, Quentin Perret wrote:
> On Friday 05 Mar 2021 at 19:29:06 (+0000), Will Deacon wrote:
> > On Tue, Mar 02, 2021 at 02:59:59PM +0000, Quentin Perret wrote:
> > > +static __always_inline void __load_host_stage2(void)
> > > +{
> > > +	if (static_branch_likely(&kvm_protected_mode_initialized))
> > > +		__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> > > +	else
> > > +		write_sysreg(0, vttbr_el2);
> > 
> > I realise you've just moved the code, but why is this needed? All we've
> > done is point the stage-2 ttb at 0x0 (i.e. we haven't disabled anything).
> 
> Right, but that is also used for tlb maintenance operations, e.g.
> __tlb_switch_to_host() and friends.

Good point, the VMID is needed in that situation.

> > > +int __pkvm_prot_finalize(void)
> > > +{
> > > +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> > > +	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> > > +
> > > +	params->vttbr = kvm_get_vttbr(mmu);
> > > +	params->vtcr = host_kvm.arch.vtcr;
> > > +	params->hcr_el2 |= HCR_VM;
> > > +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> > > +		params->hcr_el2 |= HCR_FWB;
> > > +	kvm_flush_dcache_to_poc(params, sizeof(*params));
> > > +
> > > +	write_sysreg(params->hcr_el2, hcr_el2);
> > > +	__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> > 
> > AFAICT, there's no ISB here. Do we need one before the TLB invalidation?
> 
> You mean for the ARM64_WORKAROUND_SPECULATIVE_AT case? __load_stage2()
> should already add one for me no?

__load_stage2() _only_ has the ISB if ARM64_WORKAROUND_SPECULATIVE_AT is
present, whereas I think you need one unconditionall here so that the
system register write has taken effect before the TLB invalidation.

It's similar to the comment at the end of __tlb_switch_to_guest().

Having said that, I do worry that ARM64_WORKAROUND_SPECULATIVE_AT probably
needs a closer look in the world of pKVM, since it currently special-cases
the host.

> > > +	__tlbi(vmalls12e1is);
> > > +	dsb(ish);
> > 
> > Given that the TLB is invalidated on the boot path, please can you add
> > a comment here about the stale entries which you need to invalidate?
> 
> Sure -- that is for HCR bits cached in TLBs for VMID 0. Thinking about
> it I could probably reduce the tlbi scope as well.
> 
> > Also, does this need to be inner-shareable? I thought this function ran on
> > each CPU.
> 
> Hmm, correct, nsh should do.

Cool, then you can do that for both the TLBI and the DSB instructions (and
please add a comment that the invalidation is due to the HCR bits).

> > > +static void host_stage2_unmap_dev_all(void)
> > > +{
> > > +	struct kvm_pgtable *pgt = &host_kvm.pgt;
> > > +	struct memblock_region *reg;
> > > +	u64 addr = 0;
> > > +	int i;
> > > +
> > > +	/* Unmap all non-memory regions to recycle the pages */
> > > +	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
> > > +		reg = &hyp_memory[i];
> > > +		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
> > > +	}
> > > +	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);
> > 
> > Is this just going to return -ERANGE?
> 
> Hrmpf, yes, that wants BIT(pgt->ia_bits) I think. And that wants testing
> as well, clearly.

Agreed, maybe it's worth checking the return value.

> > > +static int host_stage2_idmap(u64 addr)
> > > +{
> > > +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> > > +	struct kvm_mem_range range;
> > > +	bool is_memory = find_mem_range(addr, &range);
> > > +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> > > +	int ret;
> > > +
> > > +	if (is_memory)
> > > +		prot |= KVM_PGTABLE_PROT_X;
> > > +
> > > +	hyp_spin_lock(&host_kvm.lock);
> > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > +					      &range, pool);
> > > +	if (is_memory || ret != -ENOMEM)
> > > +		goto unlock;
> > > +	host_stage2_unmap_dev_all();
> > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > +					      &range, pool);
> > 
> > I find this _really_ hard to reason about, as range is passed by reference
> > and we don't reset it after the first call returns -ENOMEM for an MMIO
> > mapping. Maybe some commentary on why it's still valid here?
> 
> Sure, I'll add something. FWIW, that is intended -- -ENOMEM can only be
> caused by the call to kvm_pgtable_stage2_map() which leaves the range
> untouched. So, as long as we don't release the lock, the range returned
> by the first call to kvm_pgtable_stage2_idmap_greedy() should still be
> valid. I suppose I could call kvm_pgtable_stage2_map() directly the
> second time to make it obvious but I thought this would expose the
> internal of kvm_pgtable_stage2_idmap_greedy() a little bit too much.

I can see it both ways, but updating the kerneldoc for
kvm_pgtable_stage2_idmap_greedy() to state in which cases the range is
updated and how would be helpful. It just says "negative error code on
failure" at the moment.

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
@ 2021-03-08 12:46         ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-08 12:46 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Mon, Mar 08, 2021 at 09:22:29AM +0000, Quentin Perret wrote:
> On Friday 05 Mar 2021 at 19:29:06 (+0000), Will Deacon wrote:
> > On Tue, Mar 02, 2021 at 02:59:59PM +0000, Quentin Perret wrote:
> > > +static __always_inline void __load_host_stage2(void)
> > > +{
> > > +	if (static_branch_likely(&kvm_protected_mode_initialized))
> > > +		__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> > > +	else
> > > +		write_sysreg(0, vttbr_el2);
> > 
> > I realise you've just moved the code, but why is this needed? All we've
> > done is point the stage-2 ttb at 0x0 (i.e. we haven't disabled anything).
> 
> Right, but that is also used for tlb maintenance operations, e.g.
> __tlb_switch_to_host() and friends.

Good point, the VMID is needed in that situation.

> > > +int __pkvm_prot_finalize(void)
> > > +{
> > > +	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> > > +	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> > > +
> > > +	params->vttbr = kvm_get_vttbr(mmu);
> > > +	params->vtcr = host_kvm.arch.vtcr;
> > > +	params->hcr_el2 |= HCR_VM;
> > > +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> > > +		params->hcr_el2 |= HCR_FWB;
> > > +	kvm_flush_dcache_to_poc(params, sizeof(*params));
> > > +
> > > +	write_sysreg(params->hcr_el2, hcr_el2);
> > > +	__load_stage2(&host_kvm.arch.mmu, host_kvm.arch.vtcr);
> > 
> > AFAICT, there's no ISB here. Do we need one before the TLB invalidation?
> 
> You mean for the ARM64_WORKAROUND_SPECULATIVE_AT case? __load_stage2()
> should already add one for me no?

__load_stage2() _only_ has the ISB if ARM64_WORKAROUND_SPECULATIVE_AT is
present, whereas I think you need one unconditionall here so that the
system register write has taken effect before the TLB invalidation.

It's similar to the comment at the end of __tlb_switch_to_guest().

Having said that, I do worry that ARM64_WORKAROUND_SPECULATIVE_AT probably
needs a closer look in the world of pKVM, since it currently special-cases
the host.

> > > +	__tlbi(vmalls12e1is);
> > > +	dsb(ish);
> > 
> > Given that the TLB is invalidated on the boot path, please can you add
> > a comment here about the stale entries which you need to invalidate?
> 
> Sure -- that is for HCR bits cached in TLBs for VMID 0. Thinking about
> it I could probably reduce the tlbi scope as well.
> 
> > Also, does this need to be inner-shareable? I thought this function ran on
> > each CPU.
> 
> Hmm, correct, nsh should do.

Cool, then you can do that for both the TLBI and the DSB instructions (and
please add a comment that the invalidation is due to the HCR bits).

> > > +static void host_stage2_unmap_dev_all(void)
> > > +{
> > > +	struct kvm_pgtable *pgt = &host_kvm.pgt;
> > > +	struct memblock_region *reg;
> > > +	u64 addr = 0;
> > > +	int i;
> > > +
> > > +	/* Unmap all non-memory regions to recycle the pages */
> > > +	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
> > > +		reg = &hyp_memory[i];
> > > +		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
> > > +	}
> > > +	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);
> > 
> > Is this just going to return -ERANGE?
> 
> Hrmpf, yes, that wants BIT(pgt->ia_bits) I think. And that wants testing
> as well, clearly.

Agreed, maybe it's worth checking the return value.

> > > +static int host_stage2_idmap(u64 addr)
> > > +{
> > > +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> > > +	struct kvm_mem_range range;
> > > +	bool is_memory = find_mem_range(addr, &range);
> > > +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> > > +	int ret;
> > > +
> > > +	if (is_memory)
> > > +		prot |= KVM_PGTABLE_PROT_X;
> > > +
> > > +	hyp_spin_lock(&host_kvm.lock);
> > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > +					      &range, pool);
> > > +	if (is_memory || ret != -ENOMEM)
> > > +		goto unlock;
> > > +	host_stage2_unmap_dev_all();
> > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > +					      &range, pool);
> > 
> > I find this _really_ hard to reason about, as range is passed by reference
> > and we don't reset it after the first call returns -ENOMEM for an MMIO
> > mapping. Maybe some commentary on why it's still valid here?
> 
> Sure, I'll add something. FWIW, that is intended -- -ENOMEM can only be
> caused by the call to kvm_pgtable_stage2_map() which leaves the range
> untouched. So, as long as we don't release the lock, the range returned
> by the first call to kvm_pgtable_stage2_idmap_greedy() should still be
> valid. I suppose I could call kvm_pgtable_stage2_map() directly the
> second time to make it obvious but I thought this would expose the
> internal of kvm_pgtable_stage2_idmap_greedy() a little bit too much.

I can see it both ways, but updating the kerneldoc for
kvm_pgtable_stage2_idmap_greedy() to state in which cases the range is
updated and how would be helpful. It just says "negative error code on
failure" at the moment.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
  2021-03-08 12:46         ` Will Deacon
  (?)
@ 2021-03-08 13:38           ` Quentin Perret
  -1 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-08 13:38 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Monday 08 Mar 2021 at 12:46:07 (+0000), Will Deacon wrote:
> __load_stage2() _only_ has the ISB if ARM64_WORKAROUND_SPECULATIVE_AT is
> present, whereas I think you need one unconditionall here so that the
> system register write has taken effect before the TLB invalidation.
> 
> It's similar to the comment at the end of __tlb_switch_to_guest().
> 
> Having said that, I do worry that ARM64_WORKAROUND_SPECULATIVE_AT probably
> needs a closer look in the world of pKVM, since it currently special-cases
> the host.

Yes, I see that now. I'll start looking into this.

> > > > +	__tlbi(vmalls12e1is);
> > > > +	dsb(ish);
> > > 
> > > Given that the TLB is invalidated on the boot path, please can you add
> > > a comment here about the stale entries which you need to invalidate?
> > 
> > Sure -- that is for HCR bits cached in TLBs for VMID 0. Thinking about
> > it I could probably reduce the tlbi scope as well.
> > 
> > > Also, does this need to be inner-shareable? I thought this function ran on
> > > each CPU.
> > 
> > Hmm, correct, nsh should do.
> 
> Cool, then you can do that for both the TLBI and the DSB instructions (and
> please add a comment that the invalidation is due to the HCR bits).

Sure.

> > > > +static void host_stage2_unmap_dev_all(void)
> > > > +{
> > > > +	struct kvm_pgtable *pgt = &host_kvm.pgt;
> > > > +	struct memblock_region *reg;
> > > > +	u64 addr = 0;
> > > > +	int i;
> > > > +
> > > > +	/* Unmap all non-memory regions to recycle the pages */
> > > > +	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
> > > > +		reg = &hyp_memory[i];
> > > > +		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
> > > > +	}
> > > > +	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);
> > > 
> > > Is this just going to return -ERANGE?
> > 
> > Hrmpf, yes, that wants BIT(pgt->ia_bits) I think. And that wants testing
> > as well, clearly.
> 
> Agreed, maybe it's worth checking the return value.

Ack, and hyp_panic if != 0, but that is probably preferable anyway.

> > > > +static int host_stage2_idmap(u64 addr)
> > > > +{
> > > > +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> > > > +	struct kvm_mem_range range;
> > > > +	bool is_memory = find_mem_range(addr, &range);
> > > > +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> > > > +	int ret;
> > > > +
> > > > +	if (is_memory)
> > > > +		prot |= KVM_PGTABLE_PROT_X;
> > > > +
> > > > +	hyp_spin_lock(&host_kvm.lock);
> > > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > > +					      &range, pool);
> > > > +	if (is_memory || ret != -ENOMEM)
> > > > +		goto unlock;
> > > > +	host_stage2_unmap_dev_all();
> > > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > > +					      &range, pool);
> > > 
> > > I find this _really_ hard to reason about, as range is passed by reference
> > > and we don't reset it after the first call returns -ENOMEM for an MMIO
> > > mapping. Maybe some commentary on why it's still valid here?
> > 
> > Sure, I'll add something. FWIW, that is intended -- -ENOMEM can only be
> > caused by the call to kvm_pgtable_stage2_map() which leaves the range
> > untouched. So, as long as we don't release the lock, the range returned
> > by the first call to kvm_pgtable_stage2_idmap_greedy() should still be
> > valid. I suppose I could call kvm_pgtable_stage2_map() directly the
> > second time to make it obvious but I thought this would expose the
> > internal of kvm_pgtable_stage2_idmap_greedy() a little bit too much.
> 
> I can see it both ways, but updating the kerneldoc for
> kvm_pgtable_stage2_idmap_greedy() to state in which cases the range is
> updated and how would be helpful. It just says "negative error code on
> failure" at the moment.

Alternatively I could expose the 'reduce' path (maybe with another name
e.g. stage2_find_compatible_range() or so) and remove the
kvm_pgtable_stage2_idmap_greedy() wrapper. So it'd be the caller's
responsibility to not release the lock in between
stage2_find_compatible_range() and kvm_pgtable_stage2_map() for
instance, but that sounds reasonable to me. And that would make it
explicit it's the _map() path that failed with -ENOMEM, and that the
range can be re-used the second time.

Thoughts?

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
@ 2021-03-08 13:38           ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-08 13:38 UTC (permalink / raw)
  To: Will Deacon
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Monday 08 Mar 2021 at 12:46:07 (+0000), Will Deacon wrote:
> __load_stage2() _only_ has the ISB if ARM64_WORKAROUND_SPECULATIVE_AT is
> present, whereas I think you need one unconditionall here so that the
> system register write has taken effect before the TLB invalidation.
> 
> It's similar to the comment at the end of __tlb_switch_to_guest().
> 
> Having said that, I do worry that ARM64_WORKAROUND_SPECULATIVE_AT probably
> needs a closer look in the world of pKVM, since it currently special-cases
> the host.

Yes, I see that now. I'll start looking into this.

> > > > +	__tlbi(vmalls12e1is);
> > > > +	dsb(ish);
> > > 
> > > Given that the TLB is invalidated on the boot path, please can you add
> > > a comment here about the stale entries which you need to invalidate?
> > 
> > Sure -- that is for HCR bits cached in TLBs for VMID 0. Thinking about
> > it I could probably reduce the tlbi scope as well.
> > 
> > > Also, does this need to be inner-shareable? I thought this function ran on
> > > each CPU.
> > 
> > Hmm, correct, nsh should do.
> 
> Cool, then you can do that for both the TLBI and the DSB instructions (and
> please add a comment that the invalidation is due to the HCR bits).

Sure.

> > > > +static void host_stage2_unmap_dev_all(void)
> > > > +{
> > > > +	struct kvm_pgtable *pgt = &host_kvm.pgt;
> > > > +	struct memblock_region *reg;
> > > > +	u64 addr = 0;
> > > > +	int i;
> > > > +
> > > > +	/* Unmap all non-memory regions to recycle the pages */
> > > > +	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
> > > > +		reg = &hyp_memory[i];
> > > > +		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
> > > > +	}
> > > > +	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);
> > > 
> > > Is this just going to return -ERANGE?
> > 
> > Hrmpf, yes, that wants BIT(pgt->ia_bits) I think. And that wants testing
> > as well, clearly.
> 
> Agreed, maybe it's worth checking the return value.

Ack, and hyp_panic if != 0, but that is probably preferable anyway.

> > > > +static int host_stage2_idmap(u64 addr)
> > > > +{
> > > > +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> > > > +	struct kvm_mem_range range;
> > > > +	bool is_memory = find_mem_range(addr, &range);
> > > > +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> > > > +	int ret;
> > > > +
> > > > +	if (is_memory)
> > > > +		prot |= KVM_PGTABLE_PROT_X;
> > > > +
> > > > +	hyp_spin_lock(&host_kvm.lock);
> > > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > > +					      &range, pool);
> > > > +	if (is_memory || ret != -ENOMEM)
> > > > +		goto unlock;
> > > > +	host_stage2_unmap_dev_all();
> > > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > > +					      &range, pool);
> > > 
> > > I find this _really_ hard to reason about, as range is passed by reference
> > > and we don't reset it after the first call returns -ENOMEM for an MMIO
> > > mapping. Maybe some commentary on why it's still valid here?
> > 
> > Sure, I'll add something. FWIW, that is intended -- -ENOMEM can only be
> > caused by the call to kvm_pgtable_stage2_map() which leaves the range
> > untouched. So, as long as we don't release the lock, the range returned
> > by the first call to kvm_pgtable_stage2_idmap_greedy() should still be
> > valid. I suppose I could call kvm_pgtable_stage2_map() directly the
> > second time to make it obvious but I thought this would expose the
> > internal of kvm_pgtable_stage2_idmap_greedy() a little bit too much.
> 
> I can see it both ways, but updating the kerneldoc for
> kvm_pgtable_stage2_idmap_greedy() to state in which cases the range is
> updated and how would be helpful. It just says "negative error code on
> failure" at the moment.

Alternatively I could expose the 'reduce' path (maybe with another name
e.g. stage2_find_compatible_range() or so) and remove the
kvm_pgtable_stage2_idmap_greedy() wrapper. So it'd be the caller's
responsibility to not release the lock in between
stage2_find_compatible_range() and kvm_pgtable_stage2_map() for
instance, but that sounds reasonable to me. And that would make it
explicit it's the _map() path that failed with -ENOMEM, and that the
range can be re-used the second time.

Thoughts?

Thanks,
Quentin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
@ 2021-03-08 13:38           ` Quentin Perret
  0 siblings, 0 replies; 192+ messages in thread
From: Quentin Perret @ 2021-03-08 13:38 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Monday 08 Mar 2021 at 12:46:07 (+0000), Will Deacon wrote:
> __load_stage2() _only_ has the ISB if ARM64_WORKAROUND_SPECULATIVE_AT is
> present, whereas I think you need one unconditionall here so that the
> system register write has taken effect before the TLB invalidation.
> 
> It's similar to the comment at the end of __tlb_switch_to_guest().
> 
> Having said that, I do worry that ARM64_WORKAROUND_SPECULATIVE_AT probably
> needs a closer look in the world of pKVM, since it currently special-cases
> the host.

Yes, I see that now. I'll start looking into this.

> > > > +	__tlbi(vmalls12e1is);
> > > > +	dsb(ish);
> > > 
> > > Given that the TLB is invalidated on the boot path, please can you add
> > > a comment here about the stale entries which you need to invalidate?
> > 
> > Sure -- that is for HCR bits cached in TLBs for VMID 0. Thinking about
> > it I could probably reduce the tlbi scope as well.
> > 
> > > Also, does this need to be inner-shareable? I thought this function ran on
> > > each CPU.
> > 
> > Hmm, correct, nsh should do.
> 
> Cool, then you can do that for both the TLBI and the DSB instructions (and
> please add a comment that the invalidation is due to the HCR bits).

Sure.

> > > > +static void host_stage2_unmap_dev_all(void)
> > > > +{
> > > > +	struct kvm_pgtable *pgt = &host_kvm.pgt;
> > > > +	struct memblock_region *reg;
> > > > +	u64 addr = 0;
> > > > +	int i;
> > > > +
> > > > +	/* Unmap all non-memory regions to recycle the pages */
> > > > +	for (i = 0; i < hyp_memblock_nr; i++, addr = reg->base + reg->size) {
> > > > +		reg = &hyp_memory[i];
> > > > +		kvm_pgtable_stage2_unmap(pgt, addr, reg->base - addr);
> > > > +	}
> > > > +	kvm_pgtable_stage2_unmap(pgt, addr, ULONG_MAX);
> > > 
> > > Is this just going to return -ERANGE?
> > 
> > Hrmpf, yes, that wants BIT(pgt->ia_bits) I think. And that wants testing
> > as well, clearly.
> 
> Agreed, maybe it's worth checking the return value.

Ack, and hyp_panic if != 0, but that is probably preferable anyway.

> > > > +static int host_stage2_idmap(u64 addr)
> > > > +{
> > > > +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> > > > +	struct kvm_mem_range range;
> > > > +	bool is_memory = find_mem_range(addr, &range);
> > > > +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> > > > +	int ret;
> > > > +
> > > > +	if (is_memory)
> > > > +		prot |= KVM_PGTABLE_PROT_X;
> > > > +
> > > > +	hyp_spin_lock(&host_kvm.lock);
> > > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > > +					      &range, pool);
> > > > +	if (is_memory || ret != -ENOMEM)
> > > > +		goto unlock;
> > > > +	host_stage2_unmap_dev_all();
> > > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > > +					      &range, pool);
> > > 
> > > I find this _really_ hard to reason about, as range is passed by reference
> > > and we don't reset it after the first call returns -ENOMEM for an MMIO
> > > mapping. Maybe some commentary on why it's still valid here?
> > 
> > Sure, I'll add something. FWIW, that is intended -- -ENOMEM can only be
> > caused by the call to kvm_pgtable_stage2_map() which leaves the range
> > untouched. So, as long as we don't release the lock, the range returned
> > by the first call to kvm_pgtable_stage2_idmap_greedy() should still be
> > valid. I suppose I could call kvm_pgtable_stage2_map() directly the
> > second time to make it obvious but I thought this would expose the
> > internal of kvm_pgtable_stage2_idmap_greedy() a little bit too much.
> 
> I can see it both ways, but updating the kerneldoc for
> kvm_pgtable_stage2_idmap_greedy() to state in which cases the range is
> updated and how would be helpful. It just says "negative error code on
> failure" at the moment.

Alternatively I could expose the 'reduce' path (maybe with another name
e.g. stage2_find_compatible_range() or so) and remove the
kvm_pgtable_stage2_idmap_greedy() wrapper. So it'd be the caller's
responsibility to not release the lock in between
stage2_find_compatible_range() and kvm_pgtable_stage2_map() for
instance, but that sounds reasonable to me. And that would make it
explicit it's the _map() path that failed with -ENOMEM, and that the
range can be re-used the second time.

Thoughts?

Thanks,
Quentin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
  2021-03-08 13:38           ` Quentin Perret
  (?)
@ 2021-03-08 13:52             ` Will Deacon
  -1 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-08 13:52 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Mon, Mar 08, 2021 at 01:38:07PM +0000, Quentin Perret wrote:
> On Monday 08 Mar 2021 at 12:46:07 (+0000), Will Deacon wrote:
> > > > > +static int host_stage2_idmap(u64 addr)
> > > > > +{
> > > > > +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> > > > > +	struct kvm_mem_range range;
> > > > > +	bool is_memory = find_mem_range(addr, &range);
> > > > > +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> > > > > +	int ret;
> > > > > +
> > > > > +	if (is_memory)
> > > > > +		prot |= KVM_PGTABLE_PROT_X;
> > > > > +
> > > > > +	hyp_spin_lock(&host_kvm.lock);
> > > > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > > > +					      &range, pool);
> > > > > +	if (is_memory || ret != -ENOMEM)
> > > > > +		goto unlock;
> > > > > +	host_stage2_unmap_dev_all();
> > > > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > > > +					      &range, pool);
> > > > 
> > > > I find this _really_ hard to reason about, as range is passed by reference
> > > > and we don't reset it after the first call returns -ENOMEM for an MMIO
> > > > mapping. Maybe some commentary on why it's still valid here?
> > > 
> > > Sure, I'll add something. FWIW, that is intended -- -ENOMEM can only be
> > > caused by the call to kvm_pgtable_stage2_map() which leaves the range
> > > untouched. So, as long as we don't release the lock, the range returned
> > > by the first call to kvm_pgtable_stage2_idmap_greedy() should still be
> > > valid. I suppose I could call kvm_pgtable_stage2_map() directly the
> > > second time to make it obvious but I thought this would expose the
> > > internal of kvm_pgtable_stage2_idmap_greedy() a little bit too much.
> > 
> > I can see it both ways, but updating the kerneldoc for
> > kvm_pgtable_stage2_idmap_greedy() to state in which cases the range is
> > updated and how would be helpful. It just says "negative error code on
> > failure" at the moment.
> 
> Alternatively I could expose the 'reduce' path (maybe with another name
> e.g. stage2_find_compatible_range() or so) and remove the
> kvm_pgtable_stage2_idmap_greedy() wrapper. So it'd be the caller's
> responsibility to not release the lock in between
> stage2_find_compatible_range() and kvm_pgtable_stage2_map() for
> instance, but that sounds reasonable to me. And that would make it
> explicit it's the _map() path that failed with -ENOMEM, and that the
> range can be re-used the second time.
> 
> Thoughts?

I suppose it depends on whether or not you reckon this could be optimised
into a single-pass of the page-table. If not, then splitting it up makes
sense to me (and actually, it's not like this has tonnes of callers so
even if we changed things in future it wouldn't be too hard to fix them up).

Will

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
@ 2021-03-08 13:52             ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-08 13:52 UTC (permalink / raw)
  To: Quentin Perret
  Cc: android-kvm, catalin.marinas, mate.toth-pal, seanjc, tabba,
	linux-kernel, robh+dt, linux-arm-kernel, maz, kernel-team,
	kvmarm

On Mon, Mar 08, 2021 at 01:38:07PM +0000, Quentin Perret wrote:
> On Monday 08 Mar 2021 at 12:46:07 (+0000), Will Deacon wrote:
> > > > > +static int host_stage2_idmap(u64 addr)
> > > > > +{
> > > > > +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> > > > > +	struct kvm_mem_range range;
> > > > > +	bool is_memory = find_mem_range(addr, &range);
> > > > > +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> > > > > +	int ret;
> > > > > +
> > > > > +	if (is_memory)
> > > > > +		prot |= KVM_PGTABLE_PROT_X;
> > > > > +
> > > > > +	hyp_spin_lock(&host_kvm.lock);
> > > > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > > > +					      &range, pool);
> > > > > +	if (is_memory || ret != -ENOMEM)
> > > > > +		goto unlock;
> > > > > +	host_stage2_unmap_dev_all();
> > > > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > > > +					      &range, pool);
> > > > 
> > > > I find this _really_ hard to reason about, as range is passed by reference
> > > > and we don't reset it after the first call returns -ENOMEM for an MMIO
> > > > mapping. Maybe some commentary on why it's still valid here?
> > > 
> > > Sure, I'll add something. FWIW, that is intended -- -ENOMEM can only be
> > > caused by the call to kvm_pgtable_stage2_map() which leaves the range
> > > untouched. So, as long as we don't release the lock, the range returned
> > > by the first call to kvm_pgtable_stage2_idmap_greedy() should still be
> > > valid. I suppose I could call kvm_pgtable_stage2_map() directly the
> > > second time to make it obvious but I thought this would expose the
> > > internal of kvm_pgtable_stage2_idmap_greedy() a little bit too much.
> > 
> > I can see it both ways, but updating the kerneldoc for
> > kvm_pgtable_stage2_idmap_greedy() to state in which cases the range is
> > updated and how would be helpful. It just says "negative error code on
> > failure" at the moment.
> 
> Alternatively I could expose the 'reduce' path (maybe with another name
> e.g. stage2_find_compatible_range() or so) and remove the
> kvm_pgtable_stage2_idmap_greedy() wrapper. So it'd be the caller's
> responsibility to not release the lock in between
> stage2_find_compatible_range() and kvm_pgtable_stage2_map() for
> instance, but that sounds reasonable to me. And that would make it
> explicit it's the _map() path that failed with -ENOMEM, and that the
> range can be re-used the second time.
> 
> Thoughts?

I suppose it depends on whether or not you reckon this could be optimised
into a single-pass of the page-table. If not, then splitting it up makes
sense to me (and actually, it's not like this has tonnes of callers so
even if we changed things in future it wouldn't be too hard to fix them up).

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 192+ messages in thread

* Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
@ 2021-03-08 13:52             ` Will Deacon
  0 siblings, 0 replies; 192+ messages in thread
From: Will Deacon @ 2021-03-08 13:52 UTC (permalink / raw)
  To: Quentin Perret
  Cc: catalin.marinas, maz, james.morse, julien.thierry.kdev,
	suzuki.poulose, android-kvm, linux-kernel, kernel-team, kvmarm,
	linux-arm-kernel, tabba, mark.rutland, dbrazdil, mate.toth-pal,
	seanjc, robh+dt

On Mon, Mar 08, 2021 at 01:38:07PM +0000, Quentin Perret wrote:
> On Monday 08 Mar 2021 at 12:46:07 (+0000), Will Deacon wrote:
> > > > > +static int host_stage2_idmap(u64 addr)
> > > > > +{
> > > > > +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> > > > > +	struct kvm_mem_range range;
> > > > > +	bool is_memory = find_mem_range(addr, &range);
> > > > > +	struct hyp_pool *pool = is_memory ? &host_s2_mem : &host_s2_dev;
> > > > > +	int ret;
> > > > > +
> > > > > +	if (is_memory)
> > > > > +		prot |= KVM_PGTABLE_PROT_X;
> > > > > +
> > > > > +	hyp_spin_lock(&host_kvm.lock);
> > > > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > > > +					      &range, pool);
> > > > > +	if (is_memory || ret != -ENOMEM)
> > > > > +		goto unlock;
> > > > > +	host_stage2_unmap_dev_all();
> > > > > +	ret = kvm_pgtable_stage2_idmap_greedy(&host_kvm.pgt, addr, prot,
> > > > > +					      &range, pool);
> > > > 
> > > > I find this _really_ hard to reason about, as range is passed by reference
> > > > and we don't reset it after the first call returns -ENOMEM for an MMIO
> > > > mapping. Maybe some commentary on why it's still valid here?
> > > 
> > > Sure, I'll add something. FWIW, that is intended -- -ENOMEM can only be
> > > caused by the call to kvm_pgtable_stage2_map() which leaves the range
> > > untouched. So, as long as we don't release the lock, the range returned
> > > by the first call to kvm_pgtable_stage2_idmap_greedy() should still be
> > > valid. I suppose I could call kvm_pgtable_stage2_map() directly the
> > > second time to make it obvious but I thought this would expose the
> > > internal of kvm_pgtable_stage2_idmap_greedy() a little bit too much.
> > 
> > I can see it both ways, but updating the kerneldoc for
> > kvm_pgtable_stage2_idmap_greedy() to state in which cases the range is
> > updated and how would be helpful. It just says "negative error code on
> > failure" at the moment.
> 
> Alternatively I could expose the 'reduce' path (maybe with another name
> e.g. stage2_find_compatible_range() or so) and remove the
> kvm_pgtable_stage2_idmap_greedy() wrapper. So it'd be the caller's
> responsibility to not release the lock in between
> stage2_find_compatible_range() and kvm_pgtable_stage2_map() for
> instance, but that sounds reasonable to me. And that would make it
> explicit it's the _map() path that failed with -ENOMEM, and that the
> range can be re-used the second time.
> 
> Thoughts?

I suppose it depends on whether or not you reckon this could be optimised
into a single-pass of the page-table. If not, then splitting it up makes
sense to me (and actually, it's not like this has tonnes of callers so
even if we changed things in future it wouldn't be too hard to fix them up).

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 192+ messages in thread

end of thread, other threads:[~2021-03-08 13:56 UTC | newest]

Thread overview: 192+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-02 14:59 [PATCH v3 00/32] KVM: arm64: A stage 2 for the host Quentin Perret
2021-03-02 14:59 ` Quentin Perret
2021-03-02 14:59 ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 01/32] arm64: lib: Annotate {clear,copy}_page() as position-independent Quentin Perret
2021-03-02 14:59   ` [PATCH v3 01/32] arm64: lib: Annotate {clear, copy}_page() " Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 02/32] KVM: arm64: Link position-independent string routines into .hyp.text Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 03/32] arm64: kvm: Add standalone ticket spinlock implementation for use at hyp Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 04/32] KVM: arm64: Initialize kvm_nvhe_init_params early Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-04 13:39   ` Will Deacon
2021-03-04 13:39     ` Will Deacon
2021-03-04 13:39     ` Will Deacon
2021-03-02 14:59 ` [PATCH v3 05/32] KVM: arm64: Avoid free_page() in page-table allocator Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 06/32] KVM: arm64: Factor memory allocation out of pgtable.c Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-04 14:06   ` Will Deacon
2021-03-04 14:06     ` Will Deacon
2021-03-04 14:06     ` Will Deacon
2021-03-02 14:59 ` [PATCH v3 07/32] KVM: arm64: Introduce a BSS section for use at Hyp Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-04 14:09   ` Will Deacon
2021-03-04 14:09     ` Will Deacon
2021-03-04 14:09     ` Will Deacon
2021-03-02 14:59 ` [PATCH v3 08/32] KVM: arm64: Make kvm_call_hyp() a function call " Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 09/32] KVM: arm64: Allow using kvm_nvhe_sym() in hyp code Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 10/32] KVM: arm64: Introduce an early Hyp page allocator Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-04 14:38   ` Will Deacon
2021-03-04 14:38     ` Will Deacon
2021-03-04 14:38     ` Will Deacon
2021-03-02 14:59 ` [PATCH v3 11/32] KVM: arm64: Stub CONFIG_DEBUG_LIST at Hyp Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 12/32] KVM: arm64: Introduce a Hyp buddy page allocator Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-04 15:30   ` Will Deacon
2021-03-04 15:30     ` Will Deacon
2021-03-04 15:30     ` Will Deacon
2021-03-04 15:49     ` Quentin Perret
2021-03-04 15:49       ` Quentin Perret
2021-03-04 15:49       ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 13/32] KVM: arm64: Enable access to sanitized CPU features at EL2 Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 14/32] KVM: arm64: Factor out vector address calculation Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 15/32] KVM: arm64: Prepare the creation of s1 mappings at EL2 Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-04 18:47   ` Will Deacon
2021-03-04 18:47     ` Will Deacon
2021-03-04 18:47     ` Will Deacon
2021-03-02 14:59 ` [PATCH v3 16/32] KVM: arm64: Elevate hypervisor mappings creation " Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-04 19:25   ` Will Deacon
2021-03-04 19:25     ` Will Deacon
2021-03-04 19:25     ` Will Deacon
2021-03-05  9:14     ` Quentin Perret
2021-03-05  9:14       ` Quentin Perret
2021-03-05  9:14       ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 17/32] KVM: arm64: Use kvm_arch for stage 2 pgtable Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 18/32] KVM: arm64: Use kvm_arch in kvm_s2_mmu Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 19/32] KVM: arm64: Set host stage 2 using kvm_nvhe_init_params Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 20/32] KVM: arm64: Refactor kvm_arm_setup_stage2() Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-04 19:35   ` Will Deacon
2021-03-04 19:35     ` Will Deacon
2021-03-04 19:35     ` Will Deacon
2021-03-02 14:59 ` [PATCH v3 21/32] KVM: arm64: Refactor __load_guest_stage2() Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 22/32] KVM: arm64: Refactor __populate_fault_info() Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-04 19:39   ` Will Deacon
2021-03-04 19:39     ` Will Deacon
2021-03-04 19:39     ` Will Deacon
2021-03-02 14:59 ` [PATCH v3 23/32] KVM: arm64: Make memcache anonymous in pgtable allocator Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-04 19:44   ` Will Deacon
2021-03-04 19:44     ` Will Deacon
2021-03-04 19:44     ` Will Deacon
2021-03-02 14:59 ` [PATCH v3 24/32] KVM: arm64: Reserve memory for host stage 2 Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-04 19:49   ` Will Deacon
2021-03-04 19:49     ` Will Deacon
2021-03-04 19:49     ` Will Deacon
2021-03-05  9:17     ` Quentin Perret
2021-03-05  9:17       ` Quentin Perret
2021-03-05  9:17       ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 25/32] KVM: arm64: Sort the hypervisor memblocks Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-04 19:51   ` Will Deacon
2021-03-04 19:51     ` Will Deacon
2021-03-04 19:51     ` Will Deacon
2021-03-02 14:59 ` [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2 Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-04 20:00   ` Will Deacon
2021-03-04 20:00     ` Will Deacon
2021-03-04 20:00     ` Will Deacon
2021-03-05  9:52     ` Quentin Perret
2021-03-05  9:52       ` Quentin Perret
2021-03-05  9:52       ` Quentin Perret
2021-03-05 19:03       ` Will Deacon
2021-03-05 19:03         ` Will Deacon
2021-03-05 19:03         ` Will Deacon
2021-03-02 14:59 ` [PATCH v3 27/32] KVM: arm64: Refactor stage2_map_set_prot_attr() Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-04 20:03   ` Will Deacon
2021-03-04 20:03     ` Will Deacon
2021-03-04 20:03     ` Will Deacon
2021-03-05  9:18     ` Quentin Perret
2021-03-05  9:18       ` Quentin Perret
2021-03-05  9:18       ` Quentin Perret
2021-03-02 14:59 ` [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy() Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-05 14:39   ` Will Deacon
2021-03-05 14:39     ` Will Deacon
2021-03-05 14:39     ` Will Deacon
2021-03-05 15:03     ` Quentin Perret
2021-03-05 15:03       ` Quentin Perret
2021-03-05 15:03       ` Quentin Perret
2021-03-05 16:59       ` Will Deacon
2021-03-05 16:59         ` Will Deacon
2021-03-05 16:59         ` Will Deacon
2021-03-02 14:59 ` [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2 Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-02 14:59   ` Quentin Perret
2021-03-05 19:29   ` Will Deacon
2021-03-05 19:29     ` Will Deacon
2021-03-05 19:29     ` Will Deacon
2021-03-08  9:22     ` Quentin Perret
2021-03-08  9:22       ` Quentin Perret
2021-03-08  9:22       ` Quentin Perret
2021-03-08 12:46       ` Will Deacon
2021-03-08 12:46         ` Will Deacon
2021-03-08 12:46         ` Will Deacon
2021-03-08 13:38         ` Quentin Perret
2021-03-08 13:38           ` Quentin Perret
2021-03-08 13:38           ` Quentin Perret
2021-03-08 13:52           ` Will Deacon
2021-03-08 13:52             ` Will Deacon
2021-03-08 13:52             ` Will Deacon
2021-03-02 15:00 ` [PATCH v3 30/32] KVM: arm64: Page-align the .hyp sections Quentin Perret
2021-03-02 15:00   ` Quentin Perret
2021-03-02 15:00   ` Quentin Perret
2021-03-04 20:05   ` Will Deacon
2021-03-04 20:05     ` Will Deacon
2021-03-04 20:05     ` Will Deacon
2021-03-02 15:00 ` [PATCH v3 31/32] KVM: arm64: Disable PMU support in protected mode Quentin Perret
2021-03-02 15:00   ` Quentin Perret
2021-03-02 15:00   ` Quentin Perret
2021-03-05 19:02   ` Will Deacon
2021-03-05 19:02     ` Will Deacon
2021-03-05 19:02     ` Will Deacon
2021-03-02 15:00 ` [PATCH v3 32/32] KVM: arm64: Protect the .hyp sections from the host Quentin Perret
2021-03-02 15:00   ` Quentin Perret
2021-03-02 15:00   ` Quentin Perret
2021-03-05 19:13   ` Will Deacon
2021-03-05 19:13     ` Will Deacon
2021-03-05 19:13     ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.