All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-06-30 13:57 ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

Hi everyone,

This series has been extracted from the pKVM base support series (aka
"pKVM mega-patch") previously posted here:

  https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/

Unlike that more comprehensive series, this one is fairly fundamental
and does not introduce any new ABI commitments, leaving questions
involving the management of guest private memory and the creation of
protected VMs for future work. Instead, this series extends the pKVM EL2
code so that it can dynamically instantiate and manage VM shadow
structures without the host being able to access them directly. These
shadow structures consist of a shadow VM, a set of shadow vCPUs and the
stage-2 page-table and the pages used to hold them are returned to the
host when the VM is destroyed.

The last patch is marked as RFC because, although it plumbs in the
shadow state, it is woefully inefficient and copies to/from the host
state on every vCPU run. Without the last patch, the new structures are
unused but we move considerably closer to isolating guests from the
host.

The series is based on Marc's rework of the flags
(kvm-arm64/burn-the-flags).

Feedback welcome.

Cheers,

Will, Quentin, Fuad and Marc

Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Marc Zyngier <maz@kernel.org>

Cc: kernel-team@android.com
Cc: kvm@vger.kernel.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-arm-kernel@lists.infradead.org

--->8

Fuad Tabba (3):
  KVM: arm64: Add hyp_spinlock_t static initializer
  KVM: arm64: Introduce shadow VM state at EL2
  KVM: arm64: Instantiate VM shadow data from EL1

Quentin Perret (15):
  KVM: arm64: Move hyp refcount manipulation helpers
  KVM: arm64: Allow non-coalescable pages in a hyp_pool
  KVM: arm64: Add flags to struct hyp_page
  KVM: arm64: Back hyp_vmemmap for all of memory
  KVM: arm64: Make hyp stage-1 refcnt correct on the whole range
  KVM: arm64: Implement do_donate() helper for donating memory
  KVM: arm64: Prevent the donation of no-map pages
  KVM: arm64: Add helpers to pin memory shared with hyp
  KVM: arm64: Add pcpu fixmap infrastructure at EL2
  KVM: arm64: Add generic hyp_memcache helpers
  KVM: arm64: Instantiate guest stage-2 page-tables at EL2
  KVM: arm64: Return guest memory from EL2 via dedicated teardown
    memcache
  KVM: arm64: Unmap kvm_arm_hyp_percpu_base from the host
  KVM: arm64: Explicitly map kvm_vgic_global_state at EL2
  KVM: arm64: Don't map host sections in pkvm

Will Deacon (6):
  KVM: arm64: Unify identifiers used to distinguish host and hypervisor
  KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h
  KVM: arm64: Initialise hyp symbols regardless of pKVM
  KVM: arm64: Provide I-cache invalidation by VA at EL2
  KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2
  KVM: arm64: Use the shadow vCPU structure in handle___kvm_vcpu_run()

 arch/arm64/include/asm/kvm_asm.h              |   6 +-
 arch/arm64/include/asm/kvm_host.h             |  65 +++
 arch/arm64/include/asm/kvm_hyp.h              |   3 +
 arch/arm64/include/asm/kvm_pgtable.h          |   8 +
 arch/arm64/include/asm/kvm_pkvm.h             |  38 ++
 arch/arm64/kernel/image-vars.h                |  15 -
 arch/arm64/kvm/arm.c                          |  40 +-
 arch/arm64/kvm/hyp/hyp-constants.c            |   3 +
 arch/arm64/kvm/hyp/include/nvhe/gfp.h         |   6 +-
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  19 +-
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  26 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |  18 +-
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  70 +++
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h    |  10 +-
 arch/arm64/kvm/hyp/nvhe/cache.S               |  11 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 105 +++-
 arch/arm64/kvm/hyp/nvhe/hyp-smp.c             |   2 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 456 +++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 136 +++++-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c          |  42 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 438 +++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |  96 ++--
 arch/arm64/kvm/hyp/pgtable.c                  |   9 +
 arch/arm64/kvm/mmu.c                          |  26 +
 arch/arm64/kvm/pkvm.c                         | 121 ++++-
 25 files changed, 1625 insertions(+), 144 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h

-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-06-30 13:57 ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

Hi everyone,

This series has been extracted from the pKVM base support series (aka
"pKVM mega-patch") previously posted here:

  https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/

Unlike that more comprehensive series, this one is fairly fundamental
and does not introduce any new ABI commitments, leaving questions
involving the management of guest private memory and the creation of
protected VMs for future work. Instead, this series extends the pKVM EL2
code so that it can dynamically instantiate and manage VM shadow
structures without the host being able to access them directly. These
shadow structures consist of a shadow VM, a set of shadow vCPUs and the
stage-2 page-table and the pages used to hold them are returned to the
host when the VM is destroyed.

The last patch is marked as RFC because, although it plumbs in the
shadow state, it is woefully inefficient and copies to/from the host
state on every vCPU run. Without the last patch, the new structures are
unused but we move considerably closer to isolating guests from the
host.

The series is based on Marc's rework of the flags
(kvm-arm64/burn-the-flags).

Feedback welcome.

Cheers,

Will, Quentin, Fuad and Marc

Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Marc Zyngier <maz@kernel.org>

Cc: kernel-team@android.com
Cc: kvm@vger.kernel.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-arm-kernel@lists.infradead.org

--->8

Fuad Tabba (3):
  KVM: arm64: Add hyp_spinlock_t static initializer
  KVM: arm64: Introduce shadow VM state at EL2
  KVM: arm64: Instantiate VM shadow data from EL1

Quentin Perret (15):
  KVM: arm64: Move hyp refcount manipulation helpers
  KVM: arm64: Allow non-coalescable pages in a hyp_pool
  KVM: arm64: Add flags to struct hyp_page
  KVM: arm64: Back hyp_vmemmap for all of memory
  KVM: arm64: Make hyp stage-1 refcnt correct on the whole range
  KVM: arm64: Implement do_donate() helper for donating memory
  KVM: arm64: Prevent the donation of no-map pages
  KVM: arm64: Add helpers to pin memory shared with hyp
  KVM: arm64: Add pcpu fixmap infrastructure at EL2
  KVM: arm64: Add generic hyp_memcache helpers
  KVM: arm64: Instantiate guest stage-2 page-tables at EL2
  KVM: arm64: Return guest memory from EL2 via dedicated teardown
    memcache
  KVM: arm64: Unmap kvm_arm_hyp_percpu_base from the host
  KVM: arm64: Explicitly map kvm_vgic_global_state at EL2
  KVM: arm64: Don't map host sections in pkvm

Will Deacon (6):
  KVM: arm64: Unify identifiers used to distinguish host and hypervisor
  KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h
  KVM: arm64: Initialise hyp symbols regardless of pKVM
  KVM: arm64: Provide I-cache invalidation by VA at EL2
  KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2
  KVM: arm64: Use the shadow vCPU structure in handle___kvm_vcpu_run()

 arch/arm64/include/asm/kvm_asm.h              |   6 +-
 arch/arm64/include/asm/kvm_host.h             |  65 +++
 arch/arm64/include/asm/kvm_hyp.h              |   3 +
 arch/arm64/include/asm/kvm_pgtable.h          |   8 +
 arch/arm64/include/asm/kvm_pkvm.h             |  38 ++
 arch/arm64/kernel/image-vars.h                |  15 -
 arch/arm64/kvm/arm.c                          |  40 +-
 arch/arm64/kvm/hyp/hyp-constants.c            |   3 +
 arch/arm64/kvm/hyp/include/nvhe/gfp.h         |   6 +-
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  19 +-
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  26 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |  18 +-
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  70 +++
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h    |  10 +-
 arch/arm64/kvm/hyp/nvhe/cache.S               |  11 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 105 +++-
 arch/arm64/kvm/hyp/nvhe/hyp-smp.c             |   2 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 456 +++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 136 +++++-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c          |  42 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 438 +++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |  96 ++--
 arch/arm64/kvm/hyp/pgtable.c                  |   9 +
 arch/arm64/kvm/mmu.c                          |  26 +
 arch/arm64/kvm/pkvm.c                         | 121 ++++-
 25 files changed, 1625 insertions(+), 144 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h

-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-06-30 13:57 ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

Hi everyone,

This series has been extracted from the pKVM base support series (aka
"pKVM mega-patch") previously posted here:

  https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/

Unlike that more comprehensive series, this one is fairly fundamental
and does not introduce any new ABI commitments, leaving questions
involving the management of guest private memory and the creation of
protected VMs for future work. Instead, this series extends the pKVM EL2
code so that it can dynamically instantiate and manage VM shadow
structures without the host being able to access them directly. These
shadow structures consist of a shadow VM, a set of shadow vCPUs and the
stage-2 page-table and the pages used to hold them are returned to the
host when the VM is destroyed.

The last patch is marked as RFC because, although it plumbs in the
shadow state, it is woefully inefficient and copies to/from the host
state on every vCPU run. Without the last patch, the new structures are
unused but we move considerably closer to isolating guests from the
host.

The series is based on Marc's rework of the flags
(kvm-arm64/burn-the-flags).

Feedback welcome.

Cheers,

Will, Quentin, Fuad and Marc

Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Marc Zyngier <maz@kernel.org>

Cc: kernel-team@android.com
Cc: kvm@vger.kernel.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-arm-kernel@lists.infradead.org

--->8

Fuad Tabba (3):
  KVM: arm64: Add hyp_spinlock_t static initializer
  KVM: arm64: Introduce shadow VM state at EL2
  KVM: arm64: Instantiate VM shadow data from EL1

Quentin Perret (15):
  KVM: arm64: Move hyp refcount manipulation helpers
  KVM: arm64: Allow non-coalescable pages in a hyp_pool
  KVM: arm64: Add flags to struct hyp_page
  KVM: arm64: Back hyp_vmemmap for all of memory
  KVM: arm64: Make hyp stage-1 refcnt correct on the whole range
  KVM: arm64: Implement do_donate() helper for donating memory
  KVM: arm64: Prevent the donation of no-map pages
  KVM: arm64: Add helpers to pin memory shared with hyp
  KVM: arm64: Add pcpu fixmap infrastructure at EL2
  KVM: arm64: Add generic hyp_memcache helpers
  KVM: arm64: Instantiate guest stage-2 page-tables at EL2
  KVM: arm64: Return guest memory from EL2 via dedicated teardown
    memcache
  KVM: arm64: Unmap kvm_arm_hyp_percpu_base from the host
  KVM: arm64: Explicitly map kvm_vgic_global_state at EL2
  KVM: arm64: Don't map host sections in pkvm

Will Deacon (6):
  KVM: arm64: Unify identifiers used to distinguish host and hypervisor
  KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h
  KVM: arm64: Initialise hyp symbols regardless of pKVM
  KVM: arm64: Provide I-cache invalidation by VA at EL2
  KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2
  KVM: arm64: Use the shadow vCPU structure in handle___kvm_vcpu_run()

 arch/arm64/include/asm/kvm_asm.h              |   6 +-
 arch/arm64/include/asm/kvm_host.h             |  65 +++
 arch/arm64/include/asm/kvm_hyp.h              |   3 +
 arch/arm64/include/asm/kvm_pgtable.h          |   8 +
 arch/arm64/include/asm/kvm_pkvm.h             |  38 ++
 arch/arm64/kernel/image-vars.h                |  15 -
 arch/arm64/kvm/arm.c                          |  40 +-
 arch/arm64/kvm/hyp/hyp-constants.c            |   3 +
 arch/arm64/kvm/hyp/include/nvhe/gfp.h         |   6 +-
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  19 +-
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  26 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |  18 +-
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  70 +++
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h    |  10 +-
 arch/arm64/kvm/hyp/nvhe/cache.S               |  11 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 105 +++-
 arch/arm64/kvm/hyp/nvhe/hyp-smp.c             |   2 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 456 +++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 136 +++++-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c          |  42 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 438 +++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |  96 ++--
 arch/arm64/kvm/hyp/pgtable.c                  |   9 +
 arch/arm64/kvm/mmu.c                          |  26 +
 arch/arm64/kvm/pkvm.c                         | 121 ++++-
 25 files changed, 1625 insertions(+), 144 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h

-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 01/24] KVM: arm64: Move hyp refcount manipulation helpers
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

We will soon need to manipulate struct hyp_page refcounts from outside
page_alloc.c, so move the helpers to a header file.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/memory.h | 18 ++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 19 -------------------
 2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 592b7edb3edb..418b66a82a50 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -45,4 +45,22 @@ static inline int hyp_page_count(void *addr)
 	return p->refcount;
 }
 
+static inline void hyp_page_ref_inc(struct hyp_page *p)
+{
+	BUG_ON(p->refcount == USHRT_MAX);
+	p->refcount++;
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+	BUG_ON(!p->refcount);
+	p->refcount--;
+	return (p->refcount == 0);
+}
+
+static inline void hyp_set_page_refcounted(struct hyp_page *p)
+{
+	BUG_ON(p->refcount);
+	p->refcount = 1;
+}
 #endif /* __KVM_HYP_MEMORY_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index d40f0b30b534..1ded09fc9b10 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -144,25 +144,6 @@ static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
 	return p;
 }
 
-static inline void hyp_page_ref_inc(struct hyp_page *p)
-{
-	BUG_ON(p->refcount == USHRT_MAX);
-	p->refcount++;
-}
-
-static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
-{
-	BUG_ON(!p->refcount);
-	p->refcount--;
-	return (p->refcount == 0);
-}
-
-static inline void hyp_set_page_refcounted(struct hyp_page *p)
-{
-	BUG_ON(p->refcount);
-	p->refcount = 1;
-}
-
 static void __hyp_put_page(struct hyp_pool *pool, struct hyp_page *p)
 {
 	if (hyp_page_ref_dec_and_test(p))
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 01/24] KVM: arm64: Move hyp refcount manipulation helpers
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

We will soon need to manipulate struct hyp_page refcounts from outside
page_alloc.c, so move the helpers to a header file.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/memory.h | 18 ++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 19 -------------------
 2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 592b7edb3edb..418b66a82a50 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -45,4 +45,22 @@ static inline int hyp_page_count(void *addr)
 	return p->refcount;
 }
 
+static inline void hyp_page_ref_inc(struct hyp_page *p)
+{
+	BUG_ON(p->refcount == USHRT_MAX);
+	p->refcount++;
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+	BUG_ON(!p->refcount);
+	p->refcount--;
+	return (p->refcount == 0);
+}
+
+static inline void hyp_set_page_refcounted(struct hyp_page *p)
+{
+	BUG_ON(p->refcount);
+	p->refcount = 1;
+}
 #endif /* __KVM_HYP_MEMORY_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index d40f0b30b534..1ded09fc9b10 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -144,25 +144,6 @@ static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
 	return p;
 }
 
-static inline void hyp_page_ref_inc(struct hyp_page *p)
-{
-	BUG_ON(p->refcount == USHRT_MAX);
-	p->refcount++;
-}
-
-static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
-{
-	BUG_ON(!p->refcount);
-	p->refcount--;
-	return (p->refcount == 0);
-}
-
-static inline void hyp_set_page_refcounted(struct hyp_page *p)
-{
-	BUG_ON(p->refcount);
-	p->refcount = 1;
-}
-
 static void __hyp_put_page(struct hyp_pool *pool, struct hyp_page *p)
 {
 	if (hyp_page_ref_dec_and_test(p))
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 01/24] KVM: arm64: Move hyp refcount manipulation helpers
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

We will soon need to manipulate struct hyp_page refcounts from outside
page_alloc.c, so move the helpers to a header file.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/memory.h | 18 ++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 19 -------------------
 2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 592b7edb3edb..418b66a82a50 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -45,4 +45,22 @@ static inline int hyp_page_count(void *addr)
 	return p->refcount;
 }
 
+static inline void hyp_page_ref_inc(struct hyp_page *p)
+{
+	BUG_ON(p->refcount == USHRT_MAX);
+	p->refcount++;
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+	BUG_ON(!p->refcount);
+	p->refcount--;
+	return (p->refcount == 0);
+}
+
+static inline void hyp_set_page_refcounted(struct hyp_page *p)
+{
+	BUG_ON(p->refcount);
+	p->refcount = 1;
+}
 #endif /* __KVM_HYP_MEMORY_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index d40f0b30b534..1ded09fc9b10 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -144,25 +144,6 @@ static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
 	return p;
 }
 
-static inline void hyp_page_ref_inc(struct hyp_page *p)
-{
-	BUG_ON(p->refcount == USHRT_MAX);
-	p->refcount++;
-}
-
-static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
-{
-	BUG_ON(!p->refcount);
-	p->refcount--;
-	return (p->refcount == 0);
-}
-
-static inline void hyp_set_page_refcounted(struct hyp_page *p)
-{
-	BUG_ON(p->refcount);
-	p->refcount = 1;
-}
-
 static void __hyp_put_page(struct hyp_pool *pool, struct hyp_page *p)
 {
 	if (hyp_page_ref_dec_and_test(p))
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 02/24] KVM: arm64: Allow non-coalescable pages in a hyp_pool
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

All the contiguous pages used to initialize a hyp_pool are considered
coalescable, which means that the hyp page allocator will actively
try to merge them with their buddies on the hyp_put_page() path.
However, using hyp_put_page() on a page that is not part of the inital
memory range given to a hyp_pool() is currently unsupported.

In order to allow dynamically extending hyp pools at run-time, add a
check to __hyp_attach_page() to allow inserting 'external' pages into
the free-list of order 0. This will be necessary to allow lazy
donation of pages from the host to the hypervisor when allocating guest
stage-2 page-table pages at EL2.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/page_alloc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 1ded09fc9b10..0d15227aced8 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -93,11 +93,15 @@ static inline struct hyp_page *node_to_page(struct list_head *node)
 static void __hyp_attach_page(struct hyp_pool *pool,
 			      struct hyp_page *p)
 {
+	phys_addr_t phys = hyp_page_to_phys(p);
 	unsigned short order = p->order;
 	struct hyp_page *buddy;
 
 	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
 
+	if (phys < pool->range_start || phys >= pool->range_end)
+		goto insert;
+
 	/*
 	 * Only the first struct hyp_page of a high-order page (otherwise known
 	 * as the 'head') should have p->order set. The non-head pages should
@@ -116,6 +120,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 		p = min(p, buddy);
 	}
 
+insert:
 	/* Mark the new head, and insert it */
 	p->order = order;
 	page_add_to_list(p, &pool->free_area[order]);
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 02/24] KVM: arm64: Allow non-coalescable pages in a hyp_pool
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

All the contiguous pages used to initialize a hyp_pool are considered
coalescable, which means that the hyp page allocator will actively
try to merge them with their buddies on the hyp_put_page() path.
However, using hyp_put_page() on a page that is not part of the inital
memory range given to a hyp_pool() is currently unsupported.

In order to allow dynamically extending hyp pools at run-time, add a
check to __hyp_attach_page() to allow inserting 'external' pages into
the free-list of order 0. This will be necessary to allow lazy
donation of pages from the host to the hypervisor when allocating guest
stage-2 page-table pages at EL2.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/page_alloc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 1ded09fc9b10..0d15227aced8 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -93,11 +93,15 @@ static inline struct hyp_page *node_to_page(struct list_head *node)
 static void __hyp_attach_page(struct hyp_pool *pool,
 			      struct hyp_page *p)
 {
+	phys_addr_t phys = hyp_page_to_phys(p);
 	unsigned short order = p->order;
 	struct hyp_page *buddy;
 
 	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
 
+	if (phys < pool->range_start || phys >= pool->range_end)
+		goto insert;
+
 	/*
 	 * Only the first struct hyp_page of a high-order page (otherwise known
 	 * as the 'head') should have p->order set. The non-head pages should
@@ -116,6 +120,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 		p = min(p, buddy);
 	}
 
+insert:
 	/* Mark the new head, and insert it */
 	p->order = order;
 	page_add_to_list(p, &pool->free_area[order]);
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 02/24] KVM: arm64: Allow non-coalescable pages in a hyp_pool
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

All the contiguous pages used to initialize a hyp_pool are considered
coalescable, which means that the hyp page allocator will actively
try to merge them with their buddies on the hyp_put_page() path.
However, using hyp_put_page() on a page that is not part of the inital
memory range given to a hyp_pool() is currently unsupported.

In order to allow dynamically extending hyp pools at run-time, add a
check to __hyp_attach_page() to allow inserting 'external' pages into
the free-list of order 0. This will be necessary to allow lazy
donation of pages from the host to the hypervisor when allocating guest
stage-2 page-table pages at EL2.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/page_alloc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 1ded09fc9b10..0d15227aced8 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -93,11 +93,15 @@ static inline struct hyp_page *node_to_page(struct list_head *node)
 static void __hyp_attach_page(struct hyp_pool *pool,
 			      struct hyp_page *p)
 {
+	phys_addr_t phys = hyp_page_to_phys(p);
 	unsigned short order = p->order;
 	struct hyp_page *buddy;
 
 	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
 
+	if (phys < pool->range_start || phys >= pool->range_end)
+		goto insert;
+
 	/*
 	 * Only the first struct hyp_page of a high-order page (otherwise known
 	 * as the 'head') should have p->order set. The non-head pages should
@@ -116,6 +120,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 		p = min(p, buddy);
 	}
 
+insert:
 	/* Mark the new head, and insert it */
 	p->order = order;
 	page_add_to_list(p, &pool->free_area[order]);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 03/24] KVM: arm64: Add flags to struct hyp_page
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

Add a 'flags' field to struct hyp_page, and reduce the size of the order
field to u8 to avoid growing the struct size.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  6 +++---
 arch/arm64/kvm/hyp/include/nvhe/memory.h |  3 ++-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 14 +++++++-------
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index 0a048dc06a7d..9330b13075f8 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -7,7 +7,7 @@
 #include <nvhe/memory.h>
 #include <nvhe/spinlock.h>
 
-#define HYP_NO_ORDER	USHRT_MAX
+#define HYP_NO_ORDER	0xff
 
 struct hyp_pool {
 	/*
@@ -19,11 +19,11 @@ struct hyp_pool {
 	struct list_head free_area[MAX_ORDER];
 	phys_addr_t range_start;
 	phys_addr_t range_end;
-	unsigned short max_order;
+	u8 max_order;
 };
 
 /* Allocation */
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order);
+void *hyp_alloc_pages(struct hyp_pool *pool, u8 order);
 void hyp_split_page(struct hyp_page *page);
 void hyp_get_page(struct hyp_pool *pool, void *addr);
 void hyp_put_page(struct hyp_pool *pool, void *addr);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 418b66a82a50..2681f632e1c1 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -9,7 +9,8 @@
 
 struct hyp_page {
 	unsigned short refcount;
-	unsigned short order;
+	u8 order;
+	u8 flags;
 };
 
 extern u64 __hyp_vmemmap;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 0d15227aced8..e6e4b550752b 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -32,7 +32,7 @@ u64 __hyp_vmemmap;
  */
 static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
 					     struct hyp_page *p,
-					     unsigned short order)
+					     u8 order)
 {
 	phys_addr_t addr = hyp_page_to_phys(p);
 
@@ -51,7 +51,7 @@ static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
 /* Find a buddy page currently available for allocation */
 static struct hyp_page *__find_buddy_avail(struct hyp_pool *pool,
 					   struct hyp_page *p,
-					   unsigned short order)
+					   u8 order)
 {
 	struct hyp_page *buddy = __find_buddy_nocheck(pool, p, order);
 
@@ -94,8 +94,8 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 			      struct hyp_page *p)
 {
 	phys_addr_t phys = hyp_page_to_phys(p);
-	unsigned short order = p->order;
 	struct hyp_page *buddy;
+	u8 order = p->order;
 
 	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
 
@@ -128,7 +128,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 
 static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
 					   struct hyp_page *p,
-					   unsigned short order)
+					   u8 order)
 {
 	struct hyp_page *buddy;
 
@@ -182,7 +182,7 @@ void hyp_get_page(struct hyp_pool *pool, void *addr)
 
 void hyp_split_page(struct hyp_page *p)
 {
-	unsigned short order = p->order;
+	u8 order = p->order;
 	unsigned int i;
 
 	p->order = 0;
@@ -194,10 +194,10 @@ void hyp_split_page(struct hyp_page *p)
 	}
 }
 
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order)
+void *hyp_alloc_pages(struct hyp_pool *pool, u8 order)
 {
-	unsigned short i = order;
 	struct hyp_page *p;
+	u8 i = order;
 
 	hyp_spin_lock(&pool->lock);
 
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 03/24] KVM: arm64: Add flags to struct hyp_page
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Add a 'flags' field to struct hyp_page, and reduce the size of the order
field to u8 to avoid growing the struct size.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  6 +++---
 arch/arm64/kvm/hyp/include/nvhe/memory.h |  3 ++-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 14 +++++++-------
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index 0a048dc06a7d..9330b13075f8 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -7,7 +7,7 @@
 #include <nvhe/memory.h>
 #include <nvhe/spinlock.h>
 
-#define HYP_NO_ORDER	USHRT_MAX
+#define HYP_NO_ORDER	0xff
 
 struct hyp_pool {
 	/*
@@ -19,11 +19,11 @@ struct hyp_pool {
 	struct list_head free_area[MAX_ORDER];
 	phys_addr_t range_start;
 	phys_addr_t range_end;
-	unsigned short max_order;
+	u8 max_order;
 };
 
 /* Allocation */
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order);
+void *hyp_alloc_pages(struct hyp_pool *pool, u8 order);
 void hyp_split_page(struct hyp_page *page);
 void hyp_get_page(struct hyp_pool *pool, void *addr);
 void hyp_put_page(struct hyp_pool *pool, void *addr);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 418b66a82a50..2681f632e1c1 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -9,7 +9,8 @@
 
 struct hyp_page {
 	unsigned short refcount;
-	unsigned short order;
+	u8 order;
+	u8 flags;
 };
 
 extern u64 __hyp_vmemmap;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 0d15227aced8..e6e4b550752b 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -32,7 +32,7 @@ u64 __hyp_vmemmap;
  */
 static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
 					     struct hyp_page *p,
-					     unsigned short order)
+					     u8 order)
 {
 	phys_addr_t addr = hyp_page_to_phys(p);
 
@@ -51,7 +51,7 @@ static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
 /* Find a buddy page currently available for allocation */
 static struct hyp_page *__find_buddy_avail(struct hyp_pool *pool,
 					   struct hyp_page *p,
-					   unsigned short order)
+					   u8 order)
 {
 	struct hyp_page *buddy = __find_buddy_nocheck(pool, p, order);
 
@@ -94,8 +94,8 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 			      struct hyp_page *p)
 {
 	phys_addr_t phys = hyp_page_to_phys(p);
-	unsigned short order = p->order;
 	struct hyp_page *buddy;
+	u8 order = p->order;
 
 	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
 
@@ -128,7 +128,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 
 static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
 					   struct hyp_page *p,
-					   unsigned short order)
+					   u8 order)
 {
 	struct hyp_page *buddy;
 
@@ -182,7 +182,7 @@ void hyp_get_page(struct hyp_pool *pool, void *addr)
 
 void hyp_split_page(struct hyp_page *p)
 {
-	unsigned short order = p->order;
+	u8 order = p->order;
 	unsigned int i;
 
 	p->order = 0;
@@ -194,10 +194,10 @@ void hyp_split_page(struct hyp_page *p)
 	}
 }
 
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order)
+void *hyp_alloc_pages(struct hyp_pool *pool, u8 order)
 {
-	unsigned short i = order;
 	struct hyp_page *p;
+	u8 i = order;
 
 	hyp_spin_lock(&pool->lock);
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 03/24] KVM: arm64: Add flags to struct hyp_page
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Add a 'flags' field to struct hyp_page, and reduce the size of the order
field to u8 to avoid growing the struct size.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  6 +++---
 arch/arm64/kvm/hyp/include/nvhe/memory.h |  3 ++-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 14 +++++++-------
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index 0a048dc06a7d..9330b13075f8 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -7,7 +7,7 @@
 #include <nvhe/memory.h>
 #include <nvhe/spinlock.h>
 
-#define HYP_NO_ORDER	USHRT_MAX
+#define HYP_NO_ORDER	0xff
 
 struct hyp_pool {
 	/*
@@ -19,11 +19,11 @@ struct hyp_pool {
 	struct list_head free_area[MAX_ORDER];
 	phys_addr_t range_start;
 	phys_addr_t range_end;
-	unsigned short max_order;
+	u8 max_order;
 };
 
 /* Allocation */
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order);
+void *hyp_alloc_pages(struct hyp_pool *pool, u8 order);
 void hyp_split_page(struct hyp_page *page);
 void hyp_get_page(struct hyp_pool *pool, void *addr);
 void hyp_put_page(struct hyp_pool *pool, void *addr);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 418b66a82a50..2681f632e1c1 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -9,7 +9,8 @@
 
 struct hyp_page {
 	unsigned short refcount;
-	unsigned short order;
+	u8 order;
+	u8 flags;
 };
 
 extern u64 __hyp_vmemmap;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 0d15227aced8..e6e4b550752b 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -32,7 +32,7 @@ u64 __hyp_vmemmap;
  */
 static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
 					     struct hyp_page *p,
-					     unsigned short order)
+					     u8 order)
 {
 	phys_addr_t addr = hyp_page_to_phys(p);
 
@@ -51,7 +51,7 @@ static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
 /* Find a buddy page currently available for allocation */
 static struct hyp_page *__find_buddy_avail(struct hyp_pool *pool,
 					   struct hyp_page *p,
-					   unsigned short order)
+					   u8 order)
 {
 	struct hyp_page *buddy = __find_buddy_nocheck(pool, p, order);
 
@@ -94,8 +94,8 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 			      struct hyp_page *p)
 {
 	phys_addr_t phys = hyp_page_to_phys(p);
-	unsigned short order = p->order;
 	struct hyp_page *buddy;
+	u8 order = p->order;
 
 	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
 
@@ -128,7 +128,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 
 static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
 					   struct hyp_page *p,
-					   unsigned short order)
+					   u8 order)
 {
 	struct hyp_page *buddy;
 
@@ -182,7 +182,7 @@ void hyp_get_page(struct hyp_pool *pool, void *addr)
 
 void hyp_split_page(struct hyp_page *p)
 {
-	unsigned short order = p->order;
+	u8 order = p->order;
 	unsigned int i;
 
 	p->order = 0;
@@ -194,10 +194,10 @@ void hyp_split_page(struct hyp_page *p)
 	}
 }
 
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order)
+void *hyp_alloc_pages(struct hyp_pool *pool, u8 order)
 {
-	unsigned short i = order;
 	struct hyp_page *p;
+	u8 i = order;
 
 	hyp_spin_lock(&pool->lock);
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 04/24] KVM: arm64: Back hyp_vmemmap for all of memory
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

The EL2 vmemmap in nVHE Protected mode is currently very sparse: only
memory pages owned by the hypervisor itself have a matching struct
hyp_page. But since the size of these structs has been reduced
significantly, it appears that we can afford backing the vmemmap for all
of memory.

This will simplify a lot memory tracking as the hypervisor will have a
place to store metadata (e.g. refcounts) that wouldn't otherwise fit in
the 4 SW bits we have in the host stage-2 page-table for instance.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pkvm.h    | 26 +++++++++++++++++++++++
 arch/arm64/kvm/hyp/include/nvhe/mm.h | 14 +------------
 arch/arm64/kvm/hyp/nvhe/mm.c         | 31 ++++++++++++++++++++++++----
 arch/arm64/kvm/hyp/nvhe/page_alloc.c |  4 +---
 arch/arm64/kvm/hyp/nvhe/setup.c      |  7 +++----
 arch/arm64/kvm/pkvm.c                | 18 ++--------------
 6 files changed, 60 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 9f4ad2a8df59..8f7b8a2314bb 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -14,6 +14,32 @@
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
 extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
+static inline unsigned long
+hyp_vmemmap_memblock_size(struct memblock_region *reg, size_t vmemmap_entry_size)
+{
+	unsigned long nr_pages = reg->size >> PAGE_SHIFT;
+	unsigned long start, end;
+
+	start = (reg->base >> PAGE_SHIFT) * vmemmap_entry_size;
+	end = start + nr_pages * vmemmap_entry_size;
+	start = ALIGN_DOWN(start, PAGE_SIZE);
+	end = ALIGN(end, PAGE_SIZE);
+
+	return end - start;
+}
+
+static inline unsigned long hyp_vmemmap_pages(size_t vmemmap_entry_size)
+{
+	unsigned long res = 0, i;
+
+	for (i = 0; i < kvm_nvhe_sym(hyp_memblock_nr); i++) {
+		res += hyp_vmemmap_memblock_size(&kvm_nvhe_sym(hyp_memory)[i],
+						 vmemmap_entry_size);
+	}
+
+	return res >> PAGE_SHIFT;
+}
+
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
 	unsigned long total = 0, i;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 42d8eb9bfe72..b2ee6d5df55b 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -15,7 +15,7 @@ extern hyp_spinlock_t pkvm_pgd_lock;
 
 int hyp_create_idmap(u32 hyp_va_bits);
 int hyp_map_vectors(void);
-int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
+int hyp_back_vmemmap(phys_addr_t back);
 int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
 int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
@@ -24,16 +24,4 @@ int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 				  unsigned long *haddr);
 int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr);
 
-static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
-				     unsigned long *start, unsigned long *end)
-{
-	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct hyp_page *p = hyp_phys_to_page(phys);
-
-	*start = (unsigned long)p;
-	*end = *start + nr_pages * sizeof(struct hyp_page);
-	*start = ALIGN_DOWN(*start, PAGE_SIZE);
-	*end = ALIGN(*end, PAGE_SIZE);
-}
-
 #endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 96193cb31a39..d3a3b47181de 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -129,13 +129,36 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
 	return ret;
 }
 
-int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
+int hyp_back_vmemmap(phys_addr_t back)
 {
-	unsigned long start, end;
+	unsigned long i, start, size, end = 0;
+	int ret;
 
-	hyp_vmemmap_range(phys, size, &start, &end);
+	for (i = 0; i < hyp_memblock_nr; i++) {
+		start = hyp_memory[i].base;
+		start = ALIGN_DOWN((u64)hyp_phys_to_page(start), PAGE_SIZE);
+		/*
+		 * The begining of the hyp_vmemmap region for the current
+		 * memblock may already be backed by the page backing the end
+		 * the previous region, so avoid mapping it twice.
+		 */
+		start = max(start, end);
+
+		end = hyp_memory[i].base + hyp_memory[i].size;
+		end = PAGE_ALIGN((u64)hyp_phys_to_page(end));
+		if (start >= end)
+			continue;
+
+		size = end - start;
+		ret = __pkvm_create_mappings(start, size, back, PAGE_HYP);
+		if (ret)
+			return ret;
+
+		memset(hyp_phys_to_virt(back), 0, size);
+		back += size;
+	}
 
-	return __pkvm_create_mappings(start, end - start, back, PAGE_HYP);
+	return 0;
 }
 
 static void *__hyp_bp_vect_base;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index e6e4b550752b..01976a58d850 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -235,10 +235,8 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
 
 	/* Init the vmemmap portion */
 	p = hyp_phys_to_page(phys);
-	for (i = 0; i < nr_pages; i++) {
-		p[i].order = 0;
+	for (i = 0; i < nr_pages; i++)
 		hyp_set_page_refcounted(&p[i]);
-	}
 
 	/* Attach the unused pages to the buddy tree */
 	for (i = reserved_pages; i < nr_pages; i++)
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index e8d4ea2fcfa0..579eb4f73476 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -31,12 +31,11 @@ static struct hyp_pool hpool;
 
 static int divide_memory_pool(void *virt, unsigned long size)
 {
-	unsigned long vstart, vend, nr_pages;
+	unsigned long nr_pages;
 
 	hyp_early_alloc_init(virt, size);
 
-	hyp_vmemmap_range(__hyp_pa(virt), size, &vstart, &vend);
-	nr_pages = (vend - vstart) >> PAGE_SHIFT;
+	nr_pages = hyp_vmemmap_pages(sizeof(struct hyp_page));
 	vmemmap_base = hyp_early_alloc_contig(nr_pages);
 	if (!vmemmap_base)
 		return -ENOMEM;
@@ -78,7 +77,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	if (ret)
 		return ret;
 
-	ret = hyp_back_vmemmap(phys, size, hyp_virt_to_phys(vmemmap_base));
+	ret = hyp_back_vmemmap(hyp_virt_to_phys(vmemmap_base));
 	if (ret)
 		return ret;
 
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index ebecb7c045f4..34229425b25d 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -53,7 +53,7 @@ static int __init register_memblock_regions(void)
 
 void __init kvm_hyp_reserve(void)
 {
-	u64 nr_pages, prev, hyp_mem_pages = 0;
+	u64 hyp_mem_pages = 0;
 	int ret;
 
 	if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
@@ -71,21 +71,7 @@ void __init kvm_hyp_reserve(void)
 
 	hyp_mem_pages += hyp_s1_pgtable_pages();
 	hyp_mem_pages += host_s2_pgtable_pages();
-
-	/*
-	 * The hyp_vmemmap needs to be backed by pages, but these pages
-	 * themselves need to be present in the vmemmap, so compute the number
-	 * of pages needed by looking for a fixed point.
-	 */
-	nr_pages = 0;
-	do {
-		prev = nr_pages;
-		nr_pages = hyp_mem_pages + prev;
-		nr_pages = DIV_ROUND_UP(nr_pages * STRUCT_HYP_PAGE_SIZE,
-					PAGE_SIZE);
-		nr_pages += __hyp_pgtable_max_pages(nr_pages);
-	} while (nr_pages != prev);
-	hyp_mem_pages += nr_pages;
+	hyp_mem_pages += hyp_vmemmap_pages(STRUCT_HYP_PAGE_SIZE);
 
 	/*
 	 * Try to allocate a PMD-aligned region to reduce TLB pressure once
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 04/24] KVM: arm64: Back hyp_vmemmap for all of memory
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

The EL2 vmemmap in nVHE Protected mode is currently very sparse: only
memory pages owned by the hypervisor itself have a matching struct
hyp_page. But since the size of these structs has been reduced
significantly, it appears that we can afford backing the vmemmap for all
of memory.

This will simplify a lot memory tracking as the hypervisor will have a
place to store metadata (e.g. refcounts) that wouldn't otherwise fit in
the 4 SW bits we have in the host stage-2 page-table for instance.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pkvm.h    | 26 +++++++++++++++++++++++
 arch/arm64/kvm/hyp/include/nvhe/mm.h | 14 +------------
 arch/arm64/kvm/hyp/nvhe/mm.c         | 31 ++++++++++++++++++++++++----
 arch/arm64/kvm/hyp/nvhe/page_alloc.c |  4 +---
 arch/arm64/kvm/hyp/nvhe/setup.c      |  7 +++----
 arch/arm64/kvm/pkvm.c                | 18 ++--------------
 6 files changed, 60 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 9f4ad2a8df59..8f7b8a2314bb 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -14,6 +14,32 @@
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
 extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
+static inline unsigned long
+hyp_vmemmap_memblock_size(struct memblock_region *reg, size_t vmemmap_entry_size)
+{
+	unsigned long nr_pages = reg->size >> PAGE_SHIFT;
+	unsigned long start, end;
+
+	start = (reg->base >> PAGE_SHIFT) * vmemmap_entry_size;
+	end = start + nr_pages * vmemmap_entry_size;
+	start = ALIGN_DOWN(start, PAGE_SIZE);
+	end = ALIGN(end, PAGE_SIZE);
+
+	return end - start;
+}
+
+static inline unsigned long hyp_vmemmap_pages(size_t vmemmap_entry_size)
+{
+	unsigned long res = 0, i;
+
+	for (i = 0; i < kvm_nvhe_sym(hyp_memblock_nr); i++) {
+		res += hyp_vmemmap_memblock_size(&kvm_nvhe_sym(hyp_memory)[i],
+						 vmemmap_entry_size);
+	}
+
+	return res >> PAGE_SHIFT;
+}
+
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
 	unsigned long total = 0, i;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 42d8eb9bfe72..b2ee6d5df55b 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -15,7 +15,7 @@ extern hyp_spinlock_t pkvm_pgd_lock;
 
 int hyp_create_idmap(u32 hyp_va_bits);
 int hyp_map_vectors(void);
-int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
+int hyp_back_vmemmap(phys_addr_t back);
 int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
 int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
@@ -24,16 +24,4 @@ int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 				  unsigned long *haddr);
 int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr);
 
-static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
-				     unsigned long *start, unsigned long *end)
-{
-	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct hyp_page *p = hyp_phys_to_page(phys);
-
-	*start = (unsigned long)p;
-	*end = *start + nr_pages * sizeof(struct hyp_page);
-	*start = ALIGN_DOWN(*start, PAGE_SIZE);
-	*end = ALIGN(*end, PAGE_SIZE);
-}
-
 #endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 96193cb31a39..d3a3b47181de 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -129,13 +129,36 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
 	return ret;
 }
 
-int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
+int hyp_back_vmemmap(phys_addr_t back)
 {
-	unsigned long start, end;
+	unsigned long i, start, size, end = 0;
+	int ret;
 
-	hyp_vmemmap_range(phys, size, &start, &end);
+	for (i = 0; i < hyp_memblock_nr; i++) {
+		start = hyp_memory[i].base;
+		start = ALIGN_DOWN((u64)hyp_phys_to_page(start), PAGE_SIZE);
+		/*
+		 * The begining of the hyp_vmemmap region for the current
+		 * memblock may already be backed by the page backing the end
+		 * the previous region, so avoid mapping it twice.
+		 */
+		start = max(start, end);
+
+		end = hyp_memory[i].base + hyp_memory[i].size;
+		end = PAGE_ALIGN((u64)hyp_phys_to_page(end));
+		if (start >= end)
+			continue;
+
+		size = end - start;
+		ret = __pkvm_create_mappings(start, size, back, PAGE_HYP);
+		if (ret)
+			return ret;
+
+		memset(hyp_phys_to_virt(back), 0, size);
+		back += size;
+	}
 
-	return __pkvm_create_mappings(start, end - start, back, PAGE_HYP);
+	return 0;
 }
 
 static void *__hyp_bp_vect_base;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index e6e4b550752b..01976a58d850 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -235,10 +235,8 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
 
 	/* Init the vmemmap portion */
 	p = hyp_phys_to_page(phys);
-	for (i = 0; i < nr_pages; i++) {
-		p[i].order = 0;
+	for (i = 0; i < nr_pages; i++)
 		hyp_set_page_refcounted(&p[i]);
-	}
 
 	/* Attach the unused pages to the buddy tree */
 	for (i = reserved_pages; i < nr_pages; i++)
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index e8d4ea2fcfa0..579eb4f73476 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -31,12 +31,11 @@ static struct hyp_pool hpool;
 
 static int divide_memory_pool(void *virt, unsigned long size)
 {
-	unsigned long vstart, vend, nr_pages;
+	unsigned long nr_pages;
 
 	hyp_early_alloc_init(virt, size);
 
-	hyp_vmemmap_range(__hyp_pa(virt), size, &vstart, &vend);
-	nr_pages = (vend - vstart) >> PAGE_SHIFT;
+	nr_pages = hyp_vmemmap_pages(sizeof(struct hyp_page));
 	vmemmap_base = hyp_early_alloc_contig(nr_pages);
 	if (!vmemmap_base)
 		return -ENOMEM;
@@ -78,7 +77,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	if (ret)
 		return ret;
 
-	ret = hyp_back_vmemmap(phys, size, hyp_virt_to_phys(vmemmap_base));
+	ret = hyp_back_vmemmap(hyp_virt_to_phys(vmemmap_base));
 	if (ret)
 		return ret;
 
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index ebecb7c045f4..34229425b25d 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -53,7 +53,7 @@ static int __init register_memblock_regions(void)
 
 void __init kvm_hyp_reserve(void)
 {
-	u64 nr_pages, prev, hyp_mem_pages = 0;
+	u64 hyp_mem_pages = 0;
 	int ret;
 
 	if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
@@ -71,21 +71,7 @@ void __init kvm_hyp_reserve(void)
 
 	hyp_mem_pages += hyp_s1_pgtable_pages();
 	hyp_mem_pages += host_s2_pgtable_pages();
-
-	/*
-	 * The hyp_vmemmap needs to be backed by pages, but these pages
-	 * themselves need to be present in the vmemmap, so compute the number
-	 * of pages needed by looking for a fixed point.
-	 */
-	nr_pages = 0;
-	do {
-		prev = nr_pages;
-		nr_pages = hyp_mem_pages + prev;
-		nr_pages = DIV_ROUND_UP(nr_pages * STRUCT_HYP_PAGE_SIZE,
-					PAGE_SIZE);
-		nr_pages += __hyp_pgtable_max_pages(nr_pages);
-	} while (nr_pages != prev);
-	hyp_mem_pages += nr_pages;
+	hyp_mem_pages += hyp_vmemmap_pages(STRUCT_HYP_PAGE_SIZE);
 
 	/*
 	 * Try to allocate a PMD-aligned region to reduce TLB pressure once
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 04/24] KVM: arm64: Back hyp_vmemmap for all of memory
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

The EL2 vmemmap in nVHE Protected mode is currently very sparse: only
memory pages owned by the hypervisor itself have a matching struct
hyp_page. But since the size of these structs has been reduced
significantly, it appears that we can afford backing the vmemmap for all
of memory.

This will simplify a lot memory tracking as the hypervisor will have a
place to store metadata (e.g. refcounts) that wouldn't otherwise fit in
the 4 SW bits we have in the host stage-2 page-table for instance.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pkvm.h    | 26 +++++++++++++++++++++++
 arch/arm64/kvm/hyp/include/nvhe/mm.h | 14 +------------
 arch/arm64/kvm/hyp/nvhe/mm.c         | 31 ++++++++++++++++++++++++----
 arch/arm64/kvm/hyp/nvhe/page_alloc.c |  4 +---
 arch/arm64/kvm/hyp/nvhe/setup.c      |  7 +++----
 arch/arm64/kvm/pkvm.c                | 18 ++--------------
 6 files changed, 60 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 9f4ad2a8df59..8f7b8a2314bb 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -14,6 +14,32 @@
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
 extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
+static inline unsigned long
+hyp_vmemmap_memblock_size(struct memblock_region *reg, size_t vmemmap_entry_size)
+{
+	unsigned long nr_pages = reg->size >> PAGE_SHIFT;
+	unsigned long start, end;
+
+	start = (reg->base >> PAGE_SHIFT) * vmemmap_entry_size;
+	end = start + nr_pages * vmemmap_entry_size;
+	start = ALIGN_DOWN(start, PAGE_SIZE);
+	end = ALIGN(end, PAGE_SIZE);
+
+	return end - start;
+}
+
+static inline unsigned long hyp_vmemmap_pages(size_t vmemmap_entry_size)
+{
+	unsigned long res = 0, i;
+
+	for (i = 0; i < kvm_nvhe_sym(hyp_memblock_nr); i++) {
+		res += hyp_vmemmap_memblock_size(&kvm_nvhe_sym(hyp_memory)[i],
+						 vmemmap_entry_size);
+	}
+
+	return res >> PAGE_SHIFT;
+}
+
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
 	unsigned long total = 0, i;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 42d8eb9bfe72..b2ee6d5df55b 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -15,7 +15,7 @@ extern hyp_spinlock_t pkvm_pgd_lock;
 
 int hyp_create_idmap(u32 hyp_va_bits);
 int hyp_map_vectors(void);
-int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
+int hyp_back_vmemmap(phys_addr_t back);
 int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
 int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
@@ -24,16 +24,4 @@ int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 				  unsigned long *haddr);
 int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr);
 
-static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
-				     unsigned long *start, unsigned long *end)
-{
-	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct hyp_page *p = hyp_phys_to_page(phys);
-
-	*start = (unsigned long)p;
-	*end = *start + nr_pages * sizeof(struct hyp_page);
-	*start = ALIGN_DOWN(*start, PAGE_SIZE);
-	*end = ALIGN(*end, PAGE_SIZE);
-}
-
 #endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 96193cb31a39..d3a3b47181de 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -129,13 +129,36 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
 	return ret;
 }
 
-int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
+int hyp_back_vmemmap(phys_addr_t back)
 {
-	unsigned long start, end;
+	unsigned long i, start, size, end = 0;
+	int ret;
 
-	hyp_vmemmap_range(phys, size, &start, &end);
+	for (i = 0; i < hyp_memblock_nr; i++) {
+		start = hyp_memory[i].base;
+		start = ALIGN_DOWN((u64)hyp_phys_to_page(start), PAGE_SIZE);
+		/*
+		 * The begining of the hyp_vmemmap region for the current
+		 * memblock may already be backed by the page backing the end
+		 * the previous region, so avoid mapping it twice.
+		 */
+		start = max(start, end);
+
+		end = hyp_memory[i].base + hyp_memory[i].size;
+		end = PAGE_ALIGN((u64)hyp_phys_to_page(end));
+		if (start >= end)
+			continue;
+
+		size = end - start;
+		ret = __pkvm_create_mappings(start, size, back, PAGE_HYP);
+		if (ret)
+			return ret;
+
+		memset(hyp_phys_to_virt(back), 0, size);
+		back += size;
+	}
 
-	return __pkvm_create_mappings(start, end - start, back, PAGE_HYP);
+	return 0;
 }
 
 static void *__hyp_bp_vect_base;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index e6e4b550752b..01976a58d850 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -235,10 +235,8 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
 
 	/* Init the vmemmap portion */
 	p = hyp_phys_to_page(phys);
-	for (i = 0; i < nr_pages; i++) {
-		p[i].order = 0;
+	for (i = 0; i < nr_pages; i++)
 		hyp_set_page_refcounted(&p[i]);
-	}
 
 	/* Attach the unused pages to the buddy tree */
 	for (i = reserved_pages; i < nr_pages; i++)
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index e8d4ea2fcfa0..579eb4f73476 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -31,12 +31,11 @@ static struct hyp_pool hpool;
 
 static int divide_memory_pool(void *virt, unsigned long size)
 {
-	unsigned long vstart, vend, nr_pages;
+	unsigned long nr_pages;
 
 	hyp_early_alloc_init(virt, size);
 
-	hyp_vmemmap_range(__hyp_pa(virt), size, &vstart, &vend);
-	nr_pages = (vend - vstart) >> PAGE_SHIFT;
+	nr_pages = hyp_vmemmap_pages(sizeof(struct hyp_page));
 	vmemmap_base = hyp_early_alloc_contig(nr_pages);
 	if (!vmemmap_base)
 		return -ENOMEM;
@@ -78,7 +77,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	if (ret)
 		return ret;
 
-	ret = hyp_back_vmemmap(phys, size, hyp_virt_to_phys(vmemmap_base));
+	ret = hyp_back_vmemmap(hyp_virt_to_phys(vmemmap_base));
 	if (ret)
 		return ret;
 
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index ebecb7c045f4..34229425b25d 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -53,7 +53,7 @@ static int __init register_memblock_regions(void)
 
 void __init kvm_hyp_reserve(void)
 {
-	u64 nr_pages, prev, hyp_mem_pages = 0;
+	u64 hyp_mem_pages = 0;
 	int ret;
 
 	if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
@@ -71,21 +71,7 @@ void __init kvm_hyp_reserve(void)
 
 	hyp_mem_pages += hyp_s1_pgtable_pages();
 	hyp_mem_pages += host_s2_pgtable_pages();
-
-	/*
-	 * The hyp_vmemmap needs to be backed by pages, but these pages
-	 * themselves need to be present in the vmemmap, so compute the number
-	 * of pages needed by looking for a fixed point.
-	 */
-	nr_pages = 0;
-	do {
-		prev = nr_pages;
-		nr_pages = hyp_mem_pages + prev;
-		nr_pages = DIV_ROUND_UP(nr_pages * STRUCT_HYP_PAGE_SIZE,
-					PAGE_SIZE);
-		nr_pages += __hyp_pgtable_max_pages(nr_pages);
-	} while (nr_pages != prev);
-	hyp_mem_pages += nr_pages;
+	hyp_mem_pages += hyp_vmemmap_pages(STRUCT_HYP_PAGE_SIZE);
 
 	/*
 	 * Try to allocate a PMD-aligned region to reduce TLB pressure once
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 05/24] KVM: arm64: Make hyp stage-1 refcnt correct on the whole range
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

We currently fixup the hypervisor stage-1 refcount only for specific
portions of the hyp stage-1 VA space. In order to allow unmapping pages
outside of these ranges, let's fixup the refcount for the entire hyp VA
space.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 62 +++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 579eb4f73476..8f2726d7e201 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -185,12 +185,11 @@ static void hpool_put_page(void *addr)
 	hyp_put_page(&hpool, addr);
 }
 
-static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
-					 kvm_pte_t *ptep,
-					 enum kvm_pgtable_walk_flags flag,
-					 void * const arg)
+static int fix_host_ownership_walker(u64 addr, u64 end, u32 level,
+				     kvm_pte_t *ptep,
+				     enum kvm_pgtable_walk_flags flag,
+				     void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
 	kvm_pte_t pte = *ptep;
@@ -199,15 +198,6 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	/*
-	 * Fix-up the refcount for the page-table pages as the early allocator
-	 * was unable to access the hyp_vmemmap and so the buddy allocator has
-	 * initialised the refcount to '1'.
-	 */
-	mm_ops->get_page(ptep);
-	if (flag != KVM_PGTABLE_WALK_LEAF)
-		return 0;
-
 	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
@@ -236,12 +226,30 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	return host_stage2_idmap_locked(phys, PAGE_SIZE, prot);
 }
 
-static int finalize_host_mappings(void)
+static int fix_hyp_pgtable_refcnt_walker(u64 addr, u64 end, u32 level,
+					 kvm_pte_t *ptep,
+					 enum kvm_pgtable_walk_flags flag,
+					 void * const arg)
+{
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	kvm_pte_t pte = *ptep;
+
+	/*
+	 * Fix-up the refcount for the page-table pages as the early allocator
+	 * was unable to access the hyp_vmemmap and so the buddy allocator has
+	 * initialised the refcount to '1'.
+	 */
+	if (kvm_pte_valid(pte))
+		mm_ops->get_page(ptep);
+
+	return 0;
+}
+
+static int fix_host_ownership(void)
 {
 	struct kvm_pgtable_walker walker = {
-		.cb	= finalize_host_mappings_walker,
-		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pkvm_pgtable.mm_ops,
+		.cb	= fix_host_ownership_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
 	};
 	int i, ret;
 
@@ -257,6 +265,18 @@ static int finalize_host_mappings(void)
 	return 0;
 }
 
+static int fix_hyp_pgtable_refcnt(void)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= fix_hyp_pgtable_refcnt_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
+		.arg	= pkvm_pgtable.mm_ops,
+	};
+
+	return kvm_pgtable_walk(&pkvm_pgtable, 0, BIT(pkvm_pgtable.ia_bits),
+				&walker);
+}
+
 void __noreturn __pkvm_init_finalise(void)
 {
 	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
@@ -286,7 +306,11 @@ void __noreturn __pkvm_init_finalise(void)
 	};
 	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;
 
-	ret = finalize_host_mappings();
+	ret = fix_host_ownership();
+	if (ret)
+		goto out;
+
+	ret = fix_hyp_pgtable_refcnt();
 	if (ret)
 		goto out;
 
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 05/24] KVM: arm64: Make hyp stage-1 refcnt correct on the whole range
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

We currently fixup the hypervisor stage-1 refcount only for specific
portions of the hyp stage-1 VA space. In order to allow unmapping pages
outside of these ranges, let's fixup the refcount for the entire hyp VA
space.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 62 +++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 579eb4f73476..8f2726d7e201 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -185,12 +185,11 @@ static void hpool_put_page(void *addr)
 	hyp_put_page(&hpool, addr);
 }
 
-static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
-					 kvm_pte_t *ptep,
-					 enum kvm_pgtable_walk_flags flag,
-					 void * const arg)
+static int fix_host_ownership_walker(u64 addr, u64 end, u32 level,
+				     kvm_pte_t *ptep,
+				     enum kvm_pgtable_walk_flags flag,
+				     void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
 	kvm_pte_t pte = *ptep;
@@ -199,15 +198,6 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	/*
-	 * Fix-up the refcount for the page-table pages as the early allocator
-	 * was unable to access the hyp_vmemmap and so the buddy allocator has
-	 * initialised the refcount to '1'.
-	 */
-	mm_ops->get_page(ptep);
-	if (flag != KVM_PGTABLE_WALK_LEAF)
-		return 0;
-
 	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
@@ -236,12 +226,30 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	return host_stage2_idmap_locked(phys, PAGE_SIZE, prot);
 }
 
-static int finalize_host_mappings(void)
+static int fix_hyp_pgtable_refcnt_walker(u64 addr, u64 end, u32 level,
+					 kvm_pte_t *ptep,
+					 enum kvm_pgtable_walk_flags flag,
+					 void * const arg)
+{
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	kvm_pte_t pte = *ptep;
+
+	/*
+	 * Fix-up the refcount for the page-table pages as the early allocator
+	 * was unable to access the hyp_vmemmap and so the buddy allocator has
+	 * initialised the refcount to '1'.
+	 */
+	if (kvm_pte_valid(pte))
+		mm_ops->get_page(ptep);
+
+	return 0;
+}
+
+static int fix_host_ownership(void)
 {
 	struct kvm_pgtable_walker walker = {
-		.cb	= finalize_host_mappings_walker,
-		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pkvm_pgtable.mm_ops,
+		.cb	= fix_host_ownership_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
 	};
 	int i, ret;
 
@@ -257,6 +265,18 @@ static int finalize_host_mappings(void)
 	return 0;
 }
 
+static int fix_hyp_pgtable_refcnt(void)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= fix_hyp_pgtable_refcnt_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
+		.arg	= pkvm_pgtable.mm_ops,
+	};
+
+	return kvm_pgtable_walk(&pkvm_pgtable, 0, BIT(pkvm_pgtable.ia_bits),
+				&walker);
+}
+
 void __noreturn __pkvm_init_finalise(void)
 {
 	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
@@ -286,7 +306,11 @@ void __noreturn __pkvm_init_finalise(void)
 	};
 	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;
 
-	ret = finalize_host_mappings();
+	ret = fix_host_ownership();
+	if (ret)
+		goto out;
+
+	ret = fix_hyp_pgtable_refcnt();
 	if (ret)
 		goto out;
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 05/24] KVM: arm64: Make hyp stage-1 refcnt correct on the whole range
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

We currently fixup the hypervisor stage-1 refcount only for specific
portions of the hyp stage-1 VA space. In order to allow unmapping pages
outside of these ranges, let's fixup the refcount for the entire hyp VA
space.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 62 +++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 579eb4f73476..8f2726d7e201 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -185,12 +185,11 @@ static void hpool_put_page(void *addr)
 	hyp_put_page(&hpool, addr);
 }
 
-static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
-					 kvm_pte_t *ptep,
-					 enum kvm_pgtable_walk_flags flag,
-					 void * const arg)
+static int fix_host_ownership_walker(u64 addr, u64 end, u32 level,
+				     kvm_pte_t *ptep,
+				     enum kvm_pgtable_walk_flags flag,
+				     void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
 	kvm_pte_t pte = *ptep;
@@ -199,15 +198,6 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	/*
-	 * Fix-up the refcount for the page-table pages as the early allocator
-	 * was unable to access the hyp_vmemmap and so the buddy allocator has
-	 * initialised the refcount to '1'.
-	 */
-	mm_ops->get_page(ptep);
-	if (flag != KVM_PGTABLE_WALK_LEAF)
-		return 0;
-
 	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
@@ -236,12 +226,30 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	return host_stage2_idmap_locked(phys, PAGE_SIZE, prot);
 }
 
-static int finalize_host_mappings(void)
+static int fix_hyp_pgtable_refcnt_walker(u64 addr, u64 end, u32 level,
+					 kvm_pte_t *ptep,
+					 enum kvm_pgtable_walk_flags flag,
+					 void * const arg)
+{
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	kvm_pte_t pte = *ptep;
+
+	/*
+	 * Fix-up the refcount for the page-table pages as the early allocator
+	 * was unable to access the hyp_vmemmap and so the buddy allocator has
+	 * initialised the refcount to '1'.
+	 */
+	if (kvm_pte_valid(pte))
+		mm_ops->get_page(ptep);
+
+	return 0;
+}
+
+static int fix_host_ownership(void)
 {
 	struct kvm_pgtable_walker walker = {
-		.cb	= finalize_host_mappings_walker,
-		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pkvm_pgtable.mm_ops,
+		.cb	= fix_host_ownership_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
 	};
 	int i, ret;
 
@@ -257,6 +265,18 @@ static int finalize_host_mappings(void)
 	return 0;
 }
 
+static int fix_hyp_pgtable_refcnt(void)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= fix_hyp_pgtable_refcnt_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
+		.arg	= pkvm_pgtable.mm_ops,
+	};
+
+	return kvm_pgtable_walk(&pkvm_pgtable, 0, BIT(pkvm_pgtable.ia_bits),
+				&walker);
+}
+
 void __noreturn __pkvm_init_finalise(void)
 {
 	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
@@ -286,7 +306,11 @@ void __noreturn __pkvm_init_finalise(void)
 	};
 	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;
 
-	ret = finalize_host_mappings();
+	ret = fix_host_ownership();
+	if (ret)
+		goto out;
+
+	ret = fix_hyp_pgtable_refcnt();
 	if (ret)
 		goto out;
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 06/24] KVM: arm64: Unify identifiers used to distinguish host and hypervisor
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

The 'pkvm_component_id' enum type provides constants to refer to the
host and the hypervisor, yet this information is duplicated by the
'pkvm_hyp_id' constant.

Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
type definition to 'mem_protect.h' so that it can be used outside of
the memory protection code.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +++++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 8 --------
 arch/arm64/kvm/hyp/nvhe/setup.c               | 2 +-
 3 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 80e99836eac7..f5705a1e972f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -51,7 +51,11 @@ struct host_kvm {
 };
 extern struct host_kvm host_kvm;
 
-extern const u8 pkvm_hyp_id;
+/* This corresponds to page-table locking order */
+enum pkvm_component_id {
+	PKVM_ID_HOST,
+	PKVM_ID_HYP,
+};
 
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 78edf077fa3b..10390b8dc841 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -26,8 +26,6 @@ struct host_kvm host_kvm;
 
 static struct hyp_pool host_s2_pool;
 
-const u8 pkvm_hyp_id = 1;
-
 static void host_lock_component(void)
 {
 	hyp_spin_lock(&host_kvm.lock);
@@ -384,12 +382,6 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 	BUG_ON(ret && ret != -EAGAIN);
 }
 
-/* This corresponds to locking order */
-enum pkvm_component_id {
-	PKVM_ID_HOST,
-	PKVM_ID_HYP,
-};
-
 struct pkvm_mem_transition {
 	u64				nr_pages;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 8f2726d7e201..0312c9c74a5a 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -212,7 +212,7 @@ static int fix_host_ownership_walker(u64 addr, u64 end, u32 level,
 	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
 	switch (state) {
 	case PKVM_PAGE_OWNED:
-		return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
+		return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP);
 	case PKVM_PAGE_SHARED_OWNED:
 		prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_BORROWED);
 		break;
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 06/24] KVM: arm64: Unify identifiers used to distinguish host and hypervisor
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

The 'pkvm_component_id' enum type provides constants to refer to the
host and the hypervisor, yet this information is duplicated by the
'pkvm_hyp_id' constant.

Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
type definition to 'mem_protect.h' so that it can be used outside of
the memory protection code.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +++++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 8 --------
 arch/arm64/kvm/hyp/nvhe/setup.c               | 2 +-
 3 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 80e99836eac7..f5705a1e972f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -51,7 +51,11 @@ struct host_kvm {
 };
 extern struct host_kvm host_kvm;
 
-extern const u8 pkvm_hyp_id;
+/* This corresponds to page-table locking order */
+enum pkvm_component_id {
+	PKVM_ID_HOST,
+	PKVM_ID_HYP,
+};
 
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 78edf077fa3b..10390b8dc841 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -26,8 +26,6 @@ struct host_kvm host_kvm;
 
 static struct hyp_pool host_s2_pool;
 
-const u8 pkvm_hyp_id = 1;
-
 static void host_lock_component(void)
 {
 	hyp_spin_lock(&host_kvm.lock);
@@ -384,12 +382,6 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 	BUG_ON(ret && ret != -EAGAIN);
 }
 
-/* This corresponds to locking order */
-enum pkvm_component_id {
-	PKVM_ID_HOST,
-	PKVM_ID_HYP,
-};
-
 struct pkvm_mem_transition {
 	u64				nr_pages;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 8f2726d7e201..0312c9c74a5a 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -212,7 +212,7 @@ static int fix_host_ownership_walker(u64 addr, u64 end, u32 level,
 	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
 	switch (state) {
 	case PKVM_PAGE_OWNED:
-		return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
+		return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP);
 	case PKVM_PAGE_SHARED_OWNED:
 		prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_BORROWED);
 		break;
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 06/24] KVM: arm64: Unify identifiers used to distinguish host and hypervisor
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

The 'pkvm_component_id' enum type provides constants to refer to the
host and the hypervisor, yet this information is duplicated by the
'pkvm_hyp_id' constant.

Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
type definition to 'mem_protect.h' so that it can be used outside of
the memory protection code.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +++++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 8 --------
 arch/arm64/kvm/hyp/nvhe/setup.c               | 2 +-
 3 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 80e99836eac7..f5705a1e972f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -51,7 +51,11 @@ struct host_kvm {
 };
 extern struct host_kvm host_kvm;
 
-extern const u8 pkvm_hyp_id;
+/* This corresponds to page-table locking order */
+enum pkvm_component_id {
+	PKVM_ID_HOST,
+	PKVM_ID_HYP,
+};
 
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 78edf077fa3b..10390b8dc841 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -26,8 +26,6 @@ struct host_kvm host_kvm;
 
 static struct hyp_pool host_s2_pool;
 
-const u8 pkvm_hyp_id = 1;
-
 static void host_lock_component(void)
 {
 	hyp_spin_lock(&host_kvm.lock);
@@ -384,12 +382,6 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 	BUG_ON(ret && ret != -EAGAIN);
 }
 
-/* This corresponds to locking order */
-enum pkvm_component_id {
-	PKVM_ID_HOST,
-	PKVM_ID_HYP,
-};
-
 struct pkvm_mem_transition {
 	u64				nr_pages;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 8f2726d7e201..0312c9c74a5a 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -212,7 +212,7 @@ static int fix_host_ownership_walker(u64 addr, u64 end, u32 level,
 	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
 	switch (state) {
 	case PKVM_PAGE_OWNED:
-		return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
+		return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP);
 	case PKVM_PAGE_SHARED_OWNED:
 		prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_BORROWED);
 		break;
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 07/24] KVM: arm64: Implement do_donate() helper for donating memory
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

Transferring ownership information of a memory region from one component
to another can be achieved using a "donate" operation, which results
in the previous owner losing access to the underlying pages entirely.

Implement a do_donate() helper, along the same lines as do_{un,}share,
and provide this functionality for the host-{to,from}-hyp cases as this
will later be used to donate/reclaim memory pages to store VM metadata
at EL2.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   2 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 239 ++++++++++++++++++
 2 files changed, 241 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index f5705a1e972f..c87b19b2d468 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -60,6 +60,8 @@ enum pkvm_component_id {
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
 int __pkvm_host_unshare_hyp(u64 pfn);
+int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
+int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 10390b8dc841..f475d554c9fd 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -395,6 +395,9 @@ struct pkvm_mem_transition {
 				/* Address in the completer's address space */
 				u64	completer_addr;
 			} host;
+			struct {
+				u64	completer_addr;
+			} hyp;
 		};
 	} initiator;
 
@@ -408,6 +411,10 @@ struct pkvm_mem_share {
 	const enum kvm_pgtable_prot		completer_prot;
 };
 
+struct pkvm_mem_donation {
+	const struct pkvm_mem_transition	tx;
+};
+
 struct check_walk_data {
 	enum pkvm_page_state	desired;
 	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
@@ -507,6 +514,46 @@ static int host_initiate_unshare(u64 *completer_addr,
 	return __host_set_page_state_range(addr, size, PKVM_PAGE_OWNED);
 }
 
+static int host_initiate_donation(u64 *completer_addr,
+				  const struct pkvm_mem_transition *tx)
+{
+	u8 owner_id = tx->completer.id;
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	*completer_addr = tx->initiator.host.completer_addr;
+	return host_stage2_set_owner_locked(tx->initiator.addr, size, owner_id);
+}
+
+static bool __host_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
+{
+	return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
+		 tx->initiator.id != PKVM_ID_HYP);
+}
+
+static int __host_ack_transition(u64 addr, const struct pkvm_mem_transition *tx,
+				 enum pkvm_page_state state)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	if (__host_ack_skip_pgtable_check(tx))
+		return 0;
+
+	return __host_check_page_state_range(addr, size, state);
+}
+
+static int host_ack_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	return __host_ack_transition(addr, tx, PKVM_NOPAGE);
+}
+
+static int host_complete_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	u8 host_id = tx->completer.id;
+
+	return host_stage2_set_owner_locked(addr, size, host_id);
+}
+
 static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte)
 {
 	if (!kvm_pte_valid(pte))
@@ -527,6 +574,27 @@ static int __hyp_check_page_state_range(u64 addr, u64 size,
 	return check_page_state_range(&pkvm_pgtable, addr, size, &d);
 }
 
+static int hyp_request_donation(u64 *completer_addr,
+				const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	u64 addr = tx->initiator.addr;
+
+	*completer_addr = tx->initiator.hyp.completer_addr;
+	return __hyp_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
+}
+
+static int hyp_initiate_donation(u64 *completer_addr,
+				 const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	int ret;
+
+	*completer_addr = tx->initiator.hyp.completer_addr;
+	ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, tx->initiator.addr, size);
+	return (ret != size) ? -EFAULT : 0;
+}
+
 static bool __hyp_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
 {
 	return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
@@ -558,6 +626,16 @@ static int hyp_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 					    PKVM_PAGE_SHARED_BORROWED);
 }
 
+static int hyp_ack_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	if (__hyp_ack_skip_pgtable_check(tx))
+		return 0;
+
+	return __hyp_check_page_state_range(addr, size, PKVM_NOPAGE);
+}
+
 static int hyp_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
 			      enum kvm_pgtable_prot perms)
 {
@@ -576,6 +654,15 @@ static int hyp_complete_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 	return (ret != size) ? -EFAULT : 0;
 }
 
+static int hyp_complete_donation(u64 addr,
+				 const struct pkvm_mem_transition *tx)
+{
+	void *start = (void *)addr, *end = start + (tx->nr_pages * PAGE_SIZE);
+	enum kvm_pgtable_prot prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_OWNED);
+
+	return pkvm_create_mappings_locked(start, end, prot);
+}
+
 static int check_share(struct pkvm_mem_share *share)
 {
 	const struct pkvm_mem_transition *tx = &share->tx;
@@ -728,6 +815,94 @@ static int do_unshare(struct pkvm_mem_share *share)
 	return WARN_ON(__do_unshare(share));
 }
 
+static int check_donation(struct pkvm_mem_donation *donation)
+{
+	const struct pkvm_mem_transition *tx = &donation->tx;
+	u64 completer_addr;
+	int ret;
+
+	switch (tx->initiator.id) {
+	case PKVM_ID_HOST:
+		ret = host_request_owned_transition(&completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_request_donation(&completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	switch (tx->completer.id){
+	case PKVM_ID_HOST:
+		ret = host_ack_donation(completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_ack_donation(completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static int __do_donate(struct pkvm_mem_donation *donation)
+{
+	const struct pkvm_mem_transition *tx = &donation->tx;
+	u64 completer_addr;
+	int ret;
+
+	switch (tx->initiator.id) {
+	case PKVM_ID_HOST:
+		ret = host_initiate_donation(&completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_initiate_donation(&completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	switch (tx->completer.id){
+	case PKVM_ID_HOST:
+		ret = host_complete_donation(completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_complete_donation(completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+/*
+ * do_donate():
+ *
+ * The page owner transfers ownership to another component, losing access
+ * as a consequence.
+ *
+ * Initiator: OWNED	=> NOPAGE
+ * Completer: NOPAGE	=> OWNED
+ */
+static int do_donate(struct pkvm_mem_donation *donation)
+{
+	int ret;
+
+	ret = check_donation(donation);
+	if (ret)
+		return ret;
+
+	return WARN_ON(__do_donate(donation));
+}
+
 int __pkvm_host_share_hyp(u64 pfn)
 {
 	int ret;
@@ -793,3 +968,67 @@ int __pkvm_host_unshare_hyp(u64 pfn)
 
 	return ret;
 }
+
+int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u64 hyp_addr = (u64)__hyp_va(host_addr);
+	struct pkvm_mem_donation donation = {
+		.tx	= {
+			.nr_pages	= nr_pages,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= host_addr,
+				.host	= {
+					.completer_addr = hyp_addr,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_HYP,
+			},
+		},
+	};
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = do_donate(&donation);
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
+int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u64 hyp_addr = (u64)__hyp_va(host_addr);
+	struct pkvm_mem_donation donation = {
+		.tx	= {
+			.nr_pages	= nr_pages,
+			.initiator	= {
+				.id	= PKVM_ID_HYP,
+				.addr	= hyp_addr,
+				.hyp	= {
+					.completer_addr = host_addr,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_HOST,
+			},
+		},
+	};
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = do_donate(&donation);
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 07/24] KVM: arm64: Implement do_donate() helper for donating memory
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Transferring ownership information of a memory region from one component
to another can be achieved using a "donate" operation, which results
in the previous owner losing access to the underlying pages entirely.

Implement a do_donate() helper, along the same lines as do_{un,}share,
and provide this functionality for the host-{to,from}-hyp cases as this
will later be used to donate/reclaim memory pages to store VM metadata
at EL2.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   2 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 239 ++++++++++++++++++
 2 files changed, 241 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index f5705a1e972f..c87b19b2d468 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -60,6 +60,8 @@ enum pkvm_component_id {
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
 int __pkvm_host_unshare_hyp(u64 pfn);
+int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
+int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 10390b8dc841..f475d554c9fd 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -395,6 +395,9 @@ struct pkvm_mem_transition {
 				/* Address in the completer's address space */
 				u64	completer_addr;
 			} host;
+			struct {
+				u64	completer_addr;
+			} hyp;
 		};
 	} initiator;
 
@@ -408,6 +411,10 @@ struct pkvm_mem_share {
 	const enum kvm_pgtable_prot		completer_prot;
 };
 
+struct pkvm_mem_donation {
+	const struct pkvm_mem_transition	tx;
+};
+
 struct check_walk_data {
 	enum pkvm_page_state	desired;
 	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
@@ -507,6 +514,46 @@ static int host_initiate_unshare(u64 *completer_addr,
 	return __host_set_page_state_range(addr, size, PKVM_PAGE_OWNED);
 }
 
+static int host_initiate_donation(u64 *completer_addr,
+				  const struct pkvm_mem_transition *tx)
+{
+	u8 owner_id = tx->completer.id;
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	*completer_addr = tx->initiator.host.completer_addr;
+	return host_stage2_set_owner_locked(tx->initiator.addr, size, owner_id);
+}
+
+static bool __host_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
+{
+	return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
+		 tx->initiator.id != PKVM_ID_HYP);
+}
+
+static int __host_ack_transition(u64 addr, const struct pkvm_mem_transition *tx,
+				 enum pkvm_page_state state)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	if (__host_ack_skip_pgtable_check(tx))
+		return 0;
+
+	return __host_check_page_state_range(addr, size, state);
+}
+
+static int host_ack_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	return __host_ack_transition(addr, tx, PKVM_NOPAGE);
+}
+
+static int host_complete_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	u8 host_id = tx->completer.id;
+
+	return host_stage2_set_owner_locked(addr, size, host_id);
+}
+
 static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte)
 {
 	if (!kvm_pte_valid(pte))
@@ -527,6 +574,27 @@ static int __hyp_check_page_state_range(u64 addr, u64 size,
 	return check_page_state_range(&pkvm_pgtable, addr, size, &d);
 }
 
+static int hyp_request_donation(u64 *completer_addr,
+				const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	u64 addr = tx->initiator.addr;
+
+	*completer_addr = tx->initiator.hyp.completer_addr;
+	return __hyp_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
+}
+
+static int hyp_initiate_donation(u64 *completer_addr,
+				 const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	int ret;
+
+	*completer_addr = tx->initiator.hyp.completer_addr;
+	ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, tx->initiator.addr, size);
+	return (ret != size) ? -EFAULT : 0;
+}
+
 static bool __hyp_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
 {
 	return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
@@ -558,6 +626,16 @@ static int hyp_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 					    PKVM_PAGE_SHARED_BORROWED);
 }
 
+static int hyp_ack_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	if (__hyp_ack_skip_pgtable_check(tx))
+		return 0;
+
+	return __hyp_check_page_state_range(addr, size, PKVM_NOPAGE);
+}
+
 static int hyp_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
 			      enum kvm_pgtable_prot perms)
 {
@@ -576,6 +654,15 @@ static int hyp_complete_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 	return (ret != size) ? -EFAULT : 0;
 }
 
+static int hyp_complete_donation(u64 addr,
+				 const struct pkvm_mem_transition *tx)
+{
+	void *start = (void *)addr, *end = start + (tx->nr_pages * PAGE_SIZE);
+	enum kvm_pgtable_prot prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_OWNED);
+
+	return pkvm_create_mappings_locked(start, end, prot);
+}
+
 static int check_share(struct pkvm_mem_share *share)
 {
 	const struct pkvm_mem_transition *tx = &share->tx;
@@ -728,6 +815,94 @@ static int do_unshare(struct pkvm_mem_share *share)
 	return WARN_ON(__do_unshare(share));
 }
 
+static int check_donation(struct pkvm_mem_donation *donation)
+{
+	const struct pkvm_mem_transition *tx = &donation->tx;
+	u64 completer_addr;
+	int ret;
+
+	switch (tx->initiator.id) {
+	case PKVM_ID_HOST:
+		ret = host_request_owned_transition(&completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_request_donation(&completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	switch (tx->completer.id){
+	case PKVM_ID_HOST:
+		ret = host_ack_donation(completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_ack_donation(completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static int __do_donate(struct pkvm_mem_donation *donation)
+{
+	const struct pkvm_mem_transition *tx = &donation->tx;
+	u64 completer_addr;
+	int ret;
+
+	switch (tx->initiator.id) {
+	case PKVM_ID_HOST:
+		ret = host_initiate_donation(&completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_initiate_donation(&completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	switch (tx->completer.id){
+	case PKVM_ID_HOST:
+		ret = host_complete_donation(completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_complete_donation(completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+/*
+ * do_donate():
+ *
+ * The page owner transfers ownership to another component, losing access
+ * as a consequence.
+ *
+ * Initiator: OWNED	=> NOPAGE
+ * Completer: NOPAGE	=> OWNED
+ */
+static int do_donate(struct pkvm_mem_donation *donation)
+{
+	int ret;
+
+	ret = check_donation(donation);
+	if (ret)
+		return ret;
+
+	return WARN_ON(__do_donate(donation));
+}
+
 int __pkvm_host_share_hyp(u64 pfn)
 {
 	int ret;
@@ -793,3 +968,67 @@ int __pkvm_host_unshare_hyp(u64 pfn)
 
 	return ret;
 }
+
+int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u64 hyp_addr = (u64)__hyp_va(host_addr);
+	struct pkvm_mem_donation donation = {
+		.tx	= {
+			.nr_pages	= nr_pages,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= host_addr,
+				.host	= {
+					.completer_addr = hyp_addr,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_HYP,
+			},
+		},
+	};
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = do_donate(&donation);
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
+int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u64 hyp_addr = (u64)__hyp_va(host_addr);
+	struct pkvm_mem_donation donation = {
+		.tx	= {
+			.nr_pages	= nr_pages,
+			.initiator	= {
+				.id	= PKVM_ID_HYP,
+				.addr	= hyp_addr,
+				.hyp	= {
+					.completer_addr = host_addr,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_HOST,
+			},
+		},
+	};
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = do_donate(&donation);
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 07/24] KVM: arm64: Implement do_donate() helper for donating memory
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Transferring ownership information of a memory region from one component
to another can be achieved using a "donate" operation, which results
in the previous owner losing access to the underlying pages entirely.

Implement a do_donate() helper, along the same lines as do_{un,}share,
and provide this functionality for the host-{to,from}-hyp cases as this
will later be used to donate/reclaim memory pages to store VM metadata
at EL2.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   2 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 239 ++++++++++++++++++
 2 files changed, 241 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index f5705a1e972f..c87b19b2d468 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -60,6 +60,8 @@ enum pkvm_component_id {
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
 int __pkvm_host_unshare_hyp(u64 pfn);
+int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
+int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 10390b8dc841..f475d554c9fd 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -395,6 +395,9 @@ struct pkvm_mem_transition {
 				/* Address in the completer's address space */
 				u64	completer_addr;
 			} host;
+			struct {
+				u64	completer_addr;
+			} hyp;
 		};
 	} initiator;
 
@@ -408,6 +411,10 @@ struct pkvm_mem_share {
 	const enum kvm_pgtable_prot		completer_prot;
 };
 
+struct pkvm_mem_donation {
+	const struct pkvm_mem_transition	tx;
+};
+
 struct check_walk_data {
 	enum pkvm_page_state	desired;
 	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
@@ -507,6 +514,46 @@ static int host_initiate_unshare(u64 *completer_addr,
 	return __host_set_page_state_range(addr, size, PKVM_PAGE_OWNED);
 }
 
+static int host_initiate_donation(u64 *completer_addr,
+				  const struct pkvm_mem_transition *tx)
+{
+	u8 owner_id = tx->completer.id;
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	*completer_addr = tx->initiator.host.completer_addr;
+	return host_stage2_set_owner_locked(tx->initiator.addr, size, owner_id);
+}
+
+static bool __host_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
+{
+	return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
+		 tx->initiator.id != PKVM_ID_HYP);
+}
+
+static int __host_ack_transition(u64 addr, const struct pkvm_mem_transition *tx,
+				 enum pkvm_page_state state)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	if (__host_ack_skip_pgtable_check(tx))
+		return 0;
+
+	return __host_check_page_state_range(addr, size, state);
+}
+
+static int host_ack_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	return __host_ack_transition(addr, tx, PKVM_NOPAGE);
+}
+
+static int host_complete_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	u8 host_id = tx->completer.id;
+
+	return host_stage2_set_owner_locked(addr, size, host_id);
+}
+
 static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte)
 {
 	if (!kvm_pte_valid(pte))
@@ -527,6 +574,27 @@ static int __hyp_check_page_state_range(u64 addr, u64 size,
 	return check_page_state_range(&pkvm_pgtable, addr, size, &d);
 }
 
+static int hyp_request_donation(u64 *completer_addr,
+				const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	u64 addr = tx->initiator.addr;
+
+	*completer_addr = tx->initiator.hyp.completer_addr;
+	return __hyp_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
+}
+
+static int hyp_initiate_donation(u64 *completer_addr,
+				 const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	int ret;
+
+	*completer_addr = tx->initiator.hyp.completer_addr;
+	ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, tx->initiator.addr, size);
+	return (ret != size) ? -EFAULT : 0;
+}
+
 static bool __hyp_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
 {
 	return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
@@ -558,6 +626,16 @@ static int hyp_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 					    PKVM_PAGE_SHARED_BORROWED);
 }
 
+static int hyp_ack_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	if (__hyp_ack_skip_pgtable_check(tx))
+		return 0;
+
+	return __hyp_check_page_state_range(addr, size, PKVM_NOPAGE);
+}
+
 static int hyp_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
 			      enum kvm_pgtable_prot perms)
 {
@@ -576,6 +654,15 @@ static int hyp_complete_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 	return (ret != size) ? -EFAULT : 0;
 }
 
+static int hyp_complete_donation(u64 addr,
+				 const struct pkvm_mem_transition *tx)
+{
+	void *start = (void *)addr, *end = start + (tx->nr_pages * PAGE_SIZE);
+	enum kvm_pgtable_prot prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_OWNED);
+
+	return pkvm_create_mappings_locked(start, end, prot);
+}
+
 static int check_share(struct pkvm_mem_share *share)
 {
 	const struct pkvm_mem_transition *tx = &share->tx;
@@ -728,6 +815,94 @@ static int do_unshare(struct pkvm_mem_share *share)
 	return WARN_ON(__do_unshare(share));
 }
 
+static int check_donation(struct pkvm_mem_donation *donation)
+{
+	const struct pkvm_mem_transition *tx = &donation->tx;
+	u64 completer_addr;
+	int ret;
+
+	switch (tx->initiator.id) {
+	case PKVM_ID_HOST:
+		ret = host_request_owned_transition(&completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_request_donation(&completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	switch (tx->completer.id){
+	case PKVM_ID_HOST:
+		ret = host_ack_donation(completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_ack_donation(completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static int __do_donate(struct pkvm_mem_donation *donation)
+{
+	const struct pkvm_mem_transition *tx = &donation->tx;
+	u64 completer_addr;
+	int ret;
+
+	switch (tx->initiator.id) {
+	case PKVM_ID_HOST:
+		ret = host_initiate_donation(&completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_initiate_donation(&completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	switch (tx->completer.id){
+	case PKVM_ID_HOST:
+		ret = host_complete_donation(completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_complete_donation(completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+/*
+ * do_donate():
+ *
+ * The page owner transfers ownership to another component, losing access
+ * as a consequence.
+ *
+ * Initiator: OWNED	=> NOPAGE
+ * Completer: NOPAGE	=> OWNED
+ */
+static int do_donate(struct pkvm_mem_donation *donation)
+{
+	int ret;
+
+	ret = check_donation(donation);
+	if (ret)
+		return ret;
+
+	return WARN_ON(__do_donate(donation));
+}
+
 int __pkvm_host_share_hyp(u64 pfn)
 {
 	int ret;
@@ -793,3 +968,67 @@ int __pkvm_host_unshare_hyp(u64 pfn)
 
 	return ret;
 }
+
+int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u64 hyp_addr = (u64)__hyp_va(host_addr);
+	struct pkvm_mem_donation donation = {
+		.tx	= {
+			.nr_pages	= nr_pages,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= host_addr,
+				.host	= {
+					.completer_addr = hyp_addr,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_HYP,
+			},
+		},
+	};
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = do_donate(&donation);
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
+int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u64 hyp_addr = (u64)__hyp_va(host_addr);
+	struct pkvm_mem_donation donation = {
+		.tx	= {
+			.nr_pages	= nr_pages,
+			.initiator	= {
+				.id	= PKVM_ID_HYP,
+				.addr	= hyp_addr,
+				.hyp	= {
+					.completer_addr = host_addr,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_HOST,
+			},
+		},
+	};
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = do_donate(&donation);
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 08/24] KVM: arm64: Prevent the donation of no-map pages
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

Memory regions marked as no-map in DT routinely include TrustZone
carevouts and such. Although donating such pages to the hypervisor may
not breach confidentiality, it may be used to corrupt its state in
uncontrollable ways. To prevent this, let's block host-initiated memory
transitions targeting no-map pages altogether in nVHE protected mode as
there should be no valid reason to do this currently.

Thankfully, the pKVM EL2 hypervisor has a full copy of the host's list
of memblock regions, hence allowing to check for the presence of the
MEMBLOCK_NOMAP flag on any given region at EL2 easily.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index f475d554c9fd..e7015bbefbea 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -193,7 +193,7 @@ struct kvm_mem_range {
 	u64 end;
 };
 
-static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
+static struct memblock_region *find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
 {
 	int cur, left = 0, right = hyp_memblock_nr;
 	struct memblock_region *reg;
@@ -216,18 +216,28 @@ static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
 		} else {
 			range->start = reg->base;
 			range->end = end;
-			return true;
+			return reg;
 		}
 	}
 
-	return false;
+	return NULL;
 }
 
 bool addr_is_memory(phys_addr_t phys)
 {
 	struct kvm_mem_range range;
 
-	return find_mem_range(phys, &range);
+	return !!find_mem_range(phys, &range);
+}
+
+static bool addr_is_allowed_memory(phys_addr_t phys)
+{
+	struct memblock_region *reg;
+	struct kvm_mem_range range;
+
+	reg = find_mem_range(phys, &range);
+
+	return reg && !(reg->flags & MEMBLOCK_NOMAP);
 }
 
 static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
@@ -350,7 +360,7 @@ static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot pr
 static int host_stage2_idmap(u64 addr)
 {
 	struct kvm_mem_range range;
-	bool is_memory = find_mem_range(addr, &range);
+	bool is_memory = !!find_mem_range(addr, &range);
 	enum kvm_pgtable_prot prot;
 	int ret;
 
@@ -428,7 +438,7 @@ static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
 	struct check_walk_data *d = arg;
 	kvm_pte_t pte = *ptep;
 
-	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+	if (kvm_pte_valid(pte) && !addr_is_allowed_memory(kvm_pte_to_phys(pte)))
 		return -EINVAL;
 
 	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 08/24] KVM: arm64: Prevent the donation of no-map pages
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Memory regions marked as no-map in DT routinely include TrustZone
carevouts and such. Although donating such pages to the hypervisor may
not breach confidentiality, it may be used to corrupt its state in
uncontrollable ways. To prevent this, let's block host-initiated memory
transitions targeting no-map pages altogether in nVHE protected mode as
there should be no valid reason to do this currently.

Thankfully, the pKVM EL2 hypervisor has a full copy of the host's list
of memblock regions, hence allowing to check for the presence of the
MEMBLOCK_NOMAP flag on any given region at EL2 easily.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index f475d554c9fd..e7015bbefbea 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -193,7 +193,7 @@ struct kvm_mem_range {
 	u64 end;
 };
 
-static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
+static struct memblock_region *find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
 {
 	int cur, left = 0, right = hyp_memblock_nr;
 	struct memblock_region *reg;
@@ -216,18 +216,28 @@ static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
 		} else {
 			range->start = reg->base;
 			range->end = end;
-			return true;
+			return reg;
 		}
 	}
 
-	return false;
+	return NULL;
 }
 
 bool addr_is_memory(phys_addr_t phys)
 {
 	struct kvm_mem_range range;
 
-	return find_mem_range(phys, &range);
+	return !!find_mem_range(phys, &range);
+}
+
+static bool addr_is_allowed_memory(phys_addr_t phys)
+{
+	struct memblock_region *reg;
+	struct kvm_mem_range range;
+
+	reg = find_mem_range(phys, &range);
+
+	return reg && !(reg->flags & MEMBLOCK_NOMAP);
 }
 
 static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
@@ -350,7 +360,7 @@ static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot pr
 static int host_stage2_idmap(u64 addr)
 {
 	struct kvm_mem_range range;
-	bool is_memory = find_mem_range(addr, &range);
+	bool is_memory = !!find_mem_range(addr, &range);
 	enum kvm_pgtable_prot prot;
 	int ret;
 
@@ -428,7 +438,7 @@ static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
 	struct check_walk_data *d = arg;
 	kvm_pte_t pte = *ptep;
 
-	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+	if (kvm_pte_valid(pte) && !addr_is_allowed_memory(kvm_pte_to_phys(pte)))
 		return -EINVAL;
 
 	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 08/24] KVM: arm64: Prevent the donation of no-map pages
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Memory regions marked as no-map in DT routinely include TrustZone
carevouts and such. Although donating such pages to the hypervisor may
not breach confidentiality, it may be used to corrupt its state in
uncontrollable ways. To prevent this, let's block host-initiated memory
transitions targeting no-map pages altogether in nVHE protected mode as
there should be no valid reason to do this currently.

Thankfully, the pKVM EL2 hypervisor has a full copy of the host's list
of memblock regions, hence allowing to check for the presence of the
MEMBLOCK_NOMAP flag on any given region at EL2 easily.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index f475d554c9fd..e7015bbefbea 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -193,7 +193,7 @@ struct kvm_mem_range {
 	u64 end;
 };
 
-static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
+static struct memblock_region *find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
 {
 	int cur, left = 0, right = hyp_memblock_nr;
 	struct memblock_region *reg;
@@ -216,18 +216,28 @@ static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
 		} else {
 			range->start = reg->base;
 			range->end = end;
-			return true;
+			return reg;
 		}
 	}
 
-	return false;
+	return NULL;
 }
 
 bool addr_is_memory(phys_addr_t phys)
 {
 	struct kvm_mem_range range;
 
-	return find_mem_range(phys, &range);
+	return !!find_mem_range(phys, &range);
+}
+
+static bool addr_is_allowed_memory(phys_addr_t phys)
+{
+	struct memblock_region *reg;
+	struct kvm_mem_range range;
+
+	reg = find_mem_range(phys, &range);
+
+	return reg && !(reg->flags & MEMBLOCK_NOMAP);
 }
 
 static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
@@ -350,7 +360,7 @@ static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot pr
 static int host_stage2_idmap(u64 addr)
 {
 	struct kvm_mem_range range;
-	bool is_memory = find_mem_range(addr, &range);
+	bool is_memory = !!find_mem_range(addr, &range);
 	enum kvm_pgtable_prot prot;
 	int ret;
 
@@ -428,7 +438,7 @@ static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
 	struct check_walk_data *d = arg;
 	kvm_pte_t pte = *ptep;
 
-	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+	if (kvm_pte_valid(pte) && !addr_is_allowed_memory(kvm_pte_to_phys(pte)))
 		return -EINVAL;
 
 	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 09/24] KVM: arm64: Add helpers to pin memory shared with hyp
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

Add helpers allowing the hypervisor to check whether a range of pages
are currently shared by the host, and 'pin' them if so by blocking host
unshare operations until the memory has been unpinned. This will allow
the hypervisor to take references on host-provided data-structures
(struct kvm and such) and be guaranteed these pages will remain in a
stable state until it decides to release them, e.g. during guest
teardown.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  3 ++
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  7 ++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 48 +++++++++++++++++++
 3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index c87b19b2d468..998bf165af71 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -69,6 +69,9 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
 int kvm_host_prepare_stage2(void *pgt_pool_base);
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
+int hyp_pin_shared_mem(void *from, void *to);
+void hyp_unpin_shared_mem(void *from, void *to);
+
 static __always_inline void __load_host_stage2(void)
 {
 	if (static_branch_likely(&kvm_protected_mode_initialized))
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 2681f632e1c1..29f2ebe306bc 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -52,10 +52,15 @@ static inline void hyp_page_ref_inc(struct hyp_page *p)
 	p->refcount++;
 }
 
-static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+static inline void hyp_page_ref_dec(struct hyp_page *p)
 {
 	BUG_ON(!p->refcount);
 	p->refcount--;
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+	hyp_page_ref_dec(p);
 	return (p->refcount == 0);
 }
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index e7015bbefbea..e2e3b30b072e 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -629,6 +629,9 @@ static int hyp_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 {
 	u64 size = tx->nr_pages * PAGE_SIZE;
 
+	if (tx->initiator.id == PKVM_ID_HOST && hyp_page_count((void *)addr))
+		return -EBUSY;
+
 	if (__hyp_ack_skip_pgtable_check(tx))
 		return 0;
 
@@ -1042,3 +1045,48 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages)
 
 	return ret;
 }
+
+int hyp_pin_shared_mem(void *from, void *to)
+{
+	u64 cur, start = ALIGN_DOWN((u64)from, PAGE_SIZE);
+	u64 end = PAGE_ALIGN((u64)to);
+	u64 size = end - start;
+	int ret;
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = __host_check_page_state_range(__hyp_pa(start), size,
+					    PKVM_PAGE_SHARED_OWNED);
+	if (ret)
+		goto unlock;
+
+	ret = __hyp_check_page_state_range(start, size,
+					   PKVM_PAGE_SHARED_BORROWED);
+	if (ret)
+		goto unlock;
+
+	for (cur = start; cur < end; cur += PAGE_SIZE)
+		hyp_page_ref_inc(hyp_virt_to_page(cur));
+
+unlock:
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
+void hyp_unpin_shared_mem(void *from, void *to)
+{
+	u64 cur, start = ALIGN_DOWN((u64)from, PAGE_SIZE);
+	u64 end = PAGE_ALIGN((u64)to);
+
+	host_lock_component();
+	hyp_lock_component();
+
+	for (cur = start; cur < end; cur += PAGE_SIZE)
+		hyp_page_ref_dec(hyp_virt_to_page(cur));
+
+	hyp_unlock_component();
+	host_unlock_component();
+}
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 09/24] KVM: arm64: Add helpers to pin memory shared with hyp
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Add helpers allowing the hypervisor to check whether a range of pages
are currently shared by the host, and 'pin' them if so by blocking host
unshare operations until the memory has been unpinned. This will allow
the hypervisor to take references on host-provided data-structures
(struct kvm and such) and be guaranteed these pages will remain in a
stable state until it decides to release them, e.g. during guest
teardown.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  3 ++
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  7 ++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 48 +++++++++++++++++++
 3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index c87b19b2d468..998bf165af71 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -69,6 +69,9 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
 int kvm_host_prepare_stage2(void *pgt_pool_base);
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
+int hyp_pin_shared_mem(void *from, void *to);
+void hyp_unpin_shared_mem(void *from, void *to);
+
 static __always_inline void __load_host_stage2(void)
 {
 	if (static_branch_likely(&kvm_protected_mode_initialized))
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 2681f632e1c1..29f2ebe306bc 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -52,10 +52,15 @@ static inline void hyp_page_ref_inc(struct hyp_page *p)
 	p->refcount++;
 }
 
-static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+static inline void hyp_page_ref_dec(struct hyp_page *p)
 {
 	BUG_ON(!p->refcount);
 	p->refcount--;
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+	hyp_page_ref_dec(p);
 	return (p->refcount == 0);
 }
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index e7015bbefbea..e2e3b30b072e 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -629,6 +629,9 @@ static int hyp_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 {
 	u64 size = tx->nr_pages * PAGE_SIZE;
 
+	if (tx->initiator.id == PKVM_ID_HOST && hyp_page_count((void *)addr))
+		return -EBUSY;
+
 	if (__hyp_ack_skip_pgtable_check(tx))
 		return 0;
 
@@ -1042,3 +1045,48 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages)
 
 	return ret;
 }
+
+int hyp_pin_shared_mem(void *from, void *to)
+{
+	u64 cur, start = ALIGN_DOWN((u64)from, PAGE_SIZE);
+	u64 end = PAGE_ALIGN((u64)to);
+	u64 size = end - start;
+	int ret;
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = __host_check_page_state_range(__hyp_pa(start), size,
+					    PKVM_PAGE_SHARED_OWNED);
+	if (ret)
+		goto unlock;
+
+	ret = __hyp_check_page_state_range(start, size,
+					   PKVM_PAGE_SHARED_BORROWED);
+	if (ret)
+		goto unlock;
+
+	for (cur = start; cur < end; cur += PAGE_SIZE)
+		hyp_page_ref_inc(hyp_virt_to_page(cur));
+
+unlock:
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
+void hyp_unpin_shared_mem(void *from, void *to)
+{
+	u64 cur, start = ALIGN_DOWN((u64)from, PAGE_SIZE);
+	u64 end = PAGE_ALIGN((u64)to);
+
+	host_lock_component();
+	hyp_lock_component();
+
+	for (cur = start; cur < end; cur += PAGE_SIZE)
+		hyp_page_ref_dec(hyp_virt_to_page(cur));
+
+	hyp_unlock_component();
+	host_unlock_component();
+}
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 09/24] KVM: arm64: Add helpers to pin memory shared with hyp
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Add helpers allowing the hypervisor to check whether a range of pages
are currently shared by the host, and 'pin' them if so by blocking host
unshare operations until the memory has been unpinned. This will allow
the hypervisor to take references on host-provided data-structures
(struct kvm and such) and be guaranteed these pages will remain in a
stable state until it decides to release them, e.g. during guest
teardown.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  3 ++
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  7 ++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 48 +++++++++++++++++++
 3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index c87b19b2d468..998bf165af71 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -69,6 +69,9 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
 int kvm_host_prepare_stage2(void *pgt_pool_base);
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
+int hyp_pin_shared_mem(void *from, void *to);
+void hyp_unpin_shared_mem(void *from, void *to);
+
 static __always_inline void __load_host_stage2(void)
 {
 	if (static_branch_likely(&kvm_protected_mode_initialized))
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 2681f632e1c1..29f2ebe306bc 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -52,10 +52,15 @@ static inline void hyp_page_ref_inc(struct hyp_page *p)
 	p->refcount++;
 }
 
-static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+static inline void hyp_page_ref_dec(struct hyp_page *p)
 {
 	BUG_ON(!p->refcount);
 	p->refcount--;
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+	hyp_page_ref_dec(p);
 	return (p->refcount == 0);
 }
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index e7015bbefbea..e2e3b30b072e 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -629,6 +629,9 @@ static int hyp_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 {
 	u64 size = tx->nr_pages * PAGE_SIZE;
 
+	if (tx->initiator.id == PKVM_ID_HOST && hyp_page_count((void *)addr))
+		return -EBUSY;
+
 	if (__hyp_ack_skip_pgtable_check(tx))
 		return 0;
 
@@ -1042,3 +1045,48 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages)
 
 	return ret;
 }
+
+int hyp_pin_shared_mem(void *from, void *to)
+{
+	u64 cur, start = ALIGN_DOWN((u64)from, PAGE_SIZE);
+	u64 end = PAGE_ALIGN((u64)to);
+	u64 size = end - start;
+	int ret;
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = __host_check_page_state_range(__hyp_pa(start), size,
+					    PKVM_PAGE_SHARED_OWNED);
+	if (ret)
+		goto unlock;
+
+	ret = __hyp_check_page_state_range(start, size,
+					   PKVM_PAGE_SHARED_BORROWED);
+	if (ret)
+		goto unlock;
+
+	for (cur = start; cur < end; cur += PAGE_SIZE)
+		hyp_page_ref_inc(hyp_virt_to_page(cur));
+
+unlock:
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
+void hyp_unpin_shared_mem(void *from, void *to)
+{
+	u64 cur, start = ALIGN_DOWN((u64)from, PAGE_SIZE);
+	u64 end = PAGE_ALIGN((u64)to);
+
+	host_lock_component();
+	hyp_lock_component();
+
+	for (cur = start; cur < end; cur += PAGE_SIZE)
+		hyp_page_ref_dec(hyp_virt_to_page(cur));
+
+	hyp_unlock_component();
+	host_unlock_component();
+}
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 10/24] KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

nvhe/mem_protect.h refers to __load_stage2() in the definition of
__load_host_stage2() but doesn't include the relevant header.

Include asm/kvm_mmu.h in nvhe/mem_protect.h so that users of the latter
don't have to do this themselves.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 998bf165af71..3bea816296dc 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -8,6 +8,7 @@
 #define __KVM_NVHE_MEM_PROTECT__
 #include <linux/kvm_host.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 #include <asm/kvm_pgtable.h>
 #include <asm/virt.h>
 #include <nvhe/spinlock.h>
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 10/24] KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

nvhe/mem_protect.h refers to __load_stage2() in the definition of
__load_host_stage2() but doesn't include the relevant header.

Include asm/kvm_mmu.h in nvhe/mem_protect.h so that users of the latter
don't have to do this themselves.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 998bf165af71..3bea816296dc 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -8,6 +8,7 @@
 #define __KVM_NVHE_MEM_PROTECT__
 #include <linux/kvm_host.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 #include <asm/kvm_pgtable.h>
 #include <asm/virt.h>
 #include <nvhe/spinlock.h>
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 10/24] KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

nvhe/mem_protect.h refers to __load_stage2() in the definition of
__load_host_stage2() but doesn't include the relevant header.

Include asm/kvm_mmu.h in nvhe/mem_protect.h so that users of the latter
don't have to do this themselves.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 998bf165af71..3bea816296dc 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -8,6 +8,7 @@
 #define __KVM_NVHE_MEM_PROTECT__
 #include <linux/kvm_host.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 #include <asm/kvm_pgtable.h>
 #include <asm/virt.h>
 #include <nvhe/spinlock.h>
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 11/24] KVM: arm64: Add hyp_spinlock_t static initializer
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Fuad Tabba <tabba@google.com>

Having a static initializer for hyp_spinlock_t simplifies its
use when there isn't an initializing function.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
index 4652fd04bdbe..7c7ea8c55405 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
@@ -28,9 +28,17 @@ typedef union hyp_spinlock {
 	};
 } hyp_spinlock_t;
 
+#define __HYP_SPIN_LOCK_INITIALIZER \
+	{ .__val = 0 }
+
+#define __HYP_SPIN_LOCK_UNLOCKED \
+	((hyp_spinlock_t) __HYP_SPIN_LOCK_INITIALIZER)
+
+#define DEFINE_HYP_SPINLOCK(x)	hyp_spinlock_t x = __HYP_SPIN_LOCK_UNLOCKED
+
 #define hyp_spin_lock_init(l)						\
 do {									\
-	*(l) = (hyp_spinlock_t){ .__val = 0 };				\
+	*(l) = __HYP_SPIN_LOCK_UNLOCKED;				\
 } while (0)
 
 static inline void hyp_spin_lock(hyp_spinlock_t *lock)
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 11/24] KVM: arm64: Add hyp_spinlock_t static initializer
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Fuad Tabba <tabba@google.com>

Having a static initializer for hyp_spinlock_t simplifies its
use when there isn't an initializing function.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
index 4652fd04bdbe..7c7ea8c55405 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
@@ -28,9 +28,17 @@ typedef union hyp_spinlock {
 	};
 } hyp_spinlock_t;
 
+#define __HYP_SPIN_LOCK_INITIALIZER \
+	{ .__val = 0 }
+
+#define __HYP_SPIN_LOCK_UNLOCKED \
+	((hyp_spinlock_t) __HYP_SPIN_LOCK_INITIALIZER)
+
+#define DEFINE_HYP_SPINLOCK(x)	hyp_spinlock_t x = __HYP_SPIN_LOCK_UNLOCKED
+
 #define hyp_spin_lock_init(l)						\
 do {									\
-	*(l) = (hyp_spinlock_t){ .__val = 0 };				\
+	*(l) = __HYP_SPIN_LOCK_UNLOCKED;				\
 } while (0)
 
 static inline void hyp_spin_lock(hyp_spinlock_t *lock)
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 11/24] KVM: arm64: Add hyp_spinlock_t static initializer
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Fuad Tabba <tabba@google.com>

Having a static initializer for hyp_spinlock_t simplifies its
use when there isn't an initializing function.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
index 4652fd04bdbe..7c7ea8c55405 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
@@ -28,9 +28,17 @@ typedef union hyp_spinlock {
 	};
 } hyp_spinlock_t;
 
+#define __HYP_SPIN_LOCK_INITIALIZER \
+	{ .__val = 0 }
+
+#define __HYP_SPIN_LOCK_UNLOCKED \
+	((hyp_spinlock_t) __HYP_SPIN_LOCK_INITIALIZER)
+
+#define DEFINE_HYP_SPINLOCK(x)	hyp_spinlock_t x = __HYP_SPIN_LOCK_UNLOCKED
+
 #define hyp_spin_lock_init(l)						\
 do {									\
-	*(l) = (hyp_spinlock_t){ .__val = 0 };				\
+	*(l) = __HYP_SPIN_LOCK_UNLOCKED;				\
 } while (0)
 
 static inline void hyp_spin_lock(hyp_spinlock_t *lock)
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 12/24] KVM: arm64: Introduce shadow VM state at EL2
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Fuad Tabba <tabba@google.com>

Introduce a table of shadow VM structures at EL2 and provide hypercalls
to the host for creating and destroying shadow VMs.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h              |   2 +
 arch/arm64/include/asm/kvm_host.h             |   6 +
 arch/arm64/include/asm/kvm_pgtable.h          |   8 +
 arch/arm64/include/asm/kvm_pkvm.h             |   8 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   3 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  60 +++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  21 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  14 +
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 398 ++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |   8 +
 arch/arm64/kvm/hyp/pgtable.c                  |   9 +
 arch/arm64/kvm/pkvm.c                         |   1 +
 12 files changed, 538 insertions(+)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 2e277f2ed671..fac4ed699913 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -76,6 +76,8 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
 	__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
+	__KVM_HOST_SMCCC_FUNC___pkvm_init_shadow,
+	__KVM_HOST_SMCCC_FUNC___pkvm_teardown_shadow,
 };
 
 #define DECLARE_KVM_VHE_SYM(sym)	extern char sym[]
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2cc42e1fec18..41348ac728f9 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -115,6 +115,10 @@ struct kvm_smccc_features {
 	unsigned long vendor_hyp_bmap;
 };
 
+struct kvm_protected_vm {
+	unsigned int shadow_handle;
+};
+
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
@@ -166,6 +170,8 @@ struct kvm_arch {
 
 	/* Hypercall features firmware registers' descriptor */
 	struct kvm_smccc_features smccc_feat;
+
+	struct kvm_protected_vm pkvm;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 9f339dffbc1a..2d6b5058f7d3 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -288,6 +288,14 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
  */
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
 
+/*
+ * kvm_pgtable_stage2_pgd_size() - Helper to compute size of a stage-2 PGD
+ * @vtcr:	Content of the VTCR register.
+ *
+ * Return: the size (in bytes) of the stage-2 PGD
+ */
+size_t kvm_pgtable_stage2_pgd_size(u64 vtcr);
+
 /**
  * __kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 8f7b8a2314bb..11526e89fe5c 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -9,6 +9,9 @@
 #include <linux/memblock.h>
 #include <asm/kvm_pgtable.h>
 
+/* Maximum number of protected VMs that can be created. */
+#define KVM_MAX_PVMS 255
+
 #define HYP_MEMBLOCK_REGIONS 128
 
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
@@ -40,6 +43,11 @@ static inline unsigned long hyp_vmemmap_pages(size_t vmemmap_entry_size)
 	return res >> PAGE_SHIFT;
 }
 
+static inline unsigned long hyp_shadow_table_pages(void)
+{
+	return PAGE_ALIGN(KVM_MAX_PVMS * sizeof(void *)) >> PAGE_SHIFT;
+}
+
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
 	unsigned long total = 0, i;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 3bea816296dc..3a0817b5c739 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -11,6 +11,7 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_pgtable.h>
 #include <asm/virt.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/spinlock.h>
 
 /*
@@ -68,10 +69,12 @@ bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
 int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
 int kvm_host_prepare_stage2(void *pgt_pool_base);
+int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd);
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
+void reclaim_guest_pages(struct kvm_shadow_vm *vm);
 
 static __always_inline void __load_host_stage2(void)
 {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
new file mode 100644
index 000000000000..1d0a33f70879
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 Google LLC
+ * Author: Fuad Tabba <tabba@google.com>
+ */
+
+#ifndef __ARM64_KVM_NVHE_PKVM_H__
+#define __ARM64_KVM_NVHE_PKVM_H__
+
+#include <asm/kvm_pkvm.h>
+
+/*
+ * Holds the relevant data for maintaining the vcpu state completely at hyp.
+ */
+struct kvm_shadow_vcpu_state {
+	/* The data for the shadow vcpu. */
+	struct kvm_vcpu shadow_vcpu;
+
+	/* A pointer to the host's vcpu. */
+	struct kvm_vcpu *host_vcpu;
+
+	/* A pointer to the shadow vm. */
+	struct kvm_shadow_vm *shadow_vm;
+};
+
+/*
+ * Holds the relevant data for running a protected vm.
+ */
+struct kvm_shadow_vm {
+	/* The data for the shadow kvm. */
+	struct kvm kvm;
+
+	/* The host's kvm structure. */
+	struct kvm *host_kvm;
+
+	/* The total size of the donated shadow area. */
+	size_t shadow_area_size;
+
+	struct kvm_pgtable pgt;
+
+	/* Array of the shadow state per vcpu. */
+	struct kvm_shadow_vcpu_state shadow_vcpu_states[0];
+};
+
+static inline struct kvm_shadow_vcpu_state *get_shadow_state(struct kvm_vcpu *shadow_vcpu)
+{
+	return container_of(shadow_vcpu, struct kvm_shadow_vcpu_state, shadow_vcpu);
+}
+
+static inline struct kvm_shadow_vm *get_shadow_vm(struct kvm_vcpu *shadow_vcpu)
+{
+	return get_shadow_state(shadow_vcpu)->shadow_vm;
+}
+
+void hyp_shadow_table_init(void *tbl);
+int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
+		       size_t shadow_size, unsigned long pgd_hva);
+int __pkvm_teardown_shadow(unsigned int shadow_handle);
+
+#endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 3cea4b6ac23e..a1fbd11c8041 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -15,6 +15,7 @@
 
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
 DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
@@ -191,6 +192,24 @@ static void handle___pkvm_vcpu_init_traps(struct kvm_cpu_context *host_ctxt)
 	__pkvm_vcpu_init_traps(kern_hyp_va(vcpu));
 }
 
+static void handle___pkvm_init_shadow(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(struct kvm *, host_kvm, host_ctxt, 1);
+	DECLARE_REG(unsigned long, host_shadow_va, host_ctxt, 2);
+	DECLARE_REG(size_t, shadow_size, host_ctxt, 3);
+	DECLARE_REG(unsigned long, pgd, host_ctxt, 4);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_init_shadow(host_kvm, host_shadow_va,
+						   shadow_size, pgd);
+}
+
+static void handle___pkvm_teardown_shadow(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(unsigned int, shadow_handle, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_teardown_shadow(shadow_handle);
+}
+
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -220,6 +239,8 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__vgic_v3_save_aprs),
 	HANDLE_FUNC(__vgic_v3_restore_aprs),
 	HANDLE_FUNC(__pkvm_vcpu_init_traps),
+	HANDLE_FUNC(__pkvm_init_shadow),
+	HANDLE_FUNC(__pkvm_teardown_shadow),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index e2e3b30b072e..9baf731736be 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -141,6 +141,20 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
 	return 0;
 }
 
+int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd)
+{
+	vm->pgt.pgd = pgd;
+	return 0;
+}
+
+void reclaim_guest_pages(struct kvm_shadow_vm *vm)
+{
+	unsigned long nr_pages;
+
+	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
+	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
+}
+
 int __pkvm_prot_finalize(void)
 {
 	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 99c8d8b73e70..77aeb787670b 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -7,6 +7,9 @@
 #include <linux/kvm_host.h>
 #include <linux/mm.h>
 #include <nvhe/fixed_config.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/memory.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
 /*
@@ -183,3 +186,398 @@ void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
 	pvm_init_traps_aa64mmfr0(vcpu);
 	pvm_init_traps_aa64mmfr1(vcpu);
 }
+
+/*
+ * Start the shadow table handle at the offset defined instead of at 0.
+ * Mainly for sanity checking and debugging.
+ */
+#define HANDLE_OFFSET 0x1000
+
+static unsigned int shadow_handle_to_idx(unsigned int shadow_handle)
+{
+	return shadow_handle - HANDLE_OFFSET;
+}
+
+static unsigned int idx_to_shadow_handle(unsigned int idx)
+{
+	return idx + HANDLE_OFFSET;
+}
+
+/*
+ * Spinlock for protecting the shadow table related state.
+ * Protects writes to shadow_table, nr_shadow_entries, and next_shadow_alloc,
+ * as well as reads and writes to last_shadow_vcpu_lookup.
+ */
+static DEFINE_HYP_SPINLOCK(shadow_lock);
+
+/*
+ * The table of shadow entries for protected VMs in hyp.
+ * Allocated at hyp initialization and setup.
+ */
+static struct kvm_shadow_vm **shadow_table;
+
+/* Current number of vms in the shadow table. */
+static unsigned int nr_shadow_entries;
+
+/* The next entry index to try to allocate from. */
+static unsigned int next_shadow_alloc;
+
+void hyp_shadow_table_init(void *tbl)
+{
+	WARN_ON(shadow_table);
+	shadow_table = tbl;
+}
+
+/*
+ * Return the shadow vm corresponding to the handle.
+ */
+static struct kvm_shadow_vm *find_shadow_by_handle(unsigned int shadow_handle)
+{
+	unsigned int shadow_idx = shadow_handle_to_idx(shadow_handle);
+
+	if (unlikely(shadow_idx >= KVM_MAX_PVMS))
+		return NULL;
+
+	return shadow_table[shadow_idx];
+}
+
+static void unpin_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
+			     unsigned int nr_vcpus)
+{
+	int i;
+
+	for (i = 0; i < nr_vcpus; i++) {
+		struct kvm_vcpu *host_vcpu = shadow_vcpu_states[i].host_vcpu;
+		hyp_unpin_shared_mem(host_vcpu, host_vcpu + 1);
+	}
+}
+
+static int set_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
+			  unsigned int nr_vcpus,
+			  struct kvm_vcpu **vcpu_array,
+			  size_t vcpu_array_size)
+{
+	int i;
+
+	if (vcpu_array_size < sizeof(*vcpu_array) * nr_vcpus)
+		return -EINVAL;
+
+	for (i = 0; i < nr_vcpus; i++) {
+		struct kvm_vcpu *host_vcpu = kern_hyp_va(vcpu_array[i]);
+
+		if (hyp_pin_shared_mem(host_vcpu, host_vcpu + 1)) {
+			unpin_host_vcpus(shadow_vcpu_states, i);
+			return -EBUSY;
+		}
+
+		shadow_vcpu_states[i].host_vcpu = host_vcpu;
+	}
+
+	return 0;
+}
+
+static int init_shadow_structs(struct kvm *kvm, struct kvm_shadow_vm *vm,
+			       struct kvm_vcpu **vcpu_array,
+			       unsigned int nr_vcpus)
+{
+	int i;
+
+	vm->host_kvm = kvm;
+	vm->kvm.created_vcpus = nr_vcpus;
+	vm->kvm.arch.vtcr = host_kvm.arch.vtcr;
+
+	for (i = 0; i < nr_vcpus; i++) {
+		struct kvm_shadow_vcpu_state *shadow_vcpu_state = &vm->shadow_vcpu_states[i];
+		struct kvm_vcpu *shadow_vcpu = &shadow_vcpu_state->shadow_vcpu;
+		struct kvm_vcpu *host_vcpu = shadow_vcpu_state->host_vcpu;
+
+		shadow_vcpu_state->shadow_vm = vm;
+
+		shadow_vcpu->kvm = &vm->kvm;
+		shadow_vcpu->vcpu_id = READ_ONCE(host_vcpu->vcpu_id);
+		shadow_vcpu->vcpu_idx = i;
+
+		shadow_vcpu->arch.hw_mmu = &vm->kvm.arch.mmu;
+	}
+
+	return 0;
+}
+
+static bool __exists_shadow(struct kvm *host_kvm)
+{
+	int i;
+	unsigned int nr_checked = 0;
+
+	for (i = 0; i < KVM_MAX_PVMS && nr_checked < nr_shadow_entries; i++) {
+		if (!shadow_table[i])
+			continue;
+
+		if (unlikely(shadow_table[i]->host_kvm == host_kvm))
+			return true;
+
+		nr_checked++;
+	}
+
+	return false;
+}
+
+/*
+ * Allocate a shadow table entry and insert a pointer to the shadow vm.
+ *
+ * Return a unique handle to the protected VM on success,
+ * negative error code on failure.
+ */
+static unsigned int insert_shadow_table(struct kvm *kvm,
+					struct kvm_shadow_vm *vm,
+					size_t shadow_size)
+{
+	struct kvm_s2_mmu *mmu = &vm->kvm.arch.mmu;
+	unsigned int shadow_handle;
+	unsigned int vmid;
+
+	hyp_assert_lock_held(&shadow_lock);
+
+	if (unlikely(nr_shadow_entries >= KVM_MAX_PVMS))
+		return -ENOMEM;
+
+	/*
+	 * Initializing protected state might have failed, yet a malicious host
+	 * could trigger this function. Thus, ensure that shadow_table exists.
+	 */
+	if (unlikely(!shadow_table))
+		return -EINVAL;
+
+	/* Check that a shadow hasn't been created before for this host KVM. */
+	if (unlikely(__exists_shadow(kvm)))
+		return -EEXIST;
+
+	/* Find the next free entry in the shadow table. */
+	while (shadow_table[next_shadow_alloc])
+		next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;
+	shadow_handle = idx_to_shadow_handle(next_shadow_alloc);
+
+	vm->kvm.arch.pkvm.shadow_handle = shadow_handle;
+	vm->shadow_area_size = shadow_size;
+
+	/* VMID 0 is reserved for the host */
+	vmid = next_shadow_alloc + 1;
+	if (vmid > 0xff)
+		return -ENOMEM;
+
+	atomic64_set(&mmu->vmid.id, vmid);
+	mmu->arch = &vm->kvm.arch;
+	mmu->pgt = &vm->pgt;
+
+	shadow_table[next_shadow_alloc] = vm;
+	next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;
+	nr_shadow_entries++;
+
+	return shadow_handle;
+}
+
+/*
+ * Deallocate and remove the shadow table entry corresponding to the handle.
+ */
+static void remove_shadow_table(unsigned int shadow_handle)
+{
+	hyp_assert_lock_held(&shadow_lock);
+	shadow_table[shadow_handle_to_idx(shadow_handle)] = NULL;
+	nr_shadow_entries--;
+}
+
+static size_t pkvm_get_shadow_size(unsigned int nr_vcpus)
+{
+	/* Shadow space for the vm struct and all of its vcpu states. */
+	return sizeof(struct kvm_shadow_vm) +
+	       sizeof(struct kvm_shadow_vcpu_state) * nr_vcpus;
+}
+
+/*
+ * Check whether the size of the area donated by the host is sufficient for
+ * the shadow structures required for nr_vcpus as well as the shadow vm.
+ */
+static int check_shadow_size(unsigned int nr_vcpus, size_t shadow_size)
+{
+	if (nr_vcpus < 1 || nr_vcpus > KVM_MAX_VCPUS)
+		return -EINVAL;
+
+	/*
+	 * Shadow size is rounded up when allocated and donated by the host,
+	 * so it's likely to be larger than the sum of the struct sizes.
+	 */
+	if (shadow_size < pkvm_get_shadow_size(nr_vcpus))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void *map_donated_memory_noclear(unsigned long host_va, size_t size)
+{
+	void *va = (void *)kern_hyp_va(host_va);
+
+	if (!PAGE_ALIGNED(va) || !PAGE_ALIGNED(size))
+		return NULL;
+
+	if (__pkvm_host_donate_hyp(hyp_virt_to_pfn(va), size >> PAGE_SHIFT))
+		return NULL;
+
+	return va;
+}
+
+static void *map_donated_memory(unsigned long host_va, size_t size)
+{
+	void *va = map_donated_memory_noclear(host_va, size);
+
+	if (va)
+		memset(va, 0, size);
+
+	return va;
+}
+
+static void __unmap_donated_memory(void *va, size_t size)
+{
+	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va), size >> PAGE_SHIFT));
+}
+
+static void unmap_donated_memory(void *va, size_t size)
+{
+	if (!va)
+		return;
+
+	memset(va, 0, size);
+	__unmap_donated_memory(va, size);
+}
+
+static void unmap_donated_memory_noclear(void *va, size_t size)
+{
+	if (!va)
+		return;
+
+	__unmap_donated_memory(va, size);
+}
+
+/*
+ * Initialize the shadow copy of the protected VM state using the memory
+ * donated by the host.
+ *
+ * Unmaps the donated memory from the host at stage 2.
+ *
+ * kvm: A pointer to the host's struct kvm (host va).
+ * shadow_hva: The host va of the area being donated for the shadow state.
+ *	       Must be page aligned.
+ * shadow_size: The size of the area being donated for the shadow state.
+ *		Must be a multiple of the page size.
+ * pgd_hva: The host va of the area being donated for the stage-2 PGD for
+ *	    the VM. Must be page aligned. Its size is implied by the VM's
+ *	    VTCR.
+ * Note: An array to the host KVM VCPUs (host VA) is passed via the pgd, as to
+ *	 not to be dependent on how the VCPU's are layed out in struct kvm.
+ *
+ * Return a unique handle to the protected VM on success,
+ * negative error code on failure.
+ */
+int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
+		       size_t shadow_size, unsigned long pgd_hva)
+{
+	struct kvm_shadow_vm *vm = NULL;
+	unsigned int nr_vcpus;
+	size_t pgd_size = 0;
+	void *pgd = NULL;
+	int ret;
+
+	kvm = kern_hyp_va(kvm);
+	ret = hyp_pin_shared_mem(kvm, kvm + 1);
+	if (ret)
+		return ret;
+
+	nr_vcpus = READ_ONCE(kvm->created_vcpus);
+	ret = check_shadow_size(nr_vcpus, shadow_size);
+	if (ret)
+		goto err_unpin_kvm;
+
+	ret = -ENOMEM;
+
+	vm = map_donated_memory(shadow_hva, shadow_size);
+	if (!vm)
+		goto err_remove_mappings;
+
+	pgd_size = kvm_pgtable_stage2_pgd_size(host_kvm.arch.vtcr);
+	pgd = map_donated_memory_noclear(pgd_hva, pgd_size);
+	if (!pgd)
+		goto err_remove_mappings;
+
+	ret = set_host_vcpus(vm->shadow_vcpu_states, nr_vcpus, pgd, pgd_size);
+	if (ret)
+		goto err_remove_mappings;
+
+	ret = init_shadow_structs(kvm, vm, pgd, nr_vcpus);
+	if (ret < 0)
+		goto err_unpin_host_vcpus;
+
+	/* Add the entry to the shadow table. */
+	hyp_spin_lock(&shadow_lock);
+	ret = insert_shadow_table(kvm, vm, shadow_size);
+	if (ret < 0)
+		goto err_unlock_unpin_host_vcpus;
+
+	ret = kvm_guest_prepare_stage2(vm, pgd);
+	if (ret)
+		goto err_remove_shadow_table;
+	hyp_spin_unlock(&shadow_lock);
+
+	return vm->kvm.arch.pkvm.shadow_handle;
+
+err_remove_shadow_table:
+	remove_shadow_table(vm->kvm.arch.pkvm.shadow_handle);
+err_unlock_unpin_host_vcpus:
+	hyp_spin_unlock(&shadow_lock);
+err_unpin_host_vcpus:
+	unpin_host_vcpus(vm->shadow_vcpu_states, nr_vcpus);
+err_remove_mappings:
+	unmap_donated_memory(vm, shadow_size);
+	unmap_donated_memory_noclear(pgd, pgd_size);
+err_unpin_kvm:
+	hyp_unpin_shared_mem(kvm, kvm + 1);
+	return ret;
+}
+
+int __pkvm_teardown_shadow(unsigned int shadow_handle)
+{
+	struct kvm_shadow_vm *vm;
+	size_t shadow_size;
+	int err;
+
+	/* Lookup then remove entry from the shadow table. */
+	hyp_spin_lock(&shadow_lock);
+	vm = find_shadow_by_handle(shadow_handle);
+	if (!vm) {
+		err = -ENOENT;
+		goto err_unlock;
+	}
+
+	if (WARN_ON(hyp_page_count(vm))) {
+		err = -EBUSY;
+		goto err_unlock;
+	}
+
+	/* Ensure the VMID is clean before it can be reallocated */
+	__kvm_tlb_flush_vmid(&vm->kvm.arch.mmu);
+	remove_shadow_table(shadow_handle);
+	hyp_spin_unlock(&shadow_lock);
+
+	/* Reclaim guest pages (including page-table pages) */
+	reclaim_guest_pages(vm);
+	unpin_host_vcpus(vm->shadow_vcpu_states, vm->kvm.created_vcpus);
+
+	/* Push the metadata pages to the teardown memcache */
+	shadow_size = vm->shadow_area_size;
+	hyp_unpin_shared_mem(vm->host_kvm, vm->host_kvm + 1);
+
+	memset(vm, 0, shadow_size);
+	unmap_donated_memory_noclear(vm, shadow_size);
+	return 0;
+
+err_unlock:
+	hyp_spin_unlock(&shadow_lock);
+	return err;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 0312c9c74a5a..fb0eff15a89f 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -16,6 +16,7 @@
 #include <nvhe/memory.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
 unsigned long hyp_nr_cpus;
@@ -24,6 +25,7 @@ unsigned long hyp_nr_cpus;
 			 (unsigned long)__per_cpu_start)
 
 static void *vmemmap_base;
+static void *shadow_table_base;
 static void *hyp_pgt_base;
 static void *host_s2_pgt_base;
 static struct kvm_pgtable_mm_ops pkvm_pgtable_mm_ops;
@@ -40,6 +42,11 @@ static int divide_memory_pool(void *virt, unsigned long size)
 	if (!vmemmap_base)
 		return -ENOMEM;
 
+	nr_pages = hyp_shadow_table_pages();
+	shadow_table_base = hyp_early_alloc_contig(nr_pages);
+	if (!shadow_table_base)
+		return -ENOMEM;
+
 	nr_pages = hyp_s1_pgtable_pages();
 	hyp_pgt_base = hyp_early_alloc_contig(nr_pages);
 	if (!hyp_pgt_base)
@@ -314,6 +321,7 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
+	hyp_shadow_table_init(shadow_table_base);
 out:
 	/*
 	 * We tail-called to here from handle___pkvm_init() and will not return,
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 2cb3867eb7c2..1d300313009d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1200,6 +1200,15 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	return 0;
 }
 
+size_t kvm_pgtable_stage2_pgd_size(u64 vtcr)
+{
+	u32 ia_bits = VTCR_EL2_IPA(vtcr);
+	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
+	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+
+	return kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
+}
+
 static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 34229425b25d..3947063cc3a1 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -71,6 +71,7 @@ void __init kvm_hyp_reserve(void)
 
 	hyp_mem_pages += hyp_s1_pgtable_pages();
 	hyp_mem_pages += host_s2_pgtable_pages();
+	hyp_mem_pages += hyp_shadow_table_pages();
 	hyp_mem_pages += hyp_vmemmap_pages(STRUCT_HYP_PAGE_SIZE);
 
 	/*
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 12/24] KVM: arm64: Introduce shadow VM state at EL2
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Fuad Tabba <tabba@google.com>

Introduce a table of shadow VM structures at EL2 and provide hypercalls
to the host for creating and destroying shadow VMs.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h              |   2 +
 arch/arm64/include/asm/kvm_host.h             |   6 +
 arch/arm64/include/asm/kvm_pgtable.h          |   8 +
 arch/arm64/include/asm/kvm_pkvm.h             |   8 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   3 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  60 +++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  21 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  14 +
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 398 ++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |   8 +
 arch/arm64/kvm/hyp/pgtable.c                  |   9 +
 arch/arm64/kvm/pkvm.c                         |   1 +
 12 files changed, 538 insertions(+)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 2e277f2ed671..fac4ed699913 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -76,6 +76,8 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
 	__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
+	__KVM_HOST_SMCCC_FUNC___pkvm_init_shadow,
+	__KVM_HOST_SMCCC_FUNC___pkvm_teardown_shadow,
 };
 
 #define DECLARE_KVM_VHE_SYM(sym)	extern char sym[]
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2cc42e1fec18..41348ac728f9 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -115,6 +115,10 @@ struct kvm_smccc_features {
 	unsigned long vendor_hyp_bmap;
 };
 
+struct kvm_protected_vm {
+	unsigned int shadow_handle;
+};
+
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
@@ -166,6 +170,8 @@ struct kvm_arch {
 
 	/* Hypercall features firmware registers' descriptor */
 	struct kvm_smccc_features smccc_feat;
+
+	struct kvm_protected_vm pkvm;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 9f339dffbc1a..2d6b5058f7d3 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -288,6 +288,14 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
  */
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
 
+/*
+ * kvm_pgtable_stage2_pgd_size() - Helper to compute size of a stage-2 PGD
+ * @vtcr:	Content of the VTCR register.
+ *
+ * Return: the size (in bytes) of the stage-2 PGD
+ */
+size_t kvm_pgtable_stage2_pgd_size(u64 vtcr);
+
 /**
  * __kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 8f7b8a2314bb..11526e89fe5c 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -9,6 +9,9 @@
 #include <linux/memblock.h>
 #include <asm/kvm_pgtable.h>
 
+/* Maximum number of protected VMs that can be created. */
+#define KVM_MAX_PVMS 255
+
 #define HYP_MEMBLOCK_REGIONS 128
 
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
@@ -40,6 +43,11 @@ static inline unsigned long hyp_vmemmap_pages(size_t vmemmap_entry_size)
 	return res >> PAGE_SHIFT;
 }
 
+static inline unsigned long hyp_shadow_table_pages(void)
+{
+	return PAGE_ALIGN(KVM_MAX_PVMS * sizeof(void *)) >> PAGE_SHIFT;
+}
+
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
 	unsigned long total = 0, i;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 3bea816296dc..3a0817b5c739 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -11,6 +11,7 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_pgtable.h>
 #include <asm/virt.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/spinlock.h>
 
 /*
@@ -68,10 +69,12 @@ bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
 int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
 int kvm_host_prepare_stage2(void *pgt_pool_base);
+int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd);
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
+void reclaim_guest_pages(struct kvm_shadow_vm *vm);
 
 static __always_inline void __load_host_stage2(void)
 {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
new file mode 100644
index 000000000000..1d0a33f70879
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 Google LLC
+ * Author: Fuad Tabba <tabba@google.com>
+ */
+
+#ifndef __ARM64_KVM_NVHE_PKVM_H__
+#define __ARM64_KVM_NVHE_PKVM_H__
+
+#include <asm/kvm_pkvm.h>
+
+/*
+ * Holds the relevant data for maintaining the vcpu state completely at hyp.
+ */
+struct kvm_shadow_vcpu_state {
+	/* The data for the shadow vcpu. */
+	struct kvm_vcpu shadow_vcpu;
+
+	/* A pointer to the host's vcpu. */
+	struct kvm_vcpu *host_vcpu;
+
+	/* A pointer to the shadow vm. */
+	struct kvm_shadow_vm *shadow_vm;
+};
+
+/*
+ * Holds the relevant data for running a protected vm.
+ */
+struct kvm_shadow_vm {
+	/* The data for the shadow kvm. */
+	struct kvm kvm;
+
+	/* The host's kvm structure. */
+	struct kvm *host_kvm;
+
+	/* The total size of the donated shadow area. */
+	size_t shadow_area_size;
+
+	struct kvm_pgtable pgt;
+
+	/* Array of the shadow state per vcpu. */
+	struct kvm_shadow_vcpu_state shadow_vcpu_states[0];
+};
+
+static inline struct kvm_shadow_vcpu_state *get_shadow_state(struct kvm_vcpu *shadow_vcpu)
+{
+	return container_of(shadow_vcpu, struct kvm_shadow_vcpu_state, shadow_vcpu);
+}
+
+static inline struct kvm_shadow_vm *get_shadow_vm(struct kvm_vcpu *shadow_vcpu)
+{
+	return get_shadow_state(shadow_vcpu)->shadow_vm;
+}
+
+void hyp_shadow_table_init(void *tbl);
+int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
+		       size_t shadow_size, unsigned long pgd_hva);
+int __pkvm_teardown_shadow(unsigned int shadow_handle);
+
+#endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 3cea4b6ac23e..a1fbd11c8041 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -15,6 +15,7 @@
 
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
 DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
@@ -191,6 +192,24 @@ static void handle___pkvm_vcpu_init_traps(struct kvm_cpu_context *host_ctxt)
 	__pkvm_vcpu_init_traps(kern_hyp_va(vcpu));
 }
 
+static void handle___pkvm_init_shadow(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(struct kvm *, host_kvm, host_ctxt, 1);
+	DECLARE_REG(unsigned long, host_shadow_va, host_ctxt, 2);
+	DECLARE_REG(size_t, shadow_size, host_ctxt, 3);
+	DECLARE_REG(unsigned long, pgd, host_ctxt, 4);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_init_shadow(host_kvm, host_shadow_va,
+						   shadow_size, pgd);
+}
+
+static void handle___pkvm_teardown_shadow(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(unsigned int, shadow_handle, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_teardown_shadow(shadow_handle);
+}
+
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -220,6 +239,8 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__vgic_v3_save_aprs),
 	HANDLE_FUNC(__vgic_v3_restore_aprs),
 	HANDLE_FUNC(__pkvm_vcpu_init_traps),
+	HANDLE_FUNC(__pkvm_init_shadow),
+	HANDLE_FUNC(__pkvm_teardown_shadow),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index e2e3b30b072e..9baf731736be 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -141,6 +141,20 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
 	return 0;
 }
 
+int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd)
+{
+	vm->pgt.pgd = pgd;
+	return 0;
+}
+
+void reclaim_guest_pages(struct kvm_shadow_vm *vm)
+{
+	unsigned long nr_pages;
+
+	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
+	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
+}
+
 int __pkvm_prot_finalize(void)
 {
 	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 99c8d8b73e70..77aeb787670b 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -7,6 +7,9 @@
 #include <linux/kvm_host.h>
 #include <linux/mm.h>
 #include <nvhe/fixed_config.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/memory.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
 /*
@@ -183,3 +186,398 @@ void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
 	pvm_init_traps_aa64mmfr0(vcpu);
 	pvm_init_traps_aa64mmfr1(vcpu);
 }
+
+/*
+ * Start the shadow table handle at the offset defined instead of at 0.
+ * Mainly for sanity checking and debugging.
+ */
+#define HANDLE_OFFSET 0x1000
+
+static unsigned int shadow_handle_to_idx(unsigned int shadow_handle)
+{
+	return shadow_handle - HANDLE_OFFSET;
+}
+
+static unsigned int idx_to_shadow_handle(unsigned int idx)
+{
+	return idx + HANDLE_OFFSET;
+}
+
+/*
+ * Spinlock for protecting the shadow table related state.
+ * Protects writes to shadow_table, nr_shadow_entries, and next_shadow_alloc,
+ * as well as reads and writes to last_shadow_vcpu_lookup.
+ */
+static DEFINE_HYP_SPINLOCK(shadow_lock);
+
+/*
+ * The table of shadow entries for protected VMs in hyp.
+ * Allocated at hyp initialization and setup.
+ */
+static struct kvm_shadow_vm **shadow_table;
+
+/* Current number of vms in the shadow table. */
+static unsigned int nr_shadow_entries;
+
+/* The next entry index to try to allocate from. */
+static unsigned int next_shadow_alloc;
+
+void hyp_shadow_table_init(void *tbl)
+{
+	WARN_ON(shadow_table);
+	shadow_table = tbl;
+}
+
+/*
+ * Return the shadow vm corresponding to the handle.
+ */
+static struct kvm_shadow_vm *find_shadow_by_handle(unsigned int shadow_handle)
+{
+	unsigned int shadow_idx = shadow_handle_to_idx(shadow_handle);
+
+	if (unlikely(shadow_idx >= KVM_MAX_PVMS))
+		return NULL;
+
+	return shadow_table[shadow_idx];
+}
+
+static void unpin_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
+			     unsigned int nr_vcpus)
+{
+	int i;
+
+	for (i = 0; i < nr_vcpus; i++) {
+		struct kvm_vcpu *host_vcpu = shadow_vcpu_states[i].host_vcpu;
+		hyp_unpin_shared_mem(host_vcpu, host_vcpu + 1);
+	}
+}
+
+static int set_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
+			  unsigned int nr_vcpus,
+			  struct kvm_vcpu **vcpu_array,
+			  size_t vcpu_array_size)
+{
+	int i;
+
+	if (vcpu_array_size < sizeof(*vcpu_array) * nr_vcpus)
+		return -EINVAL;
+
+	for (i = 0; i < nr_vcpus; i++) {
+		struct kvm_vcpu *host_vcpu = kern_hyp_va(vcpu_array[i]);
+
+		if (hyp_pin_shared_mem(host_vcpu, host_vcpu + 1)) {
+			unpin_host_vcpus(shadow_vcpu_states, i);
+			return -EBUSY;
+		}
+
+		shadow_vcpu_states[i].host_vcpu = host_vcpu;
+	}
+
+	return 0;
+}
+
+static int init_shadow_structs(struct kvm *kvm, struct kvm_shadow_vm *vm,
+			       struct kvm_vcpu **vcpu_array,
+			       unsigned int nr_vcpus)
+{
+	int i;
+
+	vm->host_kvm = kvm;
+	vm->kvm.created_vcpus = nr_vcpus;
+	vm->kvm.arch.vtcr = host_kvm.arch.vtcr;
+
+	for (i = 0; i < nr_vcpus; i++) {
+		struct kvm_shadow_vcpu_state *shadow_vcpu_state = &vm->shadow_vcpu_states[i];
+		struct kvm_vcpu *shadow_vcpu = &shadow_vcpu_state->shadow_vcpu;
+		struct kvm_vcpu *host_vcpu = shadow_vcpu_state->host_vcpu;
+
+		shadow_vcpu_state->shadow_vm = vm;
+
+		shadow_vcpu->kvm = &vm->kvm;
+		shadow_vcpu->vcpu_id = READ_ONCE(host_vcpu->vcpu_id);
+		shadow_vcpu->vcpu_idx = i;
+
+		shadow_vcpu->arch.hw_mmu = &vm->kvm.arch.mmu;
+	}
+
+	return 0;
+}
+
+static bool __exists_shadow(struct kvm *host_kvm)
+{
+	int i;
+	unsigned int nr_checked = 0;
+
+	for (i = 0; i < KVM_MAX_PVMS && nr_checked < nr_shadow_entries; i++) {
+		if (!shadow_table[i])
+			continue;
+
+		if (unlikely(shadow_table[i]->host_kvm == host_kvm))
+			return true;
+
+		nr_checked++;
+	}
+
+	return false;
+}
+
+/*
+ * Allocate a shadow table entry and insert a pointer to the shadow vm.
+ *
+ * Return a unique handle to the protected VM on success,
+ * negative error code on failure.
+ */
+static unsigned int insert_shadow_table(struct kvm *kvm,
+					struct kvm_shadow_vm *vm,
+					size_t shadow_size)
+{
+	struct kvm_s2_mmu *mmu = &vm->kvm.arch.mmu;
+	unsigned int shadow_handle;
+	unsigned int vmid;
+
+	hyp_assert_lock_held(&shadow_lock);
+
+	if (unlikely(nr_shadow_entries >= KVM_MAX_PVMS))
+		return -ENOMEM;
+
+	/*
+	 * Initializing protected state might have failed, yet a malicious host
+	 * could trigger this function. Thus, ensure that shadow_table exists.
+	 */
+	if (unlikely(!shadow_table))
+		return -EINVAL;
+
+	/* Check that a shadow hasn't been created before for this host KVM. */
+	if (unlikely(__exists_shadow(kvm)))
+		return -EEXIST;
+
+	/* Find the next free entry in the shadow table. */
+	while (shadow_table[next_shadow_alloc])
+		next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;
+	shadow_handle = idx_to_shadow_handle(next_shadow_alloc);
+
+	vm->kvm.arch.pkvm.shadow_handle = shadow_handle;
+	vm->shadow_area_size = shadow_size;
+
+	/* VMID 0 is reserved for the host */
+	vmid = next_shadow_alloc + 1;
+	if (vmid > 0xff)
+		return -ENOMEM;
+
+	atomic64_set(&mmu->vmid.id, vmid);
+	mmu->arch = &vm->kvm.arch;
+	mmu->pgt = &vm->pgt;
+
+	shadow_table[next_shadow_alloc] = vm;
+	next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;
+	nr_shadow_entries++;
+
+	return shadow_handle;
+}
+
+/*
+ * Deallocate and remove the shadow table entry corresponding to the handle.
+ */
+static void remove_shadow_table(unsigned int shadow_handle)
+{
+	hyp_assert_lock_held(&shadow_lock);
+	shadow_table[shadow_handle_to_idx(shadow_handle)] = NULL;
+	nr_shadow_entries--;
+}
+
+static size_t pkvm_get_shadow_size(unsigned int nr_vcpus)
+{
+	/* Shadow space for the vm struct and all of its vcpu states. */
+	return sizeof(struct kvm_shadow_vm) +
+	       sizeof(struct kvm_shadow_vcpu_state) * nr_vcpus;
+}
+
+/*
+ * Check whether the size of the area donated by the host is sufficient for
+ * the shadow structures required for nr_vcpus as well as the shadow vm.
+ */
+static int check_shadow_size(unsigned int nr_vcpus, size_t shadow_size)
+{
+	if (nr_vcpus < 1 || nr_vcpus > KVM_MAX_VCPUS)
+		return -EINVAL;
+
+	/*
+	 * Shadow size is rounded up when allocated and donated by the host,
+	 * so it's likely to be larger than the sum of the struct sizes.
+	 */
+	if (shadow_size < pkvm_get_shadow_size(nr_vcpus))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void *map_donated_memory_noclear(unsigned long host_va, size_t size)
+{
+	void *va = (void *)kern_hyp_va(host_va);
+
+	if (!PAGE_ALIGNED(va) || !PAGE_ALIGNED(size))
+		return NULL;
+
+	if (__pkvm_host_donate_hyp(hyp_virt_to_pfn(va), size >> PAGE_SHIFT))
+		return NULL;
+
+	return va;
+}
+
+static void *map_donated_memory(unsigned long host_va, size_t size)
+{
+	void *va = map_donated_memory_noclear(host_va, size);
+
+	if (va)
+		memset(va, 0, size);
+
+	return va;
+}
+
+static void __unmap_donated_memory(void *va, size_t size)
+{
+	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va), size >> PAGE_SHIFT));
+}
+
+static void unmap_donated_memory(void *va, size_t size)
+{
+	if (!va)
+		return;
+
+	memset(va, 0, size);
+	__unmap_donated_memory(va, size);
+}
+
+static void unmap_donated_memory_noclear(void *va, size_t size)
+{
+	if (!va)
+		return;
+
+	__unmap_donated_memory(va, size);
+}
+
+/*
+ * Initialize the shadow copy of the protected VM state using the memory
+ * donated by the host.
+ *
+ * Unmaps the donated memory from the host at stage 2.
+ *
+ * kvm: A pointer to the host's struct kvm (host va).
+ * shadow_hva: The host va of the area being donated for the shadow state.
+ *	       Must be page aligned.
+ * shadow_size: The size of the area being donated for the shadow state.
+ *		Must be a multiple of the page size.
+ * pgd_hva: The host va of the area being donated for the stage-2 PGD for
+ *	    the VM. Must be page aligned. Its size is implied by the VM's
+ *	    VTCR.
+ * Note: An array to the host KVM VCPUs (host VA) is passed via the pgd, as to
+ *	 not to be dependent on how the VCPU's are layed out in struct kvm.
+ *
+ * Return a unique handle to the protected VM on success,
+ * negative error code on failure.
+ */
+int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
+		       size_t shadow_size, unsigned long pgd_hva)
+{
+	struct kvm_shadow_vm *vm = NULL;
+	unsigned int nr_vcpus;
+	size_t pgd_size = 0;
+	void *pgd = NULL;
+	int ret;
+
+	kvm = kern_hyp_va(kvm);
+	ret = hyp_pin_shared_mem(kvm, kvm + 1);
+	if (ret)
+		return ret;
+
+	nr_vcpus = READ_ONCE(kvm->created_vcpus);
+	ret = check_shadow_size(nr_vcpus, shadow_size);
+	if (ret)
+		goto err_unpin_kvm;
+
+	ret = -ENOMEM;
+
+	vm = map_donated_memory(shadow_hva, shadow_size);
+	if (!vm)
+		goto err_remove_mappings;
+
+	pgd_size = kvm_pgtable_stage2_pgd_size(host_kvm.arch.vtcr);
+	pgd = map_donated_memory_noclear(pgd_hva, pgd_size);
+	if (!pgd)
+		goto err_remove_mappings;
+
+	ret = set_host_vcpus(vm->shadow_vcpu_states, nr_vcpus, pgd, pgd_size);
+	if (ret)
+		goto err_remove_mappings;
+
+	ret = init_shadow_structs(kvm, vm, pgd, nr_vcpus);
+	if (ret < 0)
+		goto err_unpin_host_vcpus;
+
+	/* Add the entry to the shadow table. */
+	hyp_spin_lock(&shadow_lock);
+	ret = insert_shadow_table(kvm, vm, shadow_size);
+	if (ret < 0)
+		goto err_unlock_unpin_host_vcpus;
+
+	ret = kvm_guest_prepare_stage2(vm, pgd);
+	if (ret)
+		goto err_remove_shadow_table;
+	hyp_spin_unlock(&shadow_lock);
+
+	return vm->kvm.arch.pkvm.shadow_handle;
+
+err_remove_shadow_table:
+	remove_shadow_table(vm->kvm.arch.pkvm.shadow_handle);
+err_unlock_unpin_host_vcpus:
+	hyp_spin_unlock(&shadow_lock);
+err_unpin_host_vcpus:
+	unpin_host_vcpus(vm->shadow_vcpu_states, nr_vcpus);
+err_remove_mappings:
+	unmap_donated_memory(vm, shadow_size);
+	unmap_donated_memory_noclear(pgd, pgd_size);
+err_unpin_kvm:
+	hyp_unpin_shared_mem(kvm, kvm + 1);
+	return ret;
+}
+
+int __pkvm_teardown_shadow(unsigned int shadow_handle)
+{
+	struct kvm_shadow_vm *vm;
+	size_t shadow_size;
+	int err;
+
+	/* Lookup then remove entry from the shadow table. */
+	hyp_spin_lock(&shadow_lock);
+	vm = find_shadow_by_handle(shadow_handle);
+	if (!vm) {
+		err = -ENOENT;
+		goto err_unlock;
+	}
+
+	if (WARN_ON(hyp_page_count(vm))) {
+		err = -EBUSY;
+		goto err_unlock;
+	}
+
+	/* Ensure the VMID is clean before it can be reallocated */
+	__kvm_tlb_flush_vmid(&vm->kvm.arch.mmu);
+	remove_shadow_table(shadow_handle);
+	hyp_spin_unlock(&shadow_lock);
+
+	/* Reclaim guest pages (including page-table pages) */
+	reclaim_guest_pages(vm);
+	unpin_host_vcpus(vm->shadow_vcpu_states, vm->kvm.created_vcpus);
+
+	/* Push the metadata pages to the teardown memcache */
+	shadow_size = vm->shadow_area_size;
+	hyp_unpin_shared_mem(vm->host_kvm, vm->host_kvm + 1);
+
+	memset(vm, 0, shadow_size);
+	unmap_donated_memory_noclear(vm, shadow_size);
+	return 0;
+
+err_unlock:
+	hyp_spin_unlock(&shadow_lock);
+	return err;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 0312c9c74a5a..fb0eff15a89f 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -16,6 +16,7 @@
 #include <nvhe/memory.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
 unsigned long hyp_nr_cpus;
@@ -24,6 +25,7 @@ unsigned long hyp_nr_cpus;
 			 (unsigned long)__per_cpu_start)
 
 static void *vmemmap_base;
+static void *shadow_table_base;
 static void *hyp_pgt_base;
 static void *host_s2_pgt_base;
 static struct kvm_pgtable_mm_ops pkvm_pgtable_mm_ops;
@@ -40,6 +42,11 @@ static int divide_memory_pool(void *virt, unsigned long size)
 	if (!vmemmap_base)
 		return -ENOMEM;
 
+	nr_pages = hyp_shadow_table_pages();
+	shadow_table_base = hyp_early_alloc_contig(nr_pages);
+	if (!shadow_table_base)
+		return -ENOMEM;
+
 	nr_pages = hyp_s1_pgtable_pages();
 	hyp_pgt_base = hyp_early_alloc_contig(nr_pages);
 	if (!hyp_pgt_base)
@@ -314,6 +321,7 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
+	hyp_shadow_table_init(shadow_table_base);
 out:
 	/*
 	 * We tail-called to here from handle___pkvm_init() and will not return,
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 2cb3867eb7c2..1d300313009d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1200,6 +1200,15 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	return 0;
 }
 
+size_t kvm_pgtable_stage2_pgd_size(u64 vtcr)
+{
+	u32 ia_bits = VTCR_EL2_IPA(vtcr);
+	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
+	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+
+	return kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
+}
+
 static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 34229425b25d..3947063cc3a1 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -71,6 +71,7 @@ void __init kvm_hyp_reserve(void)
 
 	hyp_mem_pages += hyp_s1_pgtable_pages();
 	hyp_mem_pages += host_s2_pgtable_pages();
+	hyp_mem_pages += hyp_shadow_table_pages();
 	hyp_mem_pages += hyp_vmemmap_pages(STRUCT_HYP_PAGE_SIZE);
 
 	/*
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 12/24] KVM: arm64: Introduce shadow VM state at EL2
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Fuad Tabba <tabba@google.com>

Introduce a table of shadow VM structures at EL2 and provide hypercalls
to the host for creating and destroying shadow VMs.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h              |   2 +
 arch/arm64/include/asm/kvm_host.h             |   6 +
 arch/arm64/include/asm/kvm_pgtable.h          |   8 +
 arch/arm64/include/asm/kvm_pkvm.h             |   8 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   3 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  60 +++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  21 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  14 +
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 398 ++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |   8 +
 arch/arm64/kvm/hyp/pgtable.c                  |   9 +
 arch/arm64/kvm/pkvm.c                         |   1 +
 12 files changed, 538 insertions(+)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 2e277f2ed671..fac4ed699913 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -76,6 +76,8 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
 	__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
+	__KVM_HOST_SMCCC_FUNC___pkvm_init_shadow,
+	__KVM_HOST_SMCCC_FUNC___pkvm_teardown_shadow,
 };
 
 #define DECLARE_KVM_VHE_SYM(sym)	extern char sym[]
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2cc42e1fec18..41348ac728f9 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -115,6 +115,10 @@ struct kvm_smccc_features {
 	unsigned long vendor_hyp_bmap;
 };
 
+struct kvm_protected_vm {
+	unsigned int shadow_handle;
+};
+
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
@@ -166,6 +170,8 @@ struct kvm_arch {
 
 	/* Hypercall features firmware registers' descriptor */
 	struct kvm_smccc_features smccc_feat;
+
+	struct kvm_protected_vm pkvm;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 9f339dffbc1a..2d6b5058f7d3 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -288,6 +288,14 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
  */
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
 
+/*
+ * kvm_pgtable_stage2_pgd_size() - Helper to compute size of a stage-2 PGD
+ * @vtcr:	Content of the VTCR register.
+ *
+ * Return: the size (in bytes) of the stage-2 PGD
+ */
+size_t kvm_pgtable_stage2_pgd_size(u64 vtcr);
+
 /**
  * __kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 8f7b8a2314bb..11526e89fe5c 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -9,6 +9,9 @@
 #include <linux/memblock.h>
 #include <asm/kvm_pgtable.h>
 
+/* Maximum number of protected VMs that can be created. */
+#define KVM_MAX_PVMS 255
+
 #define HYP_MEMBLOCK_REGIONS 128
 
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
@@ -40,6 +43,11 @@ static inline unsigned long hyp_vmemmap_pages(size_t vmemmap_entry_size)
 	return res >> PAGE_SHIFT;
 }
 
+static inline unsigned long hyp_shadow_table_pages(void)
+{
+	return PAGE_ALIGN(KVM_MAX_PVMS * sizeof(void *)) >> PAGE_SHIFT;
+}
+
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
 	unsigned long total = 0, i;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 3bea816296dc..3a0817b5c739 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -11,6 +11,7 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_pgtable.h>
 #include <asm/virt.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/spinlock.h>
 
 /*
@@ -68,10 +69,12 @@ bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
 int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
 int kvm_host_prepare_stage2(void *pgt_pool_base);
+int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd);
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
+void reclaim_guest_pages(struct kvm_shadow_vm *vm);
 
 static __always_inline void __load_host_stage2(void)
 {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
new file mode 100644
index 000000000000..1d0a33f70879
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 Google LLC
+ * Author: Fuad Tabba <tabba@google.com>
+ */
+
+#ifndef __ARM64_KVM_NVHE_PKVM_H__
+#define __ARM64_KVM_NVHE_PKVM_H__
+
+#include <asm/kvm_pkvm.h>
+
+/*
+ * Holds the relevant data for maintaining the vcpu state completely at hyp.
+ */
+struct kvm_shadow_vcpu_state {
+	/* The data for the shadow vcpu. */
+	struct kvm_vcpu shadow_vcpu;
+
+	/* A pointer to the host's vcpu. */
+	struct kvm_vcpu *host_vcpu;
+
+	/* A pointer to the shadow vm. */
+	struct kvm_shadow_vm *shadow_vm;
+};
+
+/*
+ * Holds the relevant data for running a protected vm.
+ */
+struct kvm_shadow_vm {
+	/* The data for the shadow kvm. */
+	struct kvm kvm;
+
+	/* The host's kvm structure. */
+	struct kvm *host_kvm;
+
+	/* The total size of the donated shadow area. */
+	size_t shadow_area_size;
+
+	struct kvm_pgtable pgt;
+
+	/* Array of the shadow state per vcpu. */
+	struct kvm_shadow_vcpu_state shadow_vcpu_states[0];
+};
+
+static inline struct kvm_shadow_vcpu_state *get_shadow_state(struct kvm_vcpu *shadow_vcpu)
+{
+	return container_of(shadow_vcpu, struct kvm_shadow_vcpu_state, shadow_vcpu);
+}
+
+static inline struct kvm_shadow_vm *get_shadow_vm(struct kvm_vcpu *shadow_vcpu)
+{
+	return get_shadow_state(shadow_vcpu)->shadow_vm;
+}
+
+void hyp_shadow_table_init(void *tbl);
+int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
+		       size_t shadow_size, unsigned long pgd_hva);
+int __pkvm_teardown_shadow(unsigned int shadow_handle);
+
+#endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 3cea4b6ac23e..a1fbd11c8041 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -15,6 +15,7 @@
 
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
 DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
@@ -191,6 +192,24 @@ static void handle___pkvm_vcpu_init_traps(struct kvm_cpu_context *host_ctxt)
 	__pkvm_vcpu_init_traps(kern_hyp_va(vcpu));
 }
 
+static void handle___pkvm_init_shadow(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(struct kvm *, host_kvm, host_ctxt, 1);
+	DECLARE_REG(unsigned long, host_shadow_va, host_ctxt, 2);
+	DECLARE_REG(size_t, shadow_size, host_ctxt, 3);
+	DECLARE_REG(unsigned long, pgd, host_ctxt, 4);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_init_shadow(host_kvm, host_shadow_va,
+						   shadow_size, pgd);
+}
+
+static void handle___pkvm_teardown_shadow(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(unsigned int, shadow_handle, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_teardown_shadow(shadow_handle);
+}
+
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -220,6 +239,8 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__vgic_v3_save_aprs),
 	HANDLE_FUNC(__vgic_v3_restore_aprs),
 	HANDLE_FUNC(__pkvm_vcpu_init_traps),
+	HANDLE_FUNC(__pkvm_init_shadow),
+	HANDLE_FUNC(__pkvm_teardown_shadow),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index e2e3b30b072e..9baf731736be 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -141,6 +141,20 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
 	return 0;
 }
 
+int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd)
+{
+	vm->pgt.pgd = pgd;
+	return 0;
+}
+
+void reclaim_guest_pages(struct kvm_shadow_vm *vm)
+{
+	unsigned long nr_pages;
+
+	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
+	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
+}
+
 int __pkvm_prot_finalize(void)
 {
 	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 99c8d8b73e70..77aeb787670b 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -7,6 +7,9 @@
 #include <linux/kvm_host.h>
 #include <linux/mm.h>
 #include <nvhe/fixed_config.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/memory.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
 /*
@@ -183,3 +186,398 @@ void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
 	pvm_init_traps_aa64mmfr0(vcpu);
 	pvm_init_traps_aa64mmfr1(vcpu);
 }
+
+/*
+ * Start the shadow table handle at the offset defined instead of at 0.
+ * Mainly for sanity checking and debugging.
+ */
+#define HANDLE_OFFSET 0x1000
+
+static unsigned int shadow_handle_to_idx(unsigned int shadow_handle)
+{
+	return shadow_handle - HANDLE_OFFSET;
+}
+
+static unsigned int idx_to_shadow_handle(unsigned int idx)
+{
+	return idx + HANDLE_OFFSET;
+}
+
+/*
+ * Spinlock for protecting the shadow table related state.
+ * Protects writes to shadow_table, nr_shadow_entries, and next_shadow_alloc,
+ * as well as reads and writes to last_shadow_vcpu_lookup.
+ */
+static DEFINE_HYP_SPINLOCK(shadow_lock);
+
+/*
+ * The table of shadow entries for protected VMs in hyp.
+ * Allocated at hyp initialization and setup.
+ */
+static struct kvm_shadow_vm **shadow_table;
+
+/* Current number of vms in the shadow table. */
+static unsigned int nr_shadow_entries;
+
+/* The next entry index to try to allocate from. */
+static unsigned int next_shadow_alloc;
+
+void hyp_shadow_table_init(void *tbl)
+{
+	WARN_ON(shadow_table);
+	shadow_table = tbl;
+}
+
+/*
+ * Return the shadow vm corresponding to the handle.
+ */
+static struct kvm_shadow_vm *find_shadow_by_handle(unsigned int shadow_handle)
+{
+	unsigned int shadow_idx = shadow_handle_to_idx(shadow_handle);
+
+	if (unlikely(shadow_idx >= KVM_MAX_PVMS))
+		return NULL;
+
+	return shadow_table[shadow_idx];
+}
+
+static void unpin_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
+			     unsigned int nr_vcpus)
+{
+	int i;
+
+	for (i = 0; i < nr_vcpus; i++) {
+		struct kvm_vcpu *host_vcpu = shadow_vcpu_states[i].host_vcpu;
+		hyp_unpin_shared_mem(host_vcpu, host_vcpu + 1);
+	}
+}
+
+static int set_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
+			  unsigned int nr_vcpus,
+			  struct kvm_vcpu **vcpu_array,
+			  size_t vcpu_array_size)
+{
+	int i;
+
+	if (vcpu_array_size < sizeof(*vcpu_array) * nr_vcpus)
+		return -EINVAL;
+
+	for (i = 0; i < nr_vcpus; i++) {
+		struct kvm_vcpu *host_vcpu = kern_hyp_va(vcpu_array[i]);
+
+		if (hyp_pin_shared_mem(host_vcpu, host_vcpu + 1)) {
+			unpin_host_vcpus(shadow_vcpu_states, i);
+			return -EBUSY;
+		}
+
+		shadow_vcpu_states[i].host_vcpu = host_vcpu;
+	}
+
+	return 0;
+}
+
+static int init_shadow_structs(struct kvm *kvm, struct kvm_shadow_vm *vm,
+			       struct kvm_vcpu **vcpu_array,
+			       unsigned int nr_vcpus)
+{
+	int i;
+
+	vm->host_kvm = kvm;
+	vm->kvm.created_vcpus = nr_vcpus;
+	vm->kvm.arch.vtcr = host_kvm.arch.vtcr;
+
+	for (i = 0; i < nr_vcpus; i++) {
+		struct kvm_shadow_vcpu_state *shadow_vcpu_state = &vm->shadow_vcpu_states[i];
+		struct kvm_vcpu *shadow_vcpu = &shadow_vcpu_state->shadow_vcpu;
+		struct kvm_vcpu *host_vcpu = shadow_vcpu_state->host_vcpu;
+
+		shadow_vcpu_state->shadow_vm = vm;
+
+		shadow_vcpu->kvm = &vm->kvm;
+		shadow_vcpu->vcpu_id = READ_ONCE(host_vcpu->vcpu_id);
+		shadow_vcpu->vcpu_idx = i;
+
+		shadow_vcpu->arch.hw_mmu = &vm->kvm.arch.mmu;
+	}
+
+	return 0;
+}
+
+static bool __exists_shadow(struct kvm *host_kvm)
+{
+	int i;
+	unsigned int nr_checked = 0;
+
+	for (i = 0; i < KVM_MAX_PVMS && nr_checked < nr_shadow_entries; i++) {
+		if (!shadow_table[i])
+			continue;
+
+		if (unlikely(shadow_table[i]->host_kvm == host_kvm))
+			return true;
+
+		nr_checked++;
+	}
+
+	return false;
+}
+
+/*
+ * Allocate a shadow table entry and insert a pointer to the shadow vm.
+ *
+ * Return a unique handle to the protected VM on success,
+ * negative error code on failure.
+ */
+static unsigned int insert_shadow_table(struct kvm *kvm,
+					struct kvm_shadow_vm *vm,
+					size_t shadow_size)
+{
+	struct kvm_s2_mmu *mmu = &vm->kvm.arch.mmu;
+	unsigned int shadow_handle;
+	unsigned int vmid;
+
+	hyp_assert_lock_held(&shadow_lock);
+
+	if (unlikely(nr_shadow_entries >= KVM_MAX_PVMS))
+		return -ENOMEM;
+
+	/*
+	 * Initializing protected state might have failed, yet a malicious host
+	 * could trigger this function. Thus, ensure that shadow_table exists.
+	 */
+	if (unlikely(!shadow_table))
+		return -EINVAL;
+
+	/* Check that a shadow hasn't been created before for this host KVM. */
+	if (unlikely(__exists_shadow(kvm)))
+		return -EEXIST;
+
+	/* Find the next free entry in the shadow table. */
+	while (shadow_table[next_shadow_alloc])
+		next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;
+	shadow_handle = idx_to_shadow_handle(next_shadow_alloc);
+
+	vm->kvm.arch.pkvm.shadow_handle = shadow_handle;
+	vm->shadow_area_size = shadow_size;
+
+	/* VMID 0 is reserved for the host */
+	vmid = next_shadow_alloc + 1;
+	if (vmid > 0xff)
+		return -ENOMEM;
+
+	atomic64_set(&mmu->vmid.id, vmid);
+	mmu->arch = &vm->kvm.arch;
+	mmu->pgt = &vm->pgt;
+
+	shadow_table[next_shadow_alloc] = vm;
+	next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;
+	nr_shadow_entries++;
+
+	return shadow_handle;
+}
+
+/*
+ * Deallocate and remove the shadow table entry corresponding to the handle.
+ */
+static void remove_shadow_table(unsigned int shadow_handle)
+{
+	hyp_assert_lock_held(&shadow_lock);
+	shadow_table[shadow_handle_to_idx(shadow_handle)] = NULL;
+	nr_shadow_entries--;
+}
+
+static size_t pkvm_get_shadow_size(unsigned int nr_vcpus)
+{
+	/* Shadow space for the vm struct and all of its vcpu states. */
+	return sizeof(struct kvm_shadow_vm) +
+	       sizeof(struct kvm_shadow_vcpu_state) * nr_vcpus;
+}
+
+/*
+ * Check whether the size of the area donated by the host is sufficient for
+ * the shadow structures required for nr_vcpus as well as the shadow vm.
+ */
+static int check_shadow_size(unsigned int nr_vcpus, size_t shadow_size)
+{
+	if (nr_vcpus < 1 || nr_vcpus > KVM_MAX_VCPUS)
+		return -EINVAL;
+
+	/*
+	 * Shadow size is rounded up when allocated and donated by the host,
+	 * so it's likely to be larger than the sum of the struct sizes.
+	 */
+	if (shadow_size < pkvm_get_shadow_size(nr_vcpus))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void *map_donated_memory_noclear(unsigned long host_va, size_t size)
+{
+	void *va = (void *)kern_hyp_va(host_va);
+
+	if (!PAGE_ALIGNED(va) || !PAGE_ALIGNED(size))
+		return NULL;
+
+	if (__pkvm_host_donate_hyp(hyp_virt_to_pfn(va), size >> PAGE_SHIFT))
+		return NULL;
+
+	return va;
+}
+
+static void *map_donated_memory(unsigned long host_va, size_t size)
+{
+	void *va = map_donated_memory_noclear(host_va, size);
+
+	if (va)
+		memset(va, 0, size);
+
+	return va;
+}
+
+static void __unmap_donated_memory(void *va, size_t size)
+{
+	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va), size >> PAGE_SHIFT));
+}
+
+static void unmap_donated_memory(void *va, size_t size)
+{
+	if (!va)
+		return;
+
+	memset(va, 0, size);
+	__unmap_donated_memory(va, size);
+}
+
+static void unmap_donated_memory_noclear(void *va, size_t size)
+{
+	if (!va)
+		return;
+
+	__unmap_donated_memory(va, size);
+}
+
+/*
+ * Initialize the shadow copy of the protected VM state using the memory
+ * donated by the host.
+ *
+ * Unmaps the donated memory from the host at stage 2.
+ *
+ * kvm: A pointer to the host's struct kvm (host va).
+ * shadow_hva: The host va of the area being donated for the shadow state.
+ *	       Must be page aligned.
+ * shadow_size: The size of the area being donated for the shadow state.
+ *		Must be a multiple of the page size.
+ * pgd_hva: The host va of the area being donated for the stage-2 PGD for
+ *	    the VM. Must be page aligned. Its size is implied by the VM's
+ *	    VTCR.
+ * Note: An array to the host KVM VCPUs (host VA) is passed via the pgd, as to
+ *	 not to be dependent on how the VCPU's are layed out in struct kvm.
+ *
+ * Return a unique handle to the protected VM on success,
+ * negative error code on failure.
+ */
+int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
+		       size_t shadow_size, unsigned long pgd_hva)
+{
+	struct kvm_shadow_vm *vm = NULL;
+	unsigned int nr_vcpus;
+	size_t pgd_size = 0;
+	void *pgd = NULL;
+	int ret;
+
+	kvm = kern_hyp_va(kvm);
+	ret = hyp_pin_shared_mem(kvm, kvm + 1);
+	if (ret)
+		return ret;
+
+	nr_vcpus = READ_ONCE(kvm->created_vcpus);
+	ret = check_shadow_size(nr_vcpus, shadow_size);
+	if (ret)
+		goto err_unpin_kvm;
+
+	ret = -ENOMEM;
+
+	vm = map_donated_memory(shadow_hva, shadow_size);
+	if (!vm)
+		goto err_remove_mappings;
+
+	pgd_size = kvm_pgtable_stage2_pgd_size(host_kvm.arch.vtcr);
+	pgd = map_donated_memory_noclear(pgd_hva, pgd_size);
+	if (!pgd)
+		goto err_remove_mappings;
+
+	ret = set_host_vcpus(vm->shadow_vcpu_states, nr_vcpus, pgd, pgd_size);
+	if (ret)
+		goto err_remove_mappings;
+
+	ret = init_shadow_structs(kvm, vm, pgd, nr_vcpus);
+	if (ret < 0)
+		goto err_unpin_host_vcpus;
+
+	/* Add the entry to the shadow table. */
+	hyp_spin_lock(&shadow_lock);
+	ret = insert_shadow_table(kvm, vm, shadow_size);
+	if (ret < 0)
+		goto err_unlock_unpin_host_vcpus;
+
+	ret = kvm_guest_prepare_stage2(vm, pgd);
+	if (ret)
+		goto err_remove_shadow_table;
+	hyp_spin_unlock(&shadow_lock);
+
+	return vm->kvm.arch.pkvm.shadow_handle;
+
+err_remove_shadow_table:
+	remove_shadow_table(vm->kvm.arch.pkvm.shadow_handle);
+err_unlock_unpin_host_vcpus:
+	hyp_spin_unlock(&shadow_lock);
+err_unpin_host_vcpus:
+	unpin_host_vcpus(vm->shadow_vcpu_states, nr_vcpus);
+err_remove_mappings:
+	unmap_donated_memory(vm, shadow_size);
+	unmap_donated_memory_noclear(pgd, pgd_size);
+err_unpin_kvm:
+	hyp_unpin_shared_mem(kvm, kvm + 1);
+	return ret;
+}
+
+int __pkvm_teardown_shadow(unsigned int shadow_handle)
+{
+	struct kvm_shadow_vm *vm;
+	size_t shadow_size;
+	int err;
+
+	/* Lookup then remove entry from the shadow table. */
+	hyp_spin_lock(&shadow_lock);
+	vm = find_shadow_by_handle(shadow_handle);
+	if (!vm) {
+		err = -ENOENT;
+		goto err_unlock;
+	}
+
+	if (WARN_ON(hyp_page_count(vm))) {
+		err = -EBUSY;
+		goto err_unlock;
+	}
+
+	/* Ensure the VMID is clean before it can be reallocated */
+	__kvm_tlb_flush_vmid(&vm->kvm.arch.mmu);
+	remove_shadow_table(shadow_handle);
+	hyp_spin_unlock(&shadow_lock);
+
+	/* Reclaim guest pages (including page-table pages) */
+	reclaim_guest_pages(vm);
+	unpin_host_vcpus(vm->shadow_vcpu_states, vm->kvm.created_vcpus);
+
+	/* Push the metadata pages to the teardown memcache */
+	shadow_size = vm->shadow_area_size;
+	hyp_unpin_shared_mem(vm->host_kvm, vm->host_kvm + 1);
+
+	memset(vm, 0, shadow_size);
+	unmap_donated_memory_noclear(vm, shadow_size);
+	return 0;
+
+err_unlock:
+	hyp_spin_unlock(&shadow_lock);
+	return err;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 0312c9c74a5a..fb0eff15a89f 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -16,6 +16,7 @@
 #include <nvhe/memory.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
 unsigned long hyp_nr_cpus;
@@ -24,6 +25,7 @@ unsigned long hyp_nr_cpus;
 			 (unsigned long)__per_cpu_start)
 
 static void *vmemmap_base;
+static void *shadow_table_base;
 static void *hyp_pgt_base;
 static void *host_s2_pgt_base;
 static struct kvm_pgtable_mm_ops pkvm_pgtable_mm_ops;
@@ -40,6 +42,11 @@ static int divide_memory_pool(void *virt, unsigned long size)
 	if (!vmemmap_base)
 		return -ENOMEM;
 
+	nr_pages = hyp_shadow_table_pages();
+	shadow_table_base = hyp_early_alloc_contig(nr_pages);
+	if (!shadow_table_base)
+		return -ENOMEM;
+
 	nr_pages = hyp_s1_pgtable_pages();
 	hyp_pgt_base = hyp_early_alloc_contig(nr_pages);
 	if (!hyp_pgt_base)
@@ -314,6 +321,7 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
+	hyp_shadow_table_init(shadow_table_base);
 out:
 	/*
 	 * We tail-called to here from handle___pkvm_init() and will not return,
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 2cb3867eb7c2..1d300313009d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1200,6 +1200,15 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	return 0;
 }
 
+size_t kvm_pgtable_stage2_pgd_size(u64 vtcr)
+{
+	u32 ia_bits = VTCR_EL2_IPA(vtcr);
+	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
+	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+
+	return kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
+}
+
 static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 34229425b25d..3947063cc3a1 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -71,6 +71,7 @@ void __init kvm_hyp_reserve(void)
 
 	hyp_mem_pages += hyp_s1_pgtable_pages();
 	hyp_mem_pages += host_s2_pgtable_pages();
+	hyp_mem_pages += hyp_shadow_table_pages();
 	hyp_mem_pages += hyp_vmemmap_pages(STRUCT_HYP_PAGE_SIZE);
 
 	/*
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 13/24] KVM: arm64: Instantiate VM shadow data from EL1
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Fuad Tabba <tabba@google.com>

Now that EL2 provides calls to create and destroy shadow VM structures,
plumb these into the KVM code at EL1 so that a shadow VM is created on
first vCPU run and destroyed later along with the 'struct kvm' at
teardown time.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h  |   6 ++
 arch/arm64/include/asm/kvm_pkvm.h  |   4 ++
 arch/arm64/kvm/arm.c               |  14 ++++
 arch/arm64/kvm/hyp/hyp-constants.c |   3 +
 arch/arm64/kvm/pkvm.c              | 112 +++++++++++++++++++++++++++++
 5 files changed, 139 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 41348ac728f9..e91456f63161 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -117,6 +117,12 @@ struct kvm_smccc_features {
 
 struct kvm_protected_vm {
 	unsigned int shadow_handle;
+	struct mutex shadow_lock;
+
+	struct {
+		void *pgd;
+		void *shadow;
+	} hyp_donations;
 };
 
 struct kvm_arch {
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 11526e89fe5c..1dc7372950b1 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -14,6 +14,10 @@
 
 #define HYP_MEMBLOCK_REGIONS 128
 
+int kvm_init_pvm(struct kvm *kvm);
+int kvm_shadow_create(struct kvm *kvm);
+void kvm_shadow_destroy(struct kvm *kvm);
+
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
 extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a9dd7ec38f38..66e1d37858f1 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -37,6 +37,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_pkvm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/sections.h>
 
@@ -150,6 +151,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	if (ret)
 		goto out_free_stage2_pgd;
 
+	ret = kvm_init_pvm(kvm);
+	if (ret)
+		goto out_free_stage2_pgd;
+
 	if (!zalloc_cpumask_var(&kvm->arch.supported_cpus, GFP_KERNEL))
 		goto out_free_stage2_pgd;
 	cpumask_copy(kvm->arch.supported_cpus, cpu_possible_mask);
@@ -185,6 +190,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 
 	kvm_vgic_destroy(kvm);
 
+	if (is_protected_kvm_enabled())
+		kvm_shadow_destroy(kvm);
+
 	kvm_destroy_vcpus(kvm);
 
 	kvm_unshare_hyp(kvm, kvm + 1);
@@ -567,6 +575,12 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 	if (ret)
 		return ret;
 
+	if (is_protected_kvm_enabled()) {
+		ret = kvm_shadow_create(kvm);
+		if (ret)
+			return ret;
+	}
+
 	if (!irqchip_in_kernel(kvm)) {
 		/*
 		 * Tell the rest of the code that there are userspace irqchip
diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
index b3742a6691e8..eee79527f901 100644
--- a/arch/arm64/kvm/hyp/hyp-constants.c
+++ b/arch/arm64/kvm/hyp/hyp-constants.c
@@ -2,9 +2,12 @@
 
 #include <linux/kbuild.h>
 #include <nvhe/memory.h>
+#include <nvhe/pkvm.h>
 
 int main(void)
 {
 	DEFINE(STRUCT_HYP_PAGE_SIZE,	sizeof(struct hyp_page));
+	DEFINE(KVM_SHADOW_VM_SIZE,	sizeof(struct kvm_shadow_vm));
+	DEFINE(KVM_SHADOW_VCPU_STATE_SIZE, sizeof(struct kvm_shadow_vcpu_state));
 	return 0;
 }
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 3947063cc3a1..b4466b31d7c8 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -6,6 +6,7 @@
 
 #include <linux/kvm_host.h>
 #include <linux/memblock.h>
+#include <linux/mutex.h>
 #include <linux/sort.h>
 
 #include <asm/kvm_pkvm.h>
@@ -94,3 +95,114 @@ void __init kvm_hyp_reserve(void)
 	kvm_info("Reserved %lld MiB at 0x%llx\n", hyp_mem_size >> 20,
 		 hyp_mem_base);
 }
+
+/*
+ * Allocates and donates memory for EL2 shadow structs.
+ *
+ * Allocates space for the shadow state, which includes the shadow vm as well as
+ * the shadow vcpu states.
+ *
+ * Stores an opaque handler in the kvm struct for future reference.
+ *
+ * Return 0 on success, negative error code on failure.
+ */
+static int __kvm_shadow_create(struct kvm *kvm)
+{
+	struct kvm_vcpu *vcpu, **vcpu_array;
+	unsigned int shadow_handle;
+	size_t pgd_sz, shadow_sz;
+	void *pgd, *shadow_addr;
+	unsigned long idx;
+	int ret;
+
+	if (kvm->created_vcpus < 1)
+		return -EINVAL;
+
+	pgd_sz = kvm_pgtable_stage2_pgd_size(kvm->arch.vtcr);
+	/*
+	 * The PGD pages will be reclaimed using a hyp_memcache which implies
+	 * page granularity. So, use alloc_pages_exact() to get individual
+	 * refcounts.
+	 */
+	pgd = alloc_pages_exact(pgd_sz, GFP_KERNEL_ACCOUNT);
+	if (!pgd)
+		return -ENOMEM;
+
+	/* Allocate memory to donate to hyp for the kvm and vcpu state. */
+	shadow_sz = PAGE_ALIGN(KVM_SHADOW_VM_SIZE +
+			       KVM_SHADOW_VCPU_STATE_SIZE * kvm->created_vcpus);
+	shadow_addr = alloc_pages_exact(shadow_sz, GFP_KERNEL_ACCOUNT);
+	if (!shadow_addr) {
+		ret = -ENOMEM;
+		goto free_pgd;
+	}
+
+	/* Stash the vcpu pointers into the PGD */
+	BUILD_BUG_ON(KVM_MAX_VCPUS > (PAGE_SIZE / sizeof(u64)));
+	vcpu_array = pgd;
+	kvm_for_each_vcpu(idx, vcpu, kvm) {
+		/* Indexing of the vcpus to be sequential starting at 0. */
+		if (WARN_ON(vcpu->vcpu_idx != idx)) {
+			ret = -EINVAL;
+			goto free_shadow;
+		}
+
+		vcpu_array[idx] = vcpu;
+	}
+
+	/* Donate the shadow memory to hyp and let hyp initialize it. */
+	ret = kvm_call_hyp_nvhe(__pkvm_init_shadow, kvm, shadow_addr, shadow_sz,
+				pgd);
+	if (ret < 0)
+		goto free_shadow;
+
+	shadow_handle = ret;
+
+	/* Store the shadow handle given by hyp for future call reference. */
+	kvm->arch.pkvm.shadow_handle = shadow_handle;
+	kvm->arch.pkvm.hyp_donations.pgd = pgd;
+	kvm->arch.pkvm.hyp_donations.shadow = shadow_addr;
+	return 0;
+
+free_shadow:
+	free_pages_exact(shadow_addr, shadow_sz);
+free_pgd:
+	free_pages_exact(pgd, pgd_sz);
+	return ret;
+}
+
+int kvm_shadow_create(struct kvm *kvm)
+{
+	int ret = 0;
+
+	mutex_lock(&kvm->arch.pkvm.shadow_lock);
+	if (!kvm->arch.pkvm.shadow_handle)
+		ret = __kvm_shadow_create(kvm);
+	mutex_unlock(&kvm->arch.pkvm.shadow_lock);
+
+	return ret;
+}
+
+void kvm_shadow_destroy(struct kvm *kvm)
+{
+	size_t pgd_sz, shadow_sz;
+
+	if (kvm->arch.pkvm.shadow_handle)
+		WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_shadow,
+					  kvm->arch.pkvm.shadow_handle));
+
+	kvm->arch.pkvm.shadow_handle = 0;
+
+	shadow_sz = PAGE_ALIGN(KVM_SHADOW_VM_SIZE +
+			       KVM_SHADOW_VCPU_STATE_SIZE * kvm->created_vcpus);
+	pgd_sz = kvm_pgtable_stage2_pgd_size(kvm->arch.vtcr);
+
+	free_pages_exact(kvm->arch.pkvm.hyp_donations.shadow, shadow_sz);
+	free_pages_exact(kvm->arch.pkvm.hyp_donations.pgd, pgd_sz);
+}
+
+int kvm_init_pvm(struct kvm *kvm)
+{
+	mutex_init(&kvm->arch.pkvm.shadow_lock);
+	return 0;
+}
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 13/24] KVM: arm64: Instantiate VM shadow data from EL1
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Fuad Tabba <tabba@google.com>

Now that EL2 provides calls to create and destroy shadow VM structures,
plumb these into the KVM code at EL1 so that a shadow VM is created on
first vCPU run and destroyed later along with the 'struct kvm' at
teardown time.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h  |   6 ++
 arch/arm64/include/asm/kvm_pkvm.h  |   4 ++
 arch/arm64/kvm/arm.c               |  14 ++++
 arch/arm64/kvm/hyp/hyp-constants.c |   3 +
 arch/arm64/kvm/pkvm.c              | 112 +++++++++++++++++++++++++++++
 5 files changed, 139 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 41348ac728f9..e91456f63161 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -117,6 +117,12 @@ struct kvm_smccc_features {
 
 struct kvm_protected_vm {
 	unsigned int shadow_handle;
+	struct mutex shadow_lock;
+
+	struct {
+		void *pgd;
+		void *shadow;
+	} hyp_donations;
 };
 
 struct kvm_arch {
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 11526e89fe5c..1dc7372950b1 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -14,6 +14,10 @@
 
 #define HYP_MEMBLOCK_REGIONS 128
 
+int kvm_init_pvm(struct kvm *kvm);
+int kvm_shadow_create(struct kvm *kvm);
+void kvm_shadow_destroy(struct kvm *kvm);
+
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
 extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a9dd7ec38f38..66e1d37858f1 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -37,6 +37,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_pkvm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/sections.h>
 
@@ -150,6 +151,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	if (ret)
 		goto out_free_stage2_pgd;
 
+	ret = kvm_init_pvm(kvm);
+	if (ret)
+		goto out_free_stage2_pgd;
+
 	if (!zalloc_cpumask_var(&kvm->arch.supported_cpus, GFP_KERNEL))
 		goto out_free_stage2_pgd;
 	cpumask_copy(kvm->arch.supported_cpus, cpu_possible_mask);
@@ -185,6 +190,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 
 	kvm_vgic_destroy(kvm);
 
+	if (is_protected_kvm_enabled())
+		kvm_shadow_destroy(kvm);
+
 	kvm_destroy_vcpus(kvm);
 
 	kvm_unshare_hyp(kvm, kvm + 1);
@@ -567,6 +575,12 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 	if (ret)
 		return ret;
 
+	if (is_protected_kvm_enabled()) {
+		ret = kvm_shadow_create(kvm);
+		if (ret)
+			return ret;
+	}
+
 	if (!irqchip_in_kernel(kvm)) {
 		/*
 		 * Tell the rest of the code that there are userspace irqchip
diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
index b3742a6691e8..eee79527f901 100644
--- a/arch/arm64/kvm/hyp/hyp-constants.c
+++ b/arch/arm64/kvm/hyp/hyp-constants.c
@@ -2,9 +2,12 @@
 
 #include <linux/kbuild.h>
 #include <nvhe/memory.h>
+#include <nvhe/pkvm.h>
 
 int main(void)
 {
 	DEFINE(STRUCT_HYP_PAGE_SIZE,	sizeof(struct hyp_page));
+	DEFINE(KVM_SHADOW_VM_SIZE,	sizeof(struct kvm_shadow_vm));
+	DEFINE(KVM_SHADOW_VCPU_STATE_SIZE, sizeof(struct kvm_shadow_vcpu_state));
 	return 0;
 }
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 3947063cc3a1..b4466b31d7c8 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -6,6 +6,7 @@
 
 #include <linux/kvm_host.h>
 #include <linux/memblock.h>
+#include <linux/mutex.h>
 #include <linux/sort.h>
 
 #include <asm/kvm_pkvm.h>
@@ -94,3 +95,114 @@ void __init kvm_hyp_reserve(void)
 	kvm_info("Reserved %lld MiB at 0x%llx\n", hyp_mem_size >> 20,
 		 hyp_mem_base);
 }
+
+/*
+ * Allocates and donates memory for EL2 shadow structs.
+ *
+ * Allocates space for the shadow state, which includes the shadow vm as well as
+ * the shadow vcpu states.
+ *
+ * Stores an opaque handler in the kvm struct for future reference.
+ *
+ * Return 0 on success, negative error code on failure.
+ */
+static int __kvm_shadow_create(struct kvm *kvm)
+{
+	struct kvm_vcpu *vcpu, **vcpu_array;
+	unsigned int shadow_handle;
+	size_t pgd_sz, shadow_sz;
+	void *pgd, *shadow_addr;
+	unsigned long idx;
+	int ret;
+
+	if (kvm->created_vcpus < 1)
+		return -EINVAL;
+
+	pgd_sz = kvm_pgtable_stage2_pgd_size(kvm->arch.vtcr);
+	/*
+	 * The PGD pages will be reclaimed using a hyp_memcache which implies
+	 * page granularity. So, use alloc_pages_exact() to get individual
+	 * refcounts.
+	 */
+	pgd = alloc_pages_exact(pgd_sz, GFP_KERNEL_ACCOUNT);
+	if (!pgd)
+		return -ENOMEM;
+
+	/* Allocate memory to donate to hyp for the kvm and vcpu state. */
+	shadow_sz = PAGE_ALIGN(KVM_SHADOW_VM_SIZE +
+			       KVM_SHADOW_VCPU_STATE_SIZE * kvm->created_vcpus);
+	shadow_addr = alloc_pages_exact(shadow_sz, GFP_KERNEL_ACCOUNT);
+	if (!shadow_addr) {
+		ret = -ENOMEM;
+		goto free_pgd;
+	}
+
+	/* Stash the vcpu pointers into the PGD */
+	BUILD_BUG_ON(KVM_MAX_VCPUS > (PAGE_SIZE / sizeof(u64)));
+	vcpu_array = pgd;
+	kvm_for_each_vcpu(idx, vcpu, kvm) {
+		/* Indexing of the vcpus to be sequential starting at 0. */
+		if (WARN_ON(vcpu->vcpu_idx != idx)) {
+			ret = -EINVAL;
+			goto free_shadow;
+		}
+
+		vcpu_array[idx] = vcpu;
+	}
+
+	/* Donate the shadow memory to hyp and let hyp initialize it. */
+	ret = kvm_call_hyp_nvhe(__pkvm_init_shadow, kvm, shadow_addr, shadow_sz,
+				pgd);
+	if (ret < 0)
+		goto free_shadow;
+
+	shadow_handle = ret;
+
+	/* Store the shadow handle given by hyp for future call reference. */
+	kvm->arch.pkvm.shadow_handle = shadow_handle;
+	kvm->arch.pkvm.hyp_donations.pgd = pgd;
+	kvm->arch.pkvm.hyp_donations.shadow = shadow_addr;
+	return 0;
+
+free_shadow:
+	free_pages_exact(shadow_addr, shadow_sz);
+free_pgd:
+	free_pages_exact(pgd, pgd_sz);
+	return ret;
+}
+
+int kvm_shadow_create(struct kvm *kvm)
+{
+	int ret = 0;
+
+	mutex_lock(&kvm->arch.pkvm.shadow_lock);
+	if (!kvm->arch.pkvm.shadow_handle)
+		ret = __kvm_shadow_create(kvm);
+	mutex_unlock(&kvm->arch.pkvm.shadow_lock);
+
+	return ret;
+}
+
+void kvm_shadow_destroy(struct kvm *kvm)
+{
+	size_t pgd_sz, shadow_sz;
+
+	if (kvm->arch.pkvm.shadow_handle)
+		WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_shadow,
+					  kvm->arch.pkvm.shadow_handle));
+
+	kvm->arch.pkvm.shadow_handle = 0;
+
+	shadow_sz = PAGE_ALIGN(KVM_SHADOW_VM_SIZE +
+			       KVM_SHADOW_VCPU_STATE_SIZE * kvm->created_vcpus);
+	pgd_sz = kvm_pgtable_stage2_pgd_size(kvm->arch.vtcr);
+
+	free_pages_exact(kvm->arch.pkvm.hyp_donations.shadow, shadow_sz);
+	free_pages_exact(kvm->arch.pkvm.hyp_donations.pgd, pgd_sz);
+}
+
+int kvm_init_pvm(struct kvm *kvm)
+{
+	mutex_init(&kvm->arch.pkvm.shadow_lock);
+	return 0;
+}
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 13/24] KVM: arm64: Instantiate VM shadow data from EL1
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Fuad Tabba <tabba@google.com>

Now that EL2 provides calls to create and destroy shadow VM structures,
plumb these into the KVM code at EL1 so that a shadow VM is created on
first vCPU run and destroyed later along with the 'struct kvm' at
teardown time.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h  |   6 ++
 arch/arm64/include/asm/kvm_pkvm.h  |   4 ++
 arch/arm64/kvm/arm.c               |  14 ++++
 arch/arm64/kvm/hyp/hyp-constants.c |   3 +
 arch/arm64/kvm/pkvm.c              | 112 +++++++++++++++++++++++++++++
 5 files changed, 139 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 41348ac728f9..e91456f63161 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -117,6 +117,12 @@ struct kvm_smccc_features {
 
 struct kvm_protected_vm {
 	unsigned int shadow_handle;
+	struct mutex shadow_lock;
+
+	struct {
+		void *pgd;
+		void *shadow;
+	} hyp_donations;
 };
 
 struct kvm_arch {
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 11526e89fe5c..1dc7372950b1 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -14,6 +14,10 @@
 
 #define HYP_MEMBLOCK_REGIONS 128
 
+int kvm_init_pvm(struct kvm *kvm);
+int kvm_shadow_create(struct kvm *kvm);
+void kvm_shadow_destroy(struct kvm *kvm);
+
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
 extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a9dd7ec38f38..66e1d37858f1 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -37,6 +37,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_pkvm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/sections.h>
 
@@ -150,6 +151,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	if (ret)
 		goto out_free_stage2_pgd;
 
+	ret = kvm_init_pvm(kvm);
+	if (ret)
+		goto out_free_stage2_pgd;
+
 	if (!zalloc_cpumask_var(&kvm->arch.supported_cpus, GFP_KERNEL))
 		goto out_free_stage2_pgd;
 	cpumask_copy(kvm->arch.supported_cpus, cpu_possible_mask);
@@ -185,6 +190,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 
 	kvm_vgic_destroy(kvm);
 
+	if (is_protected_kvm_enabled())
+		kvm_shadow_destroy(kvm);
+
 	kvm_destroy_vcpus(kvm);
 
 	kvm_unshare_hyp(kvm, kvm + 1);
@@ -567,6 +575,12 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 	if (ret)
 		return ret;
 
+	if (is_protected_kvm_enabled()) {
+		ret = kvm_shadow_create(kvm);
+		if (ret)
+			return ret;
+	}
+
 	if (!irqchip_in_kernel(kvm)) {
 		/*
 		 * Tell the rest of the code that there are userspace irqchip
diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
index b3742a6691e8..eee79527f901 100644
--- a/arch/arm64/kvm/hyp/hyp-constants.c
+++ b/arch/arm64/kvm/hyp/hyp-constants.c
@@ -2,9 +2,12 @@
 
 #include <linux/kbuild.h>
 #include <nvhe/memory.h>
+#include <nvhe/pkvm.h>
 
 int main(void)
 {
 	DEFINE(STRUCT_HYP_PAGE_SIZE,	sizeof(struct hyp_page));
+	DEFINE(KVM_SHADOW_VM_SIZE,	sizeof(struct kvm_shadow_vm));
+	DEFINE(KVM_SHADOW_VCPU_STATE_SIZE, sizeof(struct kvm_shadow_vcpu_state));
 	return 0;
 }
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 3947063cc3a1..b4466b31d7c8 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -6,6 +6,7 @@
 
 #include <linux/kvm_host.h>
 #include <linux/memblock.h>
+#include <linux/mutex.h>
 #include <linux/sort.h>
 
 #include <asm/kvm_pkvm.h>
@@ -94,3 +95,114 @@ void __init kvm_hyp_reserve(void)
 	kvm_info("Reserved %lld MiB at 0x%llx\n", hyp_mem_size >> 20,
 		 hyp_mem_base);
 }
+
+/*
+ * Allocates and donates memory for EL2 shadow structs.
+ *
+ * Allocates space for the shadow state, which includes the shadow vm as well as
+ * the shadow vcpu states.
+ *
+ * Stores an opaque handler in the kvm struct for future reference.
+ *
+ * Return 0 on success, negative error code on failure.
+ */
+static int __kvm_shadow_create(struct kvm *kvm)
+{
+	struct kvm_vcpu *vcpu, **vcpu_array;
+	unsigned int shadow_handle;
+	size_t pgd_sz, shadow_sz;
+	void *pgd, *shadow_addr;
+	unsigned long idx;
+	int ret;
+
+	if (kvm->created_vcpus < 1)
+		return -EINVAL;
+
+	pgd_sz = kvm_pgtable_stage2_pgd_size(kvm->arch.vtcr);
+	/*
+	 * The PGD pages will be reclaimed using a hyp_memcache which implies
+	 * page granularity. So, use alloc_pages_exact() to get individual
+	 * refcounts.
+	 */
+	pgd = alloc_pages_exact(pgd_sz, GFP_KERNEL_ACCOUNT);
+	if (!pgd)
+		return -ENOMEM;
+
+	/* Allocate memory to donate to hyp for the kvm and vcpu state. */
+	shadow_sz = PAGE_ALIGN(KVM_SHADOW_VM_SIZE +
+			       KVM_SHADOW_VCPU_STATE_SIZE * kvm->created_vcpus);
+	shadow_addr = alloc_pages_exact(shadow_sz, GFP_KERNEL_ACCOUNT);
+	if (!shadow_addr) {
+		ret = -ENOMEM;
+		goto free_pgd;
+	}
+
+	/* Stash the vcpu pointers into the PGD */
+	BUILD_BUG_ON(KVM_MAX_VCPUS > (PAGE_SIZE / sizeof(u64)));
+	vcpu_array = pgd;
+	kvm_for_each_vcpu(idx, vcpu, kvm) {
+		/* Indexing of the vcpus to be sequential starting at 0. */
+		if (WARN_ON(vcpu->vcpu_idx != idx)) {
+			ret = -EINVAL;
+			goto free_shadow;
+		}
+
+		vcpu_array[idx] = vcpu;
+	}
+
+	/* Donate the shadow memory to hyp and let hyp initialize it. */
+	ret = kvm_call_hyp_nvhe(__pkvm_init_shadow, kvm, shadow_addr, shadow_sz,
+				pgd);
+	if (ret < 0)
+		goto free_shadow;
+
+	shadow_handle = ret;
+
+	/* Store the shadow handle given by hyp for future call reference. */
+	kvm->arch.pkvm.shadow_handle = shadow_handle;
+	kvm->arch.pkvm.hyp_donations.pgd = pgd;
+	kvm->arch.pkvm.hyp_donations.shadow = shadow_addr;
+	return 0;
+
+free_shadow:
+	free_pages_exact(shadow_addr, shadow_sz);
+free_pgd:
+	free_pages_exact(pgd, pgd_sz);
+	return ret;
+}
+
+int kvm_shadow_create(struct kvm *kvm)
+{
+	int ret = 0;
+
+	mutex_lock(&kvm->arch.pkvm.shadow_lock);
+	if (!kvm->arch.pkvm.shadow_handle)
+		ret = __kvm_shadow_create(kvm);
+	mutex_unlock(&kvm->arch.pkvm.shadow_lock);
+
+	return ret;
+}
+
+void kvm_shadow_destroy(struct kvm *kvm)
+{
+	size_t pgd_sz, shadow_sz;
+
+	if (kvm->arch.pkvm.shadow_handle)
+		WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_shadow,
+					  kvm->arch.pkvm.shadow_handle));
+
+	kvm->arch.pkvm.shadow_handle = 0;
+
+	shadow_sz = PAGE_ALIGN(KVM_SHADOW_VM_SIZE +
+			       KVM_SHADOW_VCPU_STATE_SIZE * kvm->created_vcpus);
+	pgd_sz = kvm_pgtable_stage2_pgd_size(kvm->arch.vtcr);
+
+	free_pages_exact(kvm->arch.pkvm.hyp_donations.shadow, shadow_sz);
+	free_pages_exact(kvm->arch.pkvm.hyp_donations.pgd, pgd_sz);
+}
+
+int kvm_init_pvm(struct kvm *kvm)
+{
+	mutex_init(&kvm->arch.pkvm.shadow_lock);
+	return 0;
+}
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 14/24] KVM: arm64: Add pcpu fixmap infrastructure at EL2
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

We will soon need to temporarily map pages into the hypervisor stage-1
in nVHE protected mode. To do this efficiently, let's introduce a
per-cpu fixmap allowing to map a single page without needing to take any
lock or to allocate memory.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |  4 ++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  1 -
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 72 +++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |  4 ++
 5 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 3a0817b5c739..d11d9d68a680 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -59,6 +59,8 @@ enum pkvm_component_id {
 	PKVM_ID_HYP,
 };
 
+extern unsigned long hyp_nr_cpus;
+
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
 int __pkvm_host_unshare_hyp(u64 pfn);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index b2ee6d5df55b..882c5711eda5 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -13,6 +13,10 @@
 extern struct kvm_pgtable pkvm_pgtable;
 extern hyp_spinlock_t pkvm_pgd_lock;
 
+int hyp_create_pcpu_fixmap(void);
+void *hyp_fixmap_map(phys_addr_t phys);
+int hyp_fixmap_unmap(void);
+
 int hyp_create_idmap(u32 hyp_va_bits);
 int hyp_map_vectors(void);
 int hyp_back_vmemmap(phys_addr_t back);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 9baf731736be..a0af23de2640 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -21,7 +21,6 @@
 
 #define KVM_HOST_S2_FLAGS (KVM_PGTABLE_S2_NOFWB | KVM_PGTABLE_S2_IDMAP)
 
-extern unsigned long hyp_nr_cpus;
 struct host_kvm host_kvm;
 
 static struct hyp_pool host_s2_pool;
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index d3a3b47181de..17d689483ec4 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -14,6 +14,7 @@
 #include <nvhe/early_alloc.h>
 #include <nvhe/gfp.h>
 #include <nvhe/memory.h>
+#include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 #include <nvhe/spinlock.h>
 
@@ -24,6 +25,7 @@ struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
 unsigned int hyp_memblock_nr;
 
 static u64 __io_map_base;
+static DEFINE_PER_CPU(void *, hyp_fixmap_base);
 
 static int __pkvm_create_mappings(unsigned long start, unsigned long size,
 				  unsigned long phys, enum kvm_pgtable_prot prot)
@@ -212,6 +214,76 @@ int hyp_map_vectors(void)
 	return 0;
 }
 
+void *hyp_fixmap_map(phys_addr_t phys)
+{
+	void *addr = *this_cpu_ptr(&hyp_fixmap_base);
+	int ret = kvm_pgtable_hyp_map(&pkvm_pgtable, (u64)addr, PAGE_SIZE,
+				      phys, PAGE_HYP);
+	return ret ? NULL : addr;
+}
+
+int hyp_fixmap_unmap(void)
+{
+	void *addr = *this_cpu_ptr(&hyp_fixmap_base);
+	int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, (u64)addr, PAGE_SIZE);
+
+	return (ret != PAGE_SIZE) ? -EINVAL : 0;
+}
+
+static int __pin_pgtable_cb(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			    enum kvm_pgtable_walk_flags flag, void * const arg)
+{
+	if (!kvm_pte_valid(*ptep) || level != KVM_PGTABLE_MAX_LEVELS - 1)
+		return -EINVAL;
+	hyp_page_ref_inc(hyp_virt_to_page(ptep));
+
+	return 0;
+}
+
+static int hyp_pin_pgtable_pages(u64 addr)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= __pin_pgtable_cb,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
+	};
+
+	return kvm_pgtable_walk(&pkvm_pgtable, addr, PAGE_SIZE, &walker);
+}
+
+int hyp_create_pcpu_fixmap(void)
+{
+	unsigned long addr, i;
+	int ret;
+
+	for (i = 0; i < hyp_nr_cpus; i++) {
+		ret = pkvm_alloc_private_va_range(PAGE_SIZE, &addr);
+		if (ret)
+			return ret;
+
+		/*
+		 * Create a dummy mapping, to get the intermediate page-table
+		 * pages allocated, then take a reference on the last level
+		 * page to keep it around at all times.
+		 */
+		ret = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, PAGE_SIZE,
+					  __hyp_pa(__hyp_bss_start), PAGE_HYP);
+		if (ret)
+			return ret;
+
+		ret = hyp_pin_pgtable_pages(addr);
+		if (ret)
+			return ret;
+
+		ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, addr, PAGE_SIZE);
+		if (ret != PAGE_SIZE)
+			return -EINVAL;
+
+		*per_cpu_ptr(&hyp_fixmap_base, i) = (void *)addr;
+	}
+
+	return 0;
+}
+
 int hyp_create_idmap(u32 hyp_va_bits)
 {
 	unsigned long start, end;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index fb0eff15a89f..3f689ffb2693 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -321,6 +321,10 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
+	ret = hyp_create_pcpu_fixmap();
+	if (ret)
+		goto out;
+
 	hyp_shadow_table_init(shadow_table_base);
 out:
 	/*
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 14/24] KVM: arm64: Add pcpu fixmap infrastructure at EL2
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

We will soon need to temporarily map pages into the hypervisor stage-1
in nVHE protected mode. To do this efficiently, let's introduce a
per-cpu fixmap allowing to map a single page without needing to take any
lock or to allocate memory.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |  4 ++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  1 -
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 72 +++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |  4 ++
 5 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 3a0817b5c739..d11d9d68a680 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -59,6 +59,8 @@ enum pkvm_component_id {
 	PKVM_ID_HYP,
 };
 
+extern unsigned long hyp_nr_cpus;
+
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
 int __pkvm_host_unshare_hyp(u64 pfn);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index b2ee6d5df55b..882c5711eda5 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -13,6 +13,10 @@
 extern struct kvm_pgtable pkvm_pgtable;
 extern hyp_spinlock_t pkvm_pgd_lock;
 
+int hyp_create_pcpu_fixmap(void);
+void *hyp_fixmap_map(phys_addr_t phys);
+int hyp_fixmap_unmap(void);
+
 int hyp_create_idmap(u32 hyp_va_bits);
 int hyp_map_vectors(void);
 int hyp_back_vmemmap(phys_addr_t back);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 9baf731736be..a0af23de2640 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -21,7 +21,6 @@
 
 #define KVM_HOST_S2_FLAGS (KVM_PGTABLE_S2_NOFWB | KVM_PGTABLE_S2_IDMAP)
 
-extern unsigned long hyp_nr_cpus;
 struct host_kvm host_kvm;
 
 static struct hyp_pool host_s2_pool;
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index d3a3b47181de..17d689483ec4 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -14,6 +14,7 @@
 #include <nvhe/early_alloc.h>
 #include <nvhe/gfp.h>
 #include <nvhe/memory.h>
+#include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 #include <nvhe/spinlock.h>
 
@@ -24,6 +25,7 @@ struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
 unsigned int hyp_memblock_nr;
 
 static u64 __io_map_base;
+static DEFINE_PER_CPU(void *, hyp_fixmap_base);
 
 static int __pkvm_create_mappings(unsigned long start, unsigned long size,
 				  unsigned long phys, enum kvm_pgtable_prot prot)
@@ -212,6 +214,76 @@ int hyp_map_vectors(void)
 	return 0;
 }
 
+void *hyp_fixmap_map(phys_addr_t phys)
+{
+	void *addr = *this_cpu_ptr(&hyp_fixmap_base);
+	int ret = kvm_pgtable_hyp_map(&pkvm_pgtable, (u64)addr, PAGE_SIZE,
+				      phys, PAGE_HYP);
+	return ret ? NULL : addr;
+}
+
+int hyp_fixmap_unmap(void)
+{
+	void *addr = *this_cpu_ptr(&hyp_fixmap_base);
+	int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, (u64)addr, PAGE_SIZE);
+
+	return (ret != PAGE_SIZE) ? -EINVAL : 0;
+}
+
+static int __pin_pgtable_cb(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			    enum kvm_pgtable_walk_flags flag, void * const arg)
+{
+	if (!kvm_pte_valid(*ptep) || level != KVM_PGTABLE_MAX_LEVELS - 1)
+		return -EINVAL;
+	hyp_page_ref_inc(hyp_virt_to_page(ptep));
+
+	return 0;
+}
+
+static int hyp_pin_pgtable_pages(u64 addr)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= __pin_pgtable_cb,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
+	};
+
+	return kvm_pgtable_walk(&pkvm_pgtable, addr, PAGE_SIZE, &walker);
+}
+
+int hyp_create_pcpu_fixmap(void)
+{
+	unsigned long addr, i;
+	int ret;
+
+	for (i = 0; i < hyp_nr_cpus; i++) {
+		ret = pkvm_alloc_private_va_range(PAGE_SIZE, &addr);
+		if (ret)
+			return ret;
+
+		/*
+		 * Create a dummy mapping, to get the intermediate page-table
+		 * pages allocated, then take a reference on the last level
+		 * page to keep it around at all times.
+		 */
+		ret = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, PAGE_SIZE,
+					  __hyp_pa(__hyp_bss_start), PAGE_HYP);
+		if (ret)
+			return ret;
+
+		ret = hyp_pin_pgtable_pages(addr);
+		if (ret)
+			return ret;
+
+		ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, addr, PAGE_SIZE);
+		if (ret != PAGE_SIZE)
+			return -EINVAL;
+
+		*per_cpu_ptr(&hyp_fixmap_base, i) = (void *)addr;
+	}
+
+	return 0;
+}
+
 int hyp_create_idmap(u32 hyp_va_bits)
 {
 	unsigned long start, end;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index fb0eff15a89f..3f689ffb2693 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -321,6 +321,10 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
+	ret = hyp_create_pcpu_fixmap();
+	if (ret)
+		goto out;
+
 	hyp_shadow_table_init(shadow_table_base);
 out:
 	/*
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 14/24] KVM: arm64: Add pcpu fixmap infrastructure at EL2
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

We will soon need to temporarily map pages into the hypervisor stage-1
in nVHE protected mode. To do this efficiently, let's introduce a
per-cpu fixmap allowing to map a single page without needing to take any
lock or to allocate memory.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |  4 ++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  1 -
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 72 +++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |  4 ++
 5 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 3a0817b5c739..d11d9d68a680 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -59,6 +59,8 @@ enum pkvm_component_id {
 	PKVM_ID_HYP,
 };
 
+extern unsigned long hyp_nr_cpus;
+
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
 int __pkvm_host_unshare_hyp(u64 pfn);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index b2ee6d5df55b..882c5711eda5 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -13,6 +13,10 @@
 extern struct kvm_pgtable pkvm_pgtable;
 extern hyp_spinlock_t pkvm_pgd_lock;
 
+int hyp_create_pcpu_fixmap(void);
+void *hyp_fixmap_map(phys_addr_t phys);
+int hyp_fixmap_unmap(void);
+
 int hyp_create_idmap(u32 hyp_va_bits);
 int hyp_map_vectors(void);
 int hyp_back_vmemmap(phys_addr_t back);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 9baf731736be..a0af23de2640 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -21,7 +21,6 @@
 
 #define KVM_HOST_S2_FLAGS (KVM_PGTABLE_S2_NOFWB | KVM_PGTABLE_S2_IDMAP)
 
-extern unsigned long hyp_nr_cpus;
 struct host_kvm host_kvm;
 
 static struct hyp_pool host_s2_pool;
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index d3a3b47181de..17d689483ec4 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -14,6 +14,7 @@
 #include <nvhe/early_alloc.h>
 #include <nvhe/gfp.h>
 #include <nvhe/memory.h>
+#include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 #include <nvhe/spinlock.h>
 
@@ -24,6 +25,7 @@ struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
 unsigned int hyp_memblock_nr;
 
 static u64 __io_map_base;
+static DEFINE_PER_CPU(void *, hyp_fixmap_base);
 
 static int __pkvm_create_mappings(unsigned long start, unsigned long size,
 				  unsigned long phys, enum kvm_pgtable_prot prot)
@@ -212,6 +214,76 @@ int hyp_map_vectors(void)
 	return 0;
 }
 
+void *hyp_fixmap_map(phys_addr_t phys)
+{
+	void *addr = *this_cpu_ptr(&hyp_fixmap_base);
+	int ret = kvm_pgtable_hyp_map(&pkvm_pgtable, (u64)addr, PAGE_SIZE,
+				      phys, PAGE_HYP);
+	return ret ? NULL : addr;
+}
+
+int hyp_fixmap_unmap(void)
+{
+	void *addr = *this_cpu_ptr(&hyp_fixmap_base);
+	int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, (u64)addr, PAGE_SIZE);
+
+	return (ret != PAGE_SIZE) ? -EINVAL : 0;
+}
+
+static int __pin_pgtable_cb(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			    enum kvm_pgtable_walk_flags flag, void * const arg)
+{
+	if (!kvm_pte_valid(*ptep) || level != KVM_PGTABLE_MAX_LEVELS - 1)
+		return -EINVAL;
+	hyp_page_ref_inc(hyp_virt_to_page(ptep));
+
+	return 0;
+}
+
+static int hyp_pin_pgtable_pages(u64 addr)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= __pin_pgtable_cb,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
+	};
+
+	return kvm_pgtable_walk(&pkvm_pgtable, addr, PAGE_SIZE, &walker);
+}
+
+int hyp_create_pcpu_fixmap(void)
+{
+	unsigned long addr, i;
+	int ret;
+
+	for (i = 0; i < hyp_nr_cpus; i++) {
+		ret = pkvm_alloc_private_va_range(PAGE_SIZE, &addr);
+		if (ret)
+			return ret;
+
+		/*
+		 * Create a dummy mapping, to get the intermediate page-table
+		 * pages allocated, then take a reference on the last level
+		 * page to keep it around at all times.
+		 */
+		ret = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, PAGE_SIZE,
+					  __hyp_pa(__hyp_bss_start), PAGE_HYP);
+		if (ret)
+			return ret;
+
+		ret = hyp_pin_pgtable_pages(addr);
+		if (ret)
+			return ret;
+
+		ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, addr, PAGE_SIZE);
+		if (ret != PAGE_SIZE)
+			return -EINVAL;
+
+		*per_cpu_ptr(&hyp_fixmap_base, i) = (void *)addr;
+	}
+
+	return 0;
+}
+
 int hyp_create_idmap(u32 hyp_va_bits)
 {
 	unsigned long start, end;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index fb0eff15a89f..3f689ffb2693 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -321,6 +321,10 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
+	ret = hyp_create_pcpu_fixmap();
+	if (ret)
+		goto out;
+
 	hyp_shadow_table_init(shadow_table_base);
 out:
 	/*
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 15/24] KVM: arm64: Initialise hyp symbols regardless of pKVM
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

The nVHE object at EL2 maintains its own copies of some host variables
so that, when pKVM is enabled, the host cannot directly modify the
hypervisor state. When running in normal nVHE mode, however, these
variables are still mirrored at EL2 but are not initialised.

Initialise the hypervisor symbols from the host copies regardless of
pKVM, ensuring that any reference to this data at EL2 with normal nVHE
will return an sensibly initialised value.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/arm.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 66e1d37858f1..a2343640c73c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1891,11 +1891,8 @@ static int do_pkvm_init(u32 hyp_va_bits)
 	return ret;
 }
 
-static int kvm_hyp_init_protection(u32 hyp_va_bits)
+static void kvm_hyp_init_symbols(void)
 {
-	void *addr = phys_to_virt(hyp_mem_base);
-	int ret;
-
 	kvm_nvhe_sym(id_aa64pfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
 	kvm_nvhe_sym(id_aa64pfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1);
 	kvm_nvhe_sym(id_aa64isar0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64ISAR0_EL1);
@@ -1904,6 +1901,12 @@ static int kvm_hyp_init_protection(u32 hyp_va_bits)
 	kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
+}
+
+static int kvm_hyp_init_protection(u32 hyp_va_bits)
+{
+	void *addr = phys_to_virt(hyp_mem_base);
+	int ret;
 
 	ret = create_hyp_mappings(addr, addr + hyp_mem_size, PAGE_HYP);
 	if (ret)
@@ -2078,6 +2081,8 @@ static int init_hyp_mode(void)
 		cpu_prepare_hyp_mode(cpu);
 	}
 
+	kvm_hyp_init_symbols();
+
 	if (is_protected_kvm_enabled()) {
 		init_cpu_logical_map();
 
@@ -2085,9 +2090,7 @@ static int init_hyp_mode(void)
 			err = -ENODEV;
 			goto out_err;
 		}
-	}
 
-	if (is_protected_kvm_enabled()) {
 		err = kvm_hyp_init_protection(hyp_va_bits);
 		if (err) {
 			kvm_err("Failed to init hyp memory protection\n");
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 15/24] KVM: arm64: Initialise hyp symbols regardless of pKVM
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

The nVHE object at EL2 maintains its own copies of some host variables
so that, when pKVM is enabled, the host cannot directly modify the
hypervisor state. When running in normal nVHE mode, however, these
variables are still mirrored at EL2 but are not initialised.

Initialise the hypervisor symbols from the host copies regardless of
pKVM, ensuring that any reference to this data at EL2 with normal nVHE
will return an sensibly initialised value.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/arm.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 66e1d37858f1..a2343640c73c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1891,11 +1891,8 @@ static int do_pkvm_init(u32 hyp_va_bits)
 	return ret;
 }
 
-static int kvm_hyp_init_protection(u32 hyp_va_bits)
+static void kvm_hyp_init_symbols(void)
 {
-	void *addr = phys_to_virt(hyp_mem_base);
-	int ret;
-
 	kvm_nvhe_sym(id_aa64pfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
 	kvm_nvhe_sym(id_aa64pfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1);
 	kvm_nvhe_sym(id_aa64isar0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64ISAR0_EL1);
@@ -1904,6 +1901,12 @@ static int kvm_hyp_init_protection(u32 hyp_va_bits)
 	kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
+}
+
+static int kvm_hyp_init_protection(u32 hyp_va_bits)
+{
+	void *addr = phys_to_virt(hyp_mem_base);
+	int ret;
 
 	ret = create_hyp_mappings(addr, addr + hyp_mem_size, PAGE_HYP);
 	if (ret)
@@ -2078,6 +2081,8 @@ static int init_hyp_mode(void)
 		cpu_prepare_hyp_mode(cpu);
 	}
 
+	kvm_hyp_init_symbols();
+
 	if (is_protected_kvm_enabled()) {
 		init_cpu_logical_map();
 
@@ -2085,9 +2090,7 @@ static int init_hyp_mode(void)
 			err = -ENODEV;
 			goto out_err;
 		}
-	}
 
-	if (is_protected_kvm_enabled()) {
 		err = kvm_hyp_init_protection(hyp_va_bits);
 		if (err) {
 			kvm_err("Failed to init hyp memory protection\n");
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 15/24] KVM: arm64: Initialise hyp symbols regardless of pKVM
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

The nVHE object at EL2 maintains its own copies of some host variables
so that, when pKVM is enabled, the host cannot directly modify the
hypervisor state. When running in normal nVHE mode, however, these
variables are still mirrored at EL2 but are not initialised.

Initialise the hypervisor symbols from the host copies regardless of
pKVM, ensuring that any reference to this data at EL2 with normal nVHE
will return an sensibly initialised value.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/arm.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 66e1d37858f1..a2343640c73c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1891,11 +1891,8 @@ static int do_pkvm_init(u32 hyp_va_bits)
 	return ret;
 }
 
-static int kvm_hyp_init_protection(u32 hyp_va_bits)
+static void kvm_hyp_init_symbols(void)
 {
-	void *addr = phys_to_virt(hyp_mem_base);
-	int ret;
-
 	kvm_nvhe_sym(id_aa64pfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
 	kvm_nvhe_sym(id_aa64pfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1);
 	kvm_nvhe_sym(id_aa64isar0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64ISAR0_EL1);
@@ -1904,6 +1901,12 @@ static int kvm_hyp_init_protection(u32 hyp_va_bits)
 	kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
+}
+
+static int kvm_hyp_init_protection(u32 hyp_va_bits)
+{
+	void *addr = phys_to_virt(hyp_mem_base);
+	int ret;
 
 	ret = create_hyp_mappings(addr, addr + hyp_mem_size, PAGE_HYP);
 	if (ret)
@@ -2078,6 +2081,8 @@ static int init_hyp_mode(void)
 		cpu_prepare_hyp_mode(cpu);
 	}
 
+	kvm_hyp_init_symbols();
+
 	if (is_protected_kvm_enabled()) {
 		init_cpu_logical_map();
 
@@ -2085,9 +2090,7 @@ static int init_hyp_mode(void)
 			err = -ENODEV;
 			goto out_err;
 		}
-	}
 
-	if (is_protected_kvm_enabled()) {
 		err = kvm_hyp_init_protection(hyp_va_bits);
 		if (err) {
 			kvm_err("Failed to init hyp memory protection\n");
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 16/24] KVM: arm64: Provide I-cache invalidation by VA at EL2
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

In preparation for handling cache maintenance of guest pages at EL2,
introduce an EL2 copy of icache_inval_pou() which will later be plumbed
into the stage-2 page-table cache maintenance callbacks.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_hyp.h |  1 +
 arch/arm64/kernel/image-vars.h   |  3 ---
 arch/arm64/kvm/arm.c             |  1 +
 arch/arm64/kvm/hyp/nvhe/cache.S  | 11 +++++++++++
 arch/arm64/kvm/hyp/nvhe/pkvm.c   |  3 +++
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index aa7fa2a08f06..fd99cf09972d 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -123,4 +123,5 @@ extern u64 kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
 
+extern unsigned long kvm_nvhe_sym(__icache_flags);
 #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 241c86b67d01..4e3b6d618ac1 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -80,9 +80,6 @@ KVM_NVHE_ALIAS(nvhe_hyp_panic_handler);
 /* Vectors installed by hyp-init on reset HVC. */
 KVM_NVHE_ALIAS(__hyp_stub_vectors);
 
-/* Kernel symbol used by icache_is_vpipt(). */
-KVM_NVHE_ALIAS(__icache_flags);
-
 /* VMID bits set by the KVM VMID allocator */
 KVM_NVHE_ALIAS(kvm_arm_vmid_bits);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a2343640c73c..90e0e7f38bb5 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1901,6 +1901,7 @@ static void kvm_hyp_init_symbols(void)
 	kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
+	kvm_nvhe_sym(__icache_flags) = __icache_flags;
 }
 
 static int kvm_hyp_init_protection(u32 hyp_va_bits)
diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
index 0c367eb5f4e2..85936c17ae40 100644
--- a/arch/arm64/kvm/hyp/nvhe/cache.S
+++ b/arch/arm64/kvm/hyp/nvhe/cache.S
@@ -12,3 +12,14 @@ SYM_FUNC_START(__pi_dcache_clean_inval_poc)
 	ret
 SYM_FUNC_END(__pi_dcache_clean_inval_poc)
 SYM_FUNC_ALIAS(dcache_clean_inval_poc, __pi_dcache_clean_inval_poc)
+
+SYM_FUNC_START(__pi_icache_inval_pou)
+alternative_if ARM64_HAS_CACHE_DIC
+	isb
+	ret
+alternative_else_nop_endif
+
+	invalidate_icache_by_line x0, x1, x2, x3
+	ret
+SYM_FUNC_END(__pi_icache_inval_pou)
+SYM_FUNC_ALIAS(icache_inval_pou, __pi_icache_inval_pou)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 77aeb787670b..114c5565de7d 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -12,6 +12,9 @@
 #include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
+/* Used by icache_is_vpipt(). */
+unsigned long __icache_flags;
+
 /*
  * Set trap register values based on features in ID_AA64PFR0.
  */
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 16/24] KVM: arm64: Provide I-cache invalidation by VA at EL2
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

In preparation for handling cache maintenance of guest pages at EL2,
introduce an EL2 copy of icache_inval_pou() which will later be plumbed
into the stage-2 page-table cache maintenance callbacks.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_hyp.h |  1 +
 arch/arm64/kernel/image-vars.h   |  3 ---
 arch/arm64/kvm/arm.c             |  1 +
 arch/arm64/kvm/hyp/nvhe/cache.S  | 11 +++++++++++
 arch/arm64/kvm/hyp/nvhe/pkvm.c   |  3 +++
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index aa7fa2a08f06..fd99cf09972d 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -123,4 +123,5 @@ extern u64 kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
 
+extern unsigned long kvm_nvhe_sym(__icache_flags);
 #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 241c86b67d01..4e3b6d618ac1 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -80,9 +80,6 @@ KVM_NVHE_ALIAS(nvhe_hyp_panic_handler);
 /* Vectors installed by hyp-init on reset HVC. */
 KVM_NVHE_ALIAS(__hyp_stub_vectors);
 
-/* Kernel symbol used by icache_is_vpipt(). */
-KVM_NVHE_ALIAS(__icache_flags);
-
 /* VMID bits set by the KVM VMID allocator */
 KVM_NVHE_ALIAS(kvm_arm_vmid_bits);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a2343640c73c..90e0e7f38bb5 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1901,6 +1901,7 @@ static void kvm_hyp_init_symbols(void)
 	kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
+	kvm_nvhe_sym(__icache_flags) = __icache_flags;
 }
 
 static int kvm_hyp_init_protection(u32 hyp_va_bits)
diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
index 0c367eb5f4e2..85936c17ae40 100644
--- a/arch/arm64/kvm/hyp/nvhe/cache.S
+++ b/arch/arm64/kvm/hyp/nvhe/cache.S
@@ -12,3 +12,14 @@ SYM_FUNC_START(__pi_dcache_clean_inval_poc)
 	ret
 SYM_FUNC_END(__pi_dcache_clean_inval_poc)
 SYM_FUNC_ALIAS(dcache_clean_inval_poc, __pi_dcache_clean_inval_poc)
+
+SYM_FUNC_START(__pi_icache_inval_pou)
+alternative_if ARM64_HAS_CACHE_DIC
+	isb
+	ret
+alternative_else_nop_endif
+
+	invalidate_icache_by_line x0, x1, x2, x3
+	ret
+SYM_FUNC_END(__pi_icache_inval_pou)
+SYM_FUNC_ALIAS(icache_inval_pou, __pi_icache_inval_pou)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 77aeb787670b..114c5565de7d 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -12,6 +12,9 @@
 #include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
+/* Used by icache_is_vpipt(). */
+unsigned long __icache_flags;
+
 /*
  * Set trap register values based on features in ID_AA64PFR0.
  */
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 16/24] KVM: arm64: Provide I-cache invalidation by VA at EL2
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

In preparation for handling cache maintenance of guest pages at EL2,
introduce an EL2 copy of icache_inval_pou() which will later be plumbed
into the stage-2 page-table cache maintenance callbacks.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_hyp.h |  1 +
 arch/arm64/kernel/image-vars.h   |  3 ---
 arch/arm64/kvm/arm.c             |  1 +
 arch/arm64/kvm/hyp/nvhe/cache.S  | 11 +++++++++++
 arch/arm64/kvm/hyp/nvhe/pkvm.c   |  3 +++
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index aa7fa2a08f06..fd99cf09972d 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -123,4 +123,5 @@ extern u64 kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
 
+extern unsigned long kvm_nvhe_sym(__icache_flags);
 #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 241c86b67d01..4e3b6d618ac1 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -80,9 +80,6 @@ KVM_NVHE_ALIAS(nvhe_hyp_panic_handler);
 /* Vectors installed by hyp-init on reset HVC. */
 KVM_NVHE_ALIAS(__hyp_stub_vectors);
 
-/* Kernel symbol used by icache_is_vpipt(). */
-KVM_NVHE_ALIAS(__icache_flags);
-
 /* VMID bits set by the KVM VMID allocator */
 KVM_NVHE_ALIAS(kvm_arm_vmid_bits);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a2343640c73c..90e0e7f38bb5 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1901,6 +1901,7 @@ static void kvm_hyp_init_symbols(void)
 	kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
+	kvm_nvhe_sym(__icache_flags) = __icache_flags;
 }
 
 static int kvm_hyp_init_protection(u32 hyp_va_bits)
diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
index 0c367eb5f4e2..85936c17ae40 100644
--- a/arch/arm64/kvm/hyp/nvhe/cache.S
+++ b/arch/arm64/kvm/hyp/nvhe/cache.S
@@ -12,3 +12,14 @@ SYM_FUNC_START(__pi_dcache_clean_inval_poc)
 	ret
 SYM_FUNC_END(__pi_dcache_clean_inval_poc)
 SYM_FUNC_ALIAS(dcache_clean_inval_poc, __pi_dcache_clean_inval_poc)
+
+SYM_FUNC_START(__pi_icache_inval_pou)
+alternative_if ARM64_HAS_CACHE_DIC
+	isb
+	ret
+alternative_else_nop_endif
+
+	invalidate_icache_by_line x0, x1, x2, x3
+	ret
+SYM_FUNC_END(__pi_icache_inval_pou)
+SYM_FUNC_ALIAS(icache_inval_pou, __pi_icache_inval_pou)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 77aeb787670b..114c5565de7d 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -12,6 +12,9 @@
 #include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
+/* Used by icache_is_vpipt(). */
+unsigned long __icache_flags;
+
 /*
  * Set trap register values based on features in ID_AA64PFR0.
  */
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 17/24] KVM: arm64: Add generic hyp_memcache helpers
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

The host and hypervisor will need to dynamically exchange memory pages
soon. Indeed, the hypervisor will rely on the host to donate memory
pages it can use to create guest stage-2 page-table and to store
metadata. In order to ease this process, introduce a struct hyp_memcache
which is essentially a linked list of available pages, indexed by
physical addresses.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h             | 57 +++++++++++++++++++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 33 +++++++++++
 arch/arm64/kvm/mmu.c                          | 26 +++++++++
 4 files changed, 118 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e91456f63161..70a2db91665d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -73,6 +73,63 @@ u32 __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
 
+struct kvm_hyp_memcache {
+	phys_addr_t head;
+	unsigned long nr_pages;
+};
+
+static inline void push_hyp_memcache(struct kvm_hyp_memcache *mc,
+				     phys_addr_t *p,
+				     phys_addr_t (*to_pa)(void *virt))
+{
+	*p = mc->head;
+	mc->head = to_pa(p);
+	mc->nr_pages++;
+}
+
+static inline void *pop_hyp_memcache(struct kvm_hyp_memcache *mc,
+				     void *(*to_va)(phys_addr_t phys))
+{
+	phys_addr_t *p = to_va(mc->head);
+
+	if (!mc->nr_pages)
+		return NULL;
+
+	mc->head = *p;
+	mc->nr_pages--;
+
+	return p;
+}
+
+static inline int __topup_hyp_memcache(struct kvm_hyp_memcache *mc,
+				       unsigned long min_pages,
+				       void *(*alloc_fn)(void *arg),
+				       phys_addr_t (*to_pa)(void *virt),
+				       void *arg)
+{
+	while (mc->nr_pages < min_pages) {
+		phys_addr_t *p = alloc_fn(arg);
+
+		if (!p)
+			return -ENOMEM;
+		push_hyp_memcache(mc, p, to_pa);
+	}
+
+	return 0;
+}
+
+static inline void __free_hyp_memcache(struct kvm_hyp_memcache *mc,
+				       void (*free_fn)(void *virt, void *arg),
+				       void *(*to_va)(phys_addr_t phys),
+				       void *arg)
+{
+	while (mc->nr_pages)
+		free_fn(pop_hyp_memcache(mc, to_va), arg);
+}
+
+void free_hyp_memcache(struct kvm_hyp_memcache *mc);
+int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages);
+
 struct kvm_vmid {
 	atomic64_t id;
 };
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index d11d9d68a680..36eea31a1c5f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -77,6 +77,8 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
 void reclaim_guest_pages(struct kvm_shadow_vm *vm);
+int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
+		    struct kvm_hyp_memcache *host_mc);
 
 static __always_inline void __load_host_stage2(void)
 {
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 17d689483ec4..74730376b992 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -308,3 +308,36 @@ int hyp_create_idmap(u32 hyp_va_bits)
 
 	return __pkvm_create_mappings(start, end - start, start, PAGE_HYP_EXEC);
 }
+
+static void *admit_host_page(void *arg)
+{
+	struct kvm_hyp_memcache *host_mc = arg;
+
+	if (!host_mc->nr_pages)
+		return NULL;
+
+	/*
+	 * The host still owns the pages in its memcache, so we need to go
+	 * through a full host-to-hyp donation cycle to change it. Fortunately,
+	 * __pkvm_host_donate_hyp() takes care of races for us, so if it
+	 * succeeds we're good to go.
+	 */
+	if (__pkvm_host_donate_hyp(hyp_phys_to_pfn(host_mc->head), 1))
+		return NULL;
+
+	return pop_hyp_memcache(host_mc, hyp_phys_to_virt);
+}
+
+/* Refill our local memcache by poping pages from the one provided by the host. */
+int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
+		    struct kvm_hyp_memcache *host_mc)
+{
+	struct kvm_hyp_memcache tmp = *host_mc;
+	int ret;
+
+	ret =  __topup_hyp_memcache(mc, min_pages, admit_host_page,
+				    hyp_virt_to_phys, &tmp);
+	*host_mc = tmp;
+
+	return ret;
+}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f5651a05b6a8..5ff0eaffc60f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -772,6 +772,32 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 	}
 }
 
+static void hyp_mc_free_fn(void *addr, void *unused)
+{
+	free_page((unsigned long)addr);
+}
+
+static void *hyp_mc_alloc_fn(void *unused)
+{
+	return (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
+}
+
+void free_hyp_memcache(struct kvm_hyp_memcache *mc)
+{
+	if (is_protected_kvm_enabled())
+		__free_hyp_memcache(mc, hyp_mc_free_fn,
+				    kvm_host_va, NULL);
+}
+
+int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages)
+{
+	if (!is_protected_kvm_enabled())
+		return 0;
+
+	return __topup_hyp_memcache(mc, min_pages, hyp_mc_alloc_fn,
+				    kvm_host_pa, NULL);
+}
+
 /**
  * kvm_phys_addr_ioremap - map a device range to guest IPA
  *
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 17/24] KVM: arm64: Add generic hyp_memcache helpers
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

The host and hypervisor will need to dynamically exchange memory pages
soon. Indeed, the hypervisor will rely on the host to donate memory
pages it can use to create guest stage-2 page-table and to store
metadata. In order to ease this process, introduce a struct hyp_memcache
which is essentially a linked list of available pages, indexed by
physical addresses.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h             | 57 +++++++++++++++++++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 33 +++++++++++
 arch/arm64/kvm/mmu.c                          | 26 +++++++++
 4 files changed, 118 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e91456f63161..70a2db91665d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -73,6 +73,63 @@ u32 __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
 
+struct kvm_hyp_memcache {
+	phys_addr_t head;
+	unsigned long nr_pages;
+};
+
+static inline void push_hyp_memcache(struct kvm_hyp_memcache *mc,
+				     phys_addr_t *p,
+				     phys_addr_t (*to_pa)(void *virt))
+{
+	*p = mc->head;
+	mc->head = to_pa(p);
+	mc->nr_pages++;
+}
+
+static inline void *pop_hyp_memcache(struct kvm_hyp_memcache *mc,
+				     void *(*to_va)(phys_addr_t phys))
+{
+	phys_addr_t *p = to_va(mc->head);
+
+	if (!mc->nr_pages)
+		return NULL;
+
+	mc->head = *p;
+	mc->nr_pages--;
+
+	return p;
+}
+
+static inline int __topup_hyp_memcache(struct kvm_hyp_memcache *mc,
+				       unsigned long min_pages,
+				       void *(*alloc_fn)(void *arg),
+				       phys_addr_t (*to_pa)(void *virt),
+				       void *arg)
+{
+	while (mc->nr_pages < min_pages) {
+		phys_addr_t *p = alloc_fn(arg);
+
+		if (!p)
+			return -ENOMEM;
+		push_hyp_memcache(mc, p, to_pa);
+	}
+
+	return 0;
+}
+
+static inline void __free_hyp_memcache(struct kvm_hyp_memcache *mc,
+				       void (*free_fn)(void *virt, void *arg),
+				       void *(*to_va)(phys_addr_t phys),
+				       void *arg)
+{
+	while (mc->nr_pages)
+		free_fn(pop_hyp_memcache(mc, to_va), arg);
+}
+
+void free_hyp_memcache(struct kvm_hyp_memcache *mc);
+int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages);
+
 struct kvm_vmid {
 	atomic64_t id;
 };
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index d11d9d68a680..36eea31a1c5f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -77,6 +77,8 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
 void reclaim_guest_pages(struct kvm_shadow_vm *vm);
+int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
+		    struct kvm_hyp_memcache *host_mc);
 
 static __always_inline void __load_host_stage2(void)
 {
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 17d689483ec4..74730376b992 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -308,3 +308,36 @@ int hyp_create_idmap(u32 hyp_va_bits)
 
 	return __pkvm_create_mappings(start, end - start, start, PAGE_HYP_EXEC);
 }
+
+static void *admit_host_page(void *arg)
+{
+	struct kvm_hyp_memcache *host_mc = arg;
+
+	if (!host_mc->nr_pages)
+		return NULL;
+
+	/*
+	 * The host still owns the pages in its memcache, so we need to go
+	 * through a full host-to-hyp donation cycle to change it. Fortunately,
+	 * __pkvm_host_donate_hyp() takes care of races for us, so if it
+	 * succeeds we're good to go.
+	 */
+	if (__pkvm_host_donate_hyp(hyp_phys_to_pfn(host_mc->head), 1))
+		return NULL;
+
+	return pop_hyp_memcache(host_mc, hyp_phys_to_virt);
+}
+
+/* Refill our local memcache by poping pages from the one provided by the host. */
+int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
+		    struct kvm_hyp_memcache *host_mc)
+{
+	struct kvm_hyp_memcache tmp = *host_mc;
+	int ret;
+
+	ret =  __topup_hyp_memcache(mc, min_pages, admit_host_page,
+				    hyp_virt_to_phys, &tmp);
+	*host_mc = tmp;
+
+	return ret;
+}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f5651a05b6a8..5ff0eaffc60f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -772,6 +772,32 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 	}
 }
 
+static void hyp_mc_free_fn(void *addr, void *unused)
+{
+	free_page((unsigned long)addr);
+}
+
+static void *hyp_mc_alloc_fn(void *unused)
+{
+	return (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
+}
+
+void free_hyp_memcache(struct kvm_hyp_memcache *mc)
+{
+	if (is_protected_kvm_enabled())
+		__free_hyp_memcache(mc, hyp_mc_free_fn,
+				    kvm_host_va, NULL);
+}
+
+int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages)
+{
+	if (!is_protected_kvm_enabled())
+		return 0;
+
+	return __topup_hyp_memcache(mc, min_pages, hyp_mc_alloc_fn,
+				    kvm_host_pa, NULL);
+}
+
 /**
  * kvm_phys_addr_ioremap - map a device range to guest IPA
  *
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 17/24] KVM: arm64: Add generic hyp_memcache helpers
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

The host and hypervisor will need to dynamically exchange memory pages
soon. Indeed, the hypervisor will rely on the host to donate memory
pages it can use to create guest stage-2 page-table and to store
metadata. In order to ease this process, introduce a struct hyp_memcache
which is essentially a linked list of available pages, indexed by
physical addresses.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h             | 57 +++++++++++++++++++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 33 +++++++++++
 arch/arm64/kvm/mmu.c                          | 26 +++++++++
 4 files changed, 118 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e91456f63161..70a2db91665d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -73,6 +73,63 @@ u32 __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
 
+struct kvm_hyp_memcache {
+	phys_addr_t head;
+	unsigned long nr_pages;
+};
+
+static inline void push_hyp_memcache(struct kvm_hyp_memcache *mc,
+				     phys_addr_t *p,
+				     phys_addr_t (*to_pa)(void *virt))
+{
+	*p = mc->head;
+	mc->head = to_pa(p);
+	mc->nr_pages++;
+}
+
+static inline void *pop_hyp_memcache(struct kvm_hyp_memcache *mc,
+				     void *(*to_va)(phys_addr_t phys))
+{
+	phys_addr_t *p = to_va(mc->head);
+
+	if (!mc->nr_pages)
+		return NULL;
+
+	mc->head = *p;
+	mc->nr_pages--;
+
+	return p;
+}
+
+static inline int __topup_hyp_memcache(struct kvm_hyp_memcache *mc,
+				       unsigned long min_pages,
+				       void *(*alloc_fn)(void *arg),
+				       phys_addr_t (*to_pa)(void *virt),
+				       void *arg)
+{
+	while (mc->nr_pages < min_pages) {
+		phys_addr_t *p = alloc_fn(arg);
+
+		if (!p)
+			return -ENOMEM;
+		push_hyp_memcache(mc, p, to_pa);
+	}
+
+	return 0;
+}
+
+static inline void __free_hyp_memcache(struct kvm_hyp_memcache *mc,
+				       void (*free_fn)(void *virt, void *arg),
+				       void *(*to_va)(phys_addr_t phys),
+				       void *arg)
+{
+	while (mc->nr_pages)
+		free_fn(pop_hyp_memcache(mc, to_va), arg);
+}
+
+void free_hyp_memcache(struct kvm_hyp_memcache *mc);
+int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages);
+
 struct kvm_vmid {
 	atomic64_t id;
 };
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index d11d9d68a680..36eea31a1c5f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -77,6 +77,8 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
 void reclaim_guest_pages(struct kvm_shadow_vm *vm);
+int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
+		    struct kvm_hyp_memcache *host_mc);
 
 static __always_inline void __load_host_stage2(void)
 {
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 17d689483ec4..74730376b992 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -308,3 +308,36 @@ int hyp_create_idmap(u32 hyp_va_bits)
 
 	return __pkvm_create_mappings(start, end - start, start, PAGE_HYP_EXEC);
 }
+
+static void *admit_host_page(void *arg)
+{
+	struct kvm_hyp_memcache *host_mc = arg;
+
+	if (!host_mc->nr_pages)
+		return NULL;
+
+	/*
+	 * The host still owns the pages in its memcache, so we need to go
+	 * through a full host-to-hyp donation cycle to change it. Fortunately,
+	 * __pkvm_host_donate_hyp() takes care of races for us, so if it
+	 * succeeds we're good to go.
+	 */
+	if (__pkvm_host_donate_hyp(hyp_phys_to_pfn(host_mc->head), 1))
+		return NULL;
+
+	return pop_hyp_memcache(host_mc, hyp_phys_to_virt);
+}
+
+/* Refill our local memcache by poping pages from the one provided by the host. */
+int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
+		    struct kvm_hyp_memcache *host_mc)
+{
+	struct kvm_hyp_memcache tmp = *host_mc;
+	int ret;
+
+	ret =  __topup_hyp_memcache(mc, min_pages, admit_host_page,
+				    hyp_virt_to_phys, &tmp);
+	*host_mc = tmp;
+
+	return ret;
+}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f5651a05b6a8..5ff0eaffc60f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -772,6 +772,32 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 	}
 }
 
+static void hyp_mc_free_fn(void *addr, void *unused)
+{
+	free_page((unsigned long)addr);
+}
+
+static void *hyp_mc_alloc_fn(void *unused)
+{
+	return (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
+}
+
+void free_hyp_memcache(struct kvm_hyp_memcache *mc)
+{
+	if (is_protected_kvm_enabled())
+		__free_hyp_memcache(mc, hyp_mc_free_fn,
+				    kvm_host_va, NULL);
+}
+
+int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages)
+{
+	if (!is_protected_kvm_enabled())
+		return 0;
+
+	return __topup_hyp_memcache(mc, min_pages, hyp_mc_alloc_fn,
+				    kvm_host_pa, NULL);
+}
+
 /**
  * kvm_phys_addr_ioremap - map a device range to guest IPA
  *
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 18/24] KVM: arm64: Instantiate guest stage-2 page-tables at EL2
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

Extend the shadow initialisation at EL2 so that we instantiate a memory
pool and a full 'struct kvm_s2_mmu' structure for each VM, with a
stage-2 page-table entirely independent from the one managed by the host
at EL1.

For now, the new page-table is unused as there is no way for the host
to map anything into it. Yet.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |   6 ++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c  | 127 ++++++++++++++++++++++++-
 2 files changed, 130 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 1d0a33f70879..c0e32a750b6e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -9,6 +9,9 @@
 
 #include <asm/kvm_pkvm.h>
 
+#include <nvhe/gfp.h>
+#include <nvhe/spinlock.h>
+
 /*
  * Holds the relevant data for maintaining the vcpu state completely at hyp.
  */
@@ -37,6 +40,9 @@ struct kvm_shadow_vm {
 	size_t shadow_area_size;
 
 	struct kvm_pgtable pgt;
+	struct kvm_pgtable_mm_ops mm_ops;
+	struct hyp_pool pool;
+	hyp_spinlock_t lock;
 
 	/* Array of the shadow state per vcpu. */
 	struct kvm_shadow_vcpu_state shadow_vcpu_states[0];
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index a0af23de2640..5b22bba77e57 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -25,6 +25,21 @@ struct host_kvm host_kvm;
 
 static struct hyp_pool host_s2_pool;
 
+static DEFINE_PER_CPU(struct kvm_shadow_vm *, __current_vm);
+#define current_vm (*this_cpu_ptr(&__current_vm))
+
+static void guest_lock_component(struct kvm_shadow_vm *vm)
+{
+	hyp_spin_lock(&vm->lock);
+	current_vm = vm;
+}
+
+static void guest_unlock_component(struct kvm_shadow_vm *vm)
+{
+	current_vm = NULL;
+	hyp_spin_unlock(&vm->lock);
+}
+
 static void host_lock_component(void)
 {
 	hyp_spin_lock(&host_kvm.lock);
@@ -140,18 +155,124 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
 	return 0;
 }
 
+static bool guest_stage2_force_pte_cb(u64 addr, u64 end,
+				      enum kvm_pgtable_prot prot)
+{
+	return true;
+}
+
+static void *guest_s2_zalloc_pages_exact(size_t size)
+{
+	void *addr = hyp_alloc_pages(&current_vm->pool, get_order(size));
+
+	WARN_ON(size != (PAGE_SIZE << get_order(size)));
+	hyp_split_page(hyp_virt_to_page(addr));
+
+	return addr;
+}
+
+static void guest_s2_free_pages_exact(void *addr, unsigned long size)
+{
+	u8 order = get_order(size);
+	unsigned int i;
+
+	for (i = 0; i < (1 << order); i++)
+		hyp_put_page(&current_vm->pool, addr + (i * PAGE_SIZE));
+}
+
+static void *guest_s2_zalloc_page(void *mc)
+{
+	struct hyp_page *p;
+	void *addr;
+
+	addr = hyp_alloc_pages(&current_vm->pool, 0);
+	if (addr)
+		return addr;
+
+	addr = pop_hyp_memcache(mc, hyp_phys_to_virt);
+	if (!addr)
+		return addr;
+
+	memset(addr, 0, PAGE_SIZE);
+	p = hyp_virt_to_page(addr);
+	memset(p, 0, sizeof(*p));
+	p->refcount = 1;
+
+	return addr;
+}
+
+static void guest_s2_get_page(void *addr)
+{
+	hyp_get_page(&current_vm->pool, addr);
+}
+
+static void guest_s2_put_page(void *addr)
+{
+	hyp_put_page(&current_vm->pool, addr);
+}
+
+static void clean_dcache_guest_page(void *va, size_t size)
+{
+	__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
+	hyp_fixmap_unmap();
+}
+
+static void invalidate_icache_guest_page(void *va, size_t size)
+{
+	__invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
+	hyp_fixmap_unmap();
+}
+
 int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd)
 {
-	vm->pgt.pgd = pgd;
+	struct kvm_s2_mmu *mmu = &vm->kvm.arch.mmu;
+	unsigned long nr_pages;
+	int ret;
+
+	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
+	ret = hyp_pool_init(&vm->pool, hyp_virt_to_pfn(pgd), nr_pages, 0);
+	if (ret)
+		return ret;
+
+	hyp_spin_lock_init(&vm->lock);
+	vm->mm_ops = (struct kvm_pgtable_mm_ops) {
+		.zalloc_pages_exact	= guest_s2_zalloc_pages_exact,
+		.free_pages_exact	= guest_s2_free_pages_exact,
+		.zalloc_page		= guest_s2_zalloc_page,
+		.phys_to_virt		= hyp_phys_to_virt,
+		.virt_to_phys		= hyp_virt_to_phys,
+		.page_count		= hyp_page_count,
+		.get_page		= guest_s2_get_page,
+		.put_page		= guest_s2_put_page,
+		.dcache_clean_inval_poc	= clean_dcache_guest_page,
+		.icache_inval_pou	= invalidate_icache_guest_page,
+	};
+
+	guest_lock_component(vm);
+	ret = __kvm_pgtable_stage2_init(mmu->pgt, mmu, &vm->mm_ops, 0,
+					guest_stage2_force_pte_cb);
+	guest_unlock_component(vm);
+	if (ret)
+		return ret;
+
+	vm->kvm.arch.mmu.pgd_phys = __hyp_pa(vm->pgt.pgd);
+
 	return 0;
 }
 
 void reclaim_guest_pages(struct kvm_shadow_vm *vm)
 {
-	unsigned long nr_pages;
+	unsigned long nr_pages, pfn;
 
 	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
-	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
+	pfn = hyp_virt_to_pfn(vm->pgt.pgd);
+
+	guest_lock_component(vm);
+	kvm_pgtable_stage2_destroy(&vm->pgt);
+	vm->kvm.arch.mmu.pgd_phys = 0ULL;
+	guest_unlock_component(vm);
+
+	WARN_ON(__pkvm_hyp_donate_host(pfn, nr_pages));
 }
 
 int __pkvm_prot_finalize(void)
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 18/24] KVM: arm64: Instantiate guest stage-2 page-tables at EL2
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Extend the shadow initialisation at EL2 so that we instantiate a memory
pool and a full 'struct kvm_s2_mmu' structure for each VM, with a
stage-2 page-table entirely independent from the one managed by the host
at EL1.

For now, the new page-table is unused as there is no way for the host
to map anything into it. Yet.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |   6 ++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c  | 127 ++++++++++++++++++++++++-
 2 files changed, 130 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 1d0a33f70879..c0e32a750b6e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -9,6 +9,9 @@
 
 #include <asm/kvm_pkvm.h>
 
+#include <nvhe/gfp.h>
+#include <nvhe/spinlock.h>
+
 /*
  * Holds the relevant data for maintaining the vcpu state completely at hyp.
  */
@@ -37,6 +40,9 @@ struct kvm_shadow_vm {
 	size_t shadow_area_size;
 
 	struct kvm_pgtable pgt;
+	struct kvm_pgtable_mm_ops mm_ops;
+	struct hyp_pool pool;
+	hyp_spinlock_t lock;
 
 	/* Array of the shadow state per vcpu. */
 	struct kvm_shadow_vcpu_state shadow_vcpu_states[0];
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index a0af23de2640..5b22bba77e57 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -25,6 +25,21 @@ struct host_kvm host_kvm;
 
 static struct hyp_pool host_s2_pool;
 
+static DEFINE_PER_CPU(struct kvm_shadow_vm *, __current_vm);
+#define current_vm (*this_cpu_ptr(&__current_vm))
+
+static void guest_lock_component(struct kvm_shadow_vm *vm)
+{
+	hyp_spin_lock(&vm->lock);
+	current_vm = vm;
+}
+
+static void guest_unlock_component(struct kvm_shadow_vm *vm)
+{
+	current_vm = NULL;
+	hyp_spin_unlock(&vm->lock);
+}
+
 static void host_lock_component(void)
 {
 	hyp_spin_lock(&host_kvm.lock);
@@ -140,18 +155,124 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
 	return 0;
 }
 
+static bool guest_stage2_force_pte_cb(u64 addr, u64 end,
+				      enum kvm_pgtable_prot prot)
+{
+	return true;
+}
+
+static void *guest_s2_zalloc_pages_exact(size_t size)
+{
+	void *addr = hyp_alloc_pages(&current_vm->pool, get_order(size));
+
+	WARN_ON(size != (PAGE_SIZE << get_order(size)));
+	hyp_split_page(hyp_virt_to_page(addr));
+
+	return addr;
+}
+
+static void guest_s2_free_pages_exact(void *addr, unsigned long size)
+{
+	u8 order = get_order(size);
+	unsigned int i;
+
+	for (i = 0; i < (1 << order); i++)
+		hyp_put_page(&current_vm->pool, addr + (i * PAGE_SIZE));
+}
+
+static void *guest_s2_zalloc_page(void *mc)
+{
+	struct hyp_page *p;
+	void *addr;
+
+	addr = hyp_alloc_pages(&current_vm->pool, 0);
+	if (addr)
+		return addr;
+
+	addr = pop_hyp_memcache(mc, hyp_phys_to_virt);
+	if (!addr)
+		return addr;
+
+	memset(addr, 0, PAGE_SIZE);
+	p = hyp_virt_to_page(addr);
+	memset(p, 0, sizeof(*p));
+	p->refcount = 1;
+
+	return addr;
+}
+
+static void guest_s2_get_page(void *addr)
+{
+	hyp_get_page(&current_vm->pool, addr);
+}
+
+static void guest_s2_put_page(void *addr)
+{
+	hyp_put_page(&current_vm->pool, addr);
+}
+
+static void clean_dcache_guest_page(void *va, size_t size)
+{
+	__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
+	hyp_fixmap_unmap();
+}
+
+static void invalidate_icache_guest_page(void *va, size_t size)
+{
+	__invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
+	hyp_fixmap_unmap();
+}
+
 int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd)
 {
-	vm->pgt.pgd = pgd;
+	struct kvm_s2_mmu *mmu = &vm->kvm.arch.mmu;
+	unsigned long nr_pages;
+	int ret;
+
+	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
+	ret = hyp_pool_init(&vm->pool, hyp_virt_to_pfn(pgd), nr_pages, 0);
+	if (ret)
+		return ret;
+
+	hyp_spin_lock_init(&vm->lock);
+	vm->mm_ops = (struct kvm_pgtable_mm_ops) {
+		.zalloc_pages_exact	= guest_s2_zalloc_pages_exact,
+		.free_pages_exact	= guest_s2_free_pages_exact,
+		.zalloc_page		= guest_s2_zalloc_page,
+		.phys_to_virt		= hyp_phys_to_virt,
+		.virt_to_phys		= hyp_virt_to_phys,
+		.page_count		= hyp_page_count,
+		.get_page		= guest_s2_get_page,
+		.put_page		= guest_s2_put_page,
+		.dcache_clean_inval_poc	= clean_dcache_guest_page,
+		.icache_inval_pou	= invalidate_icache_guest_page,
+	};
+
+	guest_lock_component(vm);
+	ret = __kvm_pgtable_stage2_init(mmu->pgt, mmu, &vm->mm_ops, 0,
+					guest_stage2_force_pte_cb);
+	guest_unlock_component(vm);
+	if (ret)
+		return ret;
+
+	vm->kvm.arch.mmu.pgd_phys = __hyp_pa(vm->pgt.pgd);
+
 	return 0;
 }
 
 void reclaim_guest_pages(struct kvm_shadow_vm *vm)
 {
-	unsigned long nr_pages;
+	unsigned long nr_pages, pfn;
 
 	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
-	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
+	pfn = hyp_virt_to_pfn(vm->pgt.pgd);
+
+	guest_lock_component(vm);
+	kvm_pgtable_stage2_destroy(&vm->pgt);
+	vm->kvm.arch.mmu.pgd_phys = 0ULL;
+	guest_unlock_component(vm);
+
+	WARN_ON(__pkvm_hyp_donate_host(pfn, nr_pages));
 }
 
 int __pkvm_prot_finalize(void)
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 18/24] KVM: arm64: Instantiate guest stage-2 page-tables at EL2
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Extend the shadow initialisation at EL2 so that we instantiate a memory
pool and a full 'struct kvm_s2_mmu' structure for each VM, with a
stage-2 page-table entirely independent from the one managed by the host
at EL1.

For now, the new page-table is unused as there is no way for the host
to map anything into it. Yet.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |   6 ++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c  | 127 ++++++++++++++++++++++++-
 2 files changed, 130 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 1d0a33f70879..c0e32a750b6e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -9,6 +9,9 @@
 
 #include <asm/kvm_pkvm.h>
 
+#include <nvhe/gfp.h>
+#include <nvhe/spinlock.h>
+
 /*
  * Holds the relevant data for maintaining the vcpu state completely at hyp.
  */
@@ -37,6 +40,9 @@ struct kvm_shadow_vm {
 	size_t shadow_area_size;
 
 	struct kvm_pgtable pgt;
+	struct kvm_pgtable_mm_ops mm_ops;
+	struct hyp_pool pool;
+	hyp_spinlock_t lock;
 
 	/* Array of the shadow state per vcpu. */
 	struct kvm_shadow_vcpu_state shadow_vcpu_states[0];
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index a0af23de2640..5b22bba77e57 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -25,6 +25,21 @@ struct host_kvm host_kvm;
 
 static struct hyp_pool host_s2_pool;
 
+static DEFINE_PER_CPU(struct kvm_shadow_vm *, __current_vm);
+#define current_vm (*this_cpu_ptr(&__current_vm))
+
+static void guest_lock_component(struct kvm_shadow_vm *vm)
+{
+	hyp_spin_lock(&vm->lock);
+	current_vm = vm;
+}
+
+static void guest_unlock_component(struct kvm_shadow_vm *vm)
+{
+	current_vm = NULL;
+	hyp_spin_unlock(&vm->lock);
+}
+
 static void host_lock_component(void)
 {
 	hyp_spin_lock(&host_kvm.lock);
@@ -140,18 +155,124 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
 	return 0;
 }
 
+static bool guest_stage2_force_pte_cb(u64 addr, u64 end,
+				      enum kvm_pgtable_prot prot)
+{
+	return true;
+}
+
+static void *guest_s2_zalloc_pages_exact(size_t size)
+{
+	void *addr = hyp_alloc_pages(&current_vm->pool, get_order(size));
+
+	WARN_ON(size != (PAGE_SIZE << get_order(size)));
+	hyp_split_page(hyp_virt_to_page(addr));
+
+	return addr;
+}
+
+static void guest_s2_free_pages_exact(void *addr, unsigned long size)
+{
+	u8 order = get_order(size);
+	unsigned int i;
+
+	for (i = 0; i < (1 << order); i++)
+		hyp_put_page(&current_vm->pool, addr + (i * PAGE_SIZE));
+}
+
+static void *guest_s2_zalloc_page(void *mc)
+{
+	struct hyp_page *p;
+	void *addr;
+
+	addr = hyp_alloc_pages(&current_vm->pool, 0);
+	if (addr)
+		return addr;
+
+	addr = pop_hyp_memcache(mc, hyp_phys_to_virt);
+	if (!addr)
+		return addr;
+
+	memset(addr, 0, PAGE_SIZE);
+	p = hyp_virt_to_page(addr);
+	memset(p, 0, sizeof(*p));
+	p->refcount = 1;
+
+	return addr;
+}
+
+static void guest_s2_get_page(void *addr)
+{
+	hyp_get_page(&current_vm->pool, addr);
+}
+
+static void guest_s2_put_page(void *addr)
+{
+	hyp_put_page(&current_vm->pool, addr);
+}
+
+static void clean_dcache_guest_page(void *va, size_t size)
+{
+	__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
+	hyp_fixmap_unmap();
+}
+
+static void invalidate_icache_guest_page(void *va, size_t size)
+{
+	__invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
+	hyp_fixmap_unmap();
+}
+
 int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd)
 {
-	vm->pgt.pgd = pgd;
+	struct kvm_s2_mmu *mmu = &vm->kvm.arch.mmu;
+	unsigned long nr_pages;
+	int ret;
+
+	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
+	ret = hyp_pool_init(&vm->pool, hyp_virt_to_pfn(pgd), nr_pages, 0);
+	if (ret)
+		return ret;
+
+	hyp_spin_lock_init(&vm->lock);
+	vm->mm_ops = (struct kvm_pgtable_mm_ops) {
+		.zalloc_pages_exact	= guest_s2_zalloc_pages_exact,
+		.free_pages_exact	= guest_s2_free_pages_exact,
+		.zalloc_page		= guest_s2_zalloc_page,
+		.phys_to_virt		= hyp_phys_to_virt,
+		.virt_to_phys		= hyp_virt_to_phys,
+		.page_count		= hyp_page_count,
+		.get_page		= guest_s2_get_page,
+		.put_page		= guest_s2_put_page,
+		.dcache_clean_inval_poc	= clean_dcache_guest_page,
+		.icache_inval_pou	= invalidate_icache_guest_page,
+	};
+
+	guest_lock_component(vm);
+	ret = __kvm_pgtable_stage2_init(mmu->pgt, mmu, &vm->mm_ops, 0,
+					guest_stage2_force_pte_cb);
+	guest_unlock_component(vm);
+	if (ret)
+		return ret;
+
+	vm->kvm.arch.mmu.pgd_phys = __hyp_pa(vm->pgt.pgd);
+
 	return 0;
 }
 
 void reclaim_guest_pages(struct kvm_shadow_vm *vm)
 {
-	unsigned long nr_pages;
+	unsigned long nr_pages, pfn;
 
 	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
-	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
+	pfn = hyp_virt_to_pfn(vm->pgt.pgd);
+
+	guest_lock_component(vm);
+	kvm_pgtable_stage2_destroy(&vm->pgt);
+	vm->kvm.arch.mmu.pgd_phys = 0ULL;
+	guest_unlock_component(vm);
+
+	WARN_ON(__pkvm_hyp_donate_host(pfn, nr_pages));
 }
 
 int __pkvm_prot_finalize(void)
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 19/24] KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

Rather than relying on the host to free the shadow VM pages explicitly
on teardown, introduce a dedicated teardown memcache which allows the
host to reclaim guest memory resources without having to keep track of
all of the allocations made by EL2.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h             |  6 +-----
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 17 +++++++++++------
 arch/arm64/kvm/hyp/nvhe/pkvm.c                |  8 +++++++-
 arch/arm64/kvm/pkvm.c                         | 12 +-----------
 5 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 70a2db91665d..09481268c224 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -175,11 +175,7 @@ struct kvm_smccc_features {
 struct kvm_protected_vm {
 	unsigned int shadow_handle;
 	struct mutex shadow_lock;
-
-	struct {
-		void *pgd;
-		void *shadow;
-	} hyp_donations;
+	struct kvm_hyp_memcache teardown_mc;
 };
 
 struct kvm_arch {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 36eea31a1c5f..663019992b67 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -76,7 +76,7 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
-void reclaim_guest_pages(struct kvm_shadow_vm *vm);
+void reclaim_guest_pages(struct kvm_shadow_vm *vm, struct kvm_hyp_memcache *mc);
 int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
 		    struct kvm_hyp_memcache *host_mc);
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 5b22bba77e57..bcfdba1881c1 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -260,19 +260,24 @@ int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd)
 	return 0;
 }
 
-void reclaim_guest_pages(struct kvm_shadow_vm *vm)
+void reclaim_guest_pages(struct kvm_shadow_vm *vm, struct kvm_hyp_memcache *mc)
 {
-	unsigned long nr_pages, pfn;
-
-	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
-	pfn = hyp_virt_to_pfn(vm->pgt.pgd);
+	void *addr;
 
+	/* Dump all pgtable pages in the hyp_pool */
 	guest_lock_component(vm);
 	kvm_pgtable_stage2_destroy(&vm->pgt);
 	vm->kvm.arch.mmu.pgd_phys = 0ULL;
 	guest_unlock_component(vm);
 
-	WARN_ON(__pkvm_hyp_donate_host(pfn, nr_pages));
+	/* Drain the hyp_pool into the memcache */
+	addr = hyp_alloc_pages(&vm->pool, 0);
+	while (addr) {
+		memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
+		push_hyp_memcache(mc, addr, hyp_virt_to_phys);
+		WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
+		addr = hyp_alloc_pages(&vm->pool, 0);
+	}
 }
 
 int __pkvm_prot_finalize(void)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 114c5565de7d..a4a518b2a43b 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -546,8 +546,10 @@ int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
 
 int __pkvm_teardown_shadow(unsigned int shadow_handle)
 {
+	struct kvm_hyp_memcache *mc;
 	struct kvm_shadow_vm *vm;
 	size_t shadow_size;
+	void *addr;
 	int err;
 
 	/* Lookup then remove entry from the shadow table. */
@@ -569,7 +571,8 @@ int __pkvm_teardown_shadow(unsigned int shadow_handle)
 	hyp_spin_unlock(&shadow_lock);
 
 	/* Reclaim guest pages (including page-table pages) */
-	reclaim_guest_pages(vm);
+	mc = &vm->host_kvm->arch.pkvm.teardown_mc;
+	reclaim_guest_pages(vm, mc);
 	unpin_host_vcpus(vm->shadow_vcpu_states, vm->kvm.created_vcpus);
 
 	/* Push the metadata pages to the teardown memcache */
@@ -577,6 +580,9 @@ int __pkvm_teardown_shadow(unsigned int shadow_handle)
 	hyp_unpin_shared_mem(vm->host_kvm, vm->host_kvm + 1);
 
 	memset(vm, 0, shadow_size);
+	for (addr = vm; addr < (void *)vm + shadow_size; addr += PAGE_SIZE)
+		push_hyp_memcache(mc, addr, hyp_virt_to_phys);
+
 	unmap_donated_memory_noclear(vm, shadow_size);
 	return 0;
 
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index b4466b31d7c8..b174d6dfde36 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -160,8 +160,6 @@ static int __kvm_shadow_create(struct kvm *kvm)
 
 	/* Store the shadow handle given by hyp for future call reference. */
 	kvm->arch.pkvm.shadow_handle = shadow_handle;
-	kvm->arch.pkvm.hyp_donations.pgd = pgd;
-	kvm->arch.pkvm.hyp_donations.shadow = shadow_addr;
 	return 0;
 
 free_shadow:
@@ -185,20 +183,12 @@ int kvm_shadow_create(struct kvm *kvm)
 
 void kvm_shadow_destroy(struct kvm *kvm)
 {
-	size_t pgd_sz, shadow_sz;
-
 	if (kvm->arch.pkvm.shadow_handle)
 		WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_shadow,
 					  kvm->arch.pkvm.shadow_handle));
 
 	kvm->arch.pkvm.shadow_handle = 0;
-
-	shadow_sz = PAGE_ALIGN(KVM_SHADOW_VM_SIZE +
-			       KVM_SHADOW_VCPU_STATE_SIZE * kvm->created_vcpus);
-	pgd_sz = kvm_pgtable_stage2_pgd_size(kvm->arch.vtcr);
-
-	free_pages_exact(kvm->arch.pkvm.hyp_donations.shadow, shadow_sz);
-	free_pages_exact(kvm->arch.pkvm.hyp_donations.pgd, pgd_sz);
+	free_hyp_memcache(&kvm->arch.pkvm.teardown_mc);
 }
 
 int kvm_init_pvm(struct kvm *kvm)
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 19/24] KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Rather than relying on the host to free the shadow VM pages explicitly
on teardown, introduce a dedicated teardown memcache which allows the
host to reclaim guest memory resources without having to keep track of
all of the allocations made by EL2.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h             |  6 +-----
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 17 +++++++++++------
 arch/arm64/kvm/hyp/nvhe/pkvm.c                |  8 +++++++-
 arch/arm64/kvm/pkvm.c                         | 12 +-----------
 5 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 70a2db91665d..09481268c224 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -175,11 +175,7 @@ struct kvm_smccc_features {
 struct kvm_protected_vm {
 	unsigned int shadow_handle;
 	struct mutex shadow_lock;
-
-	struct {
-		void *pgd;
-		void *shadow;
-	} hyp_donations;
+	struct kvm_hyp_memcache teardown_mc;
 };
 
 struct kvm_arch {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 36eea31a1c5f..663019992b67 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -76,7 +76,7 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
-void reclaim_guest_pages(struct kvm_shadow_vm *vm);
+void reclaim_guest_pages(struct kvm_shadow_vm *vm, struct kvm_hyp_memcache *mc);
 int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
 		    struct kvm_hyp_memcache *host_mc);
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 5b22bba77e57..bcfdba1881c1 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -260,19 +260,24 @@ int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd)
 	return 0;
 }
 
-void reclaim_guest_pages(struct kvm_shadow_vm *vm)
+void reclaim_guest_pages(struct kvm_shadow_vm *vm, struct kvm_hyp_memcache *mc)
 {
-	unsigned long nr_pages, pfn;
-
-	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
-	pfn = hyp_virt_to_pfn(vm->pgt.pgd);
+	void *addr;
 
+	/* Dump all pgtable pages in the hyp_pool */
 	guest_lock_component(vm);
 	kvm_pgtable_stage2_destroy(&vm->pgt);
 	vm->kvm.arch.mmu.pgd_phys = 0ULL;
 	guest_unlock_component(vm);
 
-	WARN_ON(__pkvm_hyp_donate_host(pfn, nr_pages));
+	/* Drain the hyp_pool into the memcache */
+	addr = hyp_alloc_pages(&vm->pool, 0);
+	while (addr) {
+		memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
+		push_hyp_memcache(mc, addr, hyp_virt_to_phys);
+		WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
+		addr = hyp_alloc_pages(&vm->pool, 0);
+	}
 }
 
 int __pkvm_prot_finalize(void)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 114c5565de7d..a4a518b2a43b 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -546,8 +546,10 @@ int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
 
 int __pkvm_teardown_shadow(unsigned int shadow_handle)
 {
+	struct kvm_hyp_memcache *mc;
 	struct kvm_shadow_vm *vm;
 	size_t shadow_size;
+	void *addr;
 	int err;
 
 	/* Lookup then remove entry from the shadow table. */
@@ -569,7 +571,8 @@ int __pkvm_teardown_shadow(unsigned int shadow_handle)
 	hyp_spin_unlock(&shadow_lock);
 
 	/* Reclaim guest pages (including page-table pages) */
-	reclaim_guest_pages(vm);
+	mc = &vm->host_kvm->arch.pkvm.teardown_mc;
+	reclaim_guest_pages(vm, mc);
 	unpin_host_vcpus(vm->shadow_vcpu_states, vm->kvm.created_vcpus);
 
 	/* Push the metadata pages to the teardown memcache */
@@ -577,6 +580,9 @@ int __pkvm_teardown_shadow(unsigned int shadow_handle)
 	hyp_unpin_shared_mem(vm->host_kvm, vm->host_kvm + 1);
 
 	memset(vm, 0, shadow_size);
+	for (addr = vm; addr < (void *)vm + shadow_size; addr += PAGE_SIZE)
+		push_hyp_memcache(mc, addr, hyp_virt_to_phys);
+
 	unmap_donated_memory_noclear(vm, shadow_size);
 	return 0;
 
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index b4466b31d7c8..b174d6dfde36 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -160,8 +160,6 @@ static int __kvm_shadow_create(struct kvm *kvm)
 
 	/* Store the shadow handle given by hyp for future call reference. */
 	kvm->arch.pkvm.shadow_handle = shadow_handle;
-	kvm->arch.pkvm.hyp_donations.pgd = pgd;
-	kvm->arch.pkvm.hyp_donations.shadow = shadow_addr;
 	return 0;
 
 free_shadow:
@@ -185,20 +183,12 @@ int kvm_shadow_create(struct kvm *kvm)
 
 void kvm_shadow_destroy(struct kvm *kvm)
 {
-	size_t pgd_sz, shadow_sz;
-
 	if (kvm->arch.pkvm.shadow_handle)
 		WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_shadow,
 					  kvm->arch.pkvm.shadow_handle));
 
 	kvm->arch.pkvm.shadow_handle = 0;
-
-	shadow_sz = PAGE_ALIGN(KVM_SHADOW_VM_SIZE +
-			       KVM_SHADOW_VCPU_STATE_SIZE * kvm->created_vcpus);
-	pgd_sz = kvm_pgtable_stage2_pgd_size(kvm->arch.vtcr);
-
-	free_pages_exact(kvm->arch.pkvm.hyp_donations.shadow, shadow_sz);
-	free_pages_exact(kvm->arch.pkvm.hyp_donations.pgd, pgd_sz);
+	free_hyp_memcache(&kvm->arch.pkvm.teardown_mc);
 }
 
 int kvm_init_pvm(struct kvm *kvm)
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 19/24] KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Rather than relying on the host to free the shadow VM pages explicitly
on teardown, introduce a dedicated teardown memcache which allows the
host to reclaim guest memory resources without having to keep track of
all of the allocations made by EL2.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h             |  6 +-----
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 17 +++++++++++------
 arch/arm64/kvm/hyp/nvhe/pkvm.c                |  8 +++++++-
 arch/arm64/kvm/pkvm.c                         | 12 +-----------
 5 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 70a2db91665d..09481268c224 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -175,11 +175,7 @@ struct kvm_smccc_features {
 struct kvm_protected_vm {
 	unsigned int shadow_handle;
 	struct mutex shadow_lock;
-
-	struct {
-		void *pgd;
-		void *shadow;
-	} hyp_donations;
+	struct kvm_hyp_memcache teardown_mc;
 };
 
 struct kvm_arch {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 36eea31a1c5f..663019992b67 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -76,7 +76,7 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
-void reclaim_guest_pages(struct kvm_shadow_vm *vm);
+void reclaim_guest_pages(struct kvm_shadow_vm *vm, struct kvm_hyp_memcache *mc);
 int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
 		    struct kvm_hyp_memcache *host_mc);
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 5b22bba77e57..bcfdba1881c1 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -260,19 +260,24 @@ int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd)
 	return 0;
 }
 
-void reclaim_guest_pages(struct kvm_shadow_vm *vm)
+void reclaim_guest_pages(struct kvm_shadow_vm *vm, struct kvm_hyp_memcache *mc)
 {
-	unsigned long nr_pages, pfn;
-
-	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
-	pfn = hyp_virt_to_pfn(vm->pgt.pgd);
+	void *addr;
 
+	/* Dump all pgtable pages in the hyp_pool */
 	guest_lock_component(vm);
 	kvm_pgtable_stage2_destroy(&vm->pgt);
 	vm->kvm.arch.mmu.pgd_phys = 0ULL;
 	guest_unlock_component(vm);
 
-	WARN_ON(__pkvm_hyp_donate_host(pfn, nr_pages));
+	/* Drain the hyp_pool into the memcache */
+	addr = hyp_alloc_pages(&vm->pool, 0);
+	while (addr) {
+		memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
+		push_hyp_memcache(mc, addr, hyp_virt_to_phys);
+		WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
+		addr = hyp_alloc_pages(&vm->pool, 0);
+	}
 }
 
 int __pkvm_prot_finalize(void)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 114c5565de7d..a4a518b2a43b 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -546,8 +546,10 @@ int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
 
 int __pkvm_teardown_shadow(unsigned int shadow_handle)
 {
+	struct kvm_hyp_memcache *mc;
 	struct kvm_shadow_vm *vm;
 	size_t shadow_size;
+	void *addr;
 	int err;
 
 	/* Lookup then remove entry from the shadow table. */
@@ -569,7 +571,8 @@ int __pkvm_teardown_shadow(unsigned int shadow_handle)
 	hyp_spin_unlock(&shadow_lock);
 
 	/* Reclaim guest pages (including page-table pages) */
-	reclaim_guest_pages(vm);
+	mc = &vm->host_kvm->arch.pkvm.teardown_mc;
+	reclaim_guest_pages(vm, mc);
 	unpin_host_vcpus(vm->shadow_vcpu_states, vm->kvm.created_vcpus);
 
 	/* Push the metadata pages to the teardown memcache */
@@ -577,6 +580,9 @@ int __pkvm_teardown_shadow(unsigned int shadow_handle)
 	hyp_unpin_shared_mem(vm->host_kvm, vm->host_kvm + 1);
 
 	memset(vm, 0, shadow_size);
+	for (addr = vm; addr < (void *)vm + shadow_size; addr += PAGE_SIZE)
+		push_hyp_memcache(mc, addr, hyp_virt_to_phys);
+
 	unmap_donated_memory_noclear(vm, shadow_size);
 	return 0;
 
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index b4466b31d7c8..b174d6dfde36 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -160,8 +160,6 @@ static int __kvm_shadow_create(struct kvm *kvm)
 
 	/* Store the shadow handle given by hyp for future call reference. */
 	kvm->arch.pkvm.shadow_handle = shadow_handle;
-	kvm->arch.pkvm.hyp_donations.pgd = pgd;
-	kvm->arch.pkvm.hyp_donations.shadow = shadow_addr;
 	return 0;
 
 free_shadow:
@@ -185,20 +183,12 @@ int kvm_shadow_create(struct kvm *kvm)
 
 void kvm_shadow_destroy(struct kvm *kvm)
 {
-	size_t pgd_sz, shadow_sz;
-
 	if (kvm->arch.pkvm.shadow_handle)
 		WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_shadow,
 					  kvm->arch.pkvm.shadow_handle));
 
 	kvm->arch.pkvm.shadow_handle = 0;
-
-	shadow_sz = PAGE_ALIGN(KVM_SHADOW_VM_SIZE +
-			       KVM_SHADOW_VCPU_STATE_SIZE * kvm->created_vcpus);
-	pgd_sz = kvm_pgtable_stage2_pgd_size(kvm->arch.vtcr);
-
-	free_pages_exact(kvm->arch.pkvm.hyp_donations.shadow, shadow_sz);
-	free_pages_exact(kvm->arch.pkvm.hyp_donations.pgd, pgd_sz);
+	free_hyp_memcache(&kvm->arch.pkvm.teardown_mc);
 }
 
 int kvm_init_pvm(struct kvm *kvm)
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 20/24] KVM: arm64: Unmap kvm_arm_hyp_percpu_base from the host
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

In pKVM mode, we can't trust the host not to mess with the hypervisor
per-cpu offsets, so let's move the array containing them to the nVHE
code.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h  | 4 ++--
 arch/arm64/kernel/image-vars.h    | 3 ---
 arch/arm64/kvm/arm.c              | 9 ++++-----
 arch/arm64/kvm/hyp/nvhe/hyp-smp.c | 2 ++
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index fac4ed699913..d92f2ccae74c 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -108,7 +108,7 @@ enum __kvm_host_smccc_func {
 #define per_cpu_ptr_nvhe_sym(sym, cpu)						\
 	({									\
 		unsigned long base, off;					\
-		base = kvm_arm_hyp_percpu_base[cpu];				\
+		base = kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu];		\
 		off = (unsigned long)&CHOOSE_NVHE_SYM(sym) -			\
 		      (unsigned long)&CHOOSE_NVHE_SYM(__per_cpu_start);		\
 		base ? (typeof(CHOOSE_NVHE_SYM(sym))*)(base + off) : NULL;	\
@@ -197,7 +197,7 @@ DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
 #define __kvm_hyp_init		CHOOSE_NVHE_SYM(__kvm_hyp_init)
 #define __kvm_hyp_vector	CHOOSE_HYP_SYM(__kvm_hyp_vector)
 
-extern unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
+extern unsigned long kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[];
 DECLARE_KVM_NVHE_SYM(__per_cpu_start);
 DECLARE_KVM_NVHE_SYM(__per_cpu_end);
 
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 4e3b6d618ac1..37a2d833851a 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -102,9 +102,6 @@ KVM_NVHE_ALIAS(gic_nonsecure_priorities);
 KVM_NVHE_ALIAS(__start___kvm_ex_table);
 KVM_NVHE_ALIAS(__stop___kvm_ex_table);
 
-/* Array containing bases of nVHE per-CPU memory regions. */
-KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
-
 /* PMU available static key */
 #ifdef CONFIG_HW_PERF_EVENTS
 KVM_NVHE_ALIAS(kvm_arm_pmu_available);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 90e0e7f38bb5..1934fcb2c2d3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -51,7 +51,6 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
 DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
 
 static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
-unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 static bool vgic_present;
@@ -1865,13 +1864,13 @@ static void teardown_hyp_mode(void)
 	free_hyp_pgds();
 	for_each_possible_cpu(cpu) {
 		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
-		free_pages(kvm_arm_hyp_percpu_base[cpu], nvhe_percpu_order());
+		free_pages(kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu], nvhe_percpu_order());
 	}
 }
 
 static int do_pkvm_init(u32 hyp_va_bits)
 {
-	void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
+	void *per_cpu_base = kvm_ksym_ref(kvm_nvhe_sym(kvm_arm_hyp_percpu_base));
 	int ret;
 
 	preempt_disable();
@@ -1975,7 +1974,7 @@ static int init_hyp_mode(void)
 
 		page_addr = page_address(page);
 		memcpy(page_addr, CHOOSE_NVHE_SYM(__per_cpu_start), nvhe_percpu_size());
-		kvm_arm_hyp_percpu_base[cpu] = (unsigned long)page_addr;
+		kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu] = (unsigned long)page_addr;
 	}
 
 	/*
@@ -2068,7 +2067,7 @@ static int init_hyp_mode(void)
 	}
 
 	for_each_possible_cpu(cpu) {
-		char *percpu_begin = (char *)kvm_arm_hyp_percpu_base[cpu];
+		char *percpu_begin = (char *)kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu];
 		char *percpu_end = percpu_begin + nvhe_percpu_size();
 
 		/* Map Hyp percpu pages */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
index 9f54833af400..04d194583f1e 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
@@ -23,6 +23,8 @@ u64 cpu_logical_map(unsigned int cpu)
 	return hyp_cpu_logical_map[cpu];
 }
 
+unsigned long __ro_after_init kvm_arm_hyp_percpu_base[NR_CPUS];
+
 unsigned long __hyp_per_cpu_offset(unsigned int cpu)
 {
 	unsigned long *cpu_base_array;
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 20/24] KVM: arm64: Unmap kvm_arm_hyp_percpu_base from the host
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

In pKVM mode, we can't trust the host not to mess with the hypervisor
per-cpu offsets, so let's move the array containing them to the nVHE
code.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h  | 4 ++--
 arch/arm64/kernel/image-vars.h    | 3 ---
 arch/arm64/kvm/arm.c              | 9 ++++-----
 arch/arm64/kvm/hyp/nvhe/hyp-smp.c | 2 ++
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index fac4ed699913..d92f2ccae74c 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -108,7 +108,7 @@ enum __kvm_host_smccc_func {
 #define per_cpu_ptr_nvhe_sym(sym, cpu)						\
 	({									\
 		unsigned long base, off;					\
-		base = kvm_arm_hyp_percpu_base[cpu];				\
+		base = kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu];		\
 		off = (unsigned long)&CHOOSE_NVHE_SYM(sym) -			\
 		      (unsigned long)&CHOOSE_NVHE_SYM(__per_cpu_start);		\
 		base ? (typeof(CHOOSE_NVHE_SYM(sym))*)(base + off) : NULL;	\
@@ -197,7 +197,7 @@ DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
 #define __kvm_hyp_init		CHOOSE_NVHE_SYM(__kvm_hyp_init)
 #define __kvm_hyp_vector	CHOOSE_HYP_SYM(__kvm_hyp_vector)
 
-extern unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
+extern unsigned long kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[];
 DECLARE_KVM_NVHE_SYM(__per_cpu_start);
 DECLARE_KVM_NVHE_SYM(__per_cpu_end);
 
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 4e3b6d618ac1..37a2d833851a 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -102,9 +102,6 @@ KVM_NVHE_ALIAS(gic_nonsecure_priorities);
 KVM_NVHE_ALIAS(__start___kvm_ex_table);
 KVM_NVHE_ALIAS(__stop___kvm_ex_table);
 
-/* Array containing bases of nVHE per-CPU memory regions. */
-KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
-
 /* PMU available static key */
 #ifdef CONFIG_HW_PERF_EVENTS
 KVM_NVHE_ALIAS(kvm_arm_pmu_available);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 90e0e7f38bb5..1934fcb2c2d3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -51,7 +51,6 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
 DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
 
 static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
-unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 static bool vgic_present;
@@ -1865,13 +1864,13 @@ static void teardown_hyp_mode(void)
 	free_hyp_pgds();
 	for_each_possible_cpu(cpu) {
 		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
-		free_pages(kvm_arm_hyp_percpu_base[cpu], nvhe_percpu_order());
+		free_pages(kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu], nvhe_percpu_order());
 	}
 }
 
 static int do_pkvm_init(u32 hyp_va_bits)
 {
-	void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
+	void *per_cpu_base = kvm_ksym_ref(kvm_nvhe_sym(kvm_arm_hyp_percpu_base));
 	int ret;
 
 	preempt_disable();
@@ -1975,7 +1974,7 @@ static int init_hyp_mode(void)
 
 		page_addr = page_address(page);
 		memcpy(page_addr, CHOOSE_NVHE_SYM(__per_cpu_start), nvhe_percpu_size());
-		kvm_arm_hyp_percpu_base[cpu] = (unsigned long)page_addr;
+		kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu] = (unsigned long)page_addr;
 	}
 
 	/*
@@ -2068,7 +2067,7 @@ static int init_hyp_mode(void)
 	}
 
 	for_each_possible_cpu(cpu) {
-		char *percpu_begin = (char *)kvm_arm_hyp_percpu_base[cpu];
+		char *percpu_begin = (char *)kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu];
 		char *percpu_end = percpu_begin + nvhe_percpu_size();
 
 		/* Map Hyp percpu pages */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
index 9f54833af400..04d194583f1e 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
@@ -23,6 +23,8 @@ u64 cpu_logical_map(unsigned int cpu)
 	return hyp_cpu_logical_map[cpu];
 }
 
+unsigned long __ro_after_init kvm_arm_hyp_percpu_base[NR_CPUS];
+
 unsigned long __hyp_per_cpu_offset(unsigned int cpu)
 {
 	unsigned long *cpu_base_array;
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 20/24] KVM: arm64: Unmap kvm_arm_hyp_percpu_base from the host
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

In pKVM mode, we can't trust the host not to mess with the hypervisor
per-cpu offsets, so let's move the array containing them to the nVHE
code.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h  | 4 ++--
 arch/arm64/kernel/image-vars.h    | 3 ---
 arch/arm64/kvm/arm.c              | 9 ++++-----
 arch/arm64/kvm/hyp/nvhe/hyp-smp.c | 2 ++
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index fac4ed699913..d92f2ccae74c 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -108,7 +108,7 @@ enum __kvm_host_smccc_func {
 #define per_cpu_ptr_nvhe_sym(sym, cpu)						\
 	({									\
 		unsigned long base, off;					\
-		base = kvm_arm_hyp_percpu_base[cpu];				\
+		base = kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu];		\
 		off = (unsigned long)&CHOOSE_NVHE_SYM(sym) -			\
 		      (unsigned long)&CHOOSE_NVHE_SYM(__per_cpu_start);		\
 		base ? (typeof(CHOOSE_NVHE_SYM(sym))*)(base + off) : NULL;	\
@@ -197,7 +197,7 @@ DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
 #define __kvm_hyp_init		CHOOSE_NVHE_SYM(__kvm_hyp_init)
 #define __kvm_hyp_vector	CHOOSE_HYP_SYM(__kvm_hyp_vector)
 
-extern unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
+extern unsigned long kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[];
 DECLARE_KVM_NVHE_SYM(__per_cpu_start);
 DECLARE_KVM_NVHE_SYM(__per_cpu_end);
 
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 4e3b6d618ac1..37a2d833851a 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -102,9 +102,6 @@ KVM_NVHE_ALIAS(gic_nonsecure_priorities);
 KVM_NVHE_ALIAS(__start___kvm_ex_table);
 KVM_NVHE_ALIAS(__stop___kvm_ex_table);
 
-/* Array containing bases of nVHE per-CPU memory regions. */
-KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
-
 /* PMU available static key */
 #ifdef CONFIG_HW_PERF_EVENTS
 KVM_NVHE_ALIAS(kvm_arm_pmu_available);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 90e0e7f38bb5..1934fcb2c2d3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -51,7 +51,6 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
 DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
 
 static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
-unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 static bool vgic_present;
@@ -1865,13 +1864,13 @@ static void teardown_hyp_mode(void)
 	free_hyp_pgds();
 	for_each_possible_cpu(cpu) {
 		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
-		free_pages(kvm_arm_hyp_percpu_base[cpu], nvhe_percpu_order());
+		free_pages(kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu], nvhe_percpu_order());
 	}
 }
 
 static int do_pkvm_init(u32 hyp_va_bits)
 {
-	void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
+	void *per_cpu_base = kvm_ksym_ref(kvm_nvhe_sym(kvm_arm_hyp_percpu_base));
 	int ret;
 
 	preempt_disable();
@@ -1975,7 +1974,7 @@ static int init_hyp_mode(void)
 
 		page_addr = page_address(page);
 		memcpy(page_addr, CHOOSE_NVHE_SYM(__per_cpu_start), nvhe_percpu_size());
-		kvm_arm_hyp_percpu_base[cpu] = (unsigned long)page_addr;
+		kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu] = (unsigned long)page_addr;
 	}
 
 	/*
@@ -2068,7 +2067,7 @@ static int init_hyp_mode(void)
 	}
 
 	for_each_possible_cpu(cpu) {
-		char *percpu_begin = (char *)kvm_arm_hyp_percpu_base[cpu];
+		char *percpu_begin = (char *)kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu];
 		char *percpu_end = percpu_begin + nvhe_percpu_size();
 
 		/* Map Hyp percpu pages */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
index 9f54833af400..04d194583f1e 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
@@ -23,6 +23,8 @@ u64 cpu_logical_map(unsigned int cpu)
 	return hyp_cpu_logical_map[cpu];
 }
 
+unsigned long __ro_after_init kvm_arm_hyp_percpu_base[NR_CPUS];
+
 unsigned long __hyp_per_cpu_offset(unsigned int cpu)
 {
 	unsigned long *cpu_base_array;
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 21/24] KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

Sharing 'kvm_arm_vmid_bits' between EL1 and EL2 allows the host to
modify the variable arbitrarily, potentially leading to all sorts of
shenanians as this is used to configure the VTTBR register for the
guest stage-2.

In preparation for unmapping host sections entirely from EL2, maintain
a copy of 'kvm_arm_vmid_bits' and initialise it from the host value
while it is still trusted.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_hyp.h | 2 ++
 arch/arm64/kernel/image-vars.h   | 3 ---
 arch/arm64/kvm/arm.c             | 1 +
 arch/arm64/kvm/hyp/nvhe/pkvm.c   | 3 +++
 4 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index fd99cf09972d..6797eafe7890 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -124,4 +124,6 @@ extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
 
 extern unsigned long kvm_nvhe_sym(__icache_flags);
+extern unsigned int kvm_nvhe_sym(kvm_arm_vmid_bits);
+
 #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 37a2d833851a..3e2489d23ff0 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -80,9 +80,6 @@ KVM_NVHE_ALIAS(nvhe_hyp_panic_handler);
 /* Vectors installed by hyp-init on reset HVC. */
 KVM_NVHE_ALIAS(__hyp_stub_vectors);
 
-/* VMID bits set by the KVM VMID allocator */
-KVM_NVHE_ALIAS(kvm_arm_vmid_bits);
-
 /* Kernel symbols needed for cpus_have_final/const_caps checks. */
 KVM_NVHE_ALIAS(arm64_const_caps_ready);
 KVM_NVHE_ALIAS(cpu_hwcap_keys);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 1934fcb2c2d3..fe249b584115 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1901,6 +1901,7 @@ static void kvm_hyp_init_symbols(void)
 	kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
 	kvm_nvhe_sym(__icache_flags) = __icache_flags;
+	kvm_nvhe_sym(kvm_arm_vmid_bits) = kvm_arm_vmid_bits;
 }
 
 static int kvm_hyp_init_protection(u32 hyp_va_bits)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index a4a518b2a43b..571334fd58ff 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -15,6 +15,9 @@
 /* Used by icache_is_vpipt(). */
 unsigned long __icache_flags;
 
+/* Used by kvm_get_vttbr(). */
+unsigned int kvm_arm_vmid_bits;
+
 /*
  * Set trap register values based on features in ID_AA64PFR0.
  */
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 21/24] KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

Sharing 'kvm_arm_vmid_bits' between EL1 and EL2 allows the host to
modify the variable arbitrarily, potentially leading to all sorts of
shenanians as this is used to configure the VTTBR register for the
guest stage-2.

In preparation for unmapping host sections entirely from EL2, maintain
a copy of 'kvm_arm_vmid_bits' and initialise it from the host value
while it is still trusted.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_hyp.h | 2 ++
 arch/arm64/kernel/image-vars.h   | 3 ---
 arch/arm64/kvm/arm.c             | 1 +
 arch/arm64/kvm/hyp/nvhe/pkvm.c   | 3 +++
 4 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index fd99cf09972d..6797eafe7890 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -124,4 +124,6 @@ extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
 
 extern unsigned long kvm_nvhe_sym(__icache_flags);
+extern unsigned int kvm_nvhe_sym(kvm_arm_vmid_bits);
+
 #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 37a2d833851a..3e2489d23ff0 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -80,9 +80,6 @@ KVM_NVHE_ALIAS(nvhe_hyp_panic_handler);
 /* Vectors installed by hyp-init on reset HVC. */
 KVM_NVHE_ALIAS(__hyp_stub_vectors);
 
-/* VMID bits set by the KVM VMID allocator */
-KVM_NVHE_ALIAS(kvm_arm_vmid_bits);
-
 /* Kernel symbols needed for cpus_have_final/const_caps checks. */
 KVM_NVHE_ALIAS(arm64_const_caps_ready);
 KVM_NVHE_ALIAS(cpu_hwcap_keys);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 1934fcb2c2d3..fe249b584115 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1901,6 +1901,7 @@ static void kvm_hyp_init_symbols(void)
 	kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
 	kvm_nvhe_sym(__icache_flags) = __icache_flags;
+	kvm_nvhe_sym(kvm_arm_vmid_bits) = kvm_arm_vmid_bits;
 }
 
 static int kvm_hyp_init_protection(u32 hyp_va_bits)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index a4a518b2a43b..571334fd58ff 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -15,6 +15,9 @@
 /* Used by icache_is_vpipt(). */
 unsigned long __icache_flags;
 
+/* Used by kvm_get_vttbr(). */
+unsigned int kvm_arm_vmid_bits;
+
 /*
  * Set trap register values based on features in ID_AA64PFR0.
  */
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 21/24] KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

Sharing 'kvm_arm_vmid_bits' between EL1 and EL2 allows the host to
modify the variable arbitrarily, potentially leading to all sorts of
shenanians as this is used to configure the VTTBR register for the
guest stage-2.

In preparation for unmapping host sections entirely from EL2, maintain
a copy of 'kvm_arm_vmid_bits' and initialise it from the host value
while it is still trusted.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_hyp.h | 2 ++
 arch/arm64/kernel/image-vars.h   | 3 ---
 arch/arm64/kvm/arm.c             | 1 +
 arch/arm64/kvm/hyp/nvhe/pkvm.c   | 3 +++
 4 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index fd99cf09972d..6797eafe7890 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -124,4 +124,6 @@ extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
 
 extern unsigned long kvm_nvhe_sym(__icache_flags);
+extern unsigned int kvm_nvhe_sym(kvm_arm_vmid_bits);
+
 #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 37a2d833851a..3e2489d23ff0 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -80,9 +80,6 @@ KVM_NVHE_ALIAS(nvhe_hyp_panic_handler);
 /* Vectors installed by hyp-init on reset HVC. */
 KVM_NVHE_ALIAS(__hyp_stub_vectors);
 
-/* VMID bits set by the KVM VMID allocator */
-KVM_NVHE_ALIAS(kvm_arm_vmid_bits);
-
 /* Kernel symbols needed for cpus_have_final/const_caps checks. */
 KVM_NVHE_ALIAS(arm64_const_caps_ready);
 KVM_NVHE_ALIAS(cpu_hwcap_keys);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 1934fcb2c2d3..fe249b584115 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1901,6 +1901,7 @@ static void kvm_hyp_init_symbols(void)
 	kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
 	kvm_nvhe_sym(__icache_flags) = __icache_flags;
+	kvm_nvhe_sym(kvm_arm_vmid_bits) = kvm_arm_vmid_bits;
 }
 
 static int kvm_hyp_init_protection(u32 hyp_va_bits)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index a4a518b2a43b..571334fd58ff 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -15,6 +15,9 @@
 /* Used by icache_is_vpipt(). */
 unsigned long __icache_flags;
 
+/* Used by kvm_get_vttbr(). */
+unsigned int kvm_arm_vmid_bits;
+
 /*
  * Set trap register values based on features in ID_AA64PFR0.
  */
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 22/24] KVM: arm64: Explicitly map kvm_vgic_global_state at EL2
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

The pkvm hypervisor may need to read the kvm_vgic_global_state variable
at EL2. Make sure to explicitly map it in its stage-1 page-table rather
than relying on mapping all of the host .rodata section.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 3f689ffb2693..fa06828899e1 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -161,6 +161,11 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	if (ret)
 		return ret;
 
+	ret = pkvm_create_mappings(&kvm_vgic_global_state,
+				   &kvm_vgic_global_state + 1, prot);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 22/24] KVM: arm64: Explicitly map kvm_vgic_global_state at EL2
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

The pkvm hypervisor may need to read the kvm_vgic_global_state variable
at EL2. Make sure to explicitly map it in its stage-1 page-table rather
than relying on mapping all of the host .rodata section.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 3f689ffb2693..fa06828899e1 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -161,6 +161,11 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	if (ret)
 		return ret;
 
+	ret = pkvm_create_mappings(&kvm_vgic_global_state,
+				   &kvm_vgic_global_state + 1, prot);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 22/24] KVM: arm64: Explicitly map kvm_vgic_global_state at EL2
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

The pkvm hypervisor may need to read the kvm_vgic_global_state variable
at EL2. Make sure to explicitly map it in its stage-1 page-table rather
than relying on mapping all of the host .rodata section.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 3f689ffb2693..fa06828899e1 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -161,6 +161,11 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	if (ret)
 		return ret;
 
+	ret = pkvm_create_mappings(&kvm_vgic_global_state,
+				   &kvm_vgic_global_state + 1, prot);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 23/24] KVM: arm64: Don't map host sections in pkvm
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

From: Quentin Perret <qperret@google.com>

We no longer need to map the host's .rodata and .bss sections in the
pkvm hypervisor, so let's remove those mappings. This will avoid
creating dependencies at EL2 on host-controlled data-structures.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/image-vars.h  |  6 ------
 arch/arm64/kvm/hyp/nvhe/setup.c | 14 +++-----------
 2 files changed, 3 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 3e2489d23ff0..2d4d6836ff47 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -115,12 +115,6 @@ KVM_NVHE_ALIAS_HYP(__memcpy, __pi_memcpy);
 KVM_NVHE_ALIAS_HYP(__memset, __pi_memset);
 #endif
 
-/* Kernel memory sections */
-KVM_NVHE_ALIAS(__start_rodata);
-KVM_NVHE_ALIAS(__end_rodata);
-KVM_NVHE_ALIAS(__bss_start);
-KVM_NVHE_ALIAS(__bss_stop);
-
 /* Hyp memory sections */
 KVM_NVHE_ALIAS(__hyp_idmap_text_start);
 KVM_NVHE_ALIAS(__hyp_idmap_text_end);
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index fa06828899e1..1a2e06760c41 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -144,23 +144,15 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	}
 
 	/*
-	 * Map the host's .bss and .rodata sections RO in the hypervisor, but
-	 * transfer the ownership from the host to the hypervisor itself to
-	 * make sure it can't be donated or shared with another entity.
+	 * Map the host sections RO in the hypervisor, but transfer the
+	 * ownership from the host to the hypervisor itself to make sure they
+	 * can't be donated or shared with another entity.
 	 *
 	 * The ownership transition requires matching changes in the host
 	 * stage-2. This will be done later (see finalize_host_mappings()) once
 	 * the hyp_vmemmap is addressable.
 	 */
 	prot = pkvm_mkstate(PAGE_HYP_RO, PKVM_PAGE_SHARED_OWNED);
-	ret = pkvm_create_mappings(__start_rodata, __end_rodata, prot);
-	if (ret)
-		return ret;
-
-	ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, prot);
-	if (ret)
-		return ret;
-
 	ret = pkvm_create_mappings(&kvm_vgic_global_state,
 				   &kvm_vgic_global_state + 1, prot);
 	if (ret)
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 23/24] KVM: arm64: Don't map host sections in pkvm
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

We no longer need to map the host's .rodata and .bss sections in the
pkvm hypervisor, so let's remove those mappings. This will avoid
creating dependencies at EL2 on host-controlled data-structures.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/image-vars.h  |  6 ------
 arch/arm64/kvm/hyp/nvhe/setup.c | 14 +++-----------
 2 files changed, 3 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 3e2489d23ff0..2d4d6836ff47 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -115,12 +115,6 @@ KVM_NVHE_ALIAS_HYP(__memcpy, __pi_memcpy);
 KVM_NVHE_ALIAS_HYP(__memset, __pi_memset);
 #endif
 
-/* Kernel memory sections */
-KVM_NVHE_ALIAS(__start_rodata);
-KVM_NVHE_ALIAS(__end_rodata);
-KVM_NVHE_ALIAS(__bss_start);
-KVM_NVHE_ALIAS(__bss_stop);
-
 /* Hyp memory sections */
 KVM_NVHE_ALIAS(__hyp_idmap_text_start);
 KVM_NVHE_ALIAS(__hyp_idmap_text_end);
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index fa06828899e1..1a2e06760c41 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -144,23 +144,15 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	}
 
 	/*
-	 * Map the host's .bss and .rodata sections RO in the hypervisor, but
-	 * transfer the ownership from the host to the hypervisor itself to
-	 * make sure it can't be donated or shared with another entity.
+	 * Map the host sections RO in the hypervisor, but transfer the
+	 * ownership from the host to the hypervisor itself to make sure they
+	 * can't be donated or shared with another entity.
 	 *
 	 * The ownership transition requires matching changes in the host
 	 * stage-2. This will be done later (see finalize_host_mappings()) once
 	 * the hyp_vmemmap is addressable.
 	 */
 	prot = pkvm_mkstate(PAGE_HYP_RO, PKVM_PAGE_SHARED_OWNED);
-	ret = pkvm_create_mappings(__start_rodata, __end_rodata, prot);
-	if (ret)
-		return ret;
-
-	ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, prot);
-	if (ret)
-		return ret;
-
 	ret = pkvm_create_mappings(&kvm_vgic_global_state,
 				   &kvm_vgic_global_state + 1, prot);
 	if (ret)
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH v2 23/24] KVM: arm64: Don't map host sections in pkvm
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

We no longer need to map the host's .rodata and .bss sections in the
pkvm hypervisor, so let's remove those mappings. This will avoid
creating dependencies at EL2 on host-controlled data-structures.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/image-vars.h  |  6 ------
 arch/arm64/kvm/hyp/nvhe/setup.c | 14 +++-----------
 2 files changed, 3 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 3e2489d23ff0..2d4d6836ff47 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -115,12 +115,6 @@ KVM_NVHE_ALIAS_HYP(__memcpy, __pi_memcpy);
 KVM_NVHE_ALIAS_HYP(__memset, __pi_memset);
 #endif
 
-/* Kernel memory sections */
-KVM_NVHE_ALIAS(__start_rodata);
-KVM_NVHE_ALIAS(__end_rodata);
-KVM_NVHE_ALIAS(__bss_start);
-KVM_NVHE_ALIAS(__bss_stop);
-
 /* Hyp memory sections */
 KVM_NVHE_ALIAS(__hyp_idmap_text_start);
 KVM_NVHE_ALIAS(__hyp_idmap_text_end);
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index fa06828899e1..1a2e06760c41 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -144,23 +144,15 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	}
 
 	/*
-	 * Map the host's .bss and .rodata sections RO in the hypervisor, but
-	 * transfer the ownership from the host to the hypervisor itself to
-	 * make sure it can't be donated or shared with another entity.
+	 * Map the host sections RO in the hypervisor, but transfer the
+	 * ownership from the host to the hypervisor itself to make sure they
+	 * can't be donated or shared with another entity.
 	 *
 	 * The ownership transition requires matching changes in the host
 	 * stage-2. This will be done later (see finalize_host_mappings()) once
 	 * the hyp_vmemmap is addressable.
 	 */
 	prot = pkvm_mkstate(PAGE_HYP_RO, PKVM_PAGE_SHARED_OWNED);
-	ret = pkvm_create_mappings(__start_rodata, __end_rodata, prot);
-	if (ret)
-		return ret;
-
-	ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, prot);
-	if (ret)
-		return ret;
-
 	ret = pkvm_create_mappings(&kvm_vgic_global_state,
 				   &kvm_vgic_global_state + 1, prot);
 	if (ret)
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [RFC PATCH v2 24/24] KVM: arm64: Use the shadow vCPU structure in handle___kvm_vcpu_run()
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-06-30 13:57   ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon

As a stepping stone towards deprivileging the host's access to the
guest's vCPU structures, introduce some naive flush/sync routines to
copy most of the host vCPU into the shadow vCPU on vCPU run and back
again on return to EL1.

This allows us to run using the shadow structure when KVM is initialised
in protected mode.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  4 ++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c     | 84 +++++++++++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/pkvm.c         | 28 +++++++++
 3 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index c0e32a750b6e..0edb3faa4067 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -63,4 +63,8 @@ int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
 		       size_t shadow_size, unsigned long pgd_hva);
 int __pkvm_teardown_shadow(unsigned int shadow_handle);
 
+struct kvm_shadow_vcpu_state *
+pkvm_load_shadow_vcpu_state(unsigned int shadow_handle, unsigned int vcpu_idx);
+void pkvm_put_shadow_vcpu_state(struct kvm_shadow_vcpu_state *shadow_state);
+
 #endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index a1fbd11c8041..39d66c7b0560 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -22,11 +22,91 @@ DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
 
+static void flush_shadow_state(struct kvm_shadow_vcpu_state *shadow_state)
+{
+	struct kvm_vcpu *shadow_vcpu = &shadow_state->shadow_vcpu;
+	struct kvm_vcpu *host_vcpu = shadow_state->host_vcpu;
+
+	shadow_vcpu->arch.ctxt		= host_vcpu->arch.ctxt;
+
+	shadow_vcpu->arch.sve_state	= kern_hyp_va(host_vcpu->arch.sve_state);
+	shadow_vcpu->arch.sve_max_vl	= host_vcpu->arch.sve_max_vl;
+
+	shadow_vcpu->arch.hw_mmu	= host_vcpu->arch.hw_mmu;
+
+	shadow_vcpu->arch.hcr_el2	= host_vcpu->arch.hcr_el2;
+	shadow_vcpu->arch.mdcr_el2	= host_vcpu->arch.mdcr_el2;
+	shadow_vcpu->arch.cptr_el2	= host_vcpu->arch.cptr_el2;
+
+	shadow_vcpu->arch.iflags	= host_vcpu->arch.iflags;
+	shadow_vcpu->arch.fp_state	= host_vcpu->arch.fp_state;
+
+	shadow_vcpu->arch.debug_ptr	= kern_hyp_va(host_vcpu->arch.debug_ptr);
+	shadow_vcpu->arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state;
+
+	shadow_vcpu->arch.vsesr_el2	= host_vcpu->arch.vsesr_el2;
+
+	shadow_vcpu->arch.vgic_cpu.vgic_v3 = host_vcpu->arch.vgic_cpu.vgic_v3;
+}
+
+static void sync_shadow_state(struct kvm_shadow_vcpu_state *shadow_state)
+{
+	struct kvm_vcpu *shadow_vcpu = &shadow_state->shadow_vcpu;
+	struct kvm_vcpu *host_vcpu = shadow_state->host_vcpu;
+	struct vgic_v3_cpu_if *shadow_cpu_if = &shadow_vcpu->arch.vgic_cpu.vgic_v3;
+	struct vgic_v3_cpu_if *host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
+	unsigned int i;
+
+	host_vcpu->arch.ctxt		= shadow_vcpu->arch.ctxt;
+
+	host_vcpu->arch.hcr_el2		= shadow_vcpu->arch.hcr_el2;
+	host_vcpu->arch.cptr_el2	= shadow_vcpu->arch.cptr_el2;
+
+	host_vcpu->arch.fault		= shadow_vcpu->arch.fault;
+
+	host_vcpu->arch.iflags		= shadow_vcpu->arch.iflags;
+	host_vcpu->arch.fp_state	= shadow_vcpu->arch.fp_state;
+
+	host_cpu_if->vgic_hcr		= shadow_cpu_if->vgic_hcr;
+	for (i = 0; i < shadow_cpu_if->used_lrs; ++i)
+		host_cpu_if->vgic_lr[i] = shadow_cpu_if->vgic_lr[i];
+}
+
 static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 {
-	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
+	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
+	int ret;
+
+	host_vcpu = kern_hyp_va(host_vcpu);
+
+	if (unlikely(is_protected_kvm_enabled())) {
+		struct kvm_shadow_vcpu_state *shadow_state;
+		struct kvm_vcpu *shadow_vcpu;
+		struct kvm *host_kvm;
+		unsigned int handle;
+
+		host_kvm = kern_hyp_va(host_vcpu->kvm);
+		handle = host_kvm->arch.pkvm.shadow_handle;
+		shadow_state = pkvm_load_shadow_vcpu_state(handle,
+							   host_vcpu->vcpu_idx);
+		if (!shadow_state) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		shadow_vcpu = &shadow_state->shadow_vcpu;
+		flush_shadow_state(shadow_state);
+
+		ret = __kvm_vcpu_run(shadow_vcpu);
+
+		sync_shadow_state(shadow_state);
+		pkvm_put_shadow_vcpu_state(shadow_state);
+	} else {
+		ret = __kvm_vcpu_run(host_vcpu);
+	}
 
-	cpu_reg(host_ctxt, 1) =  __kvm_vcpu_run(kern_hyp_va(vcpu));
+out:
+	cpu_reg(host_ctxt, 1) =  ret;
 }
 
 static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 571334fd58ff..bf92f4443c92 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -247,6 +247,33 @@ static struct kvm_shadow_vm *find_shadow_by_handle(unsigned int shadow_handle)
 	return shadow_table[shadow_idx];
 }
 
+struct kvm_shadow_vcpu_state *
+pkvm_load_shadow_vcpu_state(unsigned int shadow_handle, unsigned int vcpu_idx)
+{
+	struct kvm_shadow_vcpu_state *shadow_state = NULL;
+	struct kvm_shadow_vm *vm;
+
+	hyp_spin_lock(&shadow_lock);
+	vm = find_shadow_by_handle(shadow_handle);
+	if (!vm || vm->kvm.created_vcpus <= vcpu_idx)
+		goto unlock;
+
+	shadow_state = &vm->shadow_vcpu_states[vcpu_idx];
+	hyp_page_ref_inc(hyp_virt_to_page(vm));
+unlock:
+	hyp_spin_unlock(&shadow_lock);
+	return shadow_state;
+}
+
+void pkvm_put_shadow_vcpu_state(struct kvm_shadow_vcpu_state *shadow_state)
+{
+	struct kvm_shadow_vm *vm = shadow_state->shadow_vm;
+
+	hyp_spin_lock(&shadow_lock);
+	hyp_page_ref_dec(hyp_virt_to_page(vm));
+	hyp_spin_unlock(&shadow_lock);
+}
+
 static void unpin_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
 			     unsigned int nr_vcpus)
 {
@@ -304,6 +331,7 @@ static int init_shadow_structs(struct kvm *kvm, struct kvm_shadow_vm *vm,
 		shadow_vcpu->vcpu_idx = i;
 
 		shadow_vcpu->arch.hw_mmu = &vm->kvm.arch.mmu;
+		shadow_vcpu->arch.cflags = READ_ONCE(host_vcpu->arch.cflags);
 	}
 
 	return 0;
-- 
2.37.0.rc0.161.g10f37bed90-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [RFC PATCH v2 24/24] KVM: arm64: Use the shadow vCPU structure in handle___kvm_vcpu_run()
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

As a stepping stone towards deprivileging the host's access to the
guest's vCPU structures, introduce some naive flush/sync routines to
copy most of the host vCPU into the shadow vCPU on vCPU run and back
again on return to EL1.

This allows us to run using the shadow structure when KVM is initialised
in protected mode.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  4 ++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c     | 84 +++++++++++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/pkvm.c         | 28 +++++++++
 3 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index c0e32a750b6e..0edb3faa4067 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -63,4 +63,8 @@ int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
 		       size_t shadow_size, unsigned long pgd_hva);
 int __pkvm_teardown_shadow(unsigned int shadow_handle);
 
+struct kvm_shadow_vcpu_state *
+pkvm_load_shadow_vcpu_state(unsigned int shadow_handle, unsigned int vcpu_idx);
+void pkvm_put_shadow_vcpu_state(struct kvm_shadow_vcpu_state *shadow_state);
+
 #endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index a1fbd11c8041..39d66c7b0560 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -22,11 +22,91 @@ DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
 
+static void flush_shadow_state(struct kvm_shadow_vcpu_state *shadow_state)
+{
+	struct kvm_vcpu *shadow_vcpu = &shadow_state->shadow_vcpu;
+	struct kvm_vcpu *host_vcpu = shadow_state->host_vcpu;
+
+	shadow_vcpu->arch.ctxt		= host_vcpu->arch.ctxt;
+
+	shadow_vcpu->arch.sve_state	= kern_hyp_va(host_vcpu->arch.sve_state);
+	shadow_vcpu->arch.sve_max_vl	= host_vcpu->arch.sve_max_vl;
+
+	shadow_vcpu->arch.hw_mmu	= host_vcpu->arch.hw_mmu;
+
+	shadow_vcpu->arch.hcr_el2	= host_vcpu->arch.hcr_el2;
+	shadow_vcpu->arch.mdcr_el2	= host_vcpu->arch.mdcr_el2;
+	shadow_vcpu->arch.cptr_el2	= host_vcpu->arch.cptr_el2;
+
+	shadow_vcpu->arch.iflags	= host_vcpu->arch.iflags;
+	shadow_vcpu->arch.fp_state	= host_vcpu->arch.fp_state;
+
+	shadow_vcpu->arch.debug_ptr	= kern_hyp_va(host_vcpu->arch.debug_ptr);
+	shadow_vcpu->arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state;
+
+	shadow_vcpu->arch.vsesr_el2	= host_vcpu->arch.vsesr_el2;
+
+	shadow_vcpu->arch.vgic_cpu.vgic_v3 = host_vcpu->arch.vgic_cpu.vgic_v3;
+}
+
+static void sync_shadow_state(struct kvm_shadow_vcpu_state *shadow_state)
+{
+	struct kvm_vcpu *shadow_vcpu = &shadow_state->shadow_vcpu;
+	struct kvm_vcpu *host_vcpu = shadow_state->host_vcpu;
+	struct vgic_v3_cpu_if *shadow_cpu_if = &shadow_vcpu->arch.vgic_cpu.vgic_v3;
+	struct vgic_v3_cpu_if *host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
+	unsigned int i;
+
+	host_vcpu->arch.ctxt		= shadow_vcpu->arch.ctxt;
+
+	host_vcpu->arch.hcr_el2		= shadow_vcpu->arch.hcr_el2;
+	host_vcpu->arch.cptr_el2	= shadow_vcpu->arch.cptr_el2;
+
+	host_vcpu->arch.fault		= shadow_vcpu->arch.fault;
+
+	host_vcpu->arch.iflags		= shadow_vcpu->arch.iflags;
+	host_vcpu->arch.fp_state	= shadow_vcpu->arch.fp_state;
+
+	host_cpu_if->vgic_hcr		= shadow_cpu_if->vgic_hcr;
+	for (i = 0; i < shadow_cpu_if->used_lrs; ++i)
+		host_cpu_if->vgic_lr[i] = shadow_cpu_if->vgic_lr[i];
+}
+
 static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 {
-	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
+	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
+	int ret;
+
+	host_vcpu = kern_hyp_va(host_vcpu);
+
+	if (unlikely(is_protected_kvm_enabled())) {
+		struct kvm_shadow_vcpu_state *shadow_state;
+		struct kvm_vcpu *shadow_vcpu;
+		struct kvm *host_kvm;
+		unsigned int handle;
+
+		host_kvm = kern_hyp_va(host_vcpu->kvm);
+		handle = host_kvm->arch.pkvm.shadow_handle;
+		shadow_state = pkvm_load_shadow_vcpu_state(handle,
+							   host_vcpu->vcpu_idx);
+		if (!shadow_state) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		shadow_vcpu = &shadow_state->shadow_vcpu;
+		flush_shadow_state(shadow_state);
+
+		ret = __kvm_vcpu_run(shadow_vcpu);
+
+		sync_shadow_state(shadow_state);
+		pkvm_put_shadow_vcpu_state(shadow_state);
+	} else {
+		ret = __kvm_vcpu_run(host_vcpu);
+	}
 
-	cpu_reg(host_ctxt, 1) =  __kvm_vcpu_run(kern_hyp_va(vcpu));
+out:
+	cpu_reg(host_ctxt, 1) =  ret;
 }
 
 static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 571334fd58ff..bf92f4443c92 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -247,6 +247,33 @@ static struct kvm_shadow_vm *find_shadow_by_handle(unsigned int shadow_handle)
 	return shadow_table[shadow_idx];
 }
 
+struct kvm_shadow_vcpu_state *
+pkvm_load_shadow_vcpu_state(unsigned int shadow_handle, unsigned int vcpu_idx)
+{
+	struct kvm_shadow_vcpu_state *shadow_state = NULL;
+	struct kvm_shadow_vm *vm;
+
+	hyp_spin_lock(&shadow_lock);
+	vm = find_shadow_by_handle(shadow_handle);
+	if (!vm || vm->kvm.created_vcpus <= vcpu_idx)
+		goto unlock;
+
+	shadow_state = &vm->shadow_vcpu_states[vcpu_idx];
+	hyp_page_ref_inc(hyp_virt_to_page(vm));
+unlock:
+	hyp_spin_unlock(&shadow_lock);
+	return shadow_state;
+}
+
+void pkvm_put_shadow_vcpu_state(struct kvm_shadow_vcpu_state *shadow_state)
+{
+	struct kvm_shadow_vm *vm = shadow_state->shadow_vm;
+
+	hyp_spin_lock(&shadow_lock);
+	hyp_page_ref_dec(hyp_virt_to_page(vm));
+	hyp_spin_unlock(&shadow_lock);
+}
+
 static void unpin_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
 			     unsigned int nr_vcpus)
 {
@@ -304,6 +331,7 @@ static int init_shadow_structs(struct kvm *kvm, struct kvm_shadow_vm *vm,
 		shadow_vcpu->vcpu_idx = i;
 
 		shadow_vcpu->arch.hw_mmu = &vm->kvm.arch.mmu;
+		shadow_vcpu->arch.cflags = READ_ONCE(host_vcpu->arch.cflags);
 	}
 
 	return 0;
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [RFC PATCH v2 24/24] KVM: arm64: Use the shadow vCPU structure in handle___kvm_vcpu_run()
@ 2022-06-30 13:57   ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-06-30 13:57 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

As a stepping stone towards deprivileging the host's access to the
guest's vCPU structures, introduce some naive flush/sync routines to
copy most of the host vCPU into the shadow vCPU on vCPU run and back
again on return to EL1.

This allows us to run using the shadow structure when KVM is initialised
in protected mode.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  4 ++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c     | 84 +++++++++++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/pkvm.c         | 28 +++++++++
 3 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index c0e32a750b6e..0edb3faa4067 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -63,4 +63,8 @@ int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
 		       size_t shadow_size, unsigned long pgd_hva);
 int __pkvm_teardown_shadow(unsigned int shadow_handle);
 
+struct kvm_shadow_vcpu_state *
+pkvm_load_shadow_vcpu_state(unsigned int shadow_handle, unsigned int vcpu_idx);
+void pkvm_put_shadow_vcpu_state(struct kvm_shadow_vcpu_state *shadow_state);
+
 #endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index a1fbd11c8041..39d66c7b0560 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -22,11 +22,91 @@ DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
 
+static void flush_shadow_state(struct kvm_shadow_vcpu_state *shadow_state)
+{
+	struct kvm_vcpu *shadow_vcpu = &shadow_state->shadow_vcpu;
+	struct kvm_vcpu *host_vcpu = shadow_state->host_vcpu;
+
+	shadow_vcpu->arch.ctxt		= host_vcpu->arch.ctxt;
+
+	shadow_vcpu->arch.sve_state	= kern_hyp_va(host_vcpu->arch.sve_state);
+	shadow_vcpu->arch.sve_max_vl	= host_vcpu->arch.sve_max_vl;
+
+	shadow_vcpu->arch.hw_mmu	= host_vcpu->arch.hw_mmu;
+
+	shadow_vcpu->arch.hcr_el2	= host_vcpu->arch.hcr_el2;
+	shadow_vcpu->arch.mdcr_el2	= host_vcpu->arch.mdcr_el2;
+	shadow_vcpu->arch.cptr_el2	= host_vcpu->arch.cptr_el2;
+
+	shadow_vcpu->arch.iflags	= host_vcpu->arch.iflags;
+	shadow_vcpu->arch.fp_state	= host_vcpu->arch.fp_state;
+
+	shadow_vcpu->arch.debug_ptr	= kern_hyp_va(host_vcpu->arch.debug_ptr);
+	shadow_vcpu->arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state;
+
+	shadow_vcpu->arch.vsesr_el2	= host_vcpu->arch.vsesr_el2;
+
+	shadow_vcpu->arch.vgic_cpu.vgic_v3 = host_vcpu->arch.vgic_cpu.vgic_v3;
+}
+
+static void sync_shadow_state(struct kvm_shadow_vcpu_state *shadow_state)
+{
+	struct kvm_vcpu *shadow_vcpu = &shadow_state->shadow_vcpu;
+	struct kvm_vcpu *host_vcpu = shadow_state->host_vcpu;
+	struct vgic_v3_cpu_if *shadow_cpu_if = &shadow_vcpu->arch.vgic_cpu.vgic_v3;
+	struct vgic_v3_cpu_if *host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
+	unsigned int i;
+
+	host_vcpu->arch.ctxt		= shadow_vcpu->arch.ctxt;
+
+	host_vcpu->arch.hcr_el2		= shadow_vcpu->arch.hcr_el2;
+	host_vcpu->arch.cptr_el2	= shadow_vcpu->arch.cptr_el2;
+
+	host_vcpu->arch.fault		= shadow_vcpu->arch.fault;
+
+	host_vcpu->arch.iflags		= shadow_vcpu->arch.iflags;
+	host_vcpu->arch.fp_state	= shadow_vcpu->arch.fp_state;
+
+	host_cpu_if->vgic_hcr		= shadow_cpu_if->vgic_hcr;
+	for (i = 0; i < shadow_cpu_if->used_lrs; ++i)
+		host_cpu_if->vgic_lr[i] = shadow_cpu_if->vgic_lr[i];
+}
+
 static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 {
-	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
+	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
+	int ret;
+
+	host_vcpu = kern_hyp_va(host_vcpu);
+
+	if (unlikely(is_protected_kvm_enabled())) {
+		struct kvm_shadow_vcpu_state *shadow_state;
+		struct kvm_vcpu *shadow_vcpu;
+		struct kvm *host_kvm;
+		unsigned int handle;
+
+		host_kvm = kern_hyp_va(host_vcpu->kvm);
+		handle = host_kvm->arch.pkvm.shadow_handle;
+		shadow_state = pkvm_load_shadow_vcpu_state(handle,
+							   host_vcpu->vcpu_idx);
+		if (!shadow_state) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		shadow_vcpu = &shadow_state->shadow_vcpu;
+		flush_shadow_state(shadow_state);
+
+		ret = __kvm_vcpu_run(shadow_vcpu);
+
+		sync_shadow_state(shadow_state);
+		pkvm_put_shadow_vcpu_state(shadow_state);
+	} else {
+		ret = __kvm_vcpu_run(host_vcpu);
+	}
 
-	cpu_reg(host_ctxt, 1) =  __kvm_vcpu_run(kern_hyp_va(vcpu));
+out:
+	cpu_reg(host_ctxt, 1) =  ret;
 }
 
 static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 571334fd58ff..bf92f4443c92 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -247,6 +247,33 @@ static struct kvm_shadow_vm *find_shadow_by_handle(unsigned int shadow_handle)
 	return shadow_table[shadow_idx];
 }
 
+struct kvm_shadow_vcpu_state *
+pkvm_load_shadow_vcpu_state(unsigned int shadow_handle, unsigned int vcpu_idx)
+{
+	struct kvm_shadow_vcpu_state *shadow_state = NULL;
+	struct kvm_shadow_vm *vm;
+
+	hyp_spin_lock(&shadow_lock);
+	vm = find_shadow_by_handle(shadow_handle);
+	if (!vm || vm->kvm.created_vcpus <= vcpu_idx)
+		goto unlock;
+
+	shadow_state = &vm->shadow_vcpu_states[vcpu_idx];
+	hyp_page_ref_inc(hyp_virt_to_page(vm));
+unlock:
+	hyp_spin_unlock(&shadow_lock);
+	return shadow_state;
+}
+
+void pkvm_put_shadow_vcpu_state(struct kvm_shadow_vcpu_state *shadow_state)
+{
+	struct kvm_shadow_vm *vm = shadow_state->shadow_vm;
+
+	hyp_spin_lock(&shadow_lock);
+	hyp_page_ref_dec(hyp_virt_to_page(vm));
+	hyp_spin_unlock(&shadow_lock);
+}
+
 static void unpin_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
 			     unsigned int nr_vcpus)
 {
@@ -304,6 +331,7 @@ static int init_shadow_structs(struct kvm *kvm, struct kvm_shadow_vm *vm,
 		shadow_vcpu->vcpu_idx = i;
 
 		shadow_vcpu->arch.hw_mmu = &vm->kvm.arch.mmu;
+		shadow_vcpu->arch.cflags = READ_ONCE(host_vcpu->arch.cflags);
 	}
 
 	return 0;
-- 
2.37.0.rc0.161.g10f37bed90-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-07-06 19:17   ` Sean Christopherson
  -1 siblings, 0 replies; 135+ messages in thread
From: Sean Christopherson @ 2022-07-06 19:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Alexandru Elisei, Andy Lutomirski,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Michael Roth, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

On Thu, Jun 30, 2022, Will Deacon wrote:
> Hi everyone,
> 
> This series has been extracted from the pKVM base support series (aka
> "pKVM mega-patch") previously posted here:
> 
>   https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/
> 
> Unlike that more comprehensive series, this one is fairly fundamental
> and does not introduce any new ABI commitments, leaving questions
> involving the management of guest private memory and the creation of
> protected VMs for future work. Instead, this series extends the pKVM EL2
> code so that it can dynamically instantiate and manage VM shadow
> structures without the host being able to access them directly. These
> shadow structures consist of a shadow VM, a set of shadow vCPUs and the
> stage-2 page-table and the pages used to hold them are returned to the
> host when the VM is destroyed.
> 
> The last patch is marked as RFC because, although it plumbs in the
> shadow state, it is woefully inefficient and copies to/from the host
> state on every vCPU run. Without the last patch, the new structures are
> unused but we move considerably closer to isolating guests from the
> host.

...

>  arch/arm64/include/asm/kvm_asm.h              |   6 +-
>  arch/arm64/include/asm/kvm_host.h             |  65 +++
>  arch/arm64/include/asm/kvm_hyp.h              |   3 +
>  arch/arm64/include/asm/kvm_pgtable.h          |   8 +
>  arch/arm64/include/asm/kvm_pkvm.h             |  38 ++
>  arch/arm64/kernel/image-vars.h                |  15 -
>  arch/arm64/kvm/arm.c                          |  40 +-
>  arch/arm64/kvm/hyp/hyp-constants.c            |   3 +
>  arch/arm64/kvm/hyp/include/nvhe/gfp.h         |   6 +-
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  19 +-
>  arch/arm64/kvm/hyp/include/nvhe/memory.h      |  26 +-
>  arch/arm64/kvm/hyp/include/nvhe/mm.h          |  18 +-
>  arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  70 +++
>  arch/arm64/kvm/hyp/include/nvhe/spinlock.h    |  10 +-
>  arch/arm64/kvm/hyp/nvhe/cache.S               |  11 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 105 +++-
>  arch/arm64/kvm/hyp/nvhe/hyp-smp.c             |   2 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 456 +++++++++++++++++-
>  arch/arm64/kvm/hyp/nvhe/mm.c                  | 136 +++++-
>  arch/arm64/kvm/hyp/nvhe/page_alloc.c          |  42 +-
>  arch/arm64/kvm/hyp/nvhe/pkvm.c                | 438 +++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/setup.c               |  96 ++--
>  arch/arm64/kvm/hyp/pgtable.c                  |   9 +
>  arch/arm64/kvm/mmu.c                          |  26 +
>  arch/arm64/kvm/pkvm.c                         | 121 ++++-
>  25 files changed, 1625 insertions(+), 144 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h

The lack of documentation and the rather terse changelogs make this really hard
to review for folks that aren't intimately familiar with pKVM.  I have a decent
idea of the end goal of "shadowing", but that's mostly because of my involvement in
similar x86 projects.  Nothing in the changelogs ever explains _why_ pKVM uses
shadows.

I put "shadowing" in quotes because if the unstrusted host is aware that the VM
and vCPU it is manipulating aren't the "real" VMs/vCPUs, and there is an explicit API
between the untrusted host and pKVM for creating/destroying VMs/vCPUs, then I would
argue that it's not truly shadowing, especially if pKVM uses data/values verbatim
and only verifies correctness/safety.  It's definitely a nit, but for future readers
I think overloading "shadowing" could be confusing.

And beyond the basics, IMO pKVM needs a more formal definition of exactly what
guest state is protected/hidden from the untrusted host.  Peeking at the mega series,
there are a huge pile of patches that result in "gradual reduction of EL2 trust in
host data", but I couldn't any documentation that defines what that end result is.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-07-06 19:17   ` Sean Christopherson
  0 siblings, 0 replies; 135+ messages in thread
From: Sean Christopherson @ 2022-07-06 19:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kernel-team, kvm, Catalin Marinas, Oliver Upton,
	Andy Lutomirski, linux-arm-kernel, Michael Roth, Chao Peng,
	kvmarm

On Thu, Jun 30, 2022, Will Deacon wrote:
> Hi everyone,
> 
> This series has been extracted from the pKVM base support series (aka
> "pKVM mega-patch") previously posted here:
> 
>   https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/
> 
> Unlike that more comprehensive series, this one is fairly fundamental
> and does not introduce any new ABI commitments, leaving questions
> involving the management of guest private memory and the creation of
> protected VMs for future work. Instead, this series extends the pKVM EL2
> code so that it can dynamically instantiate and manage VM shadow
> structures without the host being able to access them directly. These
> shadow structures consist of a shadow VM, a set of shadow vCPUs and the
> stage-2 page-table and the pages used to hold them are returned to the
> host when the VM is destroyed.
> 
> The last patch is marked as RFC because, although it plumbs in the
> shadow state, it is woefully inefficient and copies to/from the host
> state on every vCPU run. Without the last patch, the new structures are
> unused but we move considerably closer to isolating guests from the
> host.

...

>  arch/arm64/include/asm/kvm_asm.h              |   6 +-
>  arch/arm64/include/asm/kvm_host.h             |  65 +++
>  arch/arm64/include/asm/kvm_hyp.h              |   3 +
>  arch/arm64/include/asm/kvm_pgtable.h          |   8 +
>  arch/arm64/include/asm/kvm_pkvm.h             |  38 ++
>  arch/arm64/kernel/image-vars.h                |  15 -
>  arch/arm64/kvm/arm.c                          |  40 +-
>  arch/arm64/kvm/hyp/hyp-constants.c            |   3 +
>  arch/arm64/kvm/hyp/include/nvhe/gfp.h         |   6 +-
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  19 +-
>  arch/arm64/kvm/hyp/include/nvhe/memory.h      |  26 +-
>  arch/arm64/kvm/hyp/include/nvhe/mm.h          |  18 +-
>  arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  70 +++
>  arch/arm64/kvm/hyp/include/nvhe/spinlock.h    |  10 +-
>  arch/arm64/kvm/hyp/nvhe/cache.S               |  11 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 105 +++-
>  arch/arm64/kvm/hyp/nvhe/hyp-smp.c             |   2 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 456 +++++++++++++++++-
>  arch/arm64/kvm/hyp/nvhe/mm.c                  | 136 +++++-
>  arch/arm64/kvm/hyp/nvhe/page_alloc.c          |  42 +-
>  arch/arm64/kvm/hyp/nvhe/pkvm.c                | 438 +++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/setup.c               |  96 ++--
>  arch/arm64/kvm/hyp/pgtable.c                  |   9 +
>  arch/arm64/kvm/mmu.c                          |  26 +
>  arch/arm64/kvm/pkvm.c                         | 121 ++++-
>  25 files changed, 1625 insertions(+), 144 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h

The lack of documentation and the rather terse changelogs make this really hard
to review for folks that aren't intimately familiar with pKVM.  I have a decent
idea of the end goal of "shadowing", but that's mostly because of my involvement in
similar x86 projects.  Nothing in the changelogs ever explains _why_ pKVM uses
shadows.

I put "shadowing" in quotes because if the unstrusted host is aware that the VM
and vCPU it is manipulating aren't the "real" VMs/vCPUs, and there is an explicit API
between the untrusted host and pKVM for creating/destroying VMs/vCPUs, then I would
argue that it's not truly shadowing, especially if pKVM uses data/values verbatim
and only verifies correctness/safety.  It's definitely a nit, but for future readers
I think overloading "shadowing" could be confusing.

And beyond the basics, IMO pKVM needs a more formal definition of exactly what
guest state is protected/hidden from the untrusted host.  Peeking at the mega series,
there are a huge pile of patches that result in "gradual reduction of EL2 trust in
host data", but I couldn't any documentation that defines what that end result is.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-07-06 19:17   ` Sean Christopherson
  0 siblings, 0 replies; 135+ messages in thread
From: Sean Christopherson @ 2022-07-06 19:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Alexandru Elisei, Andy Lutomirski,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Michael Roth, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

On Thu, Jun 30, 2022, Will Deacon wrote:
> Hi everyone,
> 
> This series has been extracted from the pKVM base support series (aka
> "pKVM mega-patch") previously posted here:
> 
>   https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/
> 
> Unlike that more comprehensive series, this one is fairly fundamental
> and does not introduce any new ABI commitments, leaving questions
> involving the management of guest private memory and the creation of
> protected VMs for future work. Instead, this series extends the pKVM EL2
> code so that it can dynamically instantiate and manage VM shadow
> structures without the host being able to access them directly. These
> shadow structures consist of a shadow VM, a set of shadow vCPUs and the
> stage-2 page-table and the pages used to hold them are returned to the
> host when the VM is destroyed.
> 
> The last patch is marked as RFC because, although it plumbs in the
> shadow state, it is woefully inefficient and copies to/from the host
> state on every vCPU run. Without the last patch, the new structures are
> unused but we move considerably closer to isolating guests from the
> host.

...

>  arch/arm64/include/asm/kvm_asm.h              |   6 +-
>  arch/arm64/include/asm/kvm_host.h             |  65 +++
>  arch/arm64/include/asm/kvm_hyp.h              |   3 +
>  arch/arm64/include/asm/kvm_pgtable.h          |   8 +
>  arch/arm64/include/asm/kvm_pkvm.h             |  38 ++
>  arch/arm64/kernel/image-vars.h                |  15 -
>  arch/arm64/kvm/arm.c                          |  40 +-
>  arch/arm64/kvm/hyp/hyp-constants.c            |   3 +
>  arch/arm64/kvm/hyp/include/nvhe/gfp.h         |   6 +-
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  19 +-
>  arch/arm64/kvm/hyp/include/nvhe/memory.h      |  26 +-
>  arch/arm64/kvm/hyp/include/nvhe/mm.h          |  18 +-
>  arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  70 +++
>  arch/arm64/kvm/hyp/include/nvhe/spinlock.h    |  10 +-
>  arch/arm64/kvm/hyp/nvhe/cache.S               |  11 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 105 +++-
>  arch/arm64/kvm/hyp/nvhe/hyp-smp.c             |   2 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 456 +++++++++++++++++-
>  arch/arm64/kvm/hyp/nvhe/mm.c                  | 136 +++++-
>  arch/arm64/kvm/hyp/nvhe/page_alloc.c          |  42 +-
>  arch/arm64/kvm/hyp/nvhe/pkvm.c                | 438 +++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/setup.c               |  96 ++--
>  arch/arm64/kvm/hyp/pgtable.c                  |   9 +
>  arch/arm64/kvm/mmu.c                          |  26 +
>  arch/arm64/kvm/pkvm.c                         | 121 ++++-
>  25 files changed, 1625 insertions(+), 144 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h

The lack of documentation and the rather terse changelogs make this really hard
to review for folks that aren't intimately familiar with pKVM.  I have a decent
idea of the end goal of "shadowing", but that's mostly because of my involvement in
similar x86 projects.  Nothing in the changelogs ever explains _why_ pKVM uses
shadows.

I put "shadowing" in quotes because if the unstrusted host is aware that the VM
and vCPU it is manipulating aren't the "real" VMs/vCPUs, and there is an explicit API
between the untrusted host and pKVM for creating/destroying VMs/vCPUs, then I would
argue that it's not truly shadowing, especially if pKVM uses data/values verbatim
and only verifies correctness/safety.  It's definitely a nit, but for future readers
I think overloading "shadowing" could be confusing.

And beyond the basics, IMO pKVM needs a more formal definition of exactly what
guest state is protected/hidden from the untrusted host.  Peeking at the mega series,
there are a huge pile of patches that result in "gradual reduction of EL2 trust in
host data", but I couldn't any documentation that defines what that end result is.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
  2022-07-06 19:17   ` Sean Christopherson
  (?)
@ 2022-07-08 16:23     ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-08 16:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, kernel-team, kvm, Catalin Marinas, Oliver Upton,
	Andy Lutomirski, linux-arm-kernel, Michael Roth, Chao Peng,
	kvmarm

Hi Sean,

Thanks for having a look.

On Wed, Jul 06, 2022 at 07:17:29PM +0000, Sean Christopherson wrote:
> On Thu, Jun 30, 2022, Will Deacon wrote:
> > This series has been extracted from the pKVM base support series (aka
> > "pKVM mega-patch") previously posted here:
> > 
> >   https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/
> > 
> > Unlike that more comprehensive series, this one is fairly fundamental
> > and does not introduce any new ABI commitments, leaving questions
> > involving the management of guest private memory and the creation of
> > protected VMs for future work. Instead, this series extends the pKVM EL2
> > code so that it can dynamically instantiate and manage VM shadow
> > structures without the host being able to access them directly. These
> > shadow structures consist of a shadow VM, a set of shadow vCPUs and the
> > stage-2 page-table and the pages used to hold them are returned to the
> > host when the VM is destroyed.
> > 
> > The last patch is marked as RFC because, although it plumbs in the
> > shadow state, it is woefully inefficient and copies to/from the host
> > state on every vCPU run. Without the last patch, the new structures are
> > unused but we move considerably closer to isolating guests from the
> > host.
> 
> ...
> 
> >  arch/arm64/include/asm/kvm_asm.h              |   6 +-
> >  arch/arm64/include/asm/kvm_host.h             |  65 +++
> >  arch/arm64/include/asm/kvm_hyp.h              |   3 +
> >  arch/arm64/include/asm/kvm_pgtable.h          |   8 +
> >  arch/arm64/include/asm/kvm_pkvm.h             |  38 ++
> >  arch/arm64/kernel/image-vars.h                |  15 -
> >  arch/arm64/kvm/arm.c                          |  40 +-
> >  arch/arm64/kvm/hyp/hyp-constants.c            |   3 +
> >  arch/arm64/kvm/hyp/include/nvhe/gfp.h         |   6 +-
> >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  19 +-
> >  arch/arm64/kvm/hyp/include/nvhe/memory.h      |  26 +-
> >  arch/arm64/kvm/hyp/include/nvhe/mm.h          |  18 +-
> >  arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  70 +++
> >  arch/arm64/kvm/hyp/include/nvhe/spinlock.h    |  10 +-
> >  arch/arm64/kvm/hyp/nvhe/cache.S               |  11 +
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 105 +++-
> >  arch/arm64/kvm/hyp/nvhe/hyp-smp.c             |   2 +
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 456 +++++++++++++++++-
> >  arch/arm64/kvm/hyp/nvhe/mm.c                  | 136 +++++-
> >  arch/arm64/kvm/hyp/nvhe/page_alloc.c          |  42 +-
> >  arch/arm64/kvm/hyp/nvhe/pkvm.c                | 438 +++++++++++++++++
> >  arch/arm64/kvm/hyp/nvhe/setup.c               |  96 ++--
> >  arch/arm64/kvm/hyp/pgtable.c                  |   9 +
> >  arch/arm64/kvm/mmu.c                          |  26 +
> >  arch/arm64/kvm/pkvm.c                         | 121 ++++-
> >  25 files changed, 1625 insertions(+), 144 deletions(-)
> >  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> 
> The lack of documentation and the rather terse changelogs make this really hard
> to review for folks that aren't intimately familiar with pKVM.  I have a decent
> idea of the end goal of "shadowing", but that's mostly because of my involvement in
> similar x86 projects.  Nothing in the changelogs ever explains _why_ pKVM uses
> shadows.

That's understandable, but thanks for persevering; this series is pretty
down in the murky depths of the arm64 architecture and EL2 code so it
doesn't really map to the KVM code most folks are familiar with. It's fair
to say we're assuming a lot of niche prior knowledge (which is quite common
for arch code in my experience), but I wanted to inherit the broader cc list
so you were aware of this break-away series. Sadly, I don't think beefing up
the commit messages would get us to a point where somebody unfamiliar with
the EL2 code already could give a constructive review, but we can try to
expand them a bit if you genuinely think it would help.

On the more positive side, we'll be speaking at KVM forum about what we've
done here, so that will be a great place to discuss it further and then we
can also link back to the recordings in later postings of the mega-series.

> I put "shadowing" in quotes because if the unstrusted host is aware that the VM
> and vCPU it is manipulating aren't the "real" VMs/vCPUs, and there is an explicit API
> between the untrusted host and pKVM for creating/destroying VMs/vCPUs, then I would
> argue that it's not truly shadowing, especially if pKVM uses data/values verbatim
> and only verifies correctness/safety.  It's definitely a nit, but for future readers
> I think overloading "shadowing" could be confusing.

Ah, this is really interesting and nicely puts the ball back in my court as
I'm not well versed with x86's use of "shadowing". We should probably use
another name (ideas?), but our "shadow" is very much explicit -- rather than
the host using its 'struct kvm's and 'struct kvm_vcpu's directly, it instead
passes those data structures to the hypervisor which allocates its own
structures (in some cases reusing the host definitions directly, e.g.
'struct kvm_vcpu') and returning a handle to the host as a reference. The
host can then issue hypercalls with this handle to load/put vCPUs of that
VM, run them once they are loaded and synchronise aspects of the vCPU state
between the host and the hypervisor copies for things like emulation traps
or interrupt injection. The main thing is that the pages containing the
hypervisor structures are not accessible by the host until the corresponding
VM is destroyed.

The advantage of doing it this way is that we don't need to change very
much of the host KVM code at all, and we can even reuse some of it directly
in the hypervisor (e.g. inline functions and macros).

Perhaps we should s/shadow/hyp/ to make this a little clearer?

> And beyond the basics, IMO pKVM needs a more formal definition of exactly what
> guest state is protected/hidden from the untrusted host.  Peeking at the mega series,
> there are a huge pile of patches that result in "gradual reduction of EL2 trust in
> host data", but I couldn't any documentation that defines what that end result is.

That's fair; I'll work to extend the documentation in the next iteration of
the mega series to cover this in more detail. Roughly speaking, the end
result is that the vCPU register and memory state is inaccessible to the
host except in cases where the guest has done something to expose it such as
MMIO or a memory sharing hypercall. Given the complexity of the register
state (GPRs, floating point, SIMD, debug, etc) the mega-series elavates
portions of the state from the host to the hypervisor as separate patches
to structure things a bit better (that's where the gradual reduction comes
in).

Does that help at all?

Cheers,

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-07-08 16:23     ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-08 16:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvmarm, Ard Biesheuvel, Alexandru Elisei, Andy Lutomirski,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Michael Roth, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Hi Sean,

Thanks for having a look.

On Wed, Jul 06, 2022 at 07:17:29PM +0000, Sean Christopherson wrote:
> On Thu, Jun 30, 2022, Will Deacon wrote:
> > This series has been extracted from the pKVM base support series (aka
> > "pKVM mega-patch") previously posted here:
> > 
> >   https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/
> > 
> > Unlike that more comprehensive series, this one is fairly fundamental
> > and does not introduce any new ABI commitments, leaving questions
> > involving the management of guest private memory and the creation of
> > protected VMs for future work. Instead, this series extends the pKVM EL2
> > code so that it can dynamically instantiate and manage VM shadow
> > structures without the host being able to access them directly. These
> > shadow structures consist of a shadow VM, a set of shadow vCPUs and the
> > stage-2 page-table and the pages used to hold them are returned to the
> > host when the VM is destroyed.
> > 
> > The last patch is marked as RFC because, although it plumbs in the
> > shadow state, it is woefully inefficient and copies to/from the host
> > state on every vCPU run. Without the last patch, the new structures are
> > unused but we move considerably closer to isolating guests from the
> > host.
> 
> ...
> 
> >  arch/arm64/include/asm/kvm_asm.h              |   6 +-
> >  arch/arm64/include/asm/kvm_host.h             |  65 +++
> >  arch/arm64/include/asm/kvm_hyp.h              |   3 +
> >  arch/arm64/include/asm/kvm_pgtable.h          |   8 +
> >  arch/arm64/include/asm/kvm_pkvm.h             |  38 ++
> >  arch/arm64/kernel/image-vars.h                |  15 -
> >  arch/arm64/kvm/arm.c                          |  40 +-
> >  arch/arm64/kvm/hyp/hyp-constants.c            |   3 +
> >  arch/arm64/kvm/hyp/include/nvhe/gfp.h         |   6 +-
> >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  19 +-
> >  arch/arm64/kvm/hyp/include/nvhe/memory.h      |  26 +-
> >  arch/arm64/kvm/hyp/include/nvhe/mm.h          |  18 +-
> >  arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  70 +++
> >  arch/arm64/kvm/hyp/include/nvhe/spinlock.h    |  10 +-
> >  arch/arm64/kvm/hyp/nvhe/cache.S               |  11 +
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 105 +++-
> >  arch/arm64/kvm/hyp/nvhe/hyp-smp.c             |   2 +
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 456 +++++++++++++++++-
> >  arch/arm64/kvm/hyp/nvhe/mm.c                  | 136 +++++-
> >  arch/arm64/kvm/hyp/nvhe/page_alloc.c          |  42 +-
> >  arch/arm64/kvm/hyp/nvhe/pkvm.c                | 438 +++++++++++++++++
> >  arch/arm64/kvm/hyp/nvhe/setup.c               |  96 ++--
> >  arch/arm64/kvm/hyp/pgtable.c                  |   9 +
> >  arch/arm64/kvm/mmu.c                          |  26 +
> >  arch/arm64/kvm/pkvm.c                         | 121 ++++-
> >  25 files changed, 1625 insertions(+), 144 deletions(-)
> >  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> 
> The lack of documentation and the rather terse changelogs make this really hard
> to review for folks that aren't intimately familiar with pKVM.  I have a decent
> idea of the end goal of "shadowing", but that's mostly because of my involvement in
> similar x86 projects.  Nothing in the changelogs ever explains _why_ pKVM uses
> shadows.

That's understandable, but thanks for persevering; this series is pretty
down in the murky depths of the arm64 architecture and EL2 code so it
doesn't really map to the KVM code most folks are familiar with. It's fair
to say we're assuming a lot of niche prior knowledge (which is quite common
for arch code in my experience), but I wanted to inherit the broader cc list
so you were aware of this break-away series. Sadly, I don't think beefing up
the commit messages would get us to a point where somebody unfamiliar with
the EL2 code already could give a constructive review, but we can try to
expand them a bit if you genuinely think it would help.

On the more positive side, we'll be speaking at KVM forum about what we've
done here, so that will be a great place to discuss it further and then we
can also link back to the recordings in later postings of the mega-series.

> I put "shadowing" in quotes because if the unstrusted host is aware that the VM
> and vCPU it is manipulating aren't the "real" VMs/vCPUs, and there is an explicit API
> between the untrusted host and pKVM for creating/destroying VMs/vCPUs, then I would
> argue that it's not truly shadowing, especially if pKVM uses data/values verbatim
> and only verifies correctness/safety.  It's definitely a nit, but for future readers
> I think overloading "shadowing" could be confusing.

Ah, this is really interesting and nicely puts the ball back in my court as
I'm not well versed with x86's use of "shadowing". We should probably use
another name (ideas?), but our "shadow" is very much explicit -- rather than
the host using its 'struct kvm's and 'struct kvm_vcpu's directly, it instead
passes those data structures to the hypervisor which allocates its own
structures (in some cases reusing the host definitions directly, e.g.
'struct kvm_vcpu') and returning a handle to the host as a reference. The
host can then issue hypercalls with this handle to load/put vCPUs of that
VM, run them once they are loaded and synchronise aspects of the vCPU state
between the host and the hypervisor copies for things like emulation traps
or interrupt injection. The main thing is that the pages containing the
hypervisor structures are not accessible by the host until the corresponding
VM is destroyed.

The advantage of doing it this way is that we don't need to change very
much of the host KVM code at all, and we can even reuse some of it directly
in the hypervisor (e.g. inline functions and macros).

Perhaps we should s/shadow/hyp/ to make this a little clearer?

> And beyond the basics, IMO pKVM needs a more formal definition of exactly what
> guest state is protected/hidden from the untrusted host.  Peeking at the mega series,
> there are a huge pile of patches that result in "gradual reduction of EL2 trust in
> host data", but I couldn't any documentation that defines what that end result is.

That's fair; I'll work to extend the documentation in the next iteration of
the mega series to cover this in more detail. Roughly speaking, the end
result is that the vCPU register and memory state is inaccessible to the
host except in cases where the guest has done something to expose it such as
MMIO or a memory sharing hypercall. Given the complexity of the register
state (GPRs, floating point, SIMD, debug, etc) the mega-series elavates
portions of the state from the host to the hypervisor as separate patches
to structure things a bit better (that's where the gradual reduction comes
in).

Does that help at all?

Cheers,

Will

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-07-08 16:23     ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-08 16:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvmarm, Ard Biesheuvel, Alexandru Elisei, Andy Lutomirski,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Michael Roth, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Hi Sean,

Thanks for having a look.

On Wed, Jul 06, 2022 at 07:17:29PM +0000, Sean Christopherson wrote:
> On Thu, Jun 30, 2022, Will Deacon wrote:
> > This series has been extracted from the pKVM base support series (aka
> > "pKVM mega-patch") previously posted here:
> > 
> >   https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/
> > 
> > Unlike that more comprehensive series, this one is fairly fundamental
> > and does not introduce any new ABI commitments, leaving questions
> > involving the management of guest private memory and the creation of
> > protected VMs for future work. Instead, this series extends the pKVM EL2
> > code so that it can dynamically instantiate and manage VM shadow
> > structures without the host being able to access them directly. These
> > shadow structures consist of a shadow VM, a set of shadow vCPUs and the
> > stage-2 page-table and the pages used to hold them are returned to the
> > host when the VM is destroyed.
> > 
> > The last patch is marked as RFC because, although it plumbs in the
> > shadow state, it is woefully inefficient and copies to/from the host
> > state on every vCPU run. Without the last patch, the new structures are
> > unused but we move considerably closer to isolating guests from the
> > host.
> 
> ...
> 
> >  arch/arm64/include/asm/kvm_asm.h              |   6 +-
> >  arch/arm64/include/asm/kvm_host.h             |  65 +++
> >  arch/arm64/include/asm/kvm_hyp.h              |   3 +
> >  arch/arm64/include/asm/kvm_pgtable.h          |   8 +
> >  arch/arm64/include/asm/kvm_pkvm.h             |  38 ++
> >  arch/arm64/kernel/image-vars.h                |  15 -
> >  arch/arm64/kvm/arm.c                          |  40 +-
> >  arch/arm64/kvm/hyp/hyp-constants.c            |   3 +
> >  arch/arm64/kvm/hyp/include/nvhe/gfp.h         |   6 +-
> >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  19 +-
> >  arch/arm64/kvm/hyp/include/nvhe/memory.h      |  26 +-
> >  arch/arm64/kvm/hyp/include/nvhe/mm.h          |  18 +-
> >  arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  70 +++
> >  arch/arm64/kvm/hyp/include/nvhe/spinlock.h    |  10 +-
> >  arch/arm64/kvm/hyp/nvhe/cache.S               |  11 +
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 105 +++-
> >  arch/arm64/kvm/hyp/nvhe/hyp-smp.c             |   2 +
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 456 +++++++++++++++++-
> >  arch/arm64/kvm/hyp/nvhe/mm.c                  | 136 +++++-
> >  arch/arm64/kvm/hyp/nvhe/page_alloc.c          |  42 +-
> >  arch/arm64/kvm/hyp/nvhe/pkvm.c                | 438 +++++++++++++++++
> >  arch/arm64/kvm/hyp/nvhe/setup.c               |  96 ++--
> >  arch/arm64/kvm/hyp/pgtable.c                  |   9 +
> >  arch/arm64/kvm/mmu.c                          |  26 +
> >  arch/arm64/kvm/pkvm.c                         | 121 ++++-
> >  25 files changed, 1625 insertions(+), 144 deletions(-)
> >  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> 
> The lack of documentation and the rather terse changelogs make this really hard
> to review for folks that aren't intimately familiar with pKVM.  I have a decent
> idea of the end goal of "shadowing", but that's mostly because of my involvement in
> similar x86 projects.  Nothing in the changelogs ever explains _why_ pKVM uses
> shadows.

That's understandable, but thanks for persevering; this series is pretty
down in the murky depths of the arm64 architecture and EL2 code so it
doesn't really map to the KVM code most folks are familiar with. It's fair
to say we're assuming a lot of niche prior knowledge (which is quite common
for arch code in my experience), but I wanted to inherit the broader cc list
so you were aware of this break-away series. Sadly, I don't think beefing up
the commit messages would get us to a point where somebody unfamiliar with
the EL2 code already could give a constructive review, but we can try to
expand them a bit if you genuinely think it would help.

On the more positive side, we'll be speaking at KVM forum about what we've
done here, so that will be a great place to discuss it further and then we
can also link back to the recordings in later postings of the mega-series.

> I put "shadowing" in quotes because if the unstrusted host is aware that the VM
> and vCPU it is manipulating aren't the "real" VMs/vCPUs, and there is an explicit API
> between the untrusted host and pKVM for creating/destroying VMs/vCPUs, then I would
> argue that it's not truly shadowing, especially if pKVM uses data/values verbatim
> and only verifies correctness/safety.  It's definitely a nit, but for future readers
> I think overloading "shadowing" could be confusing.

Ah, this is really interesting and nicely puts the ball back in my court as
I'm not well versed with x86's use of "shadowing". We should probably use
another name (ideas?), but our "shadow" is very much explicit -- rather than
the host using its 'struct kvm's and 'struct kvm_vcpu's directly, it instead
passes those data structures to the hypervisor which allocates its own
structures (in some cases reusing the host definitions directly, e.g.
'struct kvm_vcpu') and returning a handle to the host as a reference. The
host can then issue hypercalls with this handle to load/put vCPUs of that
VM, run them once they are loaded and synchronise aspects of the vCPU state
between the host and the hypervisor copies for things like emulation traps
or interrupt injection. The main thing is that the pages containing the
hypervisor structures are not accessible by the host until the corresponding
VM is destroyed.

The advantage of doing it this way is that we don't need to change very
much of the host KVM code at all, and we can even reuse some of it directly
in the hypervisor (e.g. inline functions and macros).

Perhaps we should s/shadow/hyp/ to make this a little clearer?

> And beyond the basics, IMO pKVM needs a more formal definition of exactly what
> guest state is protected/hidden from the untrusted host.  Peeking at the mega series,
> there are a huge pile of patches that result in "gradual reduction of EL2 trust in
> host data", but I couldn't any documentation that defines what that end result is.

That's fair; I'll work to extend the documentation in the next iteration of
the mega series to cover this in more detail. Roughly speaking, the end
result is that the vCPU register and memory state is inaccessible to the
host except in cases where the guest has done something to expose it such as
MMIO or a memory sharing hypercall. Given the complexity of the register
state (GPRs, floating point, SIMD, debug, etc) the mega-series elavates
portions of the state from the host to the hypervisor as separate patches
to structure things a bit better (that's where the gradual reduction comes
in).

Does that help at all?

Cheers,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 03/24] KVM: arm64: Add flags to struct hyp_page
  2022-06-30 13:57   ` Will Deacon
  (?)
@ 2022-07-18 10:54     ` Vincent Donnefort
  -1 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-18 10:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

On Thu, Jun 30, 2022 at 02:57:26PM +0100, Will Deacon wrote:
> From: Quentin Perret <qperret@google.com>
> 
> Add a 'flags' field to struct hyp_page, and reduce the size of the order
> field to u8 to avoid growing the struct size.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  6 +++---
>  arch/arm64/kvm/hyp/include/nvhe/memory.h |  3 ++-
>  arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 14 +++++++-------
>  3 files changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> index 0a048dc06a7d..9330b13075f8 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> @@ -7,7 +7,7 @@
>  #include <nvhe/memory.h>
>  #include <nvhe/spinlock.h>
>  
> -#define HYP_NO_ORDER	USHRT_MAX
> +#define HYP_NO_ORDER	0xff

BUG_ON in hyp_page_ref_inc() might now need to test for 0xff/HYP_NO_ORDER
instead of USHRT_MAX.

[...]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 03/24] KVM: arm64: Add flags to struct hyp_page
@ 2022-07-18 10:54     ` Vincent Donnefort
  0 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-18 10:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

On Thu, Jun 30, 2022 at 02:57:26PM +0100, Will Deacon wrote:
> From: Quentin Perret <qperret@google.com>
> 
> Add a 'flags' field to struct hyp_page, and reduce the size of the order
> field to u8 to avoid growing the struct size.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  6 +++---
>  arch/arm64/kvm/hyp/include/nvhe/memory.h |  3 ++-
>  arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 14 +++++++-------
>  3 files changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> index 0a048dc06a7d..9330b13075f8 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> @@ -7,7 +7,7 @@
>  #include <nvhe/memory.h>
>  #include <nvhe/spinlock.h>
>  
> -#define HYP_NO_ORDER	USHRT_MAX
> +#define HYP_NO_ORDER	0xff

BUG_ON in hyp_page_ref_inc() might now need to test for 0xff/HYP_NO_ORDER
instead of USHRT_MAX.

[...]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 03/24] KVM: arm64: Add flags to struct hyp_page
@ 2022-07-18 10:54     ` Vincent Donnefort
  0 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-18 10:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kernel-team, kvm, Oliver Upton, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	kvmarm

On Thu, Jun 30, 2022 at 02:57:26PM +0100, Will Deacon wrote:
> From: Quentin Perret <qperret@google.com>
> 
> Add a 'flags' field to struct hyp_page, and reduce the size of the order
> field to u8 to avoid growing the struct size.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  6 +++---
>  arch/arm64/kvm/hyp/include/nvhe/memory.h |  3 ++-
>  arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 14 +++++++-------
>  3 files changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> index 0a048dc06a7d..9330b13075f8 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> @@ -7,7 +7,7 @@
>  #include <nvhe/memory.h>
>  #include <nvhe/spinlock.h>
>  
> -#define HYP_NO_ORDER	USHRT_MAX
> +#define HYP_NO_ORDER	0xff

BUG_ON in hyp_page_ref_inc() might now need to test for 0xff/HYP_NO_ORDER
instead of USHRT_MAX.

[...]
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 03/24] KVM: arm64: Add flags to struct hyp_page
  2022-07-18 10:54     ` Vincent Donnefort
  (?)
@ 2022-07-18 10:57       ` Vincent Donnefort
  -1 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-18 10:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

On Mon, Jul 18, 2022 at 11:54:24AM +0100, Vincent Donnefort wrote:
> On Thu, Jun 30, 2022 at 02:57:26PM +0100, Will Deacon wrote:
> > From: Quentin Perret <qperret@google.com>
> > 
> > Add a 'flags' field to struct hyp_page, and reduce the size of the order
> > field to u8 to avoid growing the struct size.
> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  6 +++---
> >  arch/arm64/kvm/hyp/include/nvhe/memory.h |  3 ++-
> >  arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 14 +++++++-------
> >  3 files changed, 12 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > index 0a048dc06a7d..9330b13075f8 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > @@ -7,7 +7,7 @@
> >  #include <nvhe/memory.h>
> >  #include <nvhe/spinlock.h>
> >  
> > -#define HYP_NO_ORDER	USHRT_MAX
> > +#define HYP_NO_ORDER	0xff
> 
> BUG_ON in hyp_page_ref_inc() might now need to test for 0xff/HYP_NO_ORDER
> instead of USHRT_MAX.

My bad, read to quickly, refcount/order... 

> 
> [...]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 03/24] KVM: arm64: Add flags to struct hyp_page
@ 2022-07-18 10:57       ` Vincent Donnefort
  0 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-18 10:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kernel-team, kvm, Oliver Upton, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	kvmarm

On Mon, Jul 18, 2022 at 11:54:24AM +0100, Vincent Donnefort wrote:
> On Thu, Jun 30, 2022 at 02:57:26PM +0100, Will Deacon wrote:
> > From: Quentin Perret <qperret@google.com>
> > 
> > Add a 'flags' field to struct hyp_page, and reduce the size of the order
> > field to u8 to avoid growing the struct size.
> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  6 +++---
> >  arch/arm64/kvm/hyp/include/nvhe/memory.h |  3 ++-
> >  arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 14 +++++++-------
> >  3 files changed, 12 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > index 0a048dc06a7d..9330b13075f8 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > @@ -7,7 +7,7 @@
> >  #include <nvhe/memory.h>
> >  #include <nvhe/spinlock.h>
> >  
> > -#define HYP_NO_ORDER	USHRT_MAX
> > +#define HYP_NO_ORDER	0xff
> 
> BUG_ON in hyp_page_ref_inc() might now need to test for 0xff/HYP_NO_ORDER
> instead of USHRT_MAX.

My bad, read to quickly, refcount/order... 

> 
> [...]
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 03/24] KVM: arm64: Add flags to struct hyp_page
@ 2022-07-18 10:57       ` Vincent Donnefort
  0 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-18 10:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

On Mon, Jul 18, 2022 at 11:54:24AM +0100, Vincent Donnefort wrote:
> On Thu, Jun 30, 2022 at 02:57:26PM +0100, Will Deacon wrote:
> > From: Quentin Perret <qperret@google.com>
> > 
> > Add a 'flags' field to struct hyp_page, and reduce the size of the order
> > field to u8 to avoid growing the struct size.
> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/gfp.h    |  6 +++---
> >  arch/arm64/kvm/hyp/include/nvhe/memory.h |  3 ++-
> >  arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 14 +++++++-------
> >  3 files changed, 12 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > index 0a048dc06a7d..9330b13075f8 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
> > @@ -7,7 +7,7 @@
> >  #include <nvhe/memory.h>
> >  #include <nvhe/spinlock.h>
> >  
> > -#define HYP_NO_ORDER	USHRT_MAX
> > +#define HYP_NO_ORDER	0xff
> 
> BUG_ON in hyp_page_ref_inc() might now need to test for 0xff/HYP_NO_ORDER
> instead of USHRT_MAX.

My bad, read to quickly, refcount/order... 

> 
> [...]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 12/24] KVM: arm64: Introduce shadow VM state at EL2
  2022-06-30 13:57   ` Will Deacon
  (?)
@ 2022-07-18 18:40     ` Vincent Donnefort
  -1 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-18 18:40 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

[...]

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 9f339dffbc1a..2d6b5058f7d3 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -288,6 +288,14 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
>   */
>  u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
>  
> +/*

/** ?

> + * kvm_pgtable_stage2_pgd_size() - Helper to compute size of a stage-2 PGD
> + * @vtcr:	Content of the VTCR register.
> + *
> + * Return: the size (in bytes) of the stage-2 PGD
> + */
> +size_t kvm_pgtable_stage2_pgd_size(u64 vtcr);
> +
>  /**
>   * __kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
>   * @pgt:	Uninitialised page-table structure to initialise.
> diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> index 8f7b8a2314bb..11526e89fe5c 100644
> --- a/arch/arm64/include/asm/kvm_pkvm.h
> +++ b/arch/arm64/include/asm/kvm_pkvm.h
> @@ -9,6 +9,9 @@
>  #include <linux/memblock.h>
>  #include <asm/kvm_pgtable.h>
>  
> +/* Maximum number of protected VMs that can be created. */
> +#define KVM_MAX_PVMS 255
> +
>  #define HYP_MEMBLOCK_REGIONS 128
>  
>  extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
> @@ -40,6 +43,11 @@ static inline unsigned long hyp_vmemmap_pages(size_t vmemmap_entry_size)
>  	return res >> PAGE_SHIFT;
>  }
>  
> +static inline unsigned long hyp_shadow_table_pages(void)
> +{
> +	return PAGE_ALIGN(KVM_MAX_PVMS * sizeof(void *)) >> PAGE_SHIFT;
> +}
> +
>  static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
>  {
>  	unsigned long total = 0, i;
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 3bea816296dc..3a0817b5c739 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -11,6 +11,7 @@
>  #include <asm/kvm_mmu.h>
>  #include <asm/kvm_pgtable.h>
>  #include <asm/virt.h>
> +#include <nvhe/pkvm.h>
>  #include <nvhe/spinlock.h>
>  
>  /*
> @@ -68,10 +69,12 @@ bool addr_is_memory(phys_addr_t phys);
>  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
>  int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
>  int kvm_host_prepare_stage2(void *pgt_pool_base);
> +int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd);
>  void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
>  
>  int hyp_pin_shared_mem(void *from, void *to);
>  void hyp_unpin_shared_mem(void *from, void *to);
> +void reclaim_guest_pages(struct kvm_shadow_vm *vm);
>  
>  static __always_inline void __load_host_stage2(void)
>  {
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> new file mode 100644
> index 000000000000..1d0a33f70879
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2021 Google LLC
> + * Author: Fuad Tabba <tabba@google.com>
> + */
> +
> +#ifndef __ARM64_KVM_NVHE_PKVM_H__
> +#define __ARM64_KVM_NVHE_PKVM_H__
> +
> +#include <asm/kvm_pkvm.h>
> +
> +/*
> + * Holds the relevant data for maintaining the vcpu state completely at hyp.
> + */
> +struct kvm_shadow_vcpu_state {
> +	/* The data for the shadow vcpu. */
> +	struct kvm_vcpu shadow_vcpu;
> +
> +	/* A pointer to the host's vcpu. */
> +	struct kvm_vcpu *host_vcpu;
> +
> +	/* A pointer to the shadow vm. */
> +	struct kvm_shadow_vm *shadow_vm;

IMHO, those declarations are already self-explanatory. The comments above don't
bring much.

> +};
> +
> +/*
> + * Holds the relevant data for running a protected vm.
> + */
> +struct kvm_shadow_vm {
> +	/* The data for the shadow kvm. */
> +	struct kvm kvm;
> +
> +	/* The host's kvm structure. */
> +	struct kvm *host_kvm;
> +
> +	/* The total size of the donated shadow area. */
> +	size_t shadow_area_size;
> +
> +	struct kvm_pgtable pgt;
> +
> +	/* Array of the shadow state per vcpu. */
> +	struct kvm_shadow_vcpu_state shadow_vcpu_states[0];
> +};
> +
> +static inline struct kvm_shadow_vcpu_state *get_shadow_state(struct kvm_vcpu *shadow_vcpu)
> +{
> +	return container_of(shadow_vcpu, struct kvm_shadow_vcpu_state, shadow_vcpu);
> +}
> +
> +static inline struct kvm_shadow_vm *get_shadow_vm(struct kvm_vcpu *shadow_vcpu)
> +{
> +	return get_shadow_state(shadow_vcpu)->shadow_vm;
> +}
> +
> +void hyp_shadow_table_init(void *tbl);
> +int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
> +		       size_t shadow_size, unsigned long pgd_hva);
> +int __pkvm_teardown_shadow(unsigned int shadow_handle);
> +
> +#endif /* __ARM64_KVM_NVHE_PKVM_H__ */
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 3cea4b6ac23e..a1fbd11c8041 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -15,6 +15,7 @@
>  
>  #include <nvhe/mem_protect.h>
>  #include <nvhe/mm.h>
> +#include <nvhe/pkvm.h>
>  #include <nvhe/trap_handler.h>
>  
>  DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
> @@ -191,6 +192,24 @@ static void handle___pkvm_vcpu_init_traps(struct kvm_cpu_context *host_ctxt)
>  	__pkvm_vcpu_init_traps(kern_hyp_va(vcpu));
>  }
>  
> +static void handle___pkvm_init_shadow(struct kvm_cpu_context *host_ctxt)
> +{
> +	DECLARE_REG(struct kvm *, host_kvm, host_ctxt, 1);
> +	DECLARE_REG(unsigned long, host_shadow_va, host_ctxt, 2);
> +	DECLARE_REG(size_t, shadow_size, host_ctxt, 3);
> +	DECLARE_REG(unsigned long, pgd, host_ctxt, 4);
> +
> +	cpu_reg(host_ctxt, 1) = __pkvm_init_shadow(host_kvm, host_shadow_va,
> +						   shadow_size, pgd);
> +}
> +
> +static void handle___pkvm_teardown_shadow(struct kvm_cpu_context *host_ctxt)
> +{
> +	DECLARE_REG(unsigned int, shadow_handle, host_ctxt, 1);
> +
> +	cpu_reg(host_ctxt, 1) = __pkvm_teardown_shadow(shadow_handle);
> +}
> +
>  typedef void (*hcall_t)(struct kvm_cpu_context *);
>  
>  #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
> @@ -220,6 +239,8 @@ static const hcall_t host_hcall[] = {
>  	HANDLE_FUNC(__vgic_v3_save_aprs),
>  	HANDLE_FUNC(__vgic_v3_restore_aprs),
>  	HANDLE_FUNC(__pkvm_vcpu_init_traps),
> +	HANDLE_FUNC(__pkvm_init_shadow),
> +	HANDLE_FUNC(__pkvm_teardown_shadow),
>  };
>  
>  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index e2e3b30b072e..9baf731736be 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -141,6 +141,20 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
>  	return 0;
>  }
>  
> +int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd)
> +{
> +	vm->pgt.pgd = pgd;
> +	return 0;
> +}
> +
> +void reclaim_guest_pages(struct kvm_shadow_vm *vm)
> +{
> +	unsigned long nr_pages;
> +
> +	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
> +	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
> +}
> +
>  int __pkvm_prot_finalize(void)
>  {
>  	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index 99c8d8b73e70..77aeb787670b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -7,6 +7,9 @@
>  #include <linux/kvm_host.h>
>  #include <linux/mm.h>
>  #include <nvhe/fixed_config.h>
> +#include <nvhe/mem_protect.h>
> +#include <nvhe/memory.h>

I don't think this one is necessary, it is already included in mm.h.

> +#include <nvhe/pkvm.h>
>  #include <nvhe/trap_handler.h>
>  
>  /*
> @@ -183,3 +186,398 @@ void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
>  	pvm_init_traps_aa64mmfr0(vcpu);
>  	pvm_init_traps_aa64mmfr1(vcpu);
>  }
> +
> +/*
> + * Start the shadow table handle at the offset defined instead of at 0.
> + * Mainly for sanity checking and debugging.
> + */
> +#define HANDLE_OFFSET 0x1000
> +
> +static unsigned int shadow_handle_to_idx(unsigned int shadow_handle)
> +{
> +	return shadow_handle - HANDLE_OFFSET;
> +}
> +
> +static unsigned int idx_to_shadow_handle(unsigned int idx)
> +{
> +	return idx + HANDLE_OFFSET;
> +}
> +
> +/*
> + * Spinlock for protecting the shadow table related state.
> + * Protects writes to shadow_table, nr_shadow_entries, and next_shadow_alloc,
> + * as well as reads and writes to last_shadow_vcpu_lookup.
> + */
> +static DEFINE_HYP_SPINLOCK(shadow_lock);
> +
> +/*
> + * The table of shadow entries for protected VMs in hyp.
> + * Allocated at hyp initialization and setup.
> + */
> +static struct kvm_shadow_vm **shadow_table;
> +
> +/* Current number of vms in the shadow table. */
> +static unsigned int nr_shadow_entries;
> +
> +/* The next entry index to try to allocate from. */
> +static unsigned int next_shadow_alloc;
> +
> +void hyp_shadow_table_init(void *tbl)
> +{
> +	WARN_ON(shadow_table);
> +	shadow_table = tbl;
> +}
> +
> +/*
> + * Return the shadow vm corresponding to the handle.
> + */
> +static struct kvm_shadow_vm *find_shadow_by_handle(unsigned int shadow_handle)
> +{
> +	unsigned int shadow_idx = shadow_handle_to_idx(shadow_handle);
> +
> +	if (unlikely(shadow_idx >= KVM_MAX_PVMS))
> +		return NULL;
> +
> +	return shadow_table[shadow_idx];
> +}
> +
> +static void unpin_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
> +			     unsigned int nr_vcpus)
> +{
> +	int i;
> +
> +	for (i = 0; i < nr_vcpus; i++) {
> +		struct kvm_vcpu *host_vcpu = shadow_vcpu_states[i].host_vcpu;

IIRC, checkpatch likes an empty line after declarations.

> +		hyp_unpin_shared_mem(host_vcpu, host_vcpu + 1);
> +	}
> +}
> +
> +static int set_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
> +			  unsigned int nr_vcpus,
> +			  struct kvm_vcpu **vcpu_array,
> +			  size_t vcpu_array_size)
> +{
> +	int i;
> +
> +	if (vcpu_array_size < sizeof(*vcpu_array) * nr_vcpus)
> +		return -EINVAL;
> +
> +	for (i = 0; i < nr_vcpus; i++) {
> +		struct kvm_vcpu *host_vcpu = kern_hyp_va(vcpu_array[i]);
> +
> +		if (hyp_pin_shared_mem(host_vcpu, host_vcpu + 1)) {
> +			unpin_host_vcpus(shadow_vcpu_states, i);
> +			return -EBUSY;
> +		}
> +
> +		shadow_vcpu_states[i].host_vcpu = host_vcpu;
> +	}
> +
> +	return 0;
> +}
> +
> +static int init_shadow_structs(struct kvm *kvm, struct kvm_shadow_vm *vm,
> +			       struct kvm_vcpu **vcpu_array,
> +			       unsigned int nr_vcpus)
> +{
> +	int i;
> +
> +	vm->host_kvm = kvm;
> +	vm->kvm.created_vcpus = nr_vcpus;
> +	vm->kvm.arch.vtcr = host_kvm.arch.vtcr;
> +
> +	for (i = 0; i < nr_vcpus; i++) {
> +		struct kvm_shadow_vcpu_state *shadow_vcpu_state = &vm->shadow_vcpu_states[i];
> +		struct kvm_vcpu *shadow_vcpu = &shadow_vcpu_state->shadow_vcpu;
> +		struct kvm_vcpu *host_vcpu = shadow_vcpu_state->host_vcpu;
> +
> +		shadow_vcpu_state->shadow_vm = vm;
> +
> +		shadow_vcpu->kvm = &vm->kvm;
> +		shadow_vcpu->vcpu_id = READ_ONCE(host_vcpu->vcpu_id);
> +		shadow_vcpu->vcpu_idx = i;
> +
> +		shadow_vcpu->arch.hw_mmu = &vm->kvm.arch.mmu;

In the end, we don't seem to use much from the struct kvm_cpu. Is it for
convinience that a smaller struct kvm_shadow_cpu hasn't been created, or we do
anticipate a later wider usage?

> +	}
> +
> +	return 0;
> +}
> +
> +static bool __exists_shadow(struct kvm *host_kvm)
> +{
> +	int i;
> +	unsigned int nr_checked = 0;
> +
> +	for (i = 0; i < KVM_MAX_PVMS && nr_checked < nr_shadow_entries; i++) {
> +		if (!shadow_table[i])
> +			continue;
> +
> +		if (unlikely(shadow_table[i]->host_kvm == host_kvm))
> +			return true;
> +
> +		nr_checked++;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * Allocate a shadow table entry and insert a pointer to the shadow vm.
> + *
> + * Return a unique handle to the protected VM on success,
> + * negative error code on failure.
> + */
> +static unsigned int insert_shadow_table(struct kvm *kvm,
> +					struct kvm_shadow_vm *vm,
> +					size_t shadow_size)
> +{
> +	struct kvm_s2_mmu *mmu = &vm->kvm.arch.mmu;
> +	unsigned int shadow_handle;
> +	unsigned int vmid;
> +
> +	hyp_assert_lock_held(&shadow_lock);
> +
> +	if (unlikely(nr_shadow_entries >= KVM_MAX_PVMS))
> +		return -ENOMEM;
> +
> +	/*
> +	 * Initializing protected state might have failed, yet a malicious host
> +	 * could trigger this function. Thus, ensure that shadow_table exists.
> +	 */
> +	if (unlikely(!shadow_table))
> +		return -EINVAL;
> +
> +	/* Check that a shadow hasn't been created before for this host KVM. */
> +	if (unlikely(__exists_shadow(kvm)))
> +		return -EEXIST;
> +
> +	/* Find the next free entry in the shadow table. */
> +	while (shadow_table[next_shadow_alloc])
> +		next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;

Couldn't it be merged with __exists_shadow which already knows the first free
shadow_table idx?

> +	shadow_handle = idx_to_shadow_handle(next_shadow_alloc);
> +
> +	vm->kvm.arch.pkvm.shadow_handle = shadow_handle;
> +	vm->shadow_area_size = shadow_size;
> +
> +	/* VMID 0 is reserved for the host */
> +	vmid = next_shadow_alloc + 1;
> +	if (vmid > 0xff)

Couldn't the 0xff be found with get_vmid_bits() or even from host_kvm.arch.vtcr?
Or does that depends on something completely different?

Also, appologies if this has been discussed already and I missed it, maybe
KVM_MAX_PVMS could be changed for that value - 1. Unless we think that archs
supporting 16 bits would waste way too much memory for that?

> +		return -ENOMEM;
> +
> +	atomic64_set(&mmu->vmid.id, vmid);
> +	mmu->arch = &vm->kvm.arch;
> +	mmu->pgt = &vm->pgt;
> +
> +	shadow_table[next_shadow_alloc] = vm;
> +	next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;
> +	nr_shadow_entries++;
> +
> +	return shadow_handle;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 12/24] KVM: arm64: Introduce shadow VM state at EL2
@ 2022-07-18 18:40     ` Vincent Donnefort
  0 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-18 18:40 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kernel-team, kvm, Oliver Upton, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	kvmarm

[...]

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 9f339dffbc1a..2d6b5058f7d3 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -288,6 +288,14 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
>   */
>  u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
>  
> +/*

/** ?

> + * kvm_pgtable_stage2_pgd_size() - Helper to compute size of a stage-2 PGD
> + * @vtcr:	Content of the VTCR register.
> + *
> + * Return: the size (in bytes) of the stage-2 PGD
> + */
> +size_t kvm_pgtable_stage2_pgd_size(u64 vtcr);
> +
>  /**
>   * __kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
>   * @pgt:	Uninitialised page-table structure to initialise.
> diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> index 8f7b8a2314bb..11526e89fe5c 100644
> --- a/arch/arm64/include/asm/kvm_pkvm.h
> +++ b/arch/arm64/include/asm/kvm_pkvm.h
> @@ -9,6 +9,9 @@
>  #include <linux/memblock.h>
>  #include <asm/kvm_pgtable.h>
>  
> +/* Maximum number of protected VMs that can be created. */
> +#define KVM_MAX_PVMS 255
> +
>  #define HYP_MEMBLOCK_REGIONS 128
>  
>  extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
> @@ -40,6 +43,11 @@ static inline unsigned long hyp_vmemmap_pages(size_t vmemmap_entry_size)
>  	return res >> PAGE_SHIFT;
>  }
>  
> +static inline unsigned long hyp_shadow_table_pages(void)
> +{
> +	return PAGE_ALIGN(KVM_MAX_PVMS * sizeof(void *)) >> PAGE_SHIFT;
> +}
> +
>  static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
>  {
>  	unsigned long total = 0, i;
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 3bea816296dc..3a0817b5c739 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -11,6 +11,7 @@
>  #include <asm/kvm_mmu.h>
>  #include <asm/kvm_pgtable.h>
>  #include <asm/virt.h>
> +#include <nvhe/pkvm.h>
>  #include <nvhe/spinlock.h>
>  
>  /*
> @@ -68,10 +69,12 @@ bool addr_is_memory(phys_addr_t phys);
>  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
>  int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
>  int kvm_host_prepare_stage2(void *pgt_pool_base);
> +int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd);
>  void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
>  
>  int hyp_pin_shared_mem(void *from, void *to);
>  void hyp_unpin_shared_mem(void *from, void *to);
> +void reclaim_guest_pages(struct kvm_shadow_vm *vm);
>  
>  static __always_inline void __load_host_stage2(void)
>  {
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> new file mode 100644
> index 000000000000..1d0a33f70879
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2021 Google LLC
> + * Author: Fuad Tabba <tabba@google.com>
> + */
> +
> +#ifndef __ARM64_KVM_NVHE_PKVM_H__
> +#define __ARM64_KVM_NVHE_PKVM_H__
> +
> +#include <asm/kvm_pkvm.h>
> +
> +/*
> + * Holds the relevant data for maintaining the vcpu state completely at hyp.
> + */
> +struct kvm_shadow_vcpu_state {
> +	/* The data for the shadow vcpu. */
> +	struct kvm_vcpu shadow_vcpu;
> +
> +	/* A pointer to the host's vcpu. */
> +	struct kvm_vcpu *host_vcpu;
> +
> +	/* A pointer to the shadow vm. */
> +	struct kvm_shadow_vm *shadow_vm;

IMHO, those declarations are already self-explanatory. The comments above don't
bring much.

> +};
> +
> +/*
> + * Holds the relevant data for running a protected vm.
> + */
> +struct kvm_shadow_vm {
> +	/* The data for the shadow kvm. */
> +	struct kvm kvm;
> +
> +	/* The host's kvm structure. */
> +	struct kvm *host_kvm;
> +
> +	/* The total size of the donated shadow area. */
> +	size_t shadow_area_size;
> +
> +	struct kvm_pgtable pgt;
> +
> +	/* Array of the shadow state per vcpu. */
> +	struct kvm_shadow_vcpu_state shadow_vcpu_states[0];
> +};
> +
> +static inline struct kvm_shadow_vcpu_state *get_shadow_state(struct kvm_vcpu *shadow_vcpu)
> +{
> +	return container_of(shadow_vcpu, struct kvm_shadow_vcpu_state, shadow_vcpu);
> +}
> +
> +static inline struct kvm_shadow_vm *get_shadow_vm(struct kvm_vcpu *shadow_vcpu)
> +{
> +	return get_shadow_state(shadow_vcpu)->shadow_vm;
> +}
> +
> +void hyp_shadow_table_init(void *tbl);
> +int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
> +		       size_t shadow_size, unsigned long pgd_hva);
> +int __pkvm_teardown_shadow(unsigned int shadow_handle);
> +
> +#endif /* __ARM64_KVM_NVHE_PKVM_H__ */
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 3cea4b6ac23e..a1fbd11c8041 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -15,6 +15,7 @@
>  
>  #include <nvhe/mem_protect.h>
>  #include <nvhe/mm.h>
> +#include <nvhe/pkvm.h>
>  #include <nvhe/trap_handler.h>
>  
>  DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
> @@ -191,6 +192,24 @@ static void handle___pkvm_vcpu_init_traps(struct kvm_cpu_context *host_ctxt)
>  	__pkvm_vcpu_init_traps(kern_hyp_va(vcpu));
>  }
>  
> +static void handle___pkvm_init_shadow(struct kvm_cpu_context *host_ctxt)
> +{
> +	DECLARE_REG(struct kvm *, host_kvm, host_ctxt, 1);
> +	DECLARE_REG(unsigned long, host_shadow_va, host_ctxt, 2);
> +	DECLARE_REG(size_t, shadow_size, host_ctxt, 3);
> +	DECLARE_REG(unsigned long, pgd, host_ctxt, 4);
> +
> +	cpu_reg(host_ctxt, 1) = __pkvm_init_shadow(host_kvm, host_shadow_va,
> +						   shadow_size, pgd);
> +}
> +
> +static void handle___pkvm_teardown_shadow(struct kvm_cpu_context *host_ctxt)
> +{
> +	DECLARE_REG(unsigned int, shadow_handle, host_ctxt, 1);
> +
> +	cpu_reg(host_ctxt, 1) = __pkvm_teardown_shadow(shadow_handle);
> +}
> +
>  typedef void (*hcall_t)(struct kvm_cpu_context *);
>  
>  #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
> @@ -220,6 +239,8 @@ static const hcall_t host_hcall[] = {
>  	HANDLE_FUNC(__vgic_v3_save_aprs),
>  	HANDLE_FUNC(__vgic_v3_restore_aprs),
>  	HANDLE_FUNC(__pkvm_vcpu_init_traps),
> +	HANDLE_FUNC(__pkvm_init_shadow),
> +	HANDLE_FUNC(__pkvm_teardown_shadow),
>  };
>  
>  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index e2e3b30b072e..9baf731736be 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -141,6 +141,20 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
>  	return 0;
>  }
>  
> +int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd)
> +{
> +	vm->pgt.pgd = pgd;
> +	return 0;
> +}
> +
> +void reclaim_guest_pages(struct kvm_shadow_vm *vm)
> +{
> +	unsigned long nr_pages;
> +
> +	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
> +	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
> +}
> +
>  int __pkvm_prot_finalize(void)
>  {
>  	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index 99c8d8b73e70..77aeb787670b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -7,6 +7,9 @@
>  #include <linux/kvm_host.h>
>  #include <linux/mm.h>
>  #include <nvhe/fixed_config.h>
> +#include <nvhe/mem_protect.h>
> +#include <nvhe/memory.h>

I don't think this one is necessary, it is already included in mm.h.

> +#include <nvhe/pkvm.h>
>  #include <nvhe/trap_handler.h>
>  
>  /*
> @@ -183,3 +186,398 @@ void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
>  	pvm_init_traps_aa64mmfr0(vcpu);
>  	pvm_init_traps_aa64mmfr1(vcpu);
>  }
> +
> +/*
> + * Start the shadow table handle at the offset defined instead of at 0.
> + * Mainly for sanity checking and debugging.
> + */
> +#define HANDLE_OFFSET 0x1000
> +
> +static unsigned int shadow_handle_to_idx(unsigned int shadow_handle)
> +{
> +	return shadow_handle - HANDLE_OFFSET;
> +}
> +
> +static unsigned int idx_to_shadow_handle(unsigned int idx)
> +{
> +	return idx + HANDLE_OFFSET;
> +}
> +
> +/*
> + * Spinlock for protecting the shadow table related state.
> + * Protects writes to shadow_table, nr_shadow_entries, and next_shadow_alloc,
> + * as well as reads and writes to last_shadow_vcpu_lookup.
> + */
> +static DEFINE_HYP_SPINLOCK(shadow_lock);
> +
> +/*
> + * The table of shadow entries for protected VMs in hyp.
> + * Allocated at hyp initialization and setup.
> + */
> +static struct kvm_shadow_vm **shadow_table;
> +
> +/* Current number of vms in the shadow table. */
> +static unsigned int nr_shadow_entries;
> +
> +/* The next entry index to try to allocate from. */
> +static unsigned int next_shadow_alloc;
> +
> +void hyp_shadow_table_init(void *tbl)
> +{
> +	WARN_ON(shadow_table);
> +	shadow_table = tbl;
> +}
> +
> +/*
> + * Return the shadow vm corresponding to the handle.
> + */
> +static struct kvm_shadow_vm *find_shadow_by_handle(unsigned int shadow_handle)
> +{
> +	unsigned int shadow_idx = shadow_handle_to_idx(shadow_handle);
> +
> +	if (unlikely(shadow_idx >= KVM_MAX_PVMS))
> +		return NULL;
> +
> +	return shadow_table[shadow_idx];
> +}
> +
> +static void unpin_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
> +			     unsigned int nr_vcpus)
> +{
> +	int i;
> +
> +	for (i = 0; i < nr_vcpus; i++) {
> +		struct kvm_vcpu *host_vcpu = shadow_vcpu_states[i].host_vcpu;

IIRC, checkpatch likes an empty line after declarations.

> +		hyp_unpin_shared_mem(host_vcpu, host_vcpu + 1);
> +	}
> +}
> +
> +static int set_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
> +			  unsigned int nr_vcpus,
> +			  struct kvm_vcpu **vcpu_array,
> +			  size_t vcpu_array_size)
> +{
> +	int i;
> +
> +	if (vcpu_array_size < sizeof(*vcpu_array) * nr_vcpus)
> +		return -EINVAL;
> +
> +	for (i = 0; i < nr_vcpus; i++) {
> +		struct kvm_vcpu *host_vcpu = kern_hyp_va(vcpu_array[i]);
> +
> +		if (hyp_pin_shared_mem(host_vcpu, host_vcpu + 1)) {
> +			unpin_host_vcpus(shadow_vcpu_states, i);
> +			return -EBUSY;
> +		}
> +
> +		shadow_vcpu_states[i].host_vcpu = host_vcpu;
> +	}
> +
> +	return 0;
> +}
> +
> +static int init_shadow_structs(struct kvm *kvm, struct kvm_shadow_vm *vm,
> +			       struct kvm_vcpu **vcpu_array,
> +			       unsigned int nr_vcpus)
> +{
> +	int i;
> +
> +	vm->host_kvm = kvm;
> +	vm->kvm.created_vcpus = nr_vcpus;
> +	vm->kvm.arch.vtcr = host_kvm.arch.vtcr;
> +
> +	for (i = 0; i < nr_vcpus; i++) {
> +		struct kvm_shadow_vcpu_state *shadow_vcpu_state = &vm->shadow_vcpu_states[i];
> +		struct kvm_vcpu *shadow_vcpu = &shadow_vcpu_state->shadow_vcpu;
> +		struct kvm_vcpu *host_vcpu = shadow_vcpu_state->host_vcpu;
> +
> +		shadow_vcpu_state->shadow_vm = vm;
> +
> +		shadow_vcpu->kvm = &vm->kvm;
> +		shadow_vcpu->vcpu_id = READ_ONCE(host_vcpu->vcpu_id);
> +		shadow_vcpu->vcpu_idx = i;
> +
> +		shadow_vcpu->arch.hw_mmu = &vm->kvm.arch.mmu;

In the end, we don't seem to use much from the struct kvm_cpu. Is it for
convinience that a smaller struct kvm_shadow_cpu hasn't been created, or we do
anticipate a later wider usage?

> +	}
> +
> +	return 0;
> +}
> +
> +static bool __exists_shadow(struct kvm *host_kvm)
> +{
> +	int i;
> +	unsigned int nr_checked = 0;
> +
> +	for (i = 0; i < KVM_MAX_PVMS && nr_checked < nr_shadow_entries; i++) {
> +		if (!shadow_table[i])
> +			continue;
> +
> +		if (unlikely(shadow_table[i]->host_kvm == host_kvm))
> +			return true;
> +
> +		nr_checked++;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * Allocate a shadow table entry and insert a pointer to the shadow vm.
> + *
> + * Return a unique handle to the protected VM on success,
> + * negative error code on failure.
> + */
> +static unsigned int insert_shadow_table(struct kvm *kvm,
> +					struct kvm_shadow_vm *vm,
> +					size_t shadow_size)
> +{
> +	struct kvm_s2_mmu *mmu = &vm->kvm.arch.mmu;
> +	unsigned int shadow_handle;
> +	unsigned int vmid;
> +
> +	hyp_assert_lock_held(&shadow_lock);
> +
> +	if (unlikely(nr_shadow_entries >= KVM_MAX_PVMS))
> +		return -ENOMEM;
> +
> +	/*
> +	 * Initializing protected state might have failed, yet a malicious host
> +	 * could trigger this function. Thus, ensure that shadow_table exists.
> +	 */
> +	if (unlikely(!shadow_table))
> +		return -EINVAL;
> +
> +	/* Check that a shadow hasn't been created before for this host KVM. */
> +	if (unlikely(__exists_shadow(kvm)))
> +		return -EEXIST;
> +
> +	/* Find the next free entry in the shadow table. */
> +	while (shadow_table[next_shadow_alloc])
> +		next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;

Couldn't it be merged with __exists_shadow which already knows the first free
shadow_table idx?

> +	shadow_handle = idx_to_shadow_handle(next_shadow_alloc);
> +
> +	vm->kvm.arch.pkvm.shadow_handle = shadow_handle;
> +	vm->shadow_area_size = shadow_size;
> +
> +	/* VMID 0 is reserved for the host */
> +	vmid = next_shadow_alloc + 1;
> +	if (vmid > 0xff)

Couldn't the 0xff be found with get_vmid_bits() or even from host_kvm.arch.vtcr?
Or does that depends on something completely different?

Also, appologies if this has been discussed already and I missed it, maybe
KVM_MAX_PVMS could be changed for that value - 1. Unless we think that archs
supporting 16 bits would waste way too much memory for that?

> +		return -ENOMEM;
> +
> +	atomic64_set(&mmu->vmid.id, vmid);
> +	mmu->arch = &vm->kvm.arch;
> +	mmu->pgt = &vm->pgt;
> +
> +	shadow_table[next_shadow_alloc] = vm;
> +	next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;
> +	nr_shadow_entries++;
> +
> +	return shadow_handle;
> +}
> +

[...]
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 12/24] KVM: arm64: Introduce shadow VM state at EL2
@ 2022-07-18 18:40     ` Vincent Donnefort
  0 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-18 18:40 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

[...]

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 9f339dffbc1a..2d6b5058f7d3 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -288,6 +288,14 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
>   */
>  u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
>  
> +/*

/** ?

> + * kvm_pgtable_stage2_pgd_size() - Helper to compute size of a stage-2 PGD
> + * @vtcr:	Content of the VTCR register.
> + *
> + * Return: the size (in bytes) of the stage-2 PGD
> + */
> +size_t kvm_pgtable_stage2_pgd_size(u64 vtcr);
> +
>  /**
>   * __kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
>   * @pgt:	Uninitialised page-table structure to initialise.
> diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> index 8f7b8a2314bb..11526e89fe5c 100644
> --- a/arch/arm64/include/asm/kvm_pkvm.h
> +++ b/arch/arm64/include/asm/kvm_pkvm.h
> @@ -9,6 +9,9 @@
>  #include <linux/memblock.h>
>  #include <asm/kvm_pgtable.h>
>  
> +/* Maximum number of protected VMs that can be created. */
> +#define KVM_MAX_PVMS 255
> +
>  #define HYP_MEMBLOCK_REGIONS 128
>  
>  extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
> @@ -40,6 +43,11 @@ static inline unsigned long hyp_vmemmap_pages(size_t vmemmap_entry_size)
>  	return res >> PAGE_SHIFT;
>  }
>  
> +static inline unsigned long hyp_shadow_table_pages(void)
> +{
> +	return PAGE_ALIGN(KVM_MAX_PVMS * sizeof(void *)) >> PAGE_SHIFT;
> +}
> +
>  static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
>  {
>  	unsigned long total = 0, i;
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 3bea816296dc..3a0817b5c739 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -11,6 +11,7 @@
>  #include <asm/kvm_mmu.h>
>  #include <asm/kvm_pgtable.h>
>  #include <asm/virt.h>
> +#include <nvhe/pkvm.h>
>  #include <nvhe/spinlock.h>
>  
>  /*
> @@ -68,10 +69,12 @@ bool addr_is_memory(phys_addr_t phys);
>  int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
>  int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
>  int kvm_host_prepare_stage2(void *pgt_pool_base);
> +int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd);
>  void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
>  
>  int hyp_pin_shared_mem(void *from, void *to);
>  void hyp_unpin_shared_mem(void *from, void *to);
> +void reclaim_guest_pages(struct kvm_shadow_vm *vm);
>  
>  static __always_inline void __load_host_stage2(void)
>  {
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> new file mode 100644
> index 000000000000..1d0a33f70879
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2021 Google LLC
> + * Author: Fuad Tabba <tabba@google.com>
> + */
> +
> +#ifndef __ARM64_KVM_NVHE_PKVM_H__
> +#define __ARM64_KVM_NVHE_PKVM_H__
> +
> +#include <asm/kvm_pkvm.h>
> +
> +/*
> + * Holds the relevant data for maintaining the vcpu state completely at hyp.
> + */
> +struct kvm_shadow_vcpu_state {
> +	/* The data for the shadow vcpu. */
> +	struct kvm_vcpu shadow_vcpu;
> +
> +	/* A pointer to the host's vcpu. */
> +	struct kvm_vcpu *host_vcpu;
> +
> +	/* A pointer to the shadow vm. */
> +	struct kvm_shadow_vm *shadow_vm;

IMHO, those declarations are already self-explanatory. The comments above don't
bring much.

> +};
> +
> +/*
> + * Holds the relevant data for running a protected vm.
> + */
> +struct kvm_shadow_vm {
> +	/* The data for the shadow kvm. */
> +	struct kvm kvm;
> +
> +	/* The host's kvm structure. */
> +	struct kvm *host_kvm;
> +
> +	/* The total size of the donated shadow area. */
> +	size_t shadow_area_size;
> +
> +	struct kvm_pgtable pgt;
> +
> +	/* Array of the shadow state per vcpu. */
> +	struct kvm_shadow_vcpu_state shadow_vcpu_states[0];
> +};
> +
> +static inline struct kvm_shadow_vcpu_state *get_shadow_state(struct kvm_vcpu *shadow_vcpu)
> +{
> +	return container_of(shadow_vcpu, struct kvm_shadow_vcpu_state, shadow_vcpu);
> +}
> +
> +static inline struct kvm_shadow_vm *get_shadow_vm(struct kvm_vcpu *shadow_vcpu)
> +{
> +	return get_shadow_state(shadow_vcpu)->shadow_vm;
> +}
> +
> +void hyp_shadow_table_init(void *tbl);
> +int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
> +		       size_t shadow_size, unsigned long pgd_hva);
> +int __pkvm_teardown_shadow(unsigned int shadow_handle);
> +
> +#endif /* __ARM64_KVM_NVHE_PKVM_H__ */
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 3cea4b6ac23e..a1fbd11c8041 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -15,6 +15,7 @@
>  
>  #include <nvhe/mem_protect.h>
>  #include <nvhe/mm.h>
> +#include <nvhe/pkvm.h>
>  #include <nvhe/trap_handler.h>
>  
>  DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
> @@ -191,6 +192,24 @@ static void handle___pkvm_vcpu_init_traps(struct kvm_cpu_context *host_ctxt)
>  	__pkvm_vcpu_init_traps(kern_hyp_va(vcpu));
>  }
>  
> +static void handle___pkvm_init_shadow(struct kvm_cpu_context *host_ctxt)
> +{
> +	DECLARE_REG(struct kvm *, host_kvm, host_ctxt, 1);
> +	DECLARE_REG(unsigned long, host_shadow_va, host_ctxt, 2);
> +	DECLARE_REG(size_t, shadow_size, host_ctxt, 3);
> +	DECLARE_REG(unsigned long, pgd, host_ctxt, 4);
> +
> +	cpu_reg(host_ctxt, 1) = __pkvm_init_shadow(host_kvm, host_shadow_va,
> +						   shadow_size, pgd);
> +}
> +
> +static void handle___pkvm_teardown_shadow(struct kvm_cpu_context *host_ctxt)
> +{
> +	DECLARE_REG(unsigned int, shadow_handle, host_ctxt, 1);
> +
> +	cpu_reg(host_ctxt, 1) = __pkvm_teardown_shadow(shadow_handle);
> +}
> +
>  typedef void (*hcall_t)(struct kvm_cpu_context *);
>  
>  #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
> @@ -220,6 +239,8 @@ static const hcall_t host_hcall[] = {
>  	HANDLE_FUNC(__vgic_v3_save_aprs),
>  	HANDLE_FUNC(__vgic_v3_restore_aprs),
>  	HANDLE_FUNC(__pkvm_vcpu_init_traps),
> +	HANDLE_FUNC(__pkvm_init_shadow),
> +	HANDLE_FUNC(__pkvm_teardown_shadow),
>  };
>  
>  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index e2e3b30b072e..9baf731736be 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -141,6 +141,20 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
>  	return 0;
>  }
>  
> +int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd)
> +{
> +	vm->pgt.pgd = pgd;
> +	return 0;
> +}
> +
> +void reclaim_guest_pages(struct kvm_shadow_vm *vm)
> +{
> +	unsigned long nr_pages;
> +
> +	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
> +	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
> +}
> +
>  int __pkvm_prot_finalize(void)
>  {
>  	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index 99c8d8b73e70..77aeb787670b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -7,6 +7,9 @@
>  #include <linux/kvm_host.h>
>  #include <linux/mm.h>
>  #include <nvhe/fixed_config.h>
> +#include <nvhe/mem_protect.h>
> +#include <nvhe/memory.h>

I don't think this one is necessary, it is already included in mm.h.

> +#include <nvhe/pkvm.h>
>  #include <nvhe/trap_handler.h>
>  
>  /*
> @@ -183,3 +186,398 @@ void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
>  	pvm_init_traps_aa64mmfr0(vcpu);
>  	pvm_init_traps_aa64mmfr1(vcpu);
>  }
> +
> +/*
> + * Start the shadow table handle at the offset defined instead of at 0.
> + * Mainly for sanity checking and debugging.
> + */
> +#define HANDLE_OFFSET 0x1000
> +
> +static unsigned int shadow_handle_to_idx(unsigned int shadow_handle)
> +{
> +	return shadow_handle - HANDLE_OFFSET;
> +}
> +
> +static unsigned int idx_to_shadow_handle(unsigned int idx)
> +{
> +	return idx + HANDLE_OFFSET;
> +}
> +
> +/*
> + * Spinlock for protecting the shadow table related state.
> + * Protects writes to shadow_table, nr_shadow_entries, and next_shadow_alloc,
> + * as well as reads and writes to last_shadow_vcpu_lookup.
> + */
> +static DEFINE_HYP_SPINLOCK(shadow_lock);
> +
> +/*
> + * The table of shadow entries for protected VMs in hyp.
> + * Allocated at hyp initialization and setup.
> + */
> +static struct kvm_shadow_vm **shadow_table;
> +
> +/* Current number of vms in the shadow table. */
> +static unsigned int nr_shadow_entries;
> +
> +/* The next entry index to try to allocate from. */
> +static unsigned int next_shadow_alloc;
> +
> +void hyp_shadow_table_init(void *tbl)
> +{
> +	WARN_ON(shadow_table);
> +	shadow_table = tbl;
> +}
> +
> +/*
> + * Return the shadow vm corresponding to the handle.
> + */
> +static struct kvm_shadow_vm *find_shadow_by_handle(unsigned int shadow_handle)
> +{
> +	unsigned int shadow_idx = shadow_handle_to_idx(shadow_handle);
> +
> +	if (unlikely(shadow_idx >= KVM_MAX_PVMS))
> +		return NULL;
> +
> +	return shadow_table[shadow_idx];
> +}
> +
> +static void unpin_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
> +			     unsigned int nr_vcpus)
> +{
> +	int i;
> +
> +	for (i = 0; i < nr_vcpus; i++) {
> +		struct kvm_vcpu *host_vcpu = shadow_vcpu_states[i].host_vcpu;

IIRC, checkpatch likes an empty line after declarations.

> +		hyp_unpin_shared_mem(host_vcpu, host_vcpu + 1);
> +	}
> +}
> +
> +static int set_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
> +			  unsigned int nr_vcpus,
> +			  struct kvm_vcpu **vcpu_array,
> +			  size_t vcpu_array_size)
> +{
> +	int i;
> +
> +	if (vcpu_array_size < sizeof(*vcpu_array) * nr_vcpus)
> +		return -EINVAL;
> +
> +	for (i = 0; i < nr_vcpus; i++) {
> +		struct kvm_vcpu *host_vcpu = kern_hyp_va(vcpu_array[i]);
> +
> +		if (hyp_pin_shared_mem(host_vcpu, host_vcpu + 1)) {
> +			unpin_host_vcpus(shadow_vcpu_states, i);
> +			return -EBUSY;
> +		}
> +
> +		shadow_vcpu_states[i].host_vcpu = host_vcpu;
> +	}
> +
> +	return 0;
> +}
> +
> +static int init_shadow_structs(struct kvm *kvm, struct kvm_shadow_vm *vm,
> +			       struct kvm_vcpu **vcpu_array,
> +			       unsigned int nr_vcpus)
> +{
> +	int i;
> +
> +	vm->host_kvm = kvm;
> +	vm->kvm.created_vcpus = nr_vcpus;
> +	vm->kvm.arch.vtcr = host_kvm.arch.vtcr;
> +
> +	for (i = 0; i < nr_vcpus; i++) {
> +		struct kvm_shadow_vcpu_state *shadow_vcpu_state = &vm->shadow_vcpu_states[i];
> +		struct kvm_vcpu *shadow_vcpu = &shadow_vcpu_state->shadow_vcpu;
> +		struct kvm_vcpu *host_vcpu = shadow_vcpu_state->host_vcpu;
> +
> +		shadow_vcpu_state->shadow_vm = vm;
> +
> +		shadow_vcpu->kvm = &vm->kvm;
> +		shadow_vcpu->vcpu_id = READ_ONCE(host_vcpu->vcpu_id);
> +		shadow_vcpu->vcpu_idx = i;
> +
> +		shadow_vcpu->arch.hw_mmu = &vm->kvm.arch.mmu;

In the end, we don't seem to use much from the struct kvm_cpu. Is it for
convinience that a smaller struct kvm_shadow_cpu hasn't been created, or we do
anticipate a later wider usage?

> +	}
> +
> +	return 0;
> +}
> +
> +static bool __exists_shadow(struct kvm *host_kvm)
> +{
> +	int i;
> +	unsigned int nr_checked = 0;
> +
> +	for (i = 0; i < KVM_MAX_PVMS && nr_checked < nr_shadow_entries; i++) {
> +		if (!shadow_table[i])
> +			continue;
> +
> +		if (unlikely(shadow_table[i]->host_kvm == host_kvm))
> +			return true;
> +
> +		nr_checked++;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * Allocate a shadow table entry and insert a pointer to the shadow vm.
> + *
> + * Return a unique handle to the protected VM on success,
> + * negative error code on failure.
> + */
> +static unsigned int insert_shadow_table(struct kvm *kvm,
> +					struct kvm_shadow_vm *vm,
> +					size_t shadow_size)
> +{
> +	struct kvm_s2_mmu *mmu = &vm->kvm.arch.mmu;
> +	unsigned int shadow_handle;
> +	unsigned int vmid;
> +
> +	hyp_assert_lock_held(&shadow_lock);
> +
> +	if (unlikely(nr_shadow_entries >= KVM_MAX_PVMS))
> +		return -ENOMEM;
> +
> +	/*
> +	 * Initializing protected state might have failed, yet a malicious host
> +	 * could trigger this function. Thus, ensure that shadow_table exists.
> +	 */
> +	if (unlikely(!shadow_table))
> +		return -EINVAL;
> +
> +	/* Check that a shadow hasn't been created before for this host KVM. */
> +	if (unlikely(__exists_shadow(kvm)))
> +		return -EEXIST;
> +
> +	/* Find the next free entry in the shadow table. */
> +	while (shadow_table[next_shadow_alloc])
> +		next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;

Couldn't it be merged with __exists_shadow which already knows the first free
shadow_table idx?

> +	shadow_handle = idx_to_shadow_handle(next_shadow_alloc);
> +
> +	vm->kvm.arch.pkvm.shadow_handle = shadow_handle;
> +	vm->shadow_area_size = shadow_size;
> +
> +	/* VMID 0 is reserved for the host */
> +	vmid = next_shadow_alloc + 1;
> +	if (vmid > 0xff)

Couldn't the 0xff be found with get_vmid_bits() or even from host_kvm.arch.vtcr?
Or does that depends on something completely different?

Also, appologies if this has been discussed already and I missed it, maybe
KVM_MAX_PVMS could be changed for that value - 1. Unless we think that archs
supporting 16 bits would waste way too much memory for that?

> +		return -ENOMEM;
> +
> +	atomic64_set(&mmu->vmid.id, vmid);
> +	mmu->arch = &vm->kvm.arch;
> +	mmu->pgt = &vm->pgt;
> +
> +	shadow_table[next_shadow_alloc] = vm;
> +	next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;
> +	nr_shadow_entries++;
> +
> +	return shadow_handle;
> +}
> +

[...]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 12/24] KVM: arm64: Introduce shadow VM state at EL2
  2022-07-18 18:40     ` Vincent Donnefort
  (?)
@ 2022-07-19  9:41       ` Marc Zyngier
  -1 siblings, 0 replies; 135+ messages in thread
From: Marc Zyngier @ 2022-07-19  9:41 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Will Deacon, kvmarm, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, kernel-team, kvm,
	linux-arm-kernel

On Mon, 18 Jul 2022 19:40:05 +0100,
Vincent Donnefort <vdonnefort@google.com> wrote:
> 
> [...]
> 
> In the end, we don't seem to use much from the struct kvm_cpu. Is it for
> convinience that a smaller struct kvm_shadow_cpu hasn't been created, or we do
> anticipate a later wider usage?

The alternative would be to repaint the whole of the core KVM/arm64
code to be able to take the new structure in parallel with the
standard one. There is very little to gain from such a move.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 12/24] KVM: arm64: Introduce shadow VM state at EL2
@ 2022-07-19  9:41       ` Marc Zyngier
  0 siblings, 0 replies; 135+ messages in thread
From: Marc Zyngier @ 2022-07-19  9:41 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: kernel-team, kvm, Oliver Upton, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon, kvmarm

On Mon, 18 Jul 2022 19:40:05 +0100,
Vincent Donnefort <vdonnefort@google.com> wrote:
> 
> [...]
> 
> In the end, we don't seem to use much from the struct kvm_cpu. Is it for
> convinience that a smaller struct kvm_shadow_cpu hasn't been created, or we do
> anticipate a later wider usage?

The alternative would be to repaint the whole of the core KVM/arm64
code to be able to take the new structure in parallel with the
standard one. There is very little to gain from such a move.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 12/24] KVM: arm64: Introduce shadow VM state at EL2
@ 2022-07-19  9:41       ` Marc Zyngier
  0 siblings, 0 replies; 135+ messages in thread
From: Marc Zyngier @ 2022-07-19  9:41 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Will Deacon, kvmarm, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Quentin Perret, Suzuki K Poulose, Michael Roth,
	Mark Rutland, Fuad Tabba, Oliver Upton, kernel-team, kvm,
	linux-arm-kernel

On Mon, 18 Jul 2022 19:40:05 +0100,
Vincent Donnefort <vdonnefort@google.com> wrote:
> 
> [...]
> 
> In the end, we don't seem to use much from the struct kvm_cpu. Is it for
> convinience that a smaller struct kvm_shadow_cpu hasn't been created, or we do
> anticipate a later wider usage?

The alternative would be to repaint the whole of the core KVM/arm64
code to be able to take the new structure in parallel with the
standard one. There is very little to gain from such a move.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 14/24] KVM: arm64: Add pcpu fixmap infrastructure at EL2
  2022-06-30 13:57   ` Will Deacon
  (?)
@ 2022-07-19 13:30     ` Vincent Donnefort
  -1 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-19 13:30 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kernel-team, kvm, Oliver Upton, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	kvmarm

>  static struct hyp_pool host_s2_pool;
> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> index d3a3b47181de..17d689483ec4 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> @@ -14,6 +14,7 @@
>  #include <nvhe/early_alloc.h>
>  #include <nvhe/gfp.h>
>  #include <nvhe/memory.h>
> +#include <nvhe/mem_protect.h>
>  #include <nvhe/mm.h>
>  #include <nvhe/spinlock.h>
>  
> @@ -24,6 +25,7 @@ struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
>  unsigned int hyp_memblock_nr;
>  
>  static u64 __io_map_base;
> +static DEFINE_PER_CPU(void *, hyp_fixmap_base);
>  
>  static int __pkvm_create_mappings(unsigned long start, unsigned long size,
>  				  unsigned long phys, enum kvm_pgtable_prot prot)
> @@ -212,6 +214,76 @@ int hyp_map_vectors(void)
>  	return 0;
>  }
>  
> +void *hyp_fixmap_map(phys_addr_t phys)
> +{
> +	void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> +	int ret = kvm_pgtable_hyp_map(&pkvm_pgtable, (u64)addr, PAGE_SIZE,
> +				      phys, PAGE_HYP);
> +	return ret ? NULL : addr;
> +}
> +
> +int hyp_fixmap_unmap(void)
> +{
> +	void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> +	int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, (u64)addr, PAGE_SIZE);
> +
> +	return (ret != PAGE_SIZE) ? -EINVAL : 0;
> +}
> +

I probably missed something but as the pagetable pages for this mapping are
pined, it seems impossible (currently) for this call to fail. Maybe a WARN_ON
would be more appropriate, especially the callers in the subsequent patches do
not seem to check for this function return value?

[...]
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 14/24] KVM: arm64: Add pcpu fixmap infrastructure at EL2
@ 2022-07-19 13:30     ` Vincent Donnefort
  0 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-19 13:30 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

>  static struct hyp_pool host_s2_pool;
> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> index d3a3b47181de..17d689483ec4 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> @@ -14,6 +14,7 @@
>  #include <nvhe/early_alloc.h>
>  #include <nvhe/gfp.h>
>  #include <nvhe/memory.h>
> +#include <nvhe/mem_protect.h>
>  #include <nvhe/mm.h>
>  #include <nvhe/spinlock.h>
>  
> @@ -24,6 +25,7 @@ struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
>  unsigned int hyp_memblock_nr;
>  
>  static u64 __io_map_base;
> +static DEFINE_PER_CPU(void *, hyp_fixmap_base);
>  
>  static int __pkvm_create_mappings(unsigned long start, unsigned long size,
>  				  unsigned long phys, enum kvm_pgtable_prot prot)
> @@ -212,6 +214,76 @@ int hyp_map_vectors(void)
>  	return 0;
>  }
>  
> +void *hyp_fixmap_map(phys_addr_t phys)
> +{
> +	void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> +	int ret = kvm_pgtable_hyp_map(&pkvm_pgtable, (u64)addr, PAGE_SIZE,
> +				      phys, PAGE_HYP);
> +	return ret ? NULL : addr;
> +}
> +
> +int hyp_fixmap_unmap(void)
> +{
> +	void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> +	int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, (u64)addr, PAGE_SIZE);
> +
> +	return (ret != PAGE_SIZE) ? -EINVAL : 0;
> +}
> +

I probably missed something but as the pagetable pages for this mapping are
pined, it seems impossible (currently) for this call to fail. Maybe a WARN_ON
would be more appropriate, especially the callers in the subsequent patches do
not seem to check for this function return value?

[...]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 14/24] KVM: arm64: Add pcpu fixmap infrastructure at EL2
@ 2022-07-19 13:30     ` Vincent Donnefort
  0 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-19 13:30 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

>  static struct hyp_pool host_s2_pool;
> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> index d3a3b47181de..17d689483ec4 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> @@ -14,6 +14,7 @@
>  #include <nvhe/early_alloc.h>
>  #include <nvhe/gfp.h>
>  #include <nvhe/memory.h>
> +#include <nvhe/mem_protect.h>
>  #include <nvhe/mm.h>
>  #include <nvhe/spinlock.h>
>  
> @@ -24,6 +25,7 @@ struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
>  unsigned int hyp_memblock_nr;
>  
>  static u64 __io_map_base;
> +static DEFINE_PER_CPU(void *, hyp_fixmap_base);
>  
>  static int __pkvm_create_mappings(unsigned long start, unsigned long size,
>  				  unsigned long phys, enum kvm_pgtable_prot prot)
> @@ -212,6 +214,76 @@ int hyp_map_vectors(void)
>  	return 0;
>  }
>  
> +void *hyp_fixmap_map(phys_addr_t phys)
> +{
> +	void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> +	int ret = kvm_pgtable_hyp_map(&pkvm_pgtable, (u64)addr, PAGE_SIZE,
> +				      phys, PAGE_HYP);
> +	return ret ? NULL : addr;
> +}
> +
> +int hyp_fixmap_unmap(void)
> +{
> +	void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> +	int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, (u64)addr, PAGE_SIZE);
> +
> +	return (ret != PAGE_SIZE) ? -EINVAL : 0;
> +}
> +

I probably missed something but as the pagetable pages for this mapping are
pined, it seems impossible (currently) for this call to fail. Maybe a WARN_ON
would be more appropriate, especially the callers in the subsequent patches do
not seem to check for this function return value?

[...]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 18/24] KVM: arm64: Instantiate guest stage-2 page-tables at EL2
  2022-06-30 13:57   ` Will Deacon
  (?)
@ 2022-07-19 13:32     ` Vincent Donnefort
  -1 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-19 13:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kernel-team, kvm, Oliver Upton, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	kvmarm

[...]

>  }
>  
>  void reclaim_guest_pages(struct kvm_shadow_vm *vm)
>  {
> -	unsigned long nr_pages;
> +	unsigned long nr_pages, pfn;
>  
>  	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
> -	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
> +	pfn = hyp_virt_to_pfn(vm->pgt.pgd);
> +
> +	guest_lock_component(vm);
> +	kvm_pgtable_stage2_destroy(&vm->pgt);
> +	vm->kvm.arch.mmu.pgd_phys = 0ULL;
> +	guest_unlock_component(vm);
> +
> +	WARN_ON(__pkvm_hyp_donate_host(pfn, nr_pages));
>  }

The pfn introduction being removed in a subsequent patch, this is probably
unecessary noise.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 18/24] KVM: arm64: Instantiate guest stage-2 page-tables at EL2
@ 2022-07-19 13:32     ` Vincent Donnefort
  0 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-19 13:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

[...]

>  }
>  
>  void reclaim_guest_pages(struct kvm_shadow_vm *vm)
>  {
> -	unsigned long nr_pages;
> +	unsigned long nr_pages, pfn;
>  
>  	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
> -	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
> +	pfn = hyp_virt_to_pfn(vm->pgt.pgd);
> +
> +	guest_lock_component(vm);
> +	kvm_pgtable_stage2_destroy(&vm->pgt);
> +	vm->kvm.arch.mmu.pgd_phys = 0ULL;
> +	guest_unlock_component(vm);
> +
> +	WARN_ON(__pkvm_hyp_donate_host(pfn, nr_pages));
>  }

The pfn introduction being removed in a subsequent patch, this is probably
unecessary noise.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 18/24] KVM: arm64: Instantiate guest stage-2 page-tables at EL2
@ 2022-07-19 13:32     ` Vincent Donnefort
  0 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-19 13:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

[...]

>  }
>  
>  void reclaim_guest_pages(struct kvm_shadow_vm *vm)
>  {
> -	unsigned long nr_pages;
> +	unsigned long nr_pages, pfn;
>  
>  	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
> -	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
> +	pfn = hyp_virt_to_pfn(vm->pgt.pgd);
> +
> +	guest_lock_component(vm);
> +	kvm_pgtable_stage2_destroy(&vm->pgt);
> +	vm->kvm.arch.mmu.pgd_phys = 0ULL;
> +	guest_unlock_component(vm);
> +
> +	WARN_ON(__pkvm_hyp_donate_host(pfn, nr_pages));
>  }

The pfn introduction being removed in a subsequent patch, this is probably
unecessary noise.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 14/24] KVM: arm64: Add pcpu fixmap infrastructure at EL2
  2022-07-19 13:30     ` Vincent Donnefort
  (?)
@ 2022-07-19 14:09       ` Quentin Perret
  -1 siblings, 0 replies; 135+ messages in thread
From: Quentin Perret @ 2022-07-19 14:09 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Marc Zyngier, kernel-team, kvm, Oliver Upton, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon, kvmarm

On Tue, Jul 19, 2022 at 3:30 PM Vincent Donnefort <vdonnefort@google.com> wrote:
>
> >  static struct hyp_pool host_s2_pool;
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> > index d3a3b47181de..17d689483ec4 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> > @@ -14,6 +14,7 @@
> >  #include <nvhe/early_alloc.h>
> >  #include <nvhe/gfp.h>
> >  #include <nvhe/memory.h>
> > +#include <nvhe/mem_protect.h>
> >  #include <nvhe/mm.h>
> >  #include <nvhe/spinlock.h>
> >
> > @@ -24,6 +25,7 @@ struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
> >  unsigned int hyp_memblock_nr;
> >
> >  static u64 __io_map_base;
> > +static DEFINE_PER_CPU(void *, hyp_fixmap_base);
> >
> >  static int __pkvm_create_mappings(unsigned long start, unsigned long size,
> >                                 unsigned long phys, enum kvm_pgtable_prot prot)
> > @@ -212,6 +214,76 @@ int hyp_map_vectors(void)
> >       return 0;
> >  }
> >
> > +void *hyp_fixmap_map(phys_addr_t phys)
> > +{
> > +     void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> > +     int ret = kvm_pgtable_hyp_map(&pkvm_pgtable, (u64)addr, PAGE_SIZE,
> > +                                   phys, PAGE_HYP);
> > +     return ret ? NULL : addr;
> > +}
> > +
> > +int hyp_fixmap_unmap(void)
> > +{
> > +     void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> > +     int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, (u64)addr, PAGE_SIZE);
> > +
> > +     return (ret != PAGE_SIZE) ? -EINVAL : 0;
> > +}
> > +
>
> I probably missed something but as the pagetable pages for this mapping are
> pined, it seems impossible (currently) for this call to fail. Maybe a WARN_ON
> would be more appropriate, especially the callers in the subsequent patches do
> not seem to check for this function return value?

Right, I think that wouldn't hurt. And while looking at this, I
actually think we could get rid of these calls to the map/unmap
functions entirely by keeping the pointers to the reserved PTEs
directly in addition to the VA to which they correspond in the percpu
table. That way we could manipulate the PTEs directly and avoid
unnecessary pgtable walks. Bits [63:1] can probably remain untouched,
and {un}mapping is then only a matter of flipping bit 0 in the PTE
(and TLBI on the unmap path). I'll have a go at it.

Cheers,
Quentin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 14/24] KVM: arm64: Add pcpu fixmap infrastructure at EL2
@ 2022-07-19 14:09       ` Quentin Perret
  0 siblings, 0 replies; 135+ messages in thread
From: Quentin Perret @ 2022-07-19 14:09 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Will Deacon, kvmarm, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

On Tue, Jul 19, 2022 at 3:30 PM Vincent Donnefort <vdonnefort@google.com> wrote:
>
> >  static struct hyp_pool host_s2_pool;
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> > index d3a3b47181de..17d689483ec4 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> > @@ -14,6 +14,7 @@
> >  #include <nvhe/early_alloc.h>
> >  #include <nvhe/gfp.h>
> >  #include <nvhe/memory.h>
> > +#include <nvhe/mem_protect.h>
> >  #include <nvhe/mm.h>
> >  #include <nvhe/spinlock.h>
> >
> > @@ -24,6 +25,7 @@ struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
> >  unsigned int hyp_memblock_nr;
> >
> >  static u64 __io_map_base;
> > +static DEFINE_PER_CPU(void *, hyp_fixmap_base);
> >
> >  static int __pkvm_create_mappings(unsigned long start, unsigned long size,
> >                                 unsigned long phys, enum kvm_pgtable_prot prot)
> > @@ -212,6 +214,76 @@ int hyp_map_vectors(void)
> >       return 0;
> >  }
> >
> > +void *hyp_fixmap_map(phys_addr_t phys)
> > +{
> > +     void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> > +     int ret = kvm_pgtable_hyp_map(&pkvm_pgtable, (u64)addr, PAGE_SIZE,
> > +                                   phys, PAGE_HYP);
> > +     return ret ? NULL : addr;
> > +}
> > +
> > +int hyp_fixmap_unmap(void)
> > +{
> > +     void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> > +     int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, (u64)addr, PAGE_SIZE);
> > +
> > +     return (ret != PAGE_SIZE) ? -EINVAL : 0;
> > +}
> > +
>
> I probably missed something but as the pagetable pages for this mapping are
> pined, it seems impossible (currently) for this call to fail. Maybe a WARN_ON
> would be more appropriate, especially the callers in the subsequent patches do
> not seem to check for this function return value?

Right, I think that wouldn't hurt. And while looking at this, I
actually think we could get rid of these calls to the map/unmap
functions entirely by keeping the pointers to the reserved PTEs
directly in addition to the VA to which they correspond in the percpu
table. That way we could manipulate the PTEs directly and avoid
unnecessary pgtable walks. Bits [63:1] can probably remain untouched,
and {un}mapping is then only a matter of flipping bit 0 in the PTE
(and TLBI on the unmap path). I'll have a go at it.

Cheers,
Quentin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 14/24] KVM: arm64: Add pcpu fixmap infrastructure at EL2
@ 2022-07-19 14:09       ` Quentin Perret
  0 siblings, 0 replies; 135+ messages in thread
From: Quentin Perret @ 2022-07-19 14:09 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Will Deacon, kvmarm, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

On Tue, Jul 19, 2022 at 3:30 PM Vincent Donnefort <vdonnefort@google.com> wrote:
>
> >  static struct hyp_pool host_s2_pool;
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> > index d3a3b47181de..17d689483ec4 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> > @@ -14,6 +14,7 @@
> >  #include <nvhe/early_alloc.h>
> >  #include <nvhe/gfp.h>
> >  #include <nvhe/memory.h>
> > +#include <nvhe/mem_protect.h>
> >  #include <nvhe/mm.h>
> >  #include <nvhe/spinlock.h>
> >
> > @@ -24,6 +25,7 @@ struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
> >  unsigned int hyp_memblock_nr;
> >
> >  static u64 __io_map_base;
> > +static DEFINE_PER_CPU(void *, hyp_fixmap_base);
> >
> >  static int __pkvm_create_mappings(unsigned long start, unsigned long size,
> >                                 unsigned long phys, enum kvm_pgtable_prot prot)
> > @@ -212,6 +214,76 @@ int hyp_map_vectors(void)
> >       return 0;
> >  }
> >
> > +void *hyp_fixmap_map(phys_addr_t phys)
> > +{
> > +     void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> > +     int ret = kvm_pgtable_hyp_map(&pkvm_pgtable, (u64)addr, PAGE_SIZE,
> > +                                   phys, PAGE_HYP);
> > +     return ret ? NULL : addr;
> > +}
> > +
> > +int hyp_fixmap_unmap(void)
> > +{
> > +     void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> > +     int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, (u64)addr, PAGE_SIZE);
> > +
> > +     return (ret != PAGE_SIZE) ? -EINVAL : 0;
> > +}
> > +
>
> I probably missed something but as the pagetable pages for this mapping are
> pined, it seems impossible (currently) for this call to fail. Maybe a WARN_ON
> would be more appropriate, especially the callers in the subsequent patches do
> not seem to check for this function return value?

Right, I think that wouldn't hurt. And while looking at this, I
actually think we could get rid of these calls to the map/unmap
functions entirely by keeping the pointers to the reserved PTEs
directly in addition to the VA to which they correspond in the percpu
table. That way we could manipulate the PTEs directly and avoid
unnecessary pgtable walks. Bits [63:1] can probably remain untouched,
and {un}mapping is then only a matter of flipping bit 0 in the PTE
(and TLBI on the unmap path). I'll have a go at it.

Cheers,
Quentin

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 14/24] KVM: arm64: Add pcpu fixmap infrastructure at EL2
  2022-07-19 14:09       ` Quentin Perret
  (?)
@ 2022-07-19 14:10         ` Quentin Perret
  -1 siblings, 0 replies; 135+ messages in thread
From: Quentin Perret @ 2022-07-19 14:10 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Marc Zyngier, kernel-team, kvm, Oliver Upton, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	Will Deacon, kvmarm

On Tue, Jul 19, 2022 at 4:09 PM Quentin Perret <qperret@google.com> wrote:
>
> On Tue, Jul 19, 2022 at 3:30 PM Vincent Donnefort <vdonnefort@google.com> wrote:
> >
> > >  static struct hyp_pool host_s2_pool;
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> > > index d3a3b47181de..17d689483ec4 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> > > @@ -14,6 +14,7 @@
> > >  #include <nvhe/early_alloc.h>
> > >  #include <nvhe/gfp.h>
> > >  #include <nvhe/memory.h>
> > > +#include <nvhe/mem_protect.h>
> > >  #include <nvhe/mm.h>
> > >  #include <nvhe/spinlock.h>
> > >
> > > @@ -24,6 +25,7 @@ struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
> > >  unsigned int hyp_memblock_nr;
> > >
> > >  static u64 __io_map_base;
> > > +static DEFINE_PER_CPU(void *, hyp_fixmap_base);
> > >
> > >  static int __pkvm_create_mappings(unsigned long start, unsigned long size,
> > >                                 unsigned long phys, enum kvm_pgtable_prot prot)
> > > @@ -212,6 +214,76 @@ int hyp_map_vectors(void)
> > >       return 0;
> > >  }
> > >
> > > +void *hyp_fixmap_map(phys_addr_t phys)
> > > +{
> > > +     void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> > > +     int ret = kvm_pgtable_hyp_map(&pkvm_pgtable, (u64)addr, PAGE_SIZE,
> > > +                                   phys, PAGE_HYP);
> > > +     return ret ? NULL : addr;
> > > +}
> > > +
> > > +int hyp_fixmap_unmap(void)
> > > +{
> > > +     void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> > > +     int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, (u64)addr, PAGE_SIZE);
> > > +
> > > +     return (ret != PAGE_SIZE) ? -EINVAL : 0;
> > > +}
> > > +
> >
> > I probably missed something but as the pagetable pages for this mapping are
> > pined, it seems impossible (currently) for this call to fail. Maybe a WARN_ON
> > would be more appropriate, especially the callers in the subsequent patches do
> > not seem to check for this function return value?
>
> Right, I think that wouldn't hurt. And while looking at this, I
> actually think we could get rid of these calls to the map/unmap
> functions entirely by keeping the pointers to the reserved PTEs
> directly in addition to the VA to which they correspond in the percpu
> table. That way we could manipulate the PTEs directly and avoid
> unnecessary pgtable walks. Bits [63:1] can probably remain untouched,

 Well, the address bits need to change too obviously, but rest can stay.

> and {un}mapping is then only a matter of flipping bit 0 in the PTE
> (and TLBI on the unmap path). I'll have a go at it.
>
> Cheers,
> Quentin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 14/24] KVM: arm64: Add pcpu fixmap infrastructure at EL2
@ 2022-07-19 14:10         ` Quentin Perret
  0 siblings, 0 replies; 135+ messages in thread
From: Quentin Perret @ 2022-07-19 14:10 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Will Deacon, kvmarm, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

On Tue, Jul 19, 2022 at 4:09 PM Quentin Perret <qperret@google.com> wrote:
>
> On Tue, Jul 19, 2022 at 3:30 PM Vincent Donnefort <vdonnefort@google.com> wrote:
> >
> > >  static struct hyp_pool host_s2_pool;
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> > > index d3a3b47181de..17d689483ec4 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> > > @@ -14,6 +14,7 @@
> > >  #include <nvhe/early_alloc.h>
> > >  #include <nvhe/gfp.h>
> > >  #include <nvhe/memory.h>
> > > +#include <nvhe/mem_protect.h>
> > >  #include <nvhe/mm.h>
> > >  #include <nvhe/spinlock.h>
> > >
> > > @@ -24,6 +25,7 @@ struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
> > >  unsigned int hyp_memblock_nr;
> > >
> > >  static u64 __io_map_base;
> > > +static DEFINE_PER_CPU(void *, hyp_fixmap_base);
> > >
> > >  static int __pkvm_create_mappings(unsigned long start, unsigned long size,
> > >                                 unsigned long phys, enum kvm_pgtable_prot prot)
> > > @@ -212,6 +214,76 @@ int hyp_map_vectors(void)
> > >       return 0;
> > >  }
> > >
> > > +void *hyp_fixmap_map(phys_addr_t phys)
> > > +{
> > > +     void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> > > +     int ret = kvm_pgtable_hyp_map(&pkvm_pgtable, (u64)addr, PAGE_SIZE,
> > > +                                   phys, PAGE_HYP);
> > > +     return ret ? NULL : addr;
> > > +}
> > > +
> > > +int hyp_fixmap_unmap(void)
> > > +{
> > > +     void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> > > +     int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, (u64)addr, PAGE_SIZE);
> > > +
> > > +     return (ret != PAGE_SIZE) ? -EINVAL : 0;
> > > +}
> > > +
> >
> > I probably missed something but as the pagetable pages for this mapping are
> > pined, it seems impossible (currently) for this call to fail. Maybe a WARN_ON
> > would be more appropriate, especially the callers in the subsequent patches do
> > not seem to check for this function return value?
>
> Right, I think that wouldn't hurt. And while looking at this, I
> actually think we could get rid of these calls to the map/unmap
> functions entirely by keeping the pointers to the reserved PTEs
> directly in addition to the VA to which they correspond in the percpu
> table. That way we could manipulate the PTEs directly and avoid
> unnecessary pgtable walks. Bits [63:1] can probably remain untouched,

 Well, the address bits need to change too obviously, but rest can stay.

> and {un}mapping is then only a matter of flipping bit 0 in the PTE
> (and TLBI on the unmap path). I'll have a go at it.
>
> Cheers,
> Quentin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 14/24] KVM: arm64: Add pcpu fixmap infrastructure at EL2
@ 2022-07-19 14:10         ` Quentin Perret
  0 siblings, 0 replies; 135+ messages in thread
From: Quentin Perret @ 2022-07-19 14:10 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Will Deacon, kvmarm, Ard Biesheuvel, Sean Christopherson,
	Alexandru Elisei, Andy Lutomirski, Catalin Marinas, James Morse,
	Chao Peng, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

On Tue, Jul 19, 2022 at 4:09 PM Quentin Perret <qperret@google.com> wrote:
>
> On Tue, Jul 19, 2022 at 3:30 PM Vincent Donnefort <vdonnefort@google.com> wrote:
> >
> > >  static struct hyp_pool host_s2_pool;
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> > > index d3a3b47181de..17d689483ec4 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> > > @@ -14,6 +14,7 @@
> > >  #include <nvhe/early_alloc.h>
> > >  #include <nvhe/gfp.h>
> > >  #include <nvhe/memory.h>
> > > +#include <nvhe/mem_protect.h>
> > >  #include <nvhe/mm.h>
> > >  #include <nvhe/spinlock.h>
> > >
> > > @@ -24,6 +25,7 @@ struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
> > >  unsigned int hyp_memblock_nr;
> > >
> > >  static u64 __io_map_base;
> > > +static DEFINE_PER_CPU(void *, hyp_fixmap_base);
> > >
> > >  static int __pkvm_create_mappings(unsigned long start, unsigned long size,
> > >                                 unsigned long phys, enum kvm_pgtable_prot prot)
> > > @@ -212,6 +214,76 @@ int hyp_map_vectors(void)
> > >       return 0;
> > >  }
> > >
> > > +void *hyp_fixmap_map(phys_addr_t phys)
> > > +{
> > > +     void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> > > +     int ret = kvm_pgtable_hyp_map(&pkvm_pgtable, (u64)addr, PAGE_SIZE,
> > > +                                   phys, PAGE_HYP);
> > > +     return ret ? NULL : addr;
> > > +}
> > > +
> > > +int hyp_fixmap_unmap(void)
> > > +{
> > > +     void *addr = *this_cpu_ptr(&hyp_fixmap_base);
> > > +     int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, (u64)addr, PAGE_SIZE);
> > > +
> > > +     return (ret != PAGE_SIZE) ? -EINVAL : 0;
> > > +}
> > > +
> >
> > I probably missed something but as the pagetable pages for this mapping are
> > pined, it seems impossible (currently) for this call to fail. Maybe a WARN_ON
> > would be more appropriate, especially the callers in the subsequent patches do
> > not seem to check for this function return value?
>
> Right, I think that wouldn't hurt. And while looking at this, I
> actually think we could get rid of these calls to the map/unmap
> functions entirely by keeping the pointers to the reserved PTEs
> directly in addition to the VA to which they correspond in the percpu
> table. That way we could manipulate the PTEs directly and avoid
> unnecessary pgtable walks. Bits [63:1] can probably remain untouched,

 Well, the address bits need to change too obviously, but rest can stay.

> and {un}mapping is then only a matter of flipping bit 0 in the PTE
> (and TLBI on the unmap path). I'll have a go at it.
>
> Cheers,
> Quentin

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
  2022-06-30 13:57 ` Will Deacon
  (?)
@ 2022-07-19 14:24   ` Vincent Donnefort
  -1 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-19 14:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kernel-team, kvm, Oliver Upton, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	kvmarm

On Thu, Jun 30, 2022 at 02:57:23PM +0100, Will Deacon wrote:
> Hi everyone,
> 
> This series has been extracted from the pKVM base support series (aka
> "pKVM mega-patch") previously posted here:
> 
>   https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/
> 
> Unlike that more comprehensive series, this one is fairly fundamental
> and does not introduce any new ABI commitments, leaving questions
> involving the management of guest private memory and the creation of
> protected VMs for future work. Instead, this series extends the pKVM EL2
> code so that it can dynamically instantiate and manage VM shadow
> structures without the host being able to access them directly. These
> shadow structures consist of a shadow VM, a set of shadow vCPUs and the
> stage-2 page-table and the pages used to hold them are returned to the
> host when the VM is destroyed.
> 
> The last patch is marked as RFC because, although it plumbs in the
> shadow state, it is woefully inefficient and copies to/from the host
> state on every vCPU run. Without the last patch, the new structures are
> unused but we move considerably closer to isolating guests from the
> host.
> 
> The series is based on Marc's rework of the flags
> (kvm-arm64/burn-the-flags).
> 
> Feedback welcome.
> 
> Cheers,

Only had few nitpicks

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>

Also, I've been using this patchset for quite a while now.

Tested-by: Vincent Donnefort <vdonnefort@google.com>

[...]
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-07-19 14:24   ` Vincent Donnefort
  0 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-19 14:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

On Thu, Jun 30, 2022 at 02:57:23PM +0100, Will Deacon wrote:
> Hi everyone,
> 
> This series has been extracted from the pKVM base support series (aka
> "pKVM mega-patch") previously posted here:
> 
>   https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/
> 
> Unlike that more comprehensive series, this one is fairly fundamental
> and does not introduce any new ABI commitments, leaving questions
> involving the management of guest private memory and the creation of
> protected VMs for future work. Instead, this series extends the pKVM EL2
> code so that it can dynamically instantiate and manage VM shadow
> structures without the host being able to access them directly. These
> shadow structures consist of a shadow VM, a set of shadow vCPUs and the
> stage-2 page-table and the pages used to hold them are returned to the
> host when the VM is destroyed.
> 
> The last patch is marked as RFC because, although it plumbs in the
> shadow state, it is woefully inefficient and copies to/from the host
> state on every vCPU run. Without the last patch, the new structures are
> unused but we move considerably closer to isolating guests from the
> host.
> 
> The series is based on Marc's rework of the flags
> (kvm-arm64/burn-the-flags).
> 
> Feedback welcome.
> 
> Cheers,

Only had few nitpicks

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>

Also, I've been using this patchset for quite a while now.

Tested-by: Vincent Donnefort <vdonnefort@google.com>

[...]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-07-19 14:24   ` Vincent Donnefort
  0 siblings, 0 replies; 135+ messages in thread
From: Vincent Donnefort @ 2022-07-19 14:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

On Thu, Jun 30, 2022 at 02:57:23PM +0100, Will Deacon wrote:
> Hi everyone,
> 
> This series has been extracted from the pKVM base support series (aka
> "pKVM mega-patch") previously posted here:
> 
>   https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/
> 
> Unlike that more comprehensive series, this one is fairly fundamental
> and does not introduce any new ABI commitments, leaving questions
> involving the management of guest private memory and the creation of
> protected VMs for future work. Instead, this series extends the pKVM EL2
> code so that it can dynamically instantiate and manage VM shadow
> structures without the host being able to access them directly. These
> shadow structures consist of a shadow VM, a set of shadow vCPUs and the
> stage-2 page-table and the pages used to hold them are returned to the
> host when the VM is destroyed.
> 
> The last patch is marked as RFC because, although it plumbs in the
> shadow state, it is woefully inefficient and copies to/from the host
> state on every vCPU run. Without the last patch, the new structures are
> unused but we move considerably closer to isolating guests from the
> host.
> 
> The series is based on Marc's rework of the flags
> (kvm-arm64/burn-the-flags).
> 
> Feedback welcome.
> 
> Cheers,

Only had few nitpicks

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>

Also, I've been using this patchset for quite a while now.

Tested-by: Vincent Donnefort <vdonnefort@google.com>

[...]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
  2022-07-08 16:23     ` Will Deacon
  (?)
@ 2022-07-19 16:11       ` Sean Christopherson
  -1 siblings, 0 replies; 135+ messages in thread
From: Sean Christopherson @ 2022-07-19 16:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Alexandru Elisei, Andy Lutomirski,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Michael Roth, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Apologies for the slow reply.

On Fri, Jul 08, 2022, Will Deacon wrote:
> On Wed, Jul 06, 2022 at 07:17:29PM +0000, Sean Christopherson wrote:
> > On Thu, Jun 30, 2022, Will Deacon wrote:
> > The lack of documentation and the rather terse changelogs make this really hard
> > to review for folks that aren't intimately familiar with pKVM.  I have a decent
> > idea of the end goal of "shadowing", but that's mostly because of my involvement in
> > similar x86 projects.  Nothing in the changelogs ever explains _why_ pKVM uses
> > shadows.
> 
> That's understandable, but thanks for persevering; this series is pretty
> down in the murky depths of the arm64 architecture and EL2 code so it
> doesn't really map to the KVM code most folks are familiar with. It's fair
> to say we're assuming a lot of niche prior knowledge (which is quite common
> for arch code in my experience),

Assuming prior knowledge is fine so long as that prior knowledge is something
that can be gained by future readers through public documentation.  E.g. arch code
is always littered with acronyms, but someone can almost always decipher the
acronyms by reading specs or just searching the interwebs.

My objection to the changelogs is that they talk about "shadow" VMs, vCPUs, state,
structures, etc., without ever explaining what a shadow is, how it will be used,
or what its end purpose is.

A big part of the problem is that "shadow" is not unique terminology _and_ isn't
inherently tied to "protected kvm", e.g. a reader can't intuit that a "shadow vm"
is the the trusted hypervisors instance of the VM.  And I can search for pKVM and
get relevant, helpful search results.  But if I search for "shadow vm", I get
unrelated results, and "pkvm shadow vm" just leads me back to this series.

> but I wanted to inherit the broader cc list so you were aware of this
> break-away series. Sadly, I don't think beefing up the commit messages would
> get us to a point where somebody unfamiliar with the EL2 code already could
> give a constructive review, but we can try to expand them a bit if you
> genuinely think it would help.

I'm not looking at it just from a review point, but also from a future readers
perspective.  E.g. someone that looks at this changelog in isolation is going to
have no idea what a "shadow VM" is:

  KVM: arm64: Introduce pKVM shadow VM state at EL2

  Introduce a table of shadow VM structures at EL2 and provide hypercalls
  to the host for creating and destroying shadow VMs.

Obviously there will be some context available in surrounding patches, but if you
avoid the "shadow" terminology and provide a bit more context, then it yields
something like:

  KVM: arm64: Add infrastructure to create and track pKVM instances at EL2

  Introduce a global table (and lock) to track pKVM instances at EL2, and
  provide hypercalls that can be used by the untrusted host to create and
  destroy pKVM VMs.  pKVM VM/vCPU state is directly accessible only by the
  trusted hypervisor (EL2).  

  Each pKVM VM is directly associated with an untrusted host KVM instance,
  and is referenced by the host using an opaque handle.  Future patches will
  provide hypercalls to allow the host to initialize/set/get pKVM VM/vCPU
  state using the opaque handle.
   
> On the more positive side, we'll be speaking at KVM forum about what we've
> done here, so that will be a great place to discuss it further and then we
> can also link back to the recordings in later postings of the mega-series.
> 
> > I put "shadowing" in quotes because if the unstrusted host is aware that the VM
> > and vCPU it is manipulating aren't the "real" VMs/vCPUs, and there is an explicit API
> > between the untrusted host and pKVM for creating/destroying VMs/vCPUs, then I would
> > argue that it's not truly shadowing, especially if pKVM uses data/values verbatim
> > and only verifies correctness/safety.  It's definitely a nit, but for future readers
> > I think overloading "shadowing" could be confusing.
> 
> Ah, this is really interesting and nicely puts the ball back in my court as
> I'm not well versed with x86's use of "shadowing".

It's not just an x86, e.g. see https://en.wikipedia.org/wiki/Shadow_table.  The
use in pKVM is _really_ close in that what pKVM calls the shadow is the "real"
data that's used, but pKVM inverts the typical virtualization usage, which is
why I find it confusing.  I.e. instead of shadowing state being written by the
guest, pKVM is "shadowing" state written by the host.  If there ever comes a need
to actually shadow guest state, e.g. for nested virtualization, then using shadow
to refer to the protected state is going to create a conundrum.

Honestly, I think pKVM is simply being too cute in picking names.  And not just
for "shadow", e.g. IMO the flush/sync terminology in patch 24 is also unnecessarily
cute.  Instead of coming up with clever names, just be explicit in what the code
is doing.  E.g. something like:

  flush_shadow_state() => sync_host_to_pkvm_vcpu()
  sync_shadow_state()  => sync_pkvm_to_host_vcpu()

Then readers know the two functions are pairs, and will have a decent idea of
what the functions do even if they don't fully understand pKVM vs. host.

"shadow_area_size" is another case where it's unnecessarily cryptic, e.g. just
call it "donated_memory_size".

> Perhaps we should s/shadow/hyp/ to make this a little clearer?

Or maybe just "pkvm"?  I think that's especially viable if you do away with
kvm_shadow_vcpu_state.  As of this series at least, kvm_shadow_vcpu_state is
completely unnecessary.  kvm_vcpu.kvm can be used to get at the VM, and thus pKVM
state via container_of().  Then the host_vcpu can be retrieved by using the
vcpu_idx, e.g.

	struct pkvm_vm *pkvm_vm = to_pkvm_vm(pkvm_vcpu->vm);
	struct kvm_vcpu *host_vcpu;

	host_vcpu = kvm_get_vcpu(pkvm_vm->host_vm, pkvm_vcpu->vcpu_idx);

Even better is to not have to do that in the first place.  AFAICT, there's no need
to do such a lookup in handle___kvm_vcpu_run() since pKVM already has pointers to
both the host vCPU and the pKVM vCPU.

E.g. I believe you can make the code look like this:

struct kvm_arch {
	...

	/*
	 * For an unstructed host VM, pkvm_handle is used to lookup the
	 * associated pKVM instance.
	 */
	pvk_handle_t pkvm_handle;
};

struct pkvm_vm {
	struct kvm kvm;

	/* Backpointer to the host's (untrusted) KVM instance. */
	struct kvm *host_kvm;

	size_t donated_memory_size;

	struct kvm_pgtable pgt;
};

static struct kvm *pkvm_get_vm(pkvm_handle_t handle)
{
	unsigned int idx = pkvm_handle_to_idx(handle);

	if (unlikely(idx >= KVM_MAX_PVMS))
		return NULL;

	return pkvm_vm_table[idx];
}

struct kvm_vcpu *pkvm_vcpu_load(pkvm_handle_t handle, unsigned int vcpu_idx)
{
	struct kvm_vpcu *pkvm_vcpu = NULL;
	struct kvm *vm;

	hyp_spin_lock(&pkvm_global_lock);
	vm = pkvm_get_vm(handle);
	if (!vm || atomic_read(&vm->online_vcpus) <= vcpu_idx)
		goto unlock;

	pkvm_vcpu = kvm_get_vcpu(vm, vcpu_idx);
	hyp_page_ref_inc(hyp_virt_to_page(vm));
unlock:
	hyp_spin_unlock(&pkvm_global_lock);
	return pkvm_vcpu;
}

struct kvm_vcpu *pkvm_vcpu_put(struct kvm_vcpu *pkvm_vcpu)
{
	hyp_spin_lock(&pkvm_global_lock);
	hyp_page_ref_dec(hyp_virt_to_page(pkvm_vcpu->kvm));
	hyp_spin_unlock(&pkvm_global_lock);
}

static void sync_host_to_pkvm_vcpu(struct kvm_vcpu *pkvm_vcpu, struct kvm_vcpu *host_vcpu)
{
	pkvm_vcpu->arch.ctxt		= host_vcpu->arch.ctxt;

	pkvm_vcpu->arch.sve_state	= kern_hyp_va(host_vcpu->arch.sve_state);
	pkvm_vcpu->arch.sve_max_vl	= host_vcpu->arch.sve_max_vl;

	pkvm_vcpu->arch.hw_mmu		= host_vcpu->arch.hw_mmu;

	pkvm_vcpu->arch.hcr_el2		= host_vcpu->arch.hcr_el2;
	pkvm_vcpu->arch.mdcr_el2	= host_vcpu->arch.mdcr_el2;
	pkvm_vcpu->arch.cptr_el2	= host_vcpu->arch.cptr_el2;

	pkvm_vcpu->arch.iflags		= host_vcpu->arch.iflags;
	pkvm_vcpu->arch.fp_state	= host_vcpu->arch.fp_state;

	pkvm_vcpu->arch.debug_ptr	= kern_hyp_va(host_vcpu->arch.debug_ptr);
	pkvm_vcpu->arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state;

	pkvm_vcpu->arch.vsesr_el2	= host_vcpu->arch.vsesr_el2;

	pkvm_vcpu->arch.vgic_cpu.vgic_v3 = host_vcpu->arch.vgic_cpu.vgic_v3;
}

static void sync_pkvm_to_host_vcpu(struct kvm_vcpu *host_vcpu, struct kvm_vcpu *pkvm_vcpu)
{
	struct vgic_v3_cpu_if *pkvm_cpu_if = &pkvm_vcpu->arch.vgic_cpu.vgic_v3;
	struct vgic_v3_cpu_if *host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
	unsigned int i;

	host_vcpu->arch.ctxt		= pkvm_vcpu->arch.ctxt;

	host_vcpu->arch.hcr_el2		= pkvm_vcpu->arch.hcr_el2;
	host_vcpu->arch.cptr_el2	= pkvm_vcpu->arch.cptr_el2;

	host_vcpu->arch.fault		= pkvm_vcpu->arch.fault;

	host_vcpu->arch.iflags		= pkvm_vcpu->arch.iflags;
	host_vcpu->arch.fp_state	= pkvm_vcpu->arch.fp_state;

	host_cpu_if->vgic_hcr		= pkvm_cpu_if->vgic_hcr;
	for (i = 0; i < pkvm_cpu_if->used_lrs; ++i)
		host_cpu_if->vgic_lr[i] = pkvm_cpu_if->vgic_lr[i];
}

static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
{
	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
	int ret;

	host_vcpu = kern_hyp_va(host_vcpu);

	if (unlikely(is_protected_kvm_enabled())) {
		struct kvm *host_kvm = kern_hyp_va(host_vcpu->kvm);
		struct kvm_vcpu *pkvm_vcpu;

		pkvm_vcpu = pkvm_vcpu_load(host_kvm, host_vcpu);
		if (!pkvm_vcpu) {
			ret = -EINVAL;
			goto out;
		}

		sync_host_to_pkvm_vcpu(pkvm_vcpu, host_vcpu);

		ret = __kvm_vcpu_run(pkvm_vcpu);

		sync_pkvm_to_host_vcpu(host_vcpu, pkvm_vcpu);

		pkvm_vcpu_put(pkvm_vcpu);
	} else {
		ret = __kvm_vcpu_run(host_vcpu);
	}

out:
	cpu_reg(host_ctxt, 1) =  ret;
}

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-07-19 16:11       ` Sean Christopherson
  0 siblings, 0 replies; 135+ messages in thread
From: Sean Christopherson @ 2022-07-19 16:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kernel-team, kvm, Catalin Marinas, Oliver Upton,
	Andy Lutomirski, linux-arm-kernel, Michael Roth, Chao Peng,
	kvmarm

Apologies for the slow reply.

On Fri, Jul 08, 2022, Will Deacon wrote:
> On Wed, Jul 06, 2022 at 07:17:29PM +0000, Sean Christopherson wrote:
> > On Thu, Jun 30, 2022, Will Deacon wrote:
> > The lack of documentation and the rather terse changelogs make this really hard
> > to review for folks that aren't intimately familiar with pKVM.  I have a decent
> > idea of the end goal of "shadowing", but that's mostly because of my involvement in
> > similar x86 projects.  Nothing in the changelogs ever explains _why_ pKVM uses
> > shadows.
> 
> That's understandable, but thanks for persevering; this series is pretty
> down in the murky depths of the arm64 architecture and EL2 code so it
> doesn't really map to the KVM code most folks are familiar with. It's fair
> to say we're assuming a lot of niche prior knowledge (which is quite common
> for arch code in my experience),

Assuming prior knowledge is fine so long as that prior knowledge is something
that can be gained by future readers through public documentation.  E.g. arch code
is always littered with acronyms, but someone can almost always decipher the
acronyms by reading specs or just searching the interwebs.

My objection to the changelogs is that they talk about "shadow" VMs, vCPUs, state,
structures, etc., without ever explaining what a shadow is, how it will be used,
or what its end purpose is.

A big part of the problem is that "shadow" is not unique terminology _and_ isn't
inherently tied to "protected kvm", e.g. a reader can't intuit that a "shadow vm"
is the the trusted hypervisors instance of the VM.  And I can search for pKVM and
get relevant, helpful search results.  But if I search for "shadow vm", I get
unrelated results, and "pkvm shadow vm" just leads me back to this series.

> but I wanted to inherit the broader cc list so you were aware of this
> break-away series. Sadly, I don't think beefing up the commit messages would
> get us to a point where somebody unfamiliar with the EL2 code already could
> give a constructive review, but we can try to expand them a bit if you
> genuinely think it would help.

I'm not looking at it just from a review point, but also from a future readers
perspective.  E.g. someone that looks at this changelog in isolation is going to
have no idea what a "shadow VM" is:

  KVM: arm64: Introduce pKVM shadow VM state at EL2

  Introduce a table of shadow VM structures at EL2 and provide hypercalls
  to the host for creating and destroying shadow VMs.

Obviously there will be some context available in surrounding patches, but if you
avoid the "shadow" terminology and provide a bit more context, then it yields
something like:

  KVM: arm64: Add infrastructure to create and track pKVM instances at EL2

  Introduce a global table (and lock) to track pKVM instances at EL2, and
  provide hypercalls that can be used by the untrusted host to create and
  destroy pKVM VMs.  pKVM VM/vCPU state is directly accessible only by the
  trusted hypervisor (EL2).  

  Each pKVM VM is directly associated with an untrusted host KVM instance,
  and is referenced by the host using an opaque handle.  Future patches will
  provide hypercalls to allow the host to initialize/set/get pKVM VM/vCPU
  state using the opaque handle.
   
> On the more positive side, we'll be speaking at KVM forum about what we've
> done here, so that will be a great place to discuss it further and then we
> can also link back to the recordings in later postings of the mega-series.
> 
> > I put "shadowing" in quotes because if the unstrusted host is aware that the VM
> > and vCPU it is manipulating aren't the "real" VMs/vCPUs, and there is an explicit API
> > between the untrusted host and pKVM for creating/destroying VMs/vCPUs, then I would
> > argue that it's not truly shadowing, especially if pKVM uses data/values verbatim
> > and only verifies correctness/safety.  It's definitely a nit, but for future readers
> > I think overloading "shadowing" could be confusing.
> 
> Ah, this is really interesting and nicely puts the ball back in my court as
> I'm not well versed with x86's use of "shadowing".

It's not just an x86, e.g. see https://en.wikipedia.org/wiki/Shadow_table.  The
use in pKVM is _really_ close in that what pKVM calls the shadow is the "real"
data that's used, but pKVM inverts the typical virtualization usage, which is
why I find it confusing.  I.e. instead of shadowing state being written by the
guest, pKVM is "shadowing" state written by the host.  If there ever comes a need
to actually shadow guest state, e.g. for nested virtualization, then using shadow
to refer to the protected state is going to create a conundrum.

Honestly, I think pKVM is simply being too cute in picking names.  And not just
for "shadow", e.g. IMO the flush/sync terminology in patch 24 is also unnecessarily
cute.  Instead of coming up with clever names, just be explicit in what the code
is doing.  E.g. something like:

  flush_shadow_state() => sync_host_to_pkvm_vcpu()
  sync_shadow_state()  => sync_pkvm_to_host_vcpu()

Then readers know the two functions are pairs, and will have a decent idea of
what the functions do even if they don't fully understand pKVM vs. host.

"shadow_area_size" is another case where it's unnecessarily cryptic, e.g. just
call it "donated_memory_size".

> Perhaps we should s/shadow/hyp/ to make this a little clearer?

Or maybe just "pkvm"?  I think that's especially viable if you do away with
kvm_shadow_vcpu_state.  As of this series at least, kvm_shadow_vcpu_state is
completely unnecessary.  kvm_vcpu.kvm can be used to get at the VM, and thus pKVM
state via container_of().  Then the host_vcpu can be retrieved by using the
vcpu_idx, e.g.

	struct pkvm_vm *pkvm_vm = to_pkvm_vm(pkvm_vcpu->vm);
	struct kvm_vcpu *host_vcpu;

	host_vcpu = kvm_get_vcpu(pkvm_vm->host_vm, pkvm_vcpu->vcpu_idx);

Even better is to not have to do that in the first place.  AFAICT, there's no need
to do such a lookup in handle___kvm_vcpu_run() since pKVM already has pointers to
both the host vCPU and the pKVM vCPU.

E.g. I believe you can make the code look like this:

struct kvm_arch {
	...

	/*
	 * For an unstructed host VM, pkvm_handle is used to lookup the
	 * associated pKVM instance.
	 */
	pvk_handle_t pkvm_handle;
};

struct pkvm_vm {
	struct kvm kvm;

	/* Backpointer to the host's (untrusted) KVM instance. */
	struct kvm *host_kvm;

	size_t donated_memory_size;

	struct kvm_pgtable pgt;
};

static struct kvm *pkvm_get_vm(pkvm_handle_t handle)
{
	unsigned int idx = pkvm_handle_to_idx(handle);

	if (unlikely(idx >= KVM_MAX_PVMS))
		return NULL;

	return pkvm_vm_table[idx];
}

struct kvm_vcpu *pkvm_vcpu_load(pkvm_handle_t handle, unsigned int vcpu_idx)
{
	struct kvm_vpcu *pkvm_vcpu = NULL;
	struct kvm *vm;

	hyp_spin_lock(&pkvm_global_lock);
	vm = pkvm_get_vm(handle);
	if (!vm || atomic_read(&vm->online_vcpus) <= vcpu_idx)
		goto unlock;

	pkvm_vcpu = kvm_get_vcpu(vm, vcpu_idx);
	hyp_page_ref_inc(hyp_virt_to_page(vm));
unlock:
	hyp_spin_unlock(&pkvm_global_lock);
	return pkvm_vcpu;
}

struct kvm_vcpu *pkvm_vcpu_put(struct kvm_vcpu *pkvm_vcpu)
{
	hyp_spin_lock(&pkvm_global_lock);
	hyp_page_ref_dec(hyp_virt_to_page(pkvm_vcpu->kvm));
	hyp_spin_unlock(&pkvm_global_lock);
}

static void sync_host_to_pkvm_vcpu(struct kvm_vcpu *pkvm_vcpu, struct kvm_vcpu *host_vcpu)
{
	pkvm_vcpu->arch.ctxt		= host_vcpu->arch.ctxt;

	pkvm_vcpu->arch.sve_state	= kern_hyp_va(host_vcpu->arch.sve_state);
	pkvm_vcpu->arch.sve_max_vl	= host_vcpu->arch.sve_max_vl;

	pkvm_vcpu->arch.hw_mmu		= host_vcpu->arch.hw_mmu;

	pkvm_vcpu->arch.hcr_el2		= host_vcpu->arch.hcr_el2;
	pkvm_vcpu->arch.mdcr_el2	= host_vcpu->arch.mdcr_el2;
	pkvm_vcpu->arch.cptr_el2	= host_vcpu->arch.cptr_el2;

	pkvm_vcpu->arch.iflags		= host_vcpu->arch.iflags;
	pkvm_vcpu->arch.fp_state	= host_vcpu->arch.fp_state;

	pkvm_vcpu->arch.debug_ptr	= kern_hyp_va(host_vcpu->arch.debug_ptr);
	pkvm_vcpu->arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state;

	pkvm_vcpu->arch.vsesr_el2	= host_vcpu->arch.vsesr_el2;

	pkvm_vcpu->arch.vgic_cpu.vgic_v3 = host_vcpu->arch.vgic_cpu.vgic_v3;
}

static void sync_pkvm_to_host_vcpu(struct kvm_vcpu *host_vcpu, struct kvm_vcpu *pkvm_vcpu)
{
	struct vgic_v3_cpu_if *pkvm_cpu_if = &pkvm_vcpu->arch.vgic_cpu.vgic_v3;
	struct vgic_v3_cpu_if *host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
	unsigned int i;

	host_vcpu->arch.ctxt		= pkvm_vcpu->arch.ctxt;

	host_vcpu->arch.hcr_el2		= pkvm_vcpu->arch.hcr_el2;
	host_vcpu->arch.cptr_el2	= pkvm_vcpu->arch.cptr_el2;

	host_vcpu->arch.fault		= pkvm_vcpu->arch.fault;

	host_vcpu->arch.iflags		= pkvm_vcpu->arch.iflags;
	host_vcpu->arch.fp_state	= pkvm_vcpu->arch.fp_state;

	host_cpu_if->vgic_hcr		= pkvm_cpu_if->vgic_hcr;
	for (i = 0; i < pkvm_cpu_if->used_lrs; ++i)
		host_cpu_if->vgic_lr[i] = pkvm_cpu_if->vgic_lr[i];
}

static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
{
	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
	int ret;

	host_vcpu = kern_hyp_va(host_vcpu);

	if (unlikely(is_protected_kvm_enabled())) {
		struct kvm *host_kvm = kern_hyp_va(host_vcpu->kvm);
		struct kvm_vcpu *pkvm_vcpu;

		pkvm_vcpu = pkvm_vcpu_load(host_kvm, host_vcpu);
		if (!pkvm_vcpu) {
			ret = -EINVAL;
			goto out;
		}

		sync_host_to_pkvm_vcpu(pkvm_vcpu, host_vcpu);

		ret = __kvm_vcpu_run(pkvm_vcpu);

		sync_pkvm_to_host_vcpu(host_vcpu, pkvm_vcpu);

		pkvm_vcpu_put(pkvm_vcpu);
	} else {
		ret = __kvm_vcpu_run(host_vcpu);
	}

out:
	cpu_reg(host_ctxt, 1) =  ret;
}
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-07-19 16:11       ` Sean Christopherson
  0 siblings, 0 replies; 135+ messages in thread
From: Sean Christopherson @ 2022-07-19 16:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Alexandru Elisei, Andy Lutomirski,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Michael Roth, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Apologies for the slow reply.

On Fri, Jul 08, 2022, Will Deacon wrote:
> On Wed, Jul 06, 2022 at 07:17:29PM +0000, Sean Christopherson wrote:
> > On Thu, Jun 30, 2022, Will Deacon wrote:
> > The lack of documentation and the rather terse changelogs make this really hard
> > to review for folks that aren't intimately familiar with pKVM.  I have a decent
> > idea of the end goal of "shadowing", but that's mostly because of my involvement in
> > similar x86 projects.  Nothing in the changelogs ever explains _why_ pKVM uses
> > shadows.
> 
> That's understandable, but thanks for persevering; this series is pretty
> down in the murky depths of the arm64 architecture and EL2 code so it
> doesn't really map to the KVM code most folks are familiar with. It's fair
> to say we're assuming a lot of niche prior knowledge (which is quite common
> for arch code in my experience),

Assuming prior knowledge is fine so long as that prior knowledge is something
that can be gained by future readers through public documentation.  E.g. arch code
is always littered with acronyms, but someone can almost always decipher the
acronyms by reading specs or just searching the interwebs.

My objection to the changelogs is that they talk about "shadow" VMs, vCPUs, state,
structures, etc., without ever explaining what a shadow is, how it will be used,
or what its end purpose is.

A big part of the problem is that "shadow" is not unique terminology _and_ isn't
inherently tied to "protected kvm", e.g. a reader can't intuit that a "shadow vm"
is the the trusted hypervisors instance of the VM.  And I can search for pKVM and
get relevant, helpful search results.  But if I search for "shadow vm", I get
unrelated results, and "pkvm shadow vm" just leads me back to this series.

> but I wanted to inherit the broader cc list so you were aware of this
> break-away series. Sadly, I don't think beefing up the commit messages would
> get us to a point where somebody unfamiliar with the EL2 code already could
> give a constructive review, but we can try to expand them a bit if you
> genuinely think it would help.

I'm not looking at it just from a review point, but also from a future readers
perspective.  E.g. someone that looks at this changelog in isolation is going to
have no idea what a "shadow VM" is:

  KVM: arm64: Introduce pKVM shadow VM state at EL2

  Introduce a table of shadow VM structures at EL2 and provide hypercalls
  to the host for creating and destroying shadow VMs.

Obviously there will be some context available in surrounding patches, but if you
avoid the "shadow" terminology and provide a bit more context, then it yields
something like:

  KVM: arm64: Add infrastructure to create and track pKVM instances at EL2

  Introduce a global table (and lock) to track pKVM instances at EL2, and
  provide hypercalls that can be used by the untrusted host to create and
  destroy pKVM VMs.  pKVM VM/vCPU state is directly accessible only by the
  trusted hypervisor (EL2).  

  Each pKVM VM is directly associated with an untrusted host KVM instance,
  and is referenced by the host using an opaque handle.  Future patches will
  provide hypercalls to allow the host to initialize/set/get pKVM VM/vCPU
  state using the opaque handle.
   
> On the more positive side, we'll be speaking at KVM forum about what we've
> done here, so that will be a great place to discuss it further and then we
> can also link back to the recordings in later postings of the mega-series.
> 
> > I put "shadowing" in quotes because if the unstrusted host is aware that the VM
> > and vCPU it is manipulating aren't the "real" VMs/vCPUs, and there is an explicit API
> > between the untrusted host and pKVM for creating/destroying VMs/vCPUs, then I would
> > argue that it's not truly shadowing, especially if pKVM uses data/values verbatim
> > and only verifies correctness/safety.  It's definitely a nit, but for future readers
> > I think overloading "shadowing" could be confusing.
> 
> Ah, this is really interesting and nicely puts the ball back in my court as
> I'm not well versed with x86's use of "shadowing".

It's not just an x86, e.g. see https://en.wikipedia.org/wiki/Shadow_table.  The
use in pKVM is _really_ close in that what pKVM calls the shadow is the "real"
data that's used, but pKVM inverts the typical virtualization usage, which is
why I find it confusing.  I.e. instead of shadowing state being written by the
guest, pKVM is "shadowing" state written by the host.  If there ever comes a need
to actually shadow guest state, e.g. for nested virtualization, then using shadow
to refer to the protected state is going to create a conundrum.

Honestly, I think pKVM is simply being too cute in picking names.  And not just
for "shadow", e.g. IMO the flush/sync terminology in patch 24 is also unnecessarily
cute.  Instead of coming up with clever names, just be explicit in what the code
is doing.  E.g. something like:

  flush_shadow_state() => sync_host_to_pkvm_vcpu()
  sync_shadow_state()  => sync_pkvm_to_host_vcpu()

Then readers know the two functions are pairs, and will have a decent idea of
what the functions do even if they don't fully understand pKVM vs. host.

"shadow_area_size" is another case where it's unnecessarily cryptic, e.g. just
call it "donated_memory_size".

> Perhaps we should s/shadow/hyp/ to make this a little clearer?

Or maybe just "pkvm"?  I think that's especially viable if you do away with
kvm_shadow_vcpu_state.  As of this series at least, kvm_shadow_vcpu_state is
completely unnecessary.  kvm_vcpu.kvm can be used to get at the VM, and thus pKVM
state via container_of().  Then the host_vcpu can be retrieved by using the
vcpu_idx, e.g.

	struct pkvm_vm *pkvm_vm = to_pkvm_vm(pkvm_vcpu->vm);
	struct kvm_vcpu *host_vcpu;

	host_vcpu = kvm_get_vcpu(pkvm_vm->host_vm, pkvm_vcpu->vcpu_idx);

Even better is to not have to do that in the first place.  AFAICT, there's no need
to do such a lookup in handle___kvm_vcpu_run() since pKVM already has pointers to
both the host vCPU and the pKVM vCPU.

E.g. I believe you can make the code look like this:

struct kvm_arch {
	...

	/*
	 * For an unstructed host VM, pkvm_handle is used to lookup the
	 * associated pKVM instance.
	 */
	pvk_handle_t pkvm_handle;
};

struct pkvm_vm {
	struct kvm kvm;

	/* Backpointer to the host's (untrusted) KVM instance. */
	struct kvm *host_kvm;

	size_t donated_memory_size;

	struct kvm_pgtable pgt;
};

static struct kvm *pkvm_get_vm(pkvm_handle_t handle)
{
	unsigned int idx = pkvm_handle_to_idx(handle);

	if (unlikely(idx >= KVM_MAX_PVMS))
		return NULL;

	return pkvm_vm_table[idx];
}

struct kvm_vcpu *pkvm_vcpu_load(pkvm_handle_t handle, unsigned int vcpu_idx)
{
	struct kvm_vpcu *pkvm_vcpu = NULL;
	struct kvm *vm;

	hyp_spin_lock(&pkvm_global_lock);
	vm = pkvm_get_vm(handle);
	if (!vm || atomic_read(&vm->online_vcpus) <= vcpu_idx)
		goto unlock;

	pkvm_vcpu = kvm_get_vcpu(vm, vcpu_idx);
	hyp_page_ref_inc(hyp_virt_to_page(vm));
unlock:
	hyp_spin_unlock(&pkvm_global_lock);
	return pkvm_vcpu;
}

struct kvm_vcpu *pkvm_vcpu_put(struct kvm_vcpu *pkvm_vcpu)
{
	hyp_spin_lock(&pkvm_global_lock);
	hyp_page_ref_dec(hyp_virt_to_page(pkvm_vcpu->kvm));
	hyp_spin_unlock(&pkvm_global_lock);
}

static void sync_host_to_pkvm_vcpu(struct kvm_vcpu *pkvm_vcpu, struct kvm_vcpu *host_vcpu)
{
	pkvm_vcpu->arch.ctxt		= host_vcpu->arch.ctxt;

	pkvm_vcpu->arch.sve_state	= kern_hyp_va(host_vcpu->arch.sve_state);
	pkvm_vcpu->arch.sve_max_vl	= host_vcpu->arch.sve_max_vl;

	pkvm_vcpu->arch.hw_mmu		= host_vcpu->arch.hw_mmu;

	pkvm_vcpu->arch.hcr_el2		= host_vcpu->arch.hcr_el2;
	pkvm_vcpu->arch.mdcr_el2	= host_vcpu->arch.mdcr_el2;
	pkvm_vcpu->arch.cptr_el2	= host_vcpu->arch.cptr_el2;

	pkvm_vcpu->arch.iflags		= host_vcpu->arch.iflags;
	pkvm_vcpu->arch.fp_state	= host_vcpu->arch.fp_state;

	pkvm_vcpu->arch.debug_ptr	= kern_hyp_va(host_vcpu->arch.debug_ptr);
	pkvm_vcpu->arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state;

	pkvm_vcpu->arch.vsesr_el2	= host_vcpu->arch.vsesr_el2;

	pkvm_vcpu->arch.vgic_cpu.vgic_v3 = host_vcpu->arch.vgic_cpu.vgic_v3;
}

static void sync_pkvm_to_host_vcpu(struct kvm_vcpu *host_vcpu, struct kvm_vcpu *pkvm_vcpu)
{
	struct vgic_v3_cpu_if *pkvm_cpu_if = &pkvm_vcpu->arch.vgic_cpu.vgic_v3;
	struct vgic_v3_cpu_if *host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
	unsigned int i;

	host_vcpu->arch.ctxt		= pkvm_vcpu->arch.ctxt;

	host_vcpu->arch.hcr_el2		= pkvm_vcpu->arch.hcr_el2;
	host_vcpu->arch.cptr_el2	= pkvm_vcpu->arch.cptr_el2;

	host_vcpu->arch.fault		= pkvm_vcpu->arch.fault;

	host_vcpu->arch.iflags		= pkvm_vcpu->arch.iflags;
	host_vcpu->arch.fp_state	= pkvm_vcpu->arch.fp_state;

	host_cpu_if->vgic_hcr		= pkvm_cpu_if->vgic_hcr;
	for (i = 0; i < pkvm_cpu_if->used_lrs; ++i)
		host_cpu_if->vgic_lr[i] = pkvm_cpu_if->vgic_lr[i];
}

static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
{
	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
	int ret;

	host_vcpu = kern_hyp_va(host_vcpu);

	if (unlikely(is_protected_kvm_enabled())) {
		struct kvm *host_kvm = kern_hyp_va(host_vcpu->kvm);
		struct kvm_vcpu *pkvm_vcpu;

		pkvm_vcpu = pkvm_vcpu_load(host_kvm, host_vcpu);
		if (!pkvm_vcpu) {
			ret = -EINVAL;
			goto out;
		}

		sync_host_to_pkvm_vcpu(pkvm_vcpu, host_vcpu);

		ret = __kvm_vcpu_run(pkvm_vcpu);

		sync_pkvm_to_host_vcpu(host_vcpu, pkvm_vcpu);

		pkvm_vcpu_put(pkvm_vcpu);
	} else {
		ret = __kvm_vcpu_run(host_vcpu);
	}

out:
	cpu_reg(host_ctxt, 1) =  ret;
}

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
  2022-07-19 16:11       ` Sean Christopherson
  (?)
@ 2022-07-20  9:25         ` Marc Zyngier
  -1 siblings, 0 replies; 135+ messages in thread
From: Marc Zyngier @ 2022-07-20  9:25 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Will Deacon, kvmarm, Ard Biesheuvel, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, kernel-team, kvm, linux-arm-kernel

On Tue, 19 Jul 2022 17:11:32 +0100,
Sean Christopherson <seanjc@google.com> wrote:

> Honestly, I think pKVM is simply being too cute in picking names.

I don't know what you mean by "cute" here, but I assume this is not
exactly a flattering qualifier.

> And not just for "shadow", e.g. IMO the flush/sync terminology in
> patch 24 is also unnecessarily cute.  Instead of coming up with
> clever names, just be explicit in what the code is doing.
> E.g. something like:
> 
>   flush_shadow_state() => sync_host_to_pkvm_vcpu()
>   sync_shadow_state()  => sync_pkvm_to_host_vcpu()

As much as I like bikesheding, this isn't going to happen. We have had
the sync/flush duality since day one, we have a lot of code based
around this naming, and departing from it seems counter productive.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-07-20  9:25         ` Marc Zyngier
  0 siblings, 0 replies; 135+ messages in thread
From: Marc Zyngier @ 2022-07-20  9:25 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kernel-team, kvm, Will Deacon, Oliver Upton, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	kvmarm

On Tue, 19 Jul 2022 17:11:32 +0100,
Sean Christopherson <seanjc@google.com> wrote:

> Honestly, I think pKVM is simply being too cute in picking names.

I don't know what you mean by "cute" here, but I assume this is not
exactly a flattering qualifier.

> And not just for "shadow", e.g. IMO the flush/sync terminology in
> patch 24 is also unnecessarily cute.  Instead of coming up with
> clever names, just be explicit in what the code is doing.
> E.g. something like:
> 
>   flush_shadow_state() => sync_host_to_pkvm_vcpu()
>   sync_shadow_state()  => sync_pkvm_to_host_vcpu()

As much as I like bikesheding, this isn't going to happen. We have had
the sync/flush duality since day one, we have a lot of code based
around this naming, and departing from it seems counter productive.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-07-20  9:25         ` Marc Zyngier
  0 siblings, 0 replies; 135+ messages in thread
From: Marc Zyngier @ 2022-07-20  9:25 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Will Deacon, kvmarm, Ard Biesheuvel, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, kernel-team, kvm, linux-arm-kernel

On Tue, 19 Jul 2022 17:11:32 +0100,
Sean Christopherson <seanjc@google.com> wrote:

> Honestly, I think pKVM is simply being too cute in picking names.

I don't know what you mean by "cute" here, but I assume this is not
exactly a flattering qualifier.

> And not just for "shadow", e.g. IMO the flush/sync terminology in
> patch 24 is also unnecessarily cute.  Instead of coming up with
> clever names, just be explicit in what the code is doing.
> E.g. something like:
> 
>   flush_shadow_state() => sync_host_to_pkvm_vcpu()
>   sync_shadow_state()  => sync_pkvm_to_host_vcpu()

As much as I like bikesheding, this isn't going to happen. We have had
the sync/flush duality since day one, we have a lot of code based
around this naming, and departing from it seems counter productive.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 06/24] KVM: arm64: Unify identifiers used to distinguish host and hypervisor
  2022-06-30 13:57   ` Will Deacon
  (?)
@ 2022-07-20 15:11     ` Oliver Upton
  -1 siblings, 0 replies; 135+ messages in thread
From: Oliver Upton @ 2022-07-20 15:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Hi Will,

On Thu, Jun 30, 2022 at 02:57:29PM +0100, Will Deacon wrote:
> The 'pkvm_component_id' enum type provides constants to refer to the
> host and the hypervisor, yet this information is duplicated by the
> 'pkvm_hyp_id' constant.
> 
> Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
> type definition to 'mem_protect.h' so that it can be used outside of
> the memory protection code.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +++++-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 8 --------
>  arch/arm64/kvm/hyp/nvhe/setup.c               | 2 +-
>  3 files changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 80e99836eac7..f5705a1e972f 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -51,7 +51,11 @@ struct host_kvm {
>  };
>  extern struct host_kvm host_kvm;
>  
> -extern const u8 pkvm_hyp_id;
> +/* This corresponds to page-table locking order */
> +enum pkvm_component_id {
> +	PKVM_ID_HOST,
> +	PKVM_ID_HYP,
> +};

Since we have the concept of PTE ownership in pgtable.c, WDYT about
moving the owner ID enumeration there? KVM_MAX_OWNER_ID should be
incorporated in the enum too.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 06/24] KVM: arm64: Unify identifiers used to distinguish host and hypervisor
@ 2022-07-20 15:11     ` Oliver Upton
  0 siblings, 0 replies; 135+ messages in thread
From: Oliver Upton @ 2022-07-20 15:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	kvmarm

Hi Will,

On Thu, Jun 30, 2022 at 02:57:29PM +0100, Will Deacon wrote:
> The 'pkvm_component_id' enum type provides constants to refer to the
> host and the hypervisor, yet this information is duplicated by the
> 'pkvm_hyp_id' constant.
> 
> Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
> type definition to 'mem_protect.h' so that it can be used outside of
> the memory protection code.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +++++-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 8 --------
>  arch/arm64/kvm/hyp/nvhe/setup.c               | 2 +-
>  3 files changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 80e99836eac7..f5705a1e972f 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -51,7 +51,11 @@ struct host_kvm {
>  };
>  extern struct host_kvm host_kvm;
>  
> -extern const u8 pkvm_hyp_id;
> +/* This corresponds to page-table locking order */
> +enum pkvm_component_id {
> +	PKVM_ID_HOST,
> +	PKVM_ID_HYP,
> +};

Since we have the concept of PTE ownership in pgtable.c, WDYT about
moving the owner ID enumeration there? KVM_MAX_OWNER_ID should be
incorporated in the enum too.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 06/24] KVM: arm64: Unify identifiers used to distinguish host and hypervisor
@ 2022-07-20 15:11     ` Oliver Upton
  0 siblings, 0 replies; 135+ messages in thread
From: Oliver Upton @ 2022-07-20 15:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Hi Will,

On Thu, Jun 30, 2022 at 02:57:29PM +0100, Will Deacon wrote:
> The 'pkvm_component_id' enum type provides constants to refer to the
> host and the hypervisor, yet this information is duplicated by the
> 'pkvm_hyp_id' constant.
> 
> Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
> type definition to 'mem_protect.h' so that it can be used outside of
> the memory protection code.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +++++-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 8 --------
>  arch/arm64/kvm/hyp/nvhe/setup.c               | 2 +-
>  3 files changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 80e99836eac7..f5705a1e972f 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -51,7 +51,11 @@ struct host_kvm {
>  };
>  extern struct host_kvm host_kvm;
>  
> -extern const u8 pkvm_hyp_id;
> +/* This corresponds to page-table locking order */
> +enum pkvm_component_id {
> +	PKVM_ID_HOST,
> +	PKVM_ID_HYP,
> +};

Since we have the concept of PTE ownership in pgtable.c, WDYT about
moving the owner ID enumeration there? KVM_MAX_OWNER_ID should be
incorporated in the enum too.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 06/24] KVM: arm64: Unify identifiers used to distinguish host and hypervisor
  2022-07-20 15:11     ` Oliver Upton
  (?)
@ 2022-07-20 18:14       ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-20 18:14 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	kvmarm

Hi Oliver,

Thanks for having a look.

On Wed, Jul 20, 2022 at 03:11:04PM +0000, Oliver Upton wrote:
> On Thu, Jun 30, 2022 at 02:57:29PM +0100, Will Deacon wrote:
> > The 'pkvm_component_id' enum type provides constants to refer to the
> > host and the hypervisor, yet this information is duplicated by the
> > 'pkvm_hyp_id' constant.
> > 
> > Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
> > type definition to 'mem_protect.h' so that it can be used outside of
> > the memory protection code.
> > 
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +++++-
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 8 --------
> >  arch/arm64/kvm/hyp/nvhe/setup.c               | 2 +-
> >  3 files changed, 6 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > index 80e99836eac7..f5705a1e972f 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > @@ -51,7 +51,11 @@ struct host_kvm {
> >  };
> >  extern struct host_kvm host_kvm;
> >  
> > -extern const u8 pkvm_hyp_id;
> > +/* This corresponds to page-table locking order */
> > +enum pkvm_component_id {
> > +	PKVM_ID_HOST,
> > +	PKVM_ID_HYP,
> > +};
> 
> Since we have the concept of PTE ownership in pgtable.c, WDYT about
> moving the owner ID enumeration there? KVM_MAX_OWNER_ID should be
> incorporated in the enum too.

Interesting idea... I think we need the definition in a header file so that
it can be used by mem_protect.c, so I'm not entirely sure where you'd like
to see it moved.

The main worry I have is that if we ever need to distinguish e.g. one guest
instance from another, which is likely needed for sharing of memory
between more than just two components, then the pgtable code really cares
about the number of instances ("which guest is it?") whilst the mem_protect
cares about the component type ("is it a guest?").

Finally, the pgtable code is also used outside of pKVM so, although the
concept of ownership doesn't yet apply elsewhere, keeping the concept
available without dictacting the different types of owners makes sense to
me.

Does that make sense?

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 06/24] KVM: arm64: Unify identifiers used to distinguish host and hypervisor
@ 2022-07-20 18:14       ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-20 18:14 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Hi Oliver,

Thanks for having a look.

On Wed, Jul 20, 2022 at 03:11:04PM +0000, Oliver Upton wrote:
> On Thu, Jun 30, 2022 at 02:57:29PM +0100, Will Deacon wrote:
> > The 'pkvm_component_id' enum type provides constants to refer to the
> > host and the hypervisor, yet this information is duplicated by the
> > 'pkvm_hyp_id' constant.
> > 
> > Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
> > type definition to 'mem_protect.h' so that it can be used outside of
> > the memory protection code.
> > 
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +++++-
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 8 --------
> >  arch/arm64/kvm/hyp/nvhe/setup.c               | 2 +-
> >  3 files changed, 6 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > index 80e99836eac7..f5705a1e972f 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > @@ -51,7 +51,11 @@ struct host_kvm {
> >  };
> >  extern struct host_kvm host_kvm;
> >  
> > -extern const u8 pkvm_hyp_id;
> > +/* This corresponds to page-table locking order */
> > +enum pkvm_component_id {
> > +	PKVM_ID_HOST,
> > +	PKVM_ID_HYP,
> > +};
> 
> Since we have the concept of PTE ownership in pgtable.c, WDYT about
> moving the owner ID enumeration there? KVM_MAX_OWNER_ID should be
> incorporated in the enum too.

Interesting idea... I think we need the definition in a header file so that
it can be used by mem_protect.c, so I'm not entirely sure where you'd like
to see it moved.

The main worry I have is that if we ever need to distinguish e.g. one guest
instance from another, which is likely needed for sharing of memory
between more than just two components, then the pgtable code really cares
about the number of instances ("which guest is it?") whilst the mem_protect
cares about the component type ("is it a guest?").

Finally, the pgtable code is also used outside of pKVM so, although the
concept of ownership doesn't yet apply elsewhere, keeping the concept
available without dictacting the different types of owners makes sense to
me.

Does that make sense?

Will

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 06/24] KVM: arm64: Unify identifiers used to distinguish host and hypervisor
@ 2022-07-20 18:14       ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-20 18:14 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Hi Oliver,

Thanks for having a look.

On Wed, Jul 20, 2022 at 03:11:04PM +0000, Oliver Upton wrote:
> On Thu, Jun 30, 2022 at 02:57:29PM +0100, Will Deacon wrote:
> > The 'pkvm_component_id' enum type provides constants to refer to the
> > host and the hypervisor, yet this information is duplicated by the
> > 'pkvm_hyp_id' constant.
> > 
> > Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
> > type definition to 'mem_protect.h' so that it can be used outside of
> > the memory protection code.
> > 
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +++++-
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 8 --------
> >  arch/arm64/kvm/hyp/nvhe/setup.c               | 2 +-
> >  3 files changed, 6 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > index 80e99836eac7..f5705a1e972f 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > @@ -51,7 +51,11 @@ struct host_kvm {
> >  };
> >  extern struct host_kvm host_kvm;
> >  
> > -extern const u8 pkvm_hyp_id;
> > +/* This corresponds to page-table locking order */
> > +enum pkvm_component_id {
> > +	PKVM_ID_HOST,
> > +	PKVM_ID_HYP,
> > +};
> 
> Since we have the concept of PTE ownership in pgtable.c, WDYT about
> moving the owner ID enumeration there? KVM_MAX_OWNER_ID should be
> incorporated in the enum too.

Interesting idea... I think we need the definition in a header file so that
it can be used by mem_protect.c, so I'm not entirely sure where you'd like
to see it moved.

The main worry I have is that if we ever need to distinguish e.g. one guest
instance from another, which is likely needed for sharing of memory
between more than just two components, then the pgtable code really cares
about the number of instances ("which guest is it?") whilst the mem_protect
cares about the component type ("is it a guest?").

Finally, the pgtable code is also used outside of pKVM so, although the
concept of ownership doesn't yet apply elsewhere, keeping the concept
available without dictacting the different types of owners makes sense to
me.

Does that make sense?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 12/24] KVM: arm64: Introduce shadow VM state at EL2
  2022-07-18 18:40     ` Vincent Donnefort
  (?)
@ 2022-07-20 18:20       ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-20 18:20 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

Hi Vincent,

Thanks for going through this.

On Mon, Jul 18, 2022 at 07:40:05PM +0100, Vincent Donnefort wrote:
> [...]
> 
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 9f339dffbc1a..2d6b5058f7d3 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -288,6 +288,14 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
> >   */
> >  u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
> >  
> > +/*
> 
> /** ?
> 
> > + * kvm_pgtable_stage2_pgd_size() - Helper to compute size of a stage-2 PGD
> > + * @vtcr:	Content of the VTCR register.
> > + *
> > + * Return: the size (in bytes) of the stage-2 PGD
> > + */

I'll also check this is valid kernel-doc before adding the new comment
syntax!

> > +/*
> > + * Holds the relevant data for maintaining the vcpu state completely at hyp.
> > + */
> > +struct kvm_shadow_vcpu_state {
> > +	/* The data for the shadow vcpu. */
> > +	struct kvm_vcpu shadow_vcpu;
> > +
> > +	/* A pointer to the host's vcpu. */
> > +	struct kvm_vcpu *host_vcpu;
> > +
> > +	/* A pointer to the shadow vm. */
> > +	struct kvm_shadow_vm *shadow_vm;
> 
> IMHO, those declarations are already self-explanatory. The comments above don't
> bring much.

Agreed, and Sean has ideas to rework bits of this as well. I'll drop the
comments.

> > diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > index 99c8d8b73e70..77aeb787670b 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > @@ -7,6 +7,9 @@
> >  #include <linux/kvm_host.h>
> >  #include <linux/mm.h>
> >  #include <nvhe/fixed_config.h>
> > +#include <nvhe/mem_protect.h>
> > +#include <nvhe/memory.h>
> 
> I don't think this one is necessary, it is already included in mm.h.

I thought it was generally bad form to rely on transitive includes, as it
makes header rework even more painful than it already is.

> > +static void unpin_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
> > +			     unsigned int nr_vcpus)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < nr_vcpus; i++) {
> > +		struct kvm_vcpu *host_vcpu = shadow_vcpu_states[i].host_vcpu;
> 
> IIRC, checkpatch likes an empty line after declarations.

We can fix that!

> > +static unsigned int insert_shadow_table(struct kvm *kvm,
> > +					struct kvm_shadow_vm *vm,
> > +					size_t shadow_size)
> > +{
> > +	struct kvm_s2_mmu *mmu = &vm->kvm.arch.mmu;
> > +	unsigned int shadow_handle;
> > +	unsigned int vmid;
> > +
> > +	hyp_assert_lock_held(&shadow_lock);
> > +
> > +	if (unlikely(nr_shadow_entries >= KVM_MAX_PVMS))
> > +		return -ENOMEM;
> > +
> > +	/*
> > +	 * Initializing protected state might have failed, yet a malicious host
> > +	 * could trigger this function. Thus, ensure that shadow_table exists.
> > +	 */
> > +	if (unlikely(!shadow_table))
> > +		return -EINVAL;
> > +
> > +	/* Check that a shadow hasn't been created before for this host KVM. */
> > +	if (unlikely(__exists_shadow(kvm)))
> > +		return -EEXIST;
> > +
> > +	/* Find the next free entry in the shadow table. */
> > +	while (shadow_table[next_shadow_alloc])
> > +		next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;
> 
> Couldn't it be merged with __exists_shadow which already knows the first free
> shadow_table idx?

Good idea, that would save us going through it twice.

> 
> > +	shadow_handle = idx_to_shadow_handle(next_shadow_alloc);
> > +
> > +	vm->kvm.arch.pkvm.shadow_handle = shadow_handle;
> > +	vm->shadow_area_size = shadow_size;
> > +
> > +	/* VMID 0 is reserved for the host */
> > +	vmid = next_shadow_alloc + 1;
> > +	if (vmid > 0xff)
> 
> Couldn't the 0xff be found with get_vmid_bits() or even from host_kvm.arch.vtcr?
> Or does that depends on something completely different?
> 
> Also, appologies if this has been discussed already and I missed it, maybe
> KVM_MAX_PVMS could be changed for that value - 1. Unless we think that archs
> supporting 16 bits would waste way too much memory for that?

We should probably clamp the VMID based on KVM_MAX_PVMS here, as although
some CPUs support 16-bit VMIDs, we don't currently support that with pKVM.
I'll make that change to avoid hard-coding the constant here.

Thanks!

Will

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 12/24] KVM: arm64: Introduce shadow VM state at EL2
@ 2022-07-20 18:20       ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-20 18:20 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Marc Zyngier, kernel-team, kvm, Oliver Upton, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	kvmarm

Hi Vincent,

Thanks for going through this.

On Mon, Jul 18, 2022 at 07:40:05PM +0100, Vincent Donnefort wrote:
> [...]
> 
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 9f339dffbc1a..2d6b5058f7d3 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -288,6 +288,14 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
> >   */
> >  u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
> >  
> > +/*
> 
> /** ?
> 
> > + * kvm_pgtable_stage2_pgd_size() - Helper to compute size of a stage-2 PGD
> > + * @vtcr:	Content of the VTCR register.
> > + *
> > + * Return: the size (in bytes) of the stage-2 PGD
> > + */

I'll also check this is valid kernel-doc before adding the new comment
syntax!

> > +/*
> > + * Holds the relevant data for maintaining the vcpu state completely at hyp.
> > + */
> > +struct kvm_shadow_vcpu_state {
> > +	/* The data for the shadow vcpu. */
> > +	struct kvm_vcpu shadow_vcpu;
> > +
> > +	/* A pointer to the host's vcpu. */
> > +	struct kvm_vcpu *host_vcpu;
> > +
> > +	/* A pointer to the shadow vm. */
> > +	struct kvm_shadow_vm *shadow_vm;
> 
> IMHO, those declarations are already self-explanatory. The comments above don't
> bring much.

Agreed, and Sean has ideas to rework bits of this as well. I'll drop the
comments.

> > diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > index 99c8d8b73e70..77aeb787670b 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > @@ -7,6 +7,9 @@
> >  #include <linux/kvm_host.h>
> >  #include <linux/mm.h>
> >  #include <nvhe/fixed_config.h>
> > +#include <nvhe/mem_protect.h>
> > +#include <nvhe/memory.h>
> 
> I don't think this one is necessary, it is already included in mm.h.

I thought it was generally bad form to rely on transitive includes, as it
makes header rework even more painful than it already is.

> > +static void unpin_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
> > +			     unsigned int nr_vcpus)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < nr_vcpus; i++) {
> > +		struct kvm_vcpu *host_vcpu = shadow_vcpu_states[i].host_vcpu;
> 
> IIRC, checkpatch likes an empty line after declarations.

We can fix that!

> > +static unsigned int insert_shadow_table(struct kvm *kvm,
> > +					struct kvm_shadow_vm *vm,
> > +					size_t shadow_size)
> > +{
> > +	struct kvm_s2_mmu *mmu = &vm->kvm.arch.mmu;
> > +	unsigned int shadow_handle;
> > +	unsigned int vmid;
> > +
> > +	hyp_assert_lock_held(&shadow_lock);
> > +
> > +	if (unlikely(nr_shadow_entries >= KVM_MAX_PVMS))
> > +		return -ENOMEM;
> > +
> > +	/*
> > +	 * Initializing protected state might have failed, yet a malicious host
> > +	 * could trigger this function. Thus, ensure that shadow_table exists.
> > +	 */
> > +	if (unlikely(!shadow_table))
> > +		return -EINVAL;
> > +
> > +	/* Check that a shadow hasn't been created before for this host KVM. */
> > +	if (unlikely(__exists_shadow(kvm)))
> > +		return -EEXIST;
> > +
> > +	/* Find the next free entry in the shadow table. */
> > +	while (shadow_table[next_shadow_alloc])
> > +		next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;
> 
> Couldn't it be merged with __exists_shadow which already knows the first free
> shadow_table idx?

Good idea, that would save us going through it twice.

> 
> > +	shadow_handle = idx_to_shadow_handle(next_shadow_alloc);
> > +
> > +	vm->kvm.arch.pkvm.shadow_handle = shadow_handle;
> > +	vm->shadow_area_size = shadow_size;
> > +
> > +	/* VMID 0 is reserved for the host */
> > +	vmid = next_shadow_alloc + 1;
> > +	if (vmid > 0xff)
> 
> Couldn't the 0xff be found with get_vmid_bits() or even from host_kvm.arch.vtcr?
> Or does that depends on something completely different?
> 
> Also, appologies if this has been discussed already and I missed it, maybe
> KVM_MAX_PVMS could be changed for that value - 1. Unless we think that archs
> supporting 16 bits would waste way too much memory for that?

We should probably clamp the VMID based on KVM_MAX_PVMS here, as although
some CPUs support 16-bit VMIDs, we don't currently support that with pKVM.
I'll make that change to avoid hard-coding the constant here.

Thanks!

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 12/24] KVM: arm64: Introduce shadow VM state at EL2
@ 2022-07-20 18:20       ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-20 18:20 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

Hi Vincent,

Thanks for going through this.

On Mon, Jul 18, 2022 at 07:40:05PM +0100, Vincent Donnefort wrote:
> [...]
> 
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 9f339dffbc1a..2d6b5058f7d3 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -288,6 +288,14 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
> >   */
> >  u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
> >  
> > +/*
> 
> /** ?
> 
> > + * kvm_pgtable_stage2_pgd_size() - Helper to compute size of a stage-2 PGD
> > + * @vtcr:	Content of the VTCR register.
> > + *
> > + * Return: the size (in bytes) of the stage-2 PGD
> > + */

I'll also check this is valid kernel-doc before adding the new comment
syntax!

> > +/*
> > + * Holds the relevant data for maintaining the vcpu state completely at hyp.
> > + */
> > +struct kvm_shadow_vcpu_state {
> > +	/* The data for the shadow vcpu. */
> > +	struct kvm_vcpu shadow_vcpu;
> > +
> > +	/* A pointer to the host's vcpu. */
> > +	struct kvm_vcpu *host_vcpu;
> > +
> > +	/* A pointer to the shadow vm. */
> > +	struct kvm_shadow_vm *shadow_vm;
> 
> IMHO, those declarations are already self-explanatory. The comments above don't
> bring much.

Agreed, and Sean has ideas to rework bits of this as well. I'll drop the
comments.

> > diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > index 99c8d8b73e70..77aeb787670b 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > @@ -7,6 +7,9 @@
> >  #include <linux/kvm_host.h>
> >  #include <linux/mm.h>
> >  #include <nvhe/fixed_config.h>
> > +#include <nvhe/mem_protect.h>
> > +#include <nvhe/memory.h>
> 
> I don't think this one is necessary, it is already included in mm.h.

I thought it was generally bad form to rely on transitive includes, as it
makes header rework even more painful than it already is.

> > +static void unpin_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
> > +			     unsigned int nr_vcpus)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < nr_vcpus; i++) {
> > +		struct kvm_vcpu *host_vcpu = shadow_vcpu_states[i].host_vcpu;
> 
> IIRC, checkpatch likes an empty line after declarations.

We can fix that!

> > +static unsigned int insert_shadow_table(struct kvm *kvm,
> > +					struct kvm_shadow_vm *vm,
> > +					size_t shadow_size)
> > +{
> > +	struct kvm_s2_mmu *mmu = &vm->kvm.arch.mmu;
> > +	unsigned int shadow_handle;
> > +	unsigned int vmid;
> > +
> > +	hyp_assert_lock_held(&shadow_lock);
> > +
> > +	if (unlikely(nr_shadow_entries >= KVM_MAX_PVMS))
> > +		return -ENOMEM;
> > +
> > +	/*
> > +	 * Initializing protected state might have failed, yet a malicious host
> > +	 * could trigger this function. Thus, ensure that shadow_table exists.
> > +	 */
> > +	if (unlikely(!shadow_table))
> > +		return -EINVAL;
> > +
> > +	/* Check that a shadow hasn't been created before for this host KVM. */
> > +	if (unlikely(__exists_shadow(kvm)))
> > +		return -EEXIST;
> > +
> > +	/* Find the next free entry in the shadow table. */
> > +	while (shadow_table[next_shadow_alloc])
> > +		next_shadow_alloc = (next_shadow_alloc + 1) % KVM_MAX_PVMS;
> 
> Couldn't it be merged with __exists_shadow which already knows the first free
> shadow_table idx?

Good idea, that would save us going through it twice.

> 
> > +	shadow_handle = idx_to_shadow_handle(next_shadow_alloc);
> > +
> > +	vm->kvm.arch.pkvm.shadow_handle = shadow_handle;
> > +	vm->shadow_area_size = shadow_size;
> > +
> > +	/* VMID 0 is reserved for the host */
> > +	vmid = next_shadow_alloc + 1;
> > +	if (vmid > 0xff)
> 
> Couldn't the 0xff be found with get_vmid_bits() or even from host_kvm.arch.vtcr?
> Or does that depends on something completely different?
> 
> Also, appologies if this has been discussed already and I missed it, maybe
> KVM_MAX_PVMS could be changed for that value - 1. Unless we think that archs
> supporting 16 bits would waste way too much memory for that?

We should probably clamp the VMID based on KVM_MAX_PVMS here, as although
some CPUs support 16-bit VMIDs, we don't currently support that with pKVM.
I'll make that change to avoid hard-coding the constant here.

Thanks!

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 18/24] KVM: arm64: Instantiate guest stage-2 page-tables at EL2
  2022-07-19 13:32     ` Vincent Donnefort
  (?)
@ 2022-07-20 18:26       ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-20 18:26 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

On Tue, Jul 19, 2022 at 02:32:18PM +0100, Vincent Donnefort wrote:
> [...]
> 
> >  }
> >  
> >  void reclaim_guest_pages(struct kvm_shadow_vm *vm)
> >  {
> > -	unsigned long nr_pages;
> > +	unsigned long nr_pages, pfn;
> >  
> >  	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
> > -	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
> > +	pfn = hyp_virt_to_pfn(vm->pgt.pgd);
> > +
> > +	guest_lock_component(vm);
> > +	kvm_pgtable_stage2_destroy(&vm->pgt);
> > +	vm->kvm.arch.mmu.pgd_phys = 0ULL;
> > +	guest_unlock_component(vm);
> > +
> > +	WARN_ON(__pkvm_hyp_donate_host(pfn, nr_pages));
> >  }
> 
> The pfn introduction being removed in a subsequent patch, this is probably
> unecessary noise.

Quite right, that should be left as-is. Will fix.

Will

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 18/24] KVM: arm64: Instantiate guest stage-2 page-tables at EL2
@ 2022-07-20 18:26       ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-20 18:26 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Marc Zyngier, kernel-team, kvm, Oliver Upton, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	kvmarm

On Tue, Jul 19, 2022 at 02:32:18PM +0100, Vincent Donnefort wrote:
> [...]
> 
> >  }
> >  
> >  void reclaim_guest_pages(struct kvm_shadow_vm *vm)
> >  {
> > -	unsigned long nr_pages;
> > +	unsigned long nr_pages, pfn;
> >  
> >  	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
> > -	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
> > +	pfn = hyp_virt_to_pfn(vm->pgt.pgd);
> > +
> > +	guest_lock_component(vm);
> > +	kvm_pgtable_stage2_destroy(&vm->pgt);
> > +	vm->kvm.arch.mmu.pgd_phys = 0ULL;
> > +	guest_unlock_component(vm);
> > +
> > +	WARN_ON(__pkvm_hyp_donate_host(pfn, nr_pages));
> >  }
> 
> The pfn introduction being removed in a subsequent patch, this is probably
> unecessary noise.

Quite right, that should be left as-is. Will fix.

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 18/24] KVM: arm64: Instantiate guest stage-2 page-tables at EL2
@ 2022-07-20 18:26       ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-20 18:26 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Oliver Upton, Marc Zyngier, kernel-team, kvm,
	linux-arm-kernel

On Tue, Jul 19, 2022 at 02:32:18PM +0100, Vincent Donnefort wrote:
> [...]
> 
> >  }
> >  
> >  void reclaim_guest_pages(struct kvm_shadow_vm *vm)
> >  {
> > -	unsigned long nr_pages;
> > +	unsigned long nr_pages, pfn;
> >  
> >  	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
> > -	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
> > +	pfn = hyp_virt_to_pfn(vm->pgt.pgd);
> > +
> > +	guest_lock_component(vm);
> > +	kvm_pgtable_stage2_destroy(&vm->pgt);
> > +	vm->kvm.arch.mmu.pgd_phys = 0ULL;
> > +	guest_unlock_component(vm);
> > +
> > +	WARN_ON(__pkvm_hyp_donate_host(pfn, nr_pages));
> >  }
> 
> The pfn introduction being removed in a subsequent patch, this is probably
> unecessary noise.

Quite right, that should be left as-is. Will fix.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
  2022-07-19 16:11       ` Sean Christopherson
  (?)
@ 2022-07-20 18:48         ` Will Deacon
  -1 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-20 18:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvmarm, Ard Biesheuvel, Alexandru Elisei, Andy Lutomirski,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Michael Roth, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Hi Sean,

On Tue, Jul 19, 2022 at 04:11:32PM +0000, Sean Christopherson wrote:
> Apologies for the slow reply.

No problem; you've provided a tonne of insightful feedback here, so it was
worth the wait. Thanks!

> On Fri, Jul 08, 2022, Will Deacon wrote:
> > but I wanted to inherit the broader cc list so you were aware of this
> > break-away series. Sadly, I don't think beefing up the commit messages would
> > get us to a point where somebody unfamiliar with the EL2 code already could
> > give a constructive review, but we can try to expand them a bit if you
> > genuinely think it would help.
> 
> I'm not looking at it just from a review point, but also from a future readers
> perspective.  E.g. someone that looks at this changelog in isolation is going to
> have no idea what a "shadow VM" is:
> 
>   KVM: arm64: Introduce pKVM shadow VM state at EL2
> 
>   Introduce a table of shadow VM structures at EL2 and provide hypercalls
>   to the host for creating and destroying shadow VMs.
> 
> Obviously there will be some context available in surrounding patches, but if you
> avoid the "shadow" terminology and provide a bit more context, then it yields
> something like:
> 
>   KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
> 
>   Introduce a global table (and lock) to track pKVM instances at EL2, and
>   provide hypercalls that can be used by the untrusted host to create and
>   destroy pKVM VMs.  pKVM VM/vCPU state is directly accessible only by the
>   trusted hypervisor (EL2).  
> 
>   Each pKVM VM is directly associated with an untrusted host KVM instance,
>   and is referenced by the host using an opaque handle.  Future patches will
>   provide hypercalls to allow the host to initialize/set/get pKVM VM/vCPU
>   state using the opaque handle.

Thanks, that's much better. I'll have to summon up the energy to go through
the others as well...

> > Perhaps we should s/shadow/hyp/ to make this a little clearer?
> 
> Or maybe just "pkvm"?

I think the "hyp" part is useful to distinguish the pkvm code running at EL2
from the pkvm code running at EL1. For example, we have a 'pkvm' member in
'struct kvm_arch' which is used by the _host_ at EL1.

So I'd say either "pkvm_hyp" or "hyp" instead of "shadow". The latter is
nice and short...

> I think that's especially viable if you do away with
> kvm_shadow_vcpu_state.  As of this series at least, kvm_shadow_vcpu_state is
> completely unnecessary.  kvm_vcpu.kvm can be used to get at the VM, and thus pKVM
> state via container_of().  Then the host_vcpu can be retrieved by using the
> vcpu_idx, e.g.
> 
> 	struct pkvm_vm *pkvm_vm = to_pkvm_vm(pkvm_vcpu->vm);
> 	struct kvm_vcpu *host_vcpu;
> 
> 	host_vcpu = kvm_get_vcpu(pkvm_vm->host_vm, pkvm_vcpu->vcpu_idx);

Using container_of() here is neat; we can definitely go ahead with that
change. However, looking at this in more detail with Fuad, removing
'struct kvm_shadow_vcpu_state' entirely isn't going to work:

> E.g. I believe you can make the code look like this:
> 
> struct kvm_arch {
> 	...
> 
> 	/*
> 	 * For an unstructed host VM, pkvm_handle is used to lookup the
> 	 * associated pKVM instance.
> 	 */
> 	pvk_handle_t pkvm_handle;
> };
> 
> struct pkvm_vm {
> 	struct kvm kvm;
> 
> 	/* Backpointer to the host's (untrusted) KVM instance. */
> 	struct kvm *host_kvm;
> 
> 	size_t donated_memory_size;
> 
> 	struct kvm_pgtable pgt;
> };
> 
> static struct kvm *pkvm_get_vm(pkvm_handle_t handle)
> {
> 	unsigned int idx = pkvm_handle_to_idx(handle);
> 
> 	if (unlikely(idx >= KVM_MAX_PVMS))
> 		return NULL;
> 
> 	return pkvm_vm_table[idx];
> }
> 
> struct kvm_vcpu *pkvm_vcpu_load(pkvm_handle_t handle, unsigned int vcpu_idx)
> {
> 	struct kvm_vpcu *pkvm_vcpu = NULL;
> 	struct kvm *vm;
> 
> 	hyp_spin_lock(&pkvm_global_lock);
> 	vm = pkvm_get_vm(handle);
> 	if (!vm || atomic_read(&vm->online_vcpus) <= vcpu_idx)
> 		goto unlock;
> 
> 	pkvm_vcpu = kvm_get_vcpu(vm, vcpu_idx);

kvm_get_vcpu() makes use of an xarray to hold the vCPUs pointers and this is
really something which we cannot support at EL2 where, amongst other things,
we do not have support for RCU. Consequently, we do need to keep our own
mapping from the shad^H^H^H^Hhyp vCPU to the host vCPU.

We also end up expanding the 'struct kvm_shadow_vcpu_state' structure later
to track additional vCPU state in the hypervisor, for example in the
mega-series:

https://lore.kernel.org/kvmarm/20220519134204.5379-78-will@kernel.org/#Z31arch:arm64:kvm:hyp:include:nvhe:pkvm.h

Cheers,

Will

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-07-20 18:48         ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-20 18:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, kernel-team, kvm, Catalin Marinas, Oliver Upton,
	Andy Lutomirski, linux-arm-kernel, Michael Roth, Chao Peng,
	kvmarm

Hi Sean,

On Tue, Jul 19, 2022 at 04:11:32PM +0000, Sean Christopherson wrote:
> Apologies for the slow reply.

No problem; you've provided a tonne of insightful feedback here, so it was
worth the wait. Thanks!

> On Fri, Jul 08, 2022, Will Deacon wrote:
> > but I wanted to inherit the broader cc list so you were aware of this
> > break-away series. Sadly, I don't think beefing up the commit messages would
> > get us to a point where somebody unfamiliar with the EL2 code already could
> > give a constructive review, but we can try to expand them a bit if you
> > genuinely think it would help.
> 
> I'm not looking at it just from a review point, but also from a future readers
> perspective.  E.g. someone that looks at this changelog in isolation is going to
> have no idea what a "shadow VM" is:
> 
>   KVM: arm64: Introduce pKVM shadow VM state at EL2
> 
>   Introduce a table of shadow VM structures at EL2 and provide hypercalls
>   to the host for creating and destroying shadow VMs.
> 
> Obviously there will be some context available in surrounding patches, but if you
> avoid the "shadow" terminology and provide a bit more context, then it yields
> something like:
> 
>   KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
> 
>   Introduce a global table (and lock) to track pKVM instances at EL2, and
>   provide hypercalls that can be used by the untrusted host to create and
>   destroy pKVM VMs.  pKVM VM/vCPU state is directly accessible only by the
>   trusted hypervisor (EL2).  
> 
>   Each pKVM VM is directly associated with an untrusted host KVM instance,
>   and is referenced by the host using an opaque handle.  Future patches will
>   provide hypercalls to allow the host to initialize/set/get pKVM VM/vCPU
>   state using the opaque handle.

Thanks, that's much better. I'll have to summon up the energy to go through
the others as well...

> > Perhaps we should s/shadow/hyp/ to make this a little clearer?
> 
> Or maybe just "pkvm"?

I think the "hyp" part is useful to distinguish the pkvm code running at EL2
from the pkvm code running at EL1. For example, we have a 'pkvm' member in
'struct kvm_arch' which is used by the _host_ at EL1.

So I'd say either "pkvm_hyp" or "hyp" instead of "shadow". The latter is
nice and short...

> I think that's especially viable if you do away with
> kvm_shadow_vcpu_state.  As of this series at least, kvm_shadow_vcpu_state is
> completely unnecessary.  kvm_vcpu.kvm can be used to get at the VM, and thus pKVM
> state via container_of().  Then the host_vcpu can be retrieved by using the
> vcpu_idx, e.g.
> 
> 	struct pkvm_vm *pkvm_vm = to_pkvm_vm(pkvm_vcpu->vm);
> 	struct kvm_vcpu *host_vcpu;
> 
> 	host_vcpu = kvm_get_vcpu(pkvm_vm->host_vm, pkvm_vcpu->vcpu_idx);

Using container_of() here is neat; we can definitely go ahead with that
change. However, looking at this in more detail with Fuad, removing
'struct kvm_shadow_vcpu_state' entirely isn't going to work:

> E.g. I believe you can make the code look like this:
> 
> struct kvm_arch {
> 	...
> 
> 	/*
> 	 * For an unstructed host VM, pkvm_handle is used to lookup the
> 	 * associated pKVM instance.
> 	 */
> 	pvk_handle_t pkvm_handle;
> };
> 
> struct pkvm_vm {
> 	struct kvm kvm;
> 
> 	/* Backpointer to the host's (untrusted) KVM instance. */
> 	struct kvm *host_kvm;
> 
> 	size_t donated_memory_size;
> 
> 	struct kvm_pgtable pgt;
> };
> 
> static struct kvm *pkvm_get_vm(pkvm_handle_t handle)
> {
> 	unsigned int idx = pkvm_handle_to_idx(handle);
> 
> 	if (unlikely(idx >= KVM_MAX_PVMS))
> 		return NULL;
> 
> 	return pkvm_vm_table[idx];
> }
> 
> struct kvm_vcpu *pkvm_vcpu_load(pkvm_handle_t handle, unsigned int vcpu_idx)
> {
> 	struct kvm_vpcu *pkvm_vcpu = NULL;
> 	struct kvm *vm;
> 
> 	hyp_spin_lock(&pkvm_global_lock);
> 	vm = pkvm_get_vm(handle);
> 	if (!vm || atomic_read(&vm->online_vcpus) <= vcpu_idx)
> 		goto unlock;
> 
> 	pkvm_vcpu = kvm_get_vcpu(vm, vcpu_idx);

kvm_get_vcpu() makes use of an xarray to hold the vCPUs pointers and this is
really something which we cannot support at EL2 where, amongst other things,
we do not have support for RCU. Consequently, we do need to keep our own
mapping from the shad^H^H^H^Hhyp vCPU to the host vCPU.

We also end up expanding the 'struct kvm_shadow_vcpu_state' structure later
to track additional vCPU state in the hypervisor, for example in the
mega-series:

https://lore.kernel.org/kvmarm/20220519134204.5379-78-will@kernel.org/#Z31arch:arm64:kvm:hyp:include:nvhe:pkvm.h

Cheers,

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-07-20 18:48         ` Will Deacon
  0 siblings, 0 replies; 135+ messages in thread
From: Will Deacon @ 2022-07-20 18:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvmarm, Ard Biesheuvel, Alexandru Elisei, Andy Lutomirski,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Michael Roth, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Hi Sean,

On Tue, Jul 19, 2022 at 04:11:32PM +0000, Sean Christopherson wrote:
> Apologies for the slow reply.

No problem; you've provided a tonne of insightful feedback here, so it was
worth the wait. Thanks!

> On Fri, Jul 08, 2022, Will Deacon wrote:
> > but I wanted to inherit the broader cc list so you were aware of this
> > break-away series. Sadly, I don't think beefing up the commit messages would
> > get us to a point where somebody unfamiliar with the EL2 code already could
> > give a constructive review, but we can try to expand them a bit if you
> > genuinely think it would help.
> 
> I'm not looking at it just from a review point, but also from a future readers
> perspective.  E.g. someone that looks at this changelog in isolation is going to
> have no idea what a "shadow VM" is:
> 
>   KVM: arm64: Introduce pKVM shadow VM state at EL2
> 
>   Introduce a table of shadow VM structures at EL2 and provide hypercalls
>   to the host for creating and destroying shadow VMs.
> 
> Obviously there will be some context available in surrounding patches, but if you
> avoid the "shadow" terminology and provide a bit more context, then it yields
> something like:
> 
>   KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
> 
>   Introduce a global table (and lock) to track pKVM instances at EL2, and
>   provide hypercalls that can be used by the untrusted host to create and
>   destroy pKVM VMs.  pKVM VM/vCPU state is directly accessible only by the
>   trusted hypervisor (EL2).  
> 
>   Each pKVM VM is directly associated with an untrusted host KVM instance,
>   and is referenced by the host using an opaque handle.  Future patches will
>   provide hypercalls to allow the host to initialize/set/get pKVM VM/vCPU
>   state using the opaque handle.

Thanks, that's much better. I'll have to summon up the energy to go through
the others as well...

> > Perhaps we should s/shadow/hyp/ to make this a little clearer?
> 
> Or maybe just "pkvm"?

I think the "hyp" part is useful to distinguish the pkvm code running at EL2
from the pkvm code running at EL1. For example, we have a 'pkvm' member in
'struct kvm_arch' which is used by the _host_ at EL1.

So I'd say either "pkvm_hyp" or "hyp" instead of "shadow". The latter is
nice and short...

> I think that's especially viable if you do away with
> kvm_shadow_vcpu_state.  As of this series at least, kvm_shadow_vcpu_state is
> completely unnecessary.  kvm_vcpu.kvm can be used to get at the VM, and thus pKVM
> state via container_of().  Then the host_vcpu can be retrieved by using the
> vcpu_idx, e.g.
> 
> 	struct pkvm_vm *pkvm_vm = to_pkvm_vm(pkvm_vcpu->vm);
> 	struct kvm_vcpu *host_vcpu;
> 
> 	host_vcpu = kvm_get_vcpu(pkvm_vm->host_vm, pkvm_vcpu->vcpu_idx);

Using container_of() here is neat; we can definitely go ahead with that
change. However, looking at this in more detail with Fuad, removing
'struct kvm_shadow_vcpu_state' entirely isn't going to work:

> E.g. I believe you can make the code look like this:
> 
> struct kvm_arch {
> 	...
> 
> 	/*
> 	 * For an unstructed host VM, pkvm_handle is used to lookup the
> 	 * associated pKVM instance.
> 	 */
> 	pvk_handle_t pkvm_handle;
> };
> 
> struct pkvm_vm {
> 	struct kvm kvm;
> 
> 	/* Backpointer to the host's (untrusted) KVM instance. */
> 	struct kvm *host_kvm;
> 
> 	size_t donated_memory_size;
> 
> 	struct kvm_pgtable pgt;
> };
> 
> static struct kvm *pkvm_get_vm(pkvm_handle_t handle)
> {
> 	unsigned int idx = pkvm_handle_to_idx(handle);
> 
> 	if (unlikely(idx >= KVM_MAX_PVMS))
> 		return NULL;
> 
> 	return pkvm_vm_table[idx];
> }
> 
> struct kvm_vcpu *pkvm_vcpu_load(pkvm_handle_t handle, unsigned int vcpu_idx)
> {
> 	struct kvm_vpcu *pkvm_vcpu = NULL;
> 	struct kvm *vm;
> 
> 	hyp_spin_lock(&pkvm_global_lock);
> 	vm = pkvm_get_vm(handle);
> 	if (!vm || atomic_read(&vm->online_vcpus) <= vcpu_idx)
> 		goto unlock;
> 
> 	pkvm_vcpu = kvm_get_vcpu(vm, vcpu_idx);

kvm_get_vcpu() makes use of an xarray to hold the vCPUs pointers and this is
really something which we cannot support at EL2 where, amongst other things,
we do not have support for RCU. Consequently, we do need to keep our own
mapping from the shad^H^H^H^Hhyp vCPU to the host vCPU.

We also end up expanding the 'struct kvm_shadow_vcpu_state' structure later
to track additional vCPU state in the hypervisor, for example in the
mega-series:

https://lore.kernel.org/kvmarm/20220519134204.5379-78-will@kernel.org/#Z31arch:arm64:kvm:hyp:include:nvhe:pkvm.h

Cheers,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
  2022-07-20 18:48         ` Will Deacon
  (?)
@ 2022-07-20 21:17           ` Sean Christopherson
  -1 siblings, 0 replies; 135+ messages in thread
From: Sean Christopherson @ 2022-07-20 21:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Alexandru Elisei, Andy Lutomirski,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Michael Roth, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

On Wed, Jul 20, 2022, Will Deacon wrote:
> Hi Sean,
> 
> On Tue, Jul 19, 2022 at 04:11:32PM +0000, Sean Christopherson wrote:
> > Or maybe just "pkvm"?
> 
> I think the "hyp" part is useful to distinguish the pkvm code running at EL2
> from the pkvm code running at EL1. For example, we have a 'pkvm' member in
> 'struct kvm_arch' which is used by the _host_ at EL1.

Right, my suggestion was to rename that to pkvm_handle to avoid a direct conflict,
and then that naturally yields the "pkvm_handle => pkvm_vm" association.  Or are
you expecting to shove more stuff into the that "pkvm" struct?
 
> So I'd say either "pkvm_hyp" or "hyp" instead of "shadow". The latter is
> nice and short...

I 100% agree that differentating between EL1 and EL2 is important for functions,
structs and global variables, but I would argue it's not so important for fields
and local variables where the "owning" struct/function provides that context.  But
that's actually a partial argument for just using "hyp".

My concern with just using e.g. "kvm_hyp" is that, because non-pKVM nVHE also has
the host vs. hyp split, it could lead people to believe that "kvm_hyp" is also
used for the non-pKVM case.

So, what about a blend?  E.g. "struct pkvm_hyp_vcpu *hyp_vcpu".  That provides
the context that the struct is specific to the EL2 side of pKVM, most usage is
nice and short, and the "hyp" prefix avoids the ambiguity that a bare "pkvm" would
suffer for EL1 vs. EL2.

Doesn't look awful?

static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
{
	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
	int ret;

	host_vcpu = kern_hyp_va(host_vcpu);

	if (unlikely(is_protected_kvm_enabled())) {
		struct pkvm_hyp_vcpu *hyp_vcpu;
		struct kvm *host_kvm;

		host_kvm = kern_hyp_va(host_vcpu->kvm);

		hyp_vcpu = pkvm_load_hyp_vcpu(host_kvm->arch.pkvm.handle,
					      host_vcpu->vcpu_idx);
		if (!hyp_vcpu) {
			ret = -EINVAL;
			goto out;
		}

		flush_pkvm_guest_state(hyp_vcpu);

		ret = __kvm_vcpu_run(shadow_vcpu);

		sync_pkvm_guest_state(hyp_vcpu);

		pkvm_put_hyp_vcpu(shadow_state);
	} else {
		/* The host is fully trusted, run its vCPU directly. */
		ret = __kvm_vcpu_run(host_vcpu);
	}

out:
	cpu_reg(host_ctxt, 1) =  ret;
}

	
 
> > I think that's especially viable if you do away with
> > kvm_shadow_vcpu_state.  As of this series at least, kvm_shadow_vcpu_state is
> > completely unnecessary.  kvm_vcpu.kvm can be used to get at the VM, and thus pKVM
> > state via container_of().  Then the host_vcpu can be retrieved by using the
> > vcpu_idx, e.g.
> > 
> > 	struct pkvm_vm *pkvm_vm = to_pkvm_vm(pkvm_vcpu->vm);
> > 	struct kvm_vcpu *host_vcpu;
> > 
> > 	host_vcpu = kvm_get_vcpu(pkvm_vm->host_vm, pkvm_vcpu->vcpu_idx);
> 
> Using container_of() here is neat; we can definitely go ahead with that
> change. However, looking at this in more detail with Fuad, removing
> 'struct kvm_shadow_vcpu_state' entirely isn't going to work:

> > struct kvm_vcpu *pkvm_vcpu_load(pkvm_handle_t handle, unsigned int vcpu_idx)
> > {
> > 	struct kvm_vpcu *pkvm_vcpu = NULL;
> > 	struct kvm *vm;
> > 
> > 	hyp_spin_lock(&pkvm_global_lock);
> > 	vm = pkvm_get_vm(handle);
> > 	if (!vm || atomic_read(&vm->online_vcpus) <= vcpu_idx)
> > 		goto unlock;
> > 
> > 	pkvm_vcpu = kvm_get_vcpu(vm, vcpu_idx);
> 
> kvm_get_vcpu() makes use of an xarray to hold the vCPUs pointers and this is
> really something which we cannot support at EL2 where, amongst other things,
> we do not have support for RCU. Consequently, we do need to keep our own
> mapping from the shad^H^H^H^Hhyp vCPU to the host vCPU.

Hmm, are there guardrails in place to prevent using "unsafe" fields from "struct kvm"
and "struct kvm_vcpu" at EL2?  If not, it seems like embedding the common structs
in the hyp/pkvm-specific structs is going bite us in the rear at some point.

Mostly out of curiosity, I assume the EL2 restriction only applies to nVHE mode?

And waaaay off topic, has anyone explored adding macro magic to generate wrappers
to (un)marshall registers to parameters/returns for the hyp functions?  E.g. it'd
be neat if you could make the code look like this without having to add a wrapper
for every function:

static int handle___kvm_vcpu_run(unsigned long __host_vcpu)
{
	struct kvm_vcpu *host_vcpu = kern_hyp_va(__host_vcpu);
	int ret;

	if (unlikely(is_protected_kvm_enabled())) {
		struct pkvm_hyp_vcpu *hyp_vcpu;
		struct kvm *host_kvm;

		host_kvm = kern_hyp_va(host_vcpu->kvm);

		hyp_vcpu = pkvm_load_hyp_vcpu(host_kvm->arch.pkvm.handle,
					      host_vcpu->vcpu_idx);
		if (!hyp_vcpu)
			return -EINVAL;

		flush_hypervisor_state(hyp_vcpu);

		ret = __kvm_vcpu_run(shadow_vcpu);

		sync_hypervisor_state(hyp_vcpu);
		pkvm_put_hyp_vcpu(shadow_state);
	} else {
		/* The host is fully trusted, run its vCPU directly. */
		ret = __kvm_vcpu_run(host_vcpu);
	}
	return ret;
}

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-07-20 21:17           ` Sean Christopherson
  0 siblings, 0 replies; 135+ messages in thread
From: Sean Christopherson @ 2022-07-20 21:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kernel-team, kvm, Catalin Marinas, Oliver Upton,
	Andy Lutomirski, linux-arm-kernel, Michael Roth, Chao Peng,
	kvmarm

On Wed, Jul 20, 2022, Will Deacon wrote:
> Hi Sean,
> 
> On Tue, Jul 19, 2022 at 04:11:32PM +0000, Sean Christopherson wrote:
> > Or maybe just "pkvm"?
> 
> I think the "hyp" part is useful to distinguish the pkvm code running at EL2
> from the pkvm code running at EL1. For example, we have a 'pkvm' member in
> 'struct kvm_arch' which is used by the _host_ at EL1.

Right, my suggestion was to rename that to pkvm_handle to avoid a direct conflict,
and then that naturally yields the "pkvm_handle => pkvm_vm" association.  Or are
you expecting to shove more stuff into the that "pkvm" struct?
 
> So I'd say either "pkvm_hyp" or "hyp" instead of "shadow". The latter is
> nice and short...

I 100% agree that differentating between EL1 and EL2 is important for functions,
structs and global variables, but I would argue it's not so important for fields
and local variables where the "owning" struct/function provides that context.  But
that's actually a partial argument for just using "hyp".

My concern with just using e.g. "kvm_hyp" is that, because non-pKVM nVHE also has
the host vs. hyp split, it could lead people to believe that "kvm_hyp" is also
used for the non-pKVM case.

So, what about a blend?  E.g. "struct pkvm_hyp_vcpu *hyp_vcpu".  That provides
the context that the struct is specific to the EL2 side of pKVM, most usage is
nice and short, and the "hyp" prefix avoids the ambiguity that a bare "pkvm" would
suffer for EL1 vs. EL2.

Doesn't look awful?

static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
{
	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
	int ret;

	host_vcpu = kern_hyp_va(host_vcpu);

	if (unlikely(is_protected_kvm_enabled())) {
		struct pkvm_hyp_vcpu *hyp_vcpu;
		struct kvm *host_kvm;

		host_kvm = kern_hyp_va(host_vcpu->kvm);

		hyp_vcpu = pkvm_load_hyp_vcpu(host_kvm->arch.pkvm.handle,
					      host_vcpu->vcpu_idx);
		if (!hyp_vcpu) {
			ret = -EINVAL;
			goto out;
		}

		flush_pkvm_guest_state(hyp_vcpu);

		ret = __kvm_vcpu_run(shadow_vcpu);

		sync_pkvm_guest_state(hyp_vcpu);

		pkvm_put_hyp_vcpu(shadow_state);
	} else {
		/* The host is fully trusted, run its vCPU directly. */
		ret = __kvm_vcpu_run(host_vcpu);
	}

out:
	cpu_reg(host_ctxt, 1) =  ret;
}

	
 
> > I think that's especially viable if you do away with
> > kvm_shadow_vcpu_state.  As of this series at least, kvm_shadow_vcpu_state is
> > completely unnecessary.  kvm_vcpu.kvm can be used to get at the VM, and thus pKVM
> > state via container_of().  Then the host_vcpu can be retrieved by using the
> > vcpu_idx, e.g.
> > 
> > 	struct pkvm_vm *pkvm_vm = to_pkvm_vm(pkvm_vcpu->vm);
> > 	struct kvm_vcpu *host_vcpu;
> > 
> > 	host_vcpu = kvm_get_vcpu(pkvm_vm->host_vm, pkvm_vcpu->vcpu_idx);
> 
> Using container_of() here is neat; we can definitely go ahead with that
> change. However, looking at this in more detail with Fuad, removing
> 'struct kvm_shadow_vcpu_state' entirely isn't going to work:

> > struct kvm_vcpu *pkvm_vcpu_load(pkvm_handle_t handle, unsigned int vcpu_idx)
> > {
> > 	struct kvm_vpcu *pkvm_vcpu = NULL;
> > 	struct kvm *vm;
> > 
> > 	hyp_spin_lock(&pkvm_global_lock);
> > 	vm = pkvm_get_vm(handle);
> > 	if (!vm || atomic_read(&vm->online_vcpus) <= vcpu_idx)
> > 		goto unlock;
> > 
> > 	pkvm_vcpu = kvm_get_vcpu(vm, vcpu_idx);
> 
> kvm_get_vcpu() makes use of an xarray to hold the vCPUs pointers and this is
> really something which we cannot support at EL2 where, amongst other things,
> we do not have support for RCU. Consequently, we do need to keep our own
> mapping from the shad^H^H^H^Hhyp vCPU to the host vCPU.

Hmm, are there guardrails in place to prevent using "unsafe" fields from "struct kvm"
and "struct kvm_vcpu" at EL2?  If not, it seems like embedding the common structs
in the hyp/pkvm-specific structs is going bite us in the rear at some point.

Mostly out of curiosity, I assume the EL2 restriction only applies to nVHE mode?

And waaaay off topic, has anyone explored adding macro magic to generate wrappers
to (un)marshall registers to parameters/returns for the hyp functions?  E.g. it'd
be neat if you could make the code look like this without having to add a wrapper
for every function:

static int handle___kvm_vcpu_run(unsigned long __host_vcpu)
{
	struct kvm_vcpu *host_vcpu = kern_hyp_va(__host_vcpu);
	int ret;

	if (unlikely(is_protected_kvm_enabled())) {
		struct pkvm_hyp_vcpu *hyp_vcpu;
		struct kvm *host_kvm;

		host_kvm = kern_hyp_va(host_vcpu->kvm);

		hyp_vcpu = pkvm_load_hyp_vcpu(host_kvm->arch.pkvm.handle,
					      host_vcpu->vcpu_idx);
		if (!hyp_vcpu)
			return -EINVAL;

		flush_hypervisor_state(hyp_vcpu);

		ret = __kvm_vcpu_run(shadow_vcpu);

		sync_hypervisor_state(hyp_vcpu);
		pkvm_put_hyp_vcpu(shadow_state);
	} else {
		/* The host is fully trusted, run its vCPU directly. */
		ret = __kvm_vcpu_run(host_vcpu);
	}
	return ret;
}
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2
@ 2022-07-20 21:17           ` Sean Christopherson
  0 siblings, 0 replies; 135+ messages in thread
From: Sean Christopherson @ 2022-07-20 21:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Alexandru Elisei, Andy Lutomirski,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Michael Roth, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

On Wed, Jul 20, 2022, Will Deacon wrote:
> Hi Sean,
> 
> On Tue, Jul 19, 2022 at 04:11:32PM +0000, Sean Christopherson wrote:
> > Or maybe just "pkvm"?
> 
> I think the "hyp" part is useful to distinguish the pkvm code running at EL2
> from the pkvm code running at EL1. For example, we have a 'pkvm' member in
> 'struct kvm_arch' which is used by the _host_ at EL1.

Right, my suggestion was to rename that to pkvm_handle to avoid a direct conflict,
and then that naturally yields the "pkvm_handle => pkvm_vm" association.  Or are
you expecting to shove more stuff into the that "pkvm" struct?
 
> So I'd say either "pkvm_hyp" or "hyp" instead of "shadow". The latter is
> nice and short...

I 100% agree that differentating between EL1 and EL2 is important for functions,
structs and global variables, but I would argue it's not so important for fields
and local variables where the "owning" struct/function provides that context.  But
that's actually a partial argument for just using "hyp".

My concern with just using e.g. "kvm_hyp" is that, because non-pKVM nVHE also has
the host vs. hyp split, it could lead people to believe that "kvm_hyp" is also
used for the non-pKVM case.

So, what about a blend?  E.g. "struct pkvm_hyp_vcpu *hyp_vcpu".  That provides
the context that the struct is specific to the EL2 side of pKVM, most usage is
nice and short, and the "hyp" prefix avoids the ambiguity that a bare "pkvm" would
suffer for EL1 vs. EL2.

Doesn't look awful?

static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
{
	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
	int ret;

	host_vcpu = kern_hyp_va(host_vcpu);

	if (unlikely(is_protected_kvm_enabled())) {
		struct pkvm_hyp_vcpu *hyp_vcpu;
		struct kvm *host_kvm;

		host_kvm = kern_hyp_va(host_vcpu->kvm);

		hyp_vcpu = pkvm_load_hyp_vcpu(host_kvm->arch.pkvm.handle,
					      host_vcpu->vcpu_idx);
		if (!hyp_vcpu) {
			ret = -EINVAL;
			goto out;
		}

		flush_pkvm_guest_state(hyp_vcpu);

		ret = __kvm_vcpu_run(shadow_vcpu);

		sync_pkvm_guest_state(hyp_vcpu);

		pkvm_put_hyp_vcpu(shadow_state);
	} else {
		/* The host is fully trusted, run its vCPU directly. */
		ret = __kvm_vcpu_run(host_vcpu);
	}

out:
	cpu_reg(host_ctxt, 1) =  ret;
}

	
 
> > I think that's especially viable if you do away with
> > kvm_shadow_vcpu_state.  As of this series at least, kvm_shadow_vcpu_state is
> > completely unnecessary.  kvm_vcpu.kvm can be used to get at the VM, and thus pKVM
> > state via container_of().  Then the host_vcpu can be retrieved by using the
> > vcpu_idx, e.g.
> > 
> > 	struct pkvm_vm *pkvm_vm = to_pkvm_vm(pkvm_vcpu->vm);
> > 	struct kvm_vcpu *host_vcpu;
> > 
> > 	host_vcpu = kvm_get_vcpu(pkvm_vm->host_vm, pkvm_vcpu->vcpu_idx);
> 
> Using container_of() here is neat; we can definitely go ahead with that
> change. However, looking at this in more detail with Fuad, removing
> 'struct kvm_shadow_vcpu_state' entirely isn't going to work:

> > struct kvm_vcpu *pkvm_vcpu_load(pkvm_handle_t handle, unsigned int vcpu_idx)
> > {
> > 	struct kvm_vpcu *pkvm_vcpu = NULL;
> > 	struct kvm *vm;
> > 
> > 	hyp_spin_lock(&pkvm_global_lock);
> > 	vm = pkvm_get_vm(handle);
> > 	if (!vm || atomic_read(&vm->online_vcpus) <= vcpu_idx)
> > 		goto unlock;
> > 
> > 	pkvm_vcpu = kvm_get_vcpu(vm, vcpu_idx);
> 
> kvm_get_vcpu() makes use of an xarray to hold the vCPUs pointers and this is
> really something which we cannot support at EL2 where, amongst other things,
> we do not have support for RCU. Consequently, we do need to keep our own
> mapping from the shad^H^H^H^Hhyp vCPU to the host vCPU.

Hmm, are there guardrails in place to prevent using "unsafe" fields from "struct kvm"
and "struct kvm_vcpu" at EL2?  If not, it seems like embedding the common structs
in the hyp/pkvm-specific structs is going bite us in the rear at some point.

Mostly out of curiosity, I assume the EL2 restriction only applies to nVHE mode?

And waaaay off topic, has anyone explored adding macro magic to generate wrappers
to (un)marshall registers to parameters/returns for the hyp functions?  E.g. it'd
be neat if you could make the code look like this without having to add a wrapper
for every function:

static int handle___kvm_vcpu_run(unsigned long __host_vcpu)
{
	struct kvm_vcpu *host_vcpu = kern_hyp_va(__host_vcpu);
	int ret;

	if (unlikely(is_protected_kvm_enabled())) {
		struct pkvm_hyp_vcpu *hyp_vcpu;
		struct kvm *host_kvm;

		host_kvm = kern_hyp_va(host_vcpu->kvm);

		hyp_vcpu = pkvm_load_hyp_vcpu(host_kvm->arch.pkvm.handle,
					      host_vcpu->vcpu_idx);
		if (!hyp_vcpu)
			return -EINVAL;

		flush_hypervisor_state(hyp_vcpu);

		ret = __kvm_vcpu_run(shadow_vcpu);

		sync_hypervisor_state(hyp_vcpu);
		pkvm_put_hyp_vcpu(shadow_state);
	} else {
		/* The host is fully trusted, run its vCPU directly. */
		ret = __kvm_vcpu_run(host_vcpu);
	}
	return ret;
}

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 06/24] KVM: arm64: Unify identifiers used to distinguish host and hypervisor
  2022-07-20 18:14       ` Will Deacon
  (?)
@ 2022-07-29 19:28         ` Oliver Upton
  -1 siblings, 0 replies; 135+ messages in thread
From: Oliver Upton @ 2022-07-29 19:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Hi Will,

Sorry, I didn't see your reply til now.

On Wed, Jul 20, 2022 at 07:14:07PM +0100, Will Deacon wrote:
> Hi Oliver,
> 
> Thanks for having a look.
> 
> On Wed, Jul 20, 2022 at 03:11:04PM +0000, Oliver Upton wrote:
> > On Thu, Jun 30, 2022 at 02:57:29PM +0100, Will Deacon wrote:
> > > The 'pkvm_component_id' enum type provides constants to refer to the
> > > host and the hypervisor, yet this information is duplicated by the
> > > 'pkvm_hyp_id' constant.
> > > 
> > > Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
> > > type definition to 'mem_protect.h' so that it can be used outside of
> > > the memory protection code.
> > > 
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +++++-
> > >  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 8 --------
> > >  arch/arm64/kvm/hyp/nvhe/setup.c               | 2 +-
> > >  3 files changed, 6 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > index 80e99836eac7..f5705a1e972f 100644
> > > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > @@ -51,7 +51,11 @@ struct host_kvm {
> > >  };
> > >  extern struct host_kvm host_kvm;
> > >  
> > > -extern const u8 pkvm_hyp_id;
> > > +/* This corresponds to page-table locking order */
> > > +enum pkvm_component_id {
> > > +	PKVM_ID_HOST,
> > > +	PKVM_ID_HYP,
> > > +};
> > 
> > Since we have the concept of PTE ownership in pgtable.c, WDYT about
> > moving the owner ID enumeration there? KVM_MAX_OWNER_ID should be
> > incorporated in the enum too.
> 
> Interesting idea... I think we need the definition in a header file so that
> it can be used by mem_protect.c, so I'm not entirely sure where you'd like
> to see it moved.
> 
> The main worry I have is that if we ever need to distinguish e.g. one guest
> instance from another, which is likely needed for sharing of memory
> between more than just two components, then the pgtable code really cares
> about the number of instances ("which guest is it?") whilst the mem_protect
> cares about the component type ("is it a guest?").
> 
> Finally, the pgtable code is also used outside of pKVM so, although the
> concept of ownership doesn't yet apply elsewhere, keeping the concept
> available without dictacting the different types of owners makes sense to
> me.

Sorry, it was a silly suggestion to wedge the enum there. I don't think
it matters too much where it winds up, but something like:

  enum kvm_pgtable_owner_id {
  	OWNER_ID_PKVM_HOST,
	OWNER_ID_PKVM_HYP,
	NR_PGTABLE_OWNER_IDS,
  }

And put it somewhere that both pgtable.c and mem_protect.c can get at
it. That way bound checks (like in kvm_pgtable_stage2_set_owner())
organically work as new IDs are added.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 06/24] KVM: arm64: Unify identifiers used to distinguish host and hypervisor
@ 2022-07-29 19:28         ` Oliver Upton
  0 siblings, 0 replies; 135+ messages in thread
From: Oliver Upton @ 2022-07-29 19:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kernel-team, kvm, Andy Lutomirski,
	linux-arm-kernel, Michael Roth, Catalin Marinas, Chao Peng,
	kvmarm

Hi Will,

Sorry, I didn't see your reply til now.

On Wed, Jul 20, 2022 at 07:14:07PM +0100, Will Deacon wrote:
> Hi Oliver,
> 
> Thanks for having a look.
> 
> On Wed, Jul 20, 2022 at 03:11:04PM +0000, Oliver Upton wrote:
> > On Thu, Jun 30, 2022 at 02:57:29PM +0100, Will Deacon wrote:
> > > The 'pkvm_component_id' enum type provides constants to refer to the
> > > host and the hypervisor, yet this information is duplicated by the
> > > 'pkvm_hyp_id' constant.
> > > 
> > > Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
> > > type definition to 'mem_protect.h' so that it can be used outside of
> > > the memory protection code.
> > > 
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +++++-
> > >  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 8 --------
> > >  arch/arm64/kvm/hyp/nvhe/setup.c               | 2 +-
> > >  3 files changed, 6 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > index 80e99836eac7..f5705a1e972f 100644
> > > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > @@ -51,7 +51,11 @@ struct host_kvm {
> > >  };
> > >  extern struct host_kvm host_kvm;
> > >  
> > > -extern const u8 pkvm_hyp_id;
> > > +/* This corresponds to page-table locking order */
> > > +enum pkvm_component_id {
> > > +	PKVM_ID_HOST,
> > > +	PKVM_ID_HYP,
> > > +};
> > 
> > Since we have the concept of PTE ownership in pgtable.c, WDYT about
> > moving the owner ID enumeration there? KVM_MAX_OWNER_ID should be
> > incorporated in the enum too.
> 
> Interesting idea... I think we need the definition in a header file so that
> it can be used by mem_protect.c, so I'm not entirely sure where you'd like
> to see it moved.
> 
> The main worry I have is that if we ever need to distinguish e.g. one guest
> instance from another, which is likely needed for sharing of memory
> between more than just two components, then the pgtable code really cares
> about the number of instances ("which guest is it?") whilst the mem_protect
> cares about the component type ("is it a guest?").
> 
> Finally, the pgtable code is also used outside of pKVM so, although the
> concept of ownership doesn't yet apply elsewhere, keeping the concept
> available without dictacting the different types of owners makes sense to
> me.

Sorry, it was a silly suggestion to wedge the enum there. I don't think
it matters too much where it winds up, but something like:

  enum kvm_pgtable_owner_id {
  	OWNER_ID_PKVM_HOST,
	OWNER_ID_PKVM_HYP,
	NR_PGTABLE_OWNER_IDS,
  }

And put it somewhere that both pgtable.c and mem_protect.c can get at
it. That way bound checks (like in kvm_pgtable_stage2_set_owner())
organically work as new IDs are added.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 06/24] KVM: arm64: Unify identifiers used to distinguish host and hypervisor
@ 2022-07-29 19:28         ` Oliver Upton
  0 siblings, 0 replies; 135+ messages in thread
From: Oliver Upton @ 2022-07-29 19:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Ard Biesheuvel, Sean Christopherson, Alexandru Elisei,
	Andy Lutomirski, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Michael Roth, Mark Rutland,
	Fuad Tabba, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Hi Will,

Sorry, I didn't see your reply til now.

On Wed, Jul 20, 2022 at 07:14:07PM +0100, Will Deacon wrote:
> Hi Oliver,
> 
> Thanks for having a look.
> 
> On Wed, Jul 20, 2022 at 03:11:04PM +0000, Oliver Upton wrote:
> > On Thu, Jun 30, 2022 at 02:57:29PM +0100, Will Deacon wrote:
> > > The 'pkvm_component_id' enum type provides constants to refer to the
> > > host and the hypervisor, yet this information is duplicated by the
> > > 'pkvm_hyp_id' constant.
> > > 
> > > Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
> > > type definition to 'mem_protect.h' so that it can be used outside of
> > > the memory protection code.
> > > 
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +++++-
> > >  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 8 --------
> > >  arch/arm64/kvm/hyp/nvhe/setup.c               | 2 +-
> > >  3 files changed, 6 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > index 80e99836eac7..f5705a1e972f 100644
> > > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > > @@ -51,7 +51,11 @@ struct host_kvm {
> > >  };
> > >  extern struct host_kvm host_kvm;
> > >  
> > > -extern const u8 pkvm_hyp_id;
> > > +/* This corresponds to page-table locking order */
> > > +enum pkvm_component_id {
> > > +	PKVM_ID_HOST,
> > > +	PKVM_ID_HYP,
> > > +};
> > 
> > Since we have the concept of PTE ownership in pgtable.c, WDYT about
> > moving the owner ID enumeration there? KVM_MAX_OWNER_ID should be
> > incorporated in the enum too.
> 
> Interesting idea... I think we need the definition in a header file so that
> it can be used by mem_protect.c, so I'm not entirely sure where you'd like
> to see it moved.
> 
> The main worry I have is that if we ever need to distinguish e.g. one guest
> instance from another, which is likely needed for sharing of memory
> between more than just two components, then the pgtable code really cares
> about the number of instances ("which guest is it?") whilst the mem_protect
> cares about the component type ("is it a guest?").
> 
> Finally, the pgtable code is also used outside of pKVM so, although the
> concept of ownership doesn't yet apply elsewhere, keeping the concept
> available without dictacting the different types of owners makes sense to
> me.

Sorry, it was a silly suggestion to wedge the enum there. I don't think
it matters too much where it winds up, but something like:

  enum kvm_pgtable_owner_id {
  	OWNER_ID_PKVM_HOST,
	OWNER_ID_PKVM_HYP,
	NR_PGTABLE_OWNER_IDS,
  }

And put it somewhere that both pgtable.c and mem_protect.c can get at
it. That way bound checks (like in kvm_pgtable_stage2_set_owner())
organically work as new IDs are added.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 135+ messages in thread

end of thread, other threads:[~2022-07-29 19:29 UTC | newest]

Thread overview: 135+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-30 13:57 [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2 Will Deacon
2022-06-30 13:57 ` Will Deacon
2022-06-30 13:57 ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 01/24] KVM: arm64: Move hyp refcount manipulation helpers Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 02/24] KVM: arm64: Allow non-coalescable pages in a hyp_pool Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 03/24] KVM: arm64: Add flags to struct hyp_page Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-07-18 10:54   ` Vincent Donnefort
2022-07-18 10:54     ` Vincent Donnefort
2022-07-18 10:54     ` Vincent Donnefort
2022-07-18 10:57     ` Vincent Donnefort
2022-07-18 10:57       ` Vincent Donnefort
2022-07-18 10:57       ` Vincent Donnefort
2022-06-30 13:57 ` [PATCH v2 04/24] KVM: arm64: Back hyp_vmemmap for all of memory Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 05/24] KVM: arm64: Make hyp stage-1 refcnt correct on the whole range Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 06/24] KVM: arm64: Unify identifiers used to distinguish host and hypervisor Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-07-20 15:11   ` Oliver Upton
2022-07-20 15:11     ` Oliver Upton
2022-07-20 15:11     ` Oliver Upton
2022-07-20 18:14     ` Will Deacon
2022-07-20 18:14       ` Will Deacon
2022-07-20 18:14       ` Will Deacon
2022-07-29 19:28       ` Oliver Upton
2022-07-29 19:28         ` Oliver Upton
2022-07-29 19:28         ` Oliver Upton
2022-06-30 13:57 ` [PATCH v2 07/24] KVM: arm64: Implement do_donate() helper for donating memory Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 08/24] KVM: arm64: Prevent the donation of no-map pages Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 09/24] KVM: arm64: Add helpers to pin memory shared with hyp Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 10/24] KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 11/24] KVM: arm64: Add hyp_spinlock_t static initializer Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 12/24] KVM: arm64: Introduce shadow VM state at EL2 Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-07-18 18:40   ` Vincent Donnefort
2022-07-18 18:40     ` Vincent Donnefort
2022-07-18 18:40     ` Vincent Donnefort
2022-07-19  9:41     ` Marc Zyngier
2022-07-19  9:41       ` Marc Zyngier
2022-07-19  9:41       ` Marc Zyngier
2022-07-20 18:20     ` Will Deacon
2022-07-20 18:20       ` Will Deacon
2022-07-20 18:20       ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 13/24] KVM: arm64: Instantiate VM shadow data from EL1 Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 14/24] KVM: arm64: Add pcpu fixmap infrastructure at EL2 Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-07-19 13:30   ` Vincent Donnefort
2022-07-19 13:30     ` Vincent Donnefort
2022-07-19 13:30     ` Vincent Donnefort
2022-07-19 14:09     ` Quentin Perret
2022-07-19 14:09       ` Quentin Perret
2022-07-19 14:09       ` Quentin Perret
2022-07-19 14:10       ` Quentin Perret
2022-07-19 14:10         ` Quentin Perret
2022-07-19 14:10         ` Quentin Perret
2022-06-30 13:57 ` [PATCH v2 15/24] KVM: arm64: Initialise hyp symbols regardless of pKVM Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 16/24] KVM: arm64: Provide I-cache invalidation by VA at EL2 Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 17/24] KVM: arm64: Add generic hyp_memcache helpers Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 18/24] KVM: arm64: Instantiate guest stage-2 page-tables at EL2 Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-07-19 13:32   ` Vincent Donnefort
2022-07-19 13:32     ` Vincent Donnefort
2022-07-19 13:32     ` Vincent Donnefort
2022-07-20 18:26     ` Will Deacon
2022-07-20 18:26       ` Will Deacon
2022-07-20 18:26       ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 19/24] KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 20/24] KVM: arm64: Unmap kvm_arm_hyp_percpu_base from the host Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 21/24] KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2 Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 22/24] KVM: arm64: Explicitly map kvm_vgic_global_state " Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [PATCH v2 23/24] KVM: arm64: Don't map host sections in pkvm Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57 ` [RFC PATCH v2 24/24] KVM: arm64: Use the shadow vCPU structure in handle___kvm_vcpu_run() Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-06-30 13:57   ` Will Deacon
2022-07-06 19:17 ` [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2 Sean Christopherson
2022-07-06 19:17   ` Sean Christopherson
2022-07-06 19:17   ` Sean Christopherson
2022-07-08 16:23   ` Will Deacon
2022-07-08 16:23     ` Will Deacon
2022-07-08 16:23     ` Will Deacon
2022-07-19 16:11     ` Sean Christopherson
2022-07-19 16:11       ` Sean Christopherson
2022-07-19 16:11       ` Sean Christopherson
2022-07-20  9:25       ` Marc Zyngier
2022-07-20  9:25         ` Marc Zyngier
2022-07-20  9:25         ` Marc Zyngier
2022-07-20 18:48       ` Will Deacon
2022-07-20 18:48         ` Will Deacon
2022-07-20 18:48         ` Will Deacon
2022-07-20 21:17         ` Sean Christopherson
2022-07-20 21:17           ` Sean Christopherson
2022-07-20 21:17           ` Sean Christopherson
2022-07-19 14:24 ` Vincent Donnefort
2022-07-19 14:24   ` Vincent Donnefort
2022-07-19 14:24   ` Vincent Donnefort

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.