All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-08 19:38 ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

[ mm folks: You are being cc'd since this series includes a mm patch
  ("mm: Introduce architecture-neutral PG_LEVEL macros"), but general
  feedback is also welcome. I imagine there are a lot of lessons KVM can
  learn from mm about sharing page table code across architectures. ]

Hello,

This series refactors the KVM/x86 "TDP MMU" into common code. This is
the first step toward sharing TDP (aka Stage-2) page table management
code across architectures that support KVM. For more background on this
effort please see my talk from KVM Forum 2022 "Exploring an
architecture-neutral MMU":

  https://youtu.be/IBhW34fCFi0

By the end of this series, 90% of the TDP MMU code is in common directories
(virt/kvm/mmu/ and include/kvm/). The only pieces that remaing in
arch/x86 are code that deals with constructing/inspecting/modifying PTEs
and arch hooks to implement NX Huge Pages (a mitigation for an
Intel-specific vulnerability).

Before:

  180 arch/x86/kvm/mmu/tdp_iter.c
  118 arch/x86/kvm/mmu/tdp_iter.h
 1917 arch/x86/kvm/mmu/tdp_mmu.c
   98 arch/x86/kvm/mmu/tdp_mmu.h
 ----
 2313 total

After:

  178 virt/kvm/mmu/tdp_iter.c
 1867 virt/kvm/mmu/tdp_mmu.c
  117 include/kvm/tdp_iter.h
   78 include/kvm/tdp_mmu.h
   39 include/kvm/tdp_pgtable.h
 ----
  184 arch/x86/kvm/mmu/tdp_pgtable.c
   76 arch/x86/include/asm/kvm/tdp_pgtable.h
 ----
 2539 total

This series is very much an RFC, but it does build (I tested x86_64 and
ARM64) and pass basic testing (KVM selftests and kvm-unit-tests on
x86_64), so it is entirely functional aside from any bugs.

The main areas I would like feedback are:

 - NX Huge Pages support in the TDP MMU requires 5 arch hooks in
   the common code, which IMO makes the NX Huge Pages implementation
   harder to read. The alternative is to move the NX Huge Pages
   implementation into common code, including the fields in struct
   kvm_mmu_page and kvm_page_fault, which would increase memory usage
   a tiny bit (for non-x86 architectures) and pollute the common code
   with an x86-specific security mitigation. Ideas on better ways to
   handle this would be appreciated.

 - struct kvm_mmu_page increased by 64 bytes because the separation of
   arch and common state eliminated the ability to use unions to
   optimize the size of the struct. There's two things we can do to
   reduce the size of the struct back down: (1) dynamically allocated
   root-specific fields only for root page tables and (2) dynamically
   allocate Shadow MMU state in kvm_mmu_page_arch only for Shadow MMU
   pages. This should actually be a net *reduction* the size of
   kvm_mmu_page relative today for most pages, but I have not
   implemented it.

   Note that an alternative approach I implemented avoided this problem
   by creating an entirely separate struct for the common TDP MMU (e.g.
   struct tdp_mmu_page). This however had a lot of downsides that I
   don't think make it a good solution. Notably, it complicated a ton of
   existing code in arch/x86/kvm/mmu/mmu.c (e.g. anything that touches
   vcpu->arch.mmu->root and kvm_recover_nx_huge_pages()) and created a
   new runtime failure mode in to_shadow_page().

 - Naming. This series does not change the names of any existing code.
   So all the KVM/x86 Shadow MMU-style terminology like
   "shadow_page"/"sp"/"spte" persists. Should we keep that style in
   common code or move toward something less shadow-paging-specific?
   e.g. "page_table"/"pt"/"pte". Also do we want to keep "TDP" or switch
   to something more familiar across architectures (e.g. ARM and RISC-V
   both use "Stage-2")?

Additionally, there are some warts to be aware of. For these I think
they can be addressed in future series, since they only really matter
once we are ready to enable the common TDP MMU on a non-x86
architecture.

 - Tracepoints. For now the common MMU continues to use the x86
   tracepoints code and they are just stubbed (no-ops) for other
   architectures.

 - tdp_mmu_max_mapping_level() and tdp_mmu_max_gfn_exclusive() are
   currently arch hooks but they can probably be made common code at
   some point.

Lastly, I still need to verify that there are no negative performance
impacts of the changes in this series. My main concern is the new
tdp_pte_*() functions adds overhead from not being able to be inlined.

This series applies on top of kvm/queue commit 89b239585965 ("KVM:
x86/mmu: Pivot on "TDP MMU enabled" when handling direct page faults"),
since there are several recent series in kvm/queue that affect this
refactor. A revert of 0c2a04128f50 ("KVM: x86: remove unnecessary
exports") is also needed since it breaks the build on x86 (unrelated to
this refactor).

Thanks.

P.S. Looking to the future... This is just the first step toward
building a common TDP MMU for KVM. After this, We are looking at adding
KUnit testing to the common TDP MMU as a way to offset the risk of
sharing more code across architectures, and then targeting RISC-V as the
first non-x86 architecture to use the common TDP MMU. If any RISC-V
developer is interested in working on the port, please reach out.

David Matlack (36):
  KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  KVM: MMU: Move struct kvm_mmu_page_role into common code
  KVM: MMU: Move tdp_ptep_t into common code
  KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
  KVM: x86/mmu: Unify TDP MMU and Shadow MMU root refcounts
  KVM: MMU: Move struct kvm_mmu_page to common code
  mm: Introduce architecture-neutral PG_LEVEL macros
  KVM: Move page size stats into common code
  KVM: MMU: Move struct kvm_page_fault to common code
  KVM: MMU: Move RET_PF_* into common code
  KVM: x86/mmu: Use PG_LEVEL_{PTE,PMD,PUD} in the TDP MMU
  KVM: MMU: Move sptep_to_sp() to common code
  KVM: MMU: Introduce common macros for TDP page tables
  KVM: x86/mmu: Add a common API for inspecting/modifying TDP PTEs
  KVM: x86/mmu: Abstract away TDP MMU root lookup
  KVM: Move struct kvm_gfn_range to kvm_types.h
  KVM: x86/mmu: Add common API for creating TDP PTEs
  KVM: x86/mmu: Add arch hooks for NX Huge Pages
  KVM: x86/mmu: Abstract away computing the max mapping level
  KVM: Introduce CONFIG_HAVE_TDP_MMU
  KVM: x86: Select HAVE_TDP_MMU if X86_64
  KVM: MMU: Move VM-level TDP MMU state to struct kvm
  KVM: x86/mmu: Move kvm_mmu_hugepage_adjust() up to fault handler
  KVM: x86/mmu: Pass root role to kvm_tdp_mmu_get_vcpu_root_hpa()
  KVM: Move page table cache to struct kvm_vcpu
  KVM: MMU: Move mmu_page_header_cache to common code
  KVM: MMU: Stub out tracepoints on non-x86 architectures
  KVM: x86/mmu: Collapse kvm_flush_remote_tlbs_with_{range,address}()
    together
  KVM: x86/mmu: Rename kvm_flush_remote_tlbs_with_address()
  KVM: x86/MMU: Use gfn_t in kvm_flush_remote_tlbs_range()
  KVM: Allow range-based TLB invalidation from common code
  KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
  KVM: MMU: Move the TDP iterator to common code
  KVM: x86/mmu: Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c
  KVM: x86/mmu: Move is_tdp_mmu_page() to mmu_internal.h
  KVM: MMU: Move the TDP MMU to common code

Jing Zhang (1):
  KVM: selftests: Stop assuming stats are contiguous in
    kvm_binary_stats_test

 MAINTAINERS                                   |   6 +-
 arch/arm64/include/asm/kvm_host.h             |   3 -
 arch/arm64/kvm/arm.c                          |  10 +-
 arch/arm64/kvm/mmu.c                          |   2 +-
 arch/mips/include/asm/kvm_host.h              |   3 -
 arch/mips/kvm/mips.c                          |  10 +-
 arch/mips/kvm/mmu.c                           |   4 +-
 arch/riscv/include/asm/kvm_host.h             |   3 -
 arch/riscv/kvm/mmu.c                          |   8 +-
 arch/riscv/kvm/vcpu.c                         |   4 +-
 arch/x86/include/asm/kvm/mmu_types.h          | 138 ++++++
 arch/x86/include/asm/kvm/tdp_pgtable.h        |  73 ++++
 arch/x86/include/asm/kvm_host.h               | 122 +-----
 arch/x86/include/asm/pgtable_types.h          |  12 +-
 arch/x86/kvm/Kconfig                          |   1 +
 arch/x86/kvm/Makefile                         |   2 +-
 arch/x86/kvm/mmu.h                            |   5 -
 arch/x86/kvm/mmu/mmu.c                        | 409 +++++++++--------
 arch/x86/kvm/mmu/mmu_internal.h               | 221 ++--------
 arch/x86/kvm/mmu/mmutrace.h                   |  20 +-
 arch/x86/kvm/mmu/paging_tmpl.h                |  48 +-
 arch/x86/kvm/mmu/spte.c                       |   7 +-
 arch/x86/kvm/mmu/spte.h                       |  38 +-
 arch/x86/kvm/mmu/tdp_pgtable.c                | 183 ++++++++
 arch/x86/kvm/x86.c                            |  16 +-
 include/kvm/mmu.h                             |  21 +
 include/kvm/mmu_types.h                       | 179 ++++++++
 include/kvm/mmutrace.h                        |  17 +
 {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h  |  19 +-
 {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h   |  17 +-
 include/kvm/tdp_pgtable.h                     |  39 ++
 include/linux/kvm_host.h                      |  36 +-
 include/linux/kvm_types.h                     |  17 +
 include/linux/mm_types.h                      |   9 +
 .../selftests/kvm/kvm_binary_stats_test.c     |  11 +-
 virt/kvm/Kconfig                              |   3 +
 virt/kvm/Makefile.kvm                         |   3 +
 virt/kvm/kvm_main.c                           |  27 +-
 {arch/x86 => virt}/kvm/mmu/tdp_iter.c         |  24 +-
 {arch/x86 => virt}/kvm/mmu/tdp_mmu.c          | 412 ++++++++----------
 40 files changed, 1245 insertions(+), 937 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm/mmu_types.h
 create mode 100644 arch/x86/include/asm/kvm/tdp_pgtable.h
 create mode 100644 arch/x86/kvm/mmu/tdp_pgtable.c
 create mode 100644 include/kvm/mmu.h
 create mode 100644 include/kvm/mmu_types.h
 create mode 100644 include/kvm/mmutrace.h
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h (90%)
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h (87%)
 create mode 100644 include/kvm/tdp_pgtable.h
 rename {arch/x86 => virt}/kvm/mmu/tdp_iter.c (89%)
 rename {arch/x86 => virt}/kvm/mmu/tdp_mmu.c (84%)


base-commit: 89b2395859651113375101bb07cd6340b1ba3637
prerequisite-patch-id: 19dc9f47392d43a9a86c0e4f3ce1f13b62004eeb
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply	[flat|nested] 317+ messages in thread

* [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-08 19:38 ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

[ mm folks: You are being cc'd since this series includes a mm patch
  ("mm: Introduce architecture-neutral PG_LEVEL macros"), but general
  feedback is also welcome. I imagine there are a lot of lessons KVM can
  learn from mm about sharing page table code across architectures. ]

Hello,

This series refactors the KVM/x86 "TDP MMU" into common code. This is
the first step toward sharing TDP (aka Stage-2) page table management
code across architectures that support KVM. For more background on this
effort please see my talk from KVM Forum 2022 "Exploring an
architecture-neutral MMU":

  https://youtu.be/IBhW34fCFi0

By the end of this series, 90% of the TDP MMU code is in common directories
(virt/kvm/mmu/ and include/kvm/). The only pieces that remaing in
arch/x86 are code that deals with constructing/inspecting/modifying PTEs
and arch hooks to implement NX Huge Pages (a mitigation for an
Intel-specific vulnerability).

Before:

  180 arch/x86/kvm/mmu/tdp_iter.c
  118 arch/x86/kvm/mmu/tdp_iter.h
 1917 arch/x86/kvm/mmu/tdp_mmu.c
   98 arch/x86/kvm/mmu/tdp_mmu.h
 ----
 2313 total

After:

  178 virt/kvm/mmu/tdp_iter.c
 1867 virt/kvm/mmu/tdp_mmu.c
  117 include/kvm/tdp_iter.h
   78 include/kvm/tdp_mmu.h
   39 include/kvm/tdp_pgtable.h
 ----
  184 arch/x86/kvm/mmu/tdp_pgtable.c
   76 arch/x86/include/asm/kvm/tdp_pgtable.h
 ----
 2539 total

This series is very much an RFC, but it does build (I tested x86_64 and
ARM64) and pass basic testing (KVM selftests and kvm-unit-tests on
x86_64), so it is entirely functional aside from any bugs.

The main areas I would like feedback are:

 - NX Huge Pages support in the TDP MMU requires 5 arch hooks in
   the common code, which IMO makes the NX Huge Pages implementation
   harder to read. The alternative is to move the NX Huge Pages
   implementation into common code, including the fields in struct
   kvm_mmu_page and kvm_page_fault, which would increase memory usage
   a tiny bit (for non-x86 architectures) and pollute the common code
   with an x86-specific security mitigation. Ideas on better ways to
   handle this would be appreciated.

 - struct kvm_mmu_page increased by 64 bytes because the separation of
   arch and common state eliminated the ability to use unions to
   optimize the size of the struct. There's two things we can do to
   reduce the size of the struct back down: (1) dynamically allocated
   root-specific fields only for root page tables and (2) dynamically
   allocate Shadow MMU state in kvm_mmu_page_arch only for Shadow MMU
   pages. This should actually be a net *reduction* the size of
   kvm_mmu_page relative today for most pages, but I have not
   implemented it.

   Note that an alternative approach I implemented avoided this problem
   by creating an entirely separate struct for the common TDP MMU (e.g.
   struct tdp_mmu_page). This however had a lot of downsides that I
   don't think make it a good solution. Notably, it complicated a ton of
   existing code in arch/x86/kvm/mmu/mmu.c (e.g. anything that touches
   vcpu->arch.mmu->root and kvm_recover_nx_huge_pages()) and created a
   new runtime failure mode in to_shadow_page().

 - Naming. This series does not change the names of any existing code.
   So all the KVM/x86 Shadow MMU-style terminology like
   "shadow_page"/"sp"/"spte" persists. Should we keep that style in
   common code or move toward something less shadow-paging-specific?
   e.g. "page_table"/"pt"/"pte". Also do we want to keep "TDP" or switch
   to something more familiar across architectures (e.g. ARM and RISC-V
   both use "Stage-2")?

Additionally, there are some warts to be aware of. For these I think
they can be addressed in future series, since they only really matter
once we are ready to enable the common TDP MMU on a non-x86
architecture.

 - Tracepoints. For now the common MMU continues to use the x86
   tracepoints code and they are just stubbed (no-ops) for other
   architectures.

 - tdp_mmu_max_mapping_level() and tdp_mmu_max_gfn_exclusive() are
   currently arch hooks but they can probably be made common code at
   some point.

Lastly, I still need to verify that there are no negative performance
impacts of the changes in this series. My main concern is the new
tdp_pte_*() functions adds overhead from not being able to be inlined.

This series applies on top of kvm/queue commit 89b239585965 ("KVM:
x86/mmu: Pivot on "TDP MMU enabled" when handling direct page faults"),
since there are several recent series in kvm/queue that affect this
refactor. A revert of 0c2a04128f50 ("KVM: x86: remove unnecessary
exports") is also needed since it breaks the build on x86 (unrelated to
this refactor).

Thanks.

P.S. Looking to the future... This is just the first step toward
building a common TDP MMU for KVM. After this, We are looking at adding
KUnit testing to the common TDP MMU as a way to offset the risk of
sharing more code across architectures, and then targeting RISC-V as the
first non-x86 architecture to use the common TDP MMU. If any RISC-V
developer is interested in working on the port, please reach out.

David Matlack (36):
  KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  KVM: MMU: Move struct kvm_mmu_page_role into common code
  KVM: MMU: Move tdp_ptep_t into common code
  KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
  KVM: x86/mmu: Unify TDP MMU and Shadow MMU root refcounts
  KVM: MMU: Move struct kvm_mmu_page to common code
  mm: Introduce architecture-neutral PG_LEVEL macros
  KVM: Move page size stats into common code
  KVM: MMU: Move struct kvm_page_fault to common code
  KVM: MMU: Move RET_PF_* into common code
  KVM: x86/mmu: Use PG_LEVEL_{PTE,PMD,PUD} in the TDP MMU
  KVM: MMU: Move sptep_to_sp() to common code
  KVM: MMU: Introduce common macros for TDP page tables
  KVM: x86/mmu: Add a common API for inspecting/modifying TDP PTEs
  KVM: x86/mmu: Abstract away TDP MMU root lookup
  KVM: Move struct kvm_gfn_range to kvm_types.h
  KVM: x86/mmu: Add common API for creating TDP PTEs
  KVM: x86/mmu: Add arch hooks for NX Huge Pages
  KVM: x86/mmu: Abstract away computing the max mapping level
  KVM: Introduce CONFIG_HAVE_TDP_MMU
  KVM: x86: Select HAVE_TDP_MMU if X86_64
  KVM: MMU: Move VM-level TDP MMU state to struct kvm
  KVM: x86/mmu: Move kvm_mmu_hugepage_adjust() up to fault handler
  KVM: x86/mmu: Pass root role to kvm_tdp_mmu_get_vcpu_root_hpa()
  KVM: Move page table cache to struct kvm_vcpu
  KVM: MMU: Move mmu_page_header_cache to common code
  KVM: MMU: Stub out tracepoints on non-x86 architectures
  KVM: x86/mmu: Collapse kvm_flush_remote_tlbs_with_{range,address}()
    together
  KVM: x86/mmu: Rename kvm_flush_remote_tlbs_with_address()
  KVM: x86/MMU: Use gfn_t in kvm_flush_remote_tlbs_range()
  KVM: Allow range-based TLB invalidation from common code
  KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
  KVM: MMU: Move the TDP iterator to common code
  KVM: x86/mmu: Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c
  KVM: x86/mmu: Move is_tdp_mmu_page() to mmu_internal.h
  KVM: MMU: Move the TDP MMU to common code

Jing Zhang (1):
  KVM: selftests: Stop assuming stats are contiguous in
    kvm_binary_stats_test

 MAINTAINERS                                   |   6 +-
 arch/arm64/include/asm/kvm_host.h             |   3 -
 arch/arm64/kvm/arm.c                          |  10 +-
 arch/arm64/kvm/mmu.c                          |   2 +-
 arch/mips/include/asm/kvm_host.h              |   3 -
 arch/mips/kvm/mips.c                          |  10 +-
 arch/mips/kvm/mmu.c                           |   4 +-
 arch/riscv/include/asm/kvm_host.h             |   3 -
 arch/riscv/kvm/mmu.c                          |   8 +-
 arch/riscv/kvm/vcpu.c                         |   4 +-
 arch/x86/include/asm/kvm/mmu_types.h          | 138 ++++++
 arch/x86/include/asm/kvm/tdp_pgtable.h        |  73 ++++
 arch/x86/include/asm/kvm_host.h               | 122 +-----
 arch/x86/include/asm/pgtable_types.h          |  12 +-
 arch/x86/kvm/Kconfig                          |   1 +
 arch/x86/kvm/Makefile                         |   2 +-
 arch/x86/kvm/mmu.h                            |   5 -
 arch/x86/kvm/mmu/mmu.c                        | 409 +++++++++--------
 arch/x86/kvm/mmu/mmu_internal.h               | 221 ++--------
 arch/x86/kvm/mmu/mmutrace.h                   |  20 +-
 arch/x86/kvm/mmu/paging_tmpl.h                |  48 +-
 arch/x86/kvm/mmu/spte.c                       |   7 +-
 arch/x86/kvm/mmu/spte.h                       |  38 +-
 arch/x86/kvm/mmu/tdp_pgtable.c                | 183 ++++++++
 arch/x86/kvm/x86.c                            |  16 +-
 include/kvm/mmu.h                             |  21 +
 include/kvm/mmu_types.h                       | 179 ++++++++
 include/kvm/mmutrace.h                        |  17 +
 {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h  |  19 +-
 {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h   |  17 +-
 include/kvm/tdp_pgtable.h                     |  39 ++
 include/linux/kvm_host.h                      |  36 +-
 include/linux/kvm_types.h                     |  17 +
 include/linux/mm_types.h                      |   9 +
 .../selftests/kvm/kvm_binary_stats_test.c     |  11 +-
 virt/kvm/Kconfig                              |   3 +
 virt/kvm/Makefile.kvm                         |   3 +
 virt/kvm/kvm_main.c                           |  27 +-
 {arch/x86 => virt}/kvm/mmu/tdp_iter.c         |  24 +-
 {arch/x86 => virt}/kvm/mmu/tdp_mmu.c          | 412 ++++++++----------
 40 files changed, 1245 insertions(+), 937 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm/mmu_types.h
 create mode 100644 arch/x86/include/asm/kvm/tdp_pgtable.h
 create mode 100644 arch/x86/kvm/mmu/tdp_pgtable.c
 create mode 100644 include/kvm/mmu.h
 create mode 100644 include/kvm/mmu_types.h
 create mode 100644 include/kvm/mmutrace.h
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h (90%)
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h (87%)
 create mode 100644 include/kvm/tdp_pgtable.h
 rename {arch/x86 => virt}/kvm/mmu/tdp_iter.c (89%)
 rename {arch/x86 => virt}/kvm/mmu/tdp_mmu.c (84%)


base-commit: 89b2395859651113375101bb07cd6340b1ba3637
prerequisite-patch-id: 19dc9f47392d43a9a86c0e4f3ce1f13b62004eeb
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-08 19:38 ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

[ mm folks: You are being cc'd since this series includes a mm patch
  ("mm: Introduce architecture-neutral PG_LEVEL macros"), but general
  feedback is also welcome. I imagine there are a lot of lessons KVM can
  learn from mm about sharing page table code across architectures. ]

Hello,

This series refactors the KVM/x86 "TDP MMU" into common code. This is
the first step toward sharing TDP (aka Stage-2) page table management
code across architectures that support KVM. For more background on this
effort please see my talk from KVM Forum 2022 "Exploring an
architecture-neutral MMU":

  https://youtu.be/IBhW34fCFi0

By the end of this series, 90% of the TDP MMU code is in common directories
(virt/kvm/mmu/ and include/kvm/). The only pieces that remaing in
arch/x86 are code that deals with constructing/inspecting/modifying PTEs
and arch hooks to implement NX Huge Pages (a mitigation for an
Intel-specific vulnerability).

Before:

  180 arch/x86/kvm/mmu/tdp_iter.c
  118 arch/x86/kvm/mmu/tdp_iter.h
 1917 arch/x86/kvm/mmu/tdp_mmu.c
   98 arch/x86/kvm/mmu/tdp_mmu.h
 ----
 2313 total

After:

  178 virt/kvm/mmu/tdp_iter.c
 1867 virt/kvm/mmu/tdp_mmu.c
  117 include/kvm/tdp_iter.h
   78 include/kvm/tdp_mmu.h
   39 include/kvm/tdp_pgtable.h
 ----
  184 arch/x86/kvm/mmu/tdp_pgtable.c
   76 arch/x86/include/asm/kvm/tdp_pgtable.h
 ----
 2539 total

This series is very much an RFC, but it does build (I tested x86_64 and
ARM64) and pass basic testing (KVM selftests and kvm-unit-tests on
x86_64), so it is entirely functional aside from any bugs.

The main areas I would like feedback are:

 - NX Huge Pages support in the TDP MMU requires 5 arch hooks in
   the common code, which IMO makes the NX Huge Pages implementation
   harder to read. The alternative is to move the NX Huge Pages
   implementation into common code, including the fields in struct
   kvm_mmu_page and kvm_page_fault, which would increase memory usage
   a tiny bit (for non-x86 architectures) and pollute the common code
   with an x86-specific security mitigation. Ideas on better ways to
   handle this would be appreciated.

 - struct kvm_mmu_page increased by 64 bytes because the separation of
   arch and common state eliminated the ability to use unions to
   optimize the size of the struct. There's two things we can do to
   reduce the size of the struct back down: (1) dynamically allocated
   root-specific fields only for root page tables and (2) dynamically
   allocate Shadow MMU state in kvm_mmu_page_arch only for Shadow MMU
   pages. This should actually be a net *reduction* the size of
   kvm_mmu_page relative today for most pages, but I have not
   implemented it.

   Note that an alternative approach I implemented avoided this problem
   by creating an entirely separate struct for the common TDP MMU (e.g.
   struct tdp_mmu_page). This however had a lot of downsides that I
   don't think make it a good solution. Notably, it complicated a ton of
   existing code in arch/x86/kvm/mmu/mmu.c (e.g. anything that touches
   vcpu->arch.mmu->root and kvm_recover_nx_huge_pages()) and created a
   new runtime failure mode in to_shadow_page().

 - Naming. This series does not change the names of any existing code.
   So all the KVM/x86 Shadow MMU-style terminology like
   "shadow_page"/"sp"/"spte" persists. Should we keep that style in
   common code or move toward something less shadow-paging-specific?
   e.g. "page_table"/"pt"/"pte". Also do we want to keep "TDP" or switch
   to something more familiar across architectures (e.g. ARM and RISC-V
   both use "Stage-2")?

Additionally, there are some warts to be aware of. For these I think
they can be addressed in future series, since they only really matter
once we are ready to enable the common TDP MMU on a non-x86
architecture.

 - Tracepoints. For now the common MMU continues to use the x86
   tracepoints code and they are just stubbed (no-ops) for other
   architectures.

 - tdp_mmu_max_mapping_level() and tdp_mmu_max_gfn_exclusive() are
   currently arch hooks but they can probably be made common code at
   some point.

Lastly, I still need to verify that there are no negative performance
impacts of the changes in this series. My main concern is the new
tdp_pte_*() functions adds overhead from not being able to be inlined.

This series applies on top of kvm/queue commit 89b239585965 ("KVM:
x86/mmu: Pivot on "TDP MMU enabled" when handling direct page faults"),
since there are several recent series in kvm/queue that affect this
refactor. A revert of 0c2a04128f50 ("KVM: x86: remove unnecessary
exports") is also needed since it breaks the build on x86 (unrelated to
this refactor).

Thanks.

P.S. Looking to the future... This is just the first step toward
building a common TDP MMU for KVM. After this, We are looking at adding
KUnit testing to the common TDP MMU as a way to offset the risk of
sharing more code across architectures, and then targeting RISC-V as the
first non-x86 architecture to use the common TDP MMU. If any RISC-V
developer is interested in working on the port, please reach out.

David Matlack (36):
  KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  KVM: MMU: Move struct kvm_mmu_page_role into common code
  KVM: MMU: Move tdp_ptep_t into common code
  KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
  KVM: x86/mmu: Unify TDP MMU and Shadow MMU root refcounts
  KVM: MMU: Move struct kvm_mmu_page to common code
  mm: Introduce architecture-neutral PG_LEVEL macros
  KVM: Move page size stats into common code
  KVM: MMU: Move struct kvm_page_fault to common code
  KVM: MMU: Move RET_PF_* into common code
  KVM: x86/mmu: Use PG_LEVEL_{PTE,PMD,PUD} in the TDP MMU
  KVM: MMU: Move sptep_to_sp() to common code
  KVM: MMU: Introduce common macros for TDP page tables
  KVM: x86/mmu: Add a common API for inspecting/modifying TDP PTEs
  KVM: x86/mmu: Abstract away TDP MMU root lookup
  KVM: Move struct kvm_gfn_range to kvm_types.h
  KVM: x86/mmu: Add common API for creating TDP PTEs
  KVM: x86/mmu: Add arch hooks for NX Huge Pages
  KVM: x86/mmu: Abstract away computing the max mapping level
  KVM: Introduce CONFIG_HAVE_TDP_MMU
  KVM: x86: Select HAVE_TDP_MMU if X86_64
  KVM: MMU: Move VM-level TDP MMU state to struct kvm
  KVM: x86/mmu: Move kvm_mmu_hugepage_adjust() up to fault handler
  KVM: x86/mmu: Pass root role to kvm_tdp_mmu_get_vcpu_root_hpa()
  KVM: Move page table cache to struct kvm_vcpu
  KVM: MMU: Move mmu_page_header_cache to common code
  KVM: MMU: Stub out tracepoints on non-x86 architectures
  KVM: x86/mmu: Collapse kvm_flush_remote_tlbs_with_{range,address}()
    together
  KVM: x86/mmu: Rename kvm_flush_remote_tlbs_with_address()
  KVM: x86/MMU: Use gfn_t in kvm_flush_remote_tlbs_range()
  KVM: Allow range-based TLB invalidation from common code
  KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
  KVM: MMU: Move the TDP iterator to common code
  KVM: x86/mmu: Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c
  KVM: x86/mmu: Move is_tdp_mmu_page() to mmu_internal.h
  KVM: MMU: Move the TDP MMU to common code

Jing Zhang (1):
  KVM: selftests: Stop assuming stats are contiguous in
    kvm_binary_stats_test

 MAINTAINERS                                   |   6 +-
 arch/arm64/include/asm/kvm_host.h             |   3 -
 arch/arm64/kvm/arm.c                          |  10 +-
 arch/arm64/kvm/mmu.c                          |   2 +-
 arch/mips/include/asm/kvm_host.h              |   3 -
 arch/mips/kvm/mips.c                          |  10 +-
 arch/mips/kvm/mmu.c                           |   4 +-
 arch/riscv/include/asm/kvm_host.h             |   3 -
 arch/riscv/kvm/mmu.c                          |   8 +-
 arch/riscv/kvm/vcpu.c                         |   4 +-
 arch/x86/include/asm/kvm/mmu_types.h          | 138 ++++++
 arch/x86/include/asm/kvm/tdp_pgtable.h        |  73 ++++
 arch/x86/include/asm/kvm_host.h               | 122 +-----
 arch/x86/include/asm/pgtable_types.h          |  12 +-
 arch/x86/kvm/Kconfig                          |   1 +
 arch/x86/kvm/Makefile                         |   2 +-
 arch/x86/kvm/mmu.h                            |   5 -
 arch/x86/kvm/mmu/mmu.c                        | 409 +++++++++--------
 arch/x86/kvm/mmu/mmu_internal.h               | 221 ++--------
 arch/x86/kvm/mmu/mmutrace.h                   |  20 +-
 arch/x86/kvm/mmu/paging_tmpl.h                |  48 +-
 arch/x86/kvm/mmu/spte.c                       |   7 +-
 arch/x86/kvm/mmu/spte.h                       |  38 +-
 arch/x86/kvm/mmu/tdp_pgtable.c                | 183 ++++++++
 arch/x86/kvm/x86.c                            |  16 +-
 include/kvm/mmu.h                             |  21 +
 include/kvm/mmu_types.h                       | 179 ++++++++
 include/kvm/mmutrace.h                        |  17 +
 {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h  |  19 +-
 {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h   |  17 +-
 include/kvm/tdp_pgtable.h                     |  39 ++
 include/linux/kvm_host.h                      |  36 +-
 include/linux/kvm_types.h                     |  17 +
 include/linux/mm_types.h                      |   9 +
 .../selftests/kvm/kvm_binary_stats_test.c     |  11 +-
 virt/kvm/Kconfig                              |   3 +
 virt/kvm/Makefile.kvm                         |   3 +
 virt/kvm/kvm_main.c                           |  27 +-
 {arch/x86 => virt}/kvm/mmu/tdp_iter.c         |  24 +-
 {arch/x86 => virt}/kvm/mmu/tdp_mmu.c          | 412 ++++++++----------
 40 files changed, 1245 insertions(+), 937 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm/mmu_types.h
 create mode 100644 arch/x86/include/asm/kvm/tdp_pgtable.h
 create mode 100644 arch/x86/kvm/mmu/tdp_pgtable.c
 create mode 100644 include/kvm/mmu.h
 create mode 100644 include/kvm/mmu_types.h
 create mode 100644 include/kvm/mmutrace.h
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h (90%)
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h (87%)
 create mode 100644 include/kvm/tdp_pgtable.h
 rename {arch/x86 => virt}/kvm/mmu/tdp_iter.c (89%)
 rename {arch/x86 => virt}/kvm/mmu/tdp_mmu.c (84%)


base-commit: 89b2395859651113375101bb07cd6340b1ba3637
prerequisite-patch-id: 19dc9f47392d43a9a86c0e4f3ce1f13b62004eeb
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-08 19:38 ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

[ mm folks: You are being cc'd since this series includes a mm patch
  ("mm: Introduce architecture-neutral PG_LEVEL macros"), but general
  feedback is also welcome. I imagine there are a lot of lessons KVM can
  learn from mm about sharing page table code across architectures. ]

Hello,

This series refactors the KVM/x86 "TDP MMU" into common code. This is
the first step toward sharing TDP (aka Stage-2) page table management
code across architectures that support KVM. For more background on this
effort please see my talk from KVM Forum 2022 "Exploring an
architecture-neutral MMU":

  https://youtu.be/IBhW34fCFi0

By the end of this series, 90% of the TDP MMU code is in common directories
(virt/kvm/mmu/ and include/kvm/). The only pieces that remaing in
arch/x86 are code that deals with constructing/inspecting/modifying PTEs
and arch hooks to implement NX Huge Pages (a mitigation for an
Intel-specific vulnerability).

Before:

  180 arch/x86/kvm/mmu/tdp_iter.c
  118 arch/x86/kvm/mmu/tdp_iter.h
 1917 arch/x86/kvm/mmu/tdp_mmu.c
   98 arch/x86/kvm/mmu/tdp_mmu.h
 ----
 2313 total

After:

  178 virt/kvm/mmu/tdp_iter.c
 1867 virt/kvm/mmu/tdp_mmu.c
  117 include/kvm/tdp_iter.h
   78 include/kvm/tdp_mmu.h
   39 include/kvm/tdp_pgtable.h
 ----
  184 arch/x86/kvm/mmu/tdp_pgtable.c
   76 arch/x86/include/asm/kvm/tdp_pgtable.h
 ----
 2539 total

This series is very much an RFC, but it does build (I tested x86_64 and
ARM64) and pass basic testing (KVM selftests and kvm-unit-tests on
x86_64), so it is entirely functional aside from any bugs.

The main areas I would like feedback are:

 - NX Huge Pages support in the TDP MMU requires 5 arch hooks in
   the common code, which IMO makes the NX Huge Pages implementation
   harder to read. The alternative is to move the NX Huge Pages
   implementation into common code, including the fields in struct
   kvm_mmu_page and kvm_page_fault, which would increase memory usage
   a tiny bit (for non-x86 architectures) and pollute the common code
   with an x86-specific security mitigation. Ideas on better ways to
   handle this would be appreciated.

 - struct kvm_mmu_page increased by 64 bytes because the separation of
   arch and common state eliminated the ability to use unions to
   optimize the size of the struct. There's two things we can do to
   reduce the size of the struct back down: (1) dynamically allocated
   root-specific fields only for root page tables and (2) dynamically
   allocate Shadow MMU state in kvm_mmu_page_arch only for Shadow MMU
   pages. This should actually be a net *reduction* the size of
   kvm_mmu_page relative today for most pages, but I have not
   implemented it.

   Note that an alternative approach I implemented avoided this problem
   by creating an entirely separate struct for the common TDP MMU (e.g.
   struct tdp_mmu_page). This however had a lot of downsides that I
   don't think make it a good solution. Notably, it complicated a ton of
   existing code in arch/x86/kvm/mmu/mmu.c (e.g. anything that touches
   vcpu->arch.mmu->root and kvm_recover_nx_huge_pages()) and created a
   new runtime failure mode in to_shadow_page().

 - Naming. This series does not change the names of any existing code.
   So all the KVM/x86 Shadow MMU-style terminology like
   "shadow_page"/"sp"/"spte" persists. Should we keep that style in
   common code or move toward something less shadow-paging-specific?
   e.g. "page_table"/"pt"/"pte". Also do we want to keep "TDP" or switch
   to something more familiar across architectures (e.g. ARM and RISC-V
   both use "Stage-2")?

Additionally, there are some warts to be aware of. For these I think
they can be addressed in future series, since they only really matter
once we are ready to enable the common TDP MMU on a non-x86
architecture.

 - Tracepoints. For now the common MMU continues to use the x86
   tracepoints code and they are just stubbed (no-ops) for other
   architectures.

 - tdp_mmu_max_mapping_level() and tdp_mmu_max_gfn_exclusive() are
   currently arch hooks but they can probably be made common code at
   some point.

Lastly, I still need to verify that there are no negative performance
impacts of the changes in this series. My main concern is the new
tdp_pte_*() functions adds overhead from not being able to be inlined.

This series applies on top of kvm/queue commit 89b239585965 ("KVM:
x86/mmu: Pivot on "TDP MMU enabled" when handling direct page faults"),
since there are several recent series in kvm/queue that affect this
refactor. A revert of 0c2a04128f50 ("KVM: x86: remove unnecessary
exports") is also needed since it breaks the build on x86 (unrelated to
this refactor).

Thanks.

P.S. Looking to the future... This is just the first step toward
building a common TDP MMU for KVM. After this, We are looking at adding
KUnit testing to the common TDP MMU as a way to offset the risk of
sharing more code across architectures, and then targeting RISC-V as the
first non-x86 architecture to use the common TDP MMU. If any RISC-V
developer is interested in working on the port, please reach out.

David Matlack (36):
  KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  KVM: MMU: Move struct kvm_mmu_page_role into common code
  KVM: MMU: Move tdp_ptep_t into common code
  KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
  KVM: x86/mmu: Unify TDP MMU and Shadow MMU root refcounts
  KVM: MMU: Move struct kvm_mmu_page to common code
  mm: Introduce architecture-neutral PG_LEVEL macros
  KVM: Move page size stats into common code
  KVM: MMU: Move struct kvm_page_fault to common code
  KVM: MMU: Move RET_PF_* into common code
  KVM: x86/mmu: Use PG_LEVEL_{PTE,PMD,PUD} in the TDP MMU
  KVM: MMU: Move sptep_to_sp() to common code
  KVM: MMU: Introduce common macros for TDP page tables
  KVM: x86/mmu: Add a common API for inspecting/modifying TDP PTEs
  KVM: x86/mmu: Abstract away TDP MMU root lookup
  KVM: Move struct kvm_gfn_range to kvm_types.h
  KVM: x86/mmu: Add common API for creating TDP PTEs
  KVM: x86/mmu: Add arch hooks for NX Huge Pages
  KVM: x86/mmu: Abstract away computing the max mapping level
  KVM: Introduce CONFIG_HAVE_TDP_MMU
  KVM: x86: Select HAVE_TDP_MMU if X86_64
  KVM: MMU: Move VM-level TDP MMU state to struct kvm
  KVM: x86/mmu: Move kvm_mmu_hugepage_adjust() up to fault handler
  KVM: x86/mmu: Pass root role to kvm_tdp_mmu_get_vcpu_root_hpa()
  KVM: Move page table cache to struct kvm_vcpu
  KVM: MMU: Move mmu_page_header_cache to common code
  KVM: MMU: Stub out tracepoints on non-x86 architectures
  KVM: x86/mmu: Collapse kvm_flush_remote_tlbs_with_{range,address}()
    together
  KVM: x86/mmu: Rename kvm_flush_remote_tlbs_with_address()
  KVM: x86/MMU: Use gfn_t in kvm_flush_remote_tlbs_range()
  KVM: Allow range-based TLB invalidation from common code
  KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
  KVM: MMU: Move the TDP iterator to common code
  KVM: x86/mmu: Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c
  KVM: x86/mmu: Move is_tdp_mmu_page() to mmu_internal.h
  KVM: MMU: Move the TDP MMU to common code

Jing Zhang (1):
  KVM: selftests: Stop assuming stats are contiguous in
    kvm_binary_stats_test

 MAINTAINERS                                   |   6 +-
 arch/arm64/include/asm/kvm_host.h             |   3 -
 arch/arm64/kvm/arm.c                          |  10 +-
 arch/arm64/kvm/mmu.c                          |   2 +-
 arch/mips/include/asm/kvm_host.h              |   3 -
 arch/mips/kvm/mips.c                          |  10 +-
 arch/mips/kvm/mmu.c                           |   4 +-
 arch/riscv/include/asm/kvm_host.h             |   3 -
 arch/riscv/kvm/mmu.c                          |   8 +-
 arch/riscv/kvm/vcpu.c                         |   4 +-
 arch/x86/include/asm/kvm/mmu_types.h          | 138 ++++++
 arch/x86/include/asm/kvm/tdp_pgtable.h        |  73 ++++
 arch/x86/include/asm/kvm_host.h               | 122 +-----
 arch/x86/include/asm/pgtable_types.h          |  12 +-
 arch/x86/kvm/Kconfig                          |   1 +
 arch/x86/kvm/Makefile                         |   2 +-
 arch/x86/kvm/mmu.h                            |   5 -
 arch/x86/kvm/mmu/mmu.c                        | 409 +++++++++--------
 arch/x86/kvm/mmu/mmu_internal.h               | 221 ++--------
 arch/x86/kvm/mmu/mmutrace.h                   |  20 +-
 arch/x86/kvm/mmu/paging_tmpl.h                |  48 +-
 arch/x86/kvm/mmu/spte.c                       |   7 +-
 arch/x86/kvm/mmu/spte.h                       |  38 +-
 arch/x86/kvm/mmu/tdp_pgtable.c                | 183 ++++++++
 arch/x86/kvm/x86.c                            |  16 +-
 include/kvm/mmu.h                             |  21 +
 include/kvm/mmu_types.h                       | 179 ++++++++
 include/kvm/mmutrace.h                        |  17 +
 {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h  |  19 +-
 {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h   |  17 +-
 include/kvm/tdp_pgtable.h                     |  39 ++
 include/linux/kvm_host.h                      |  36 +-
 include/linux/kvm_types.h                     |  17 +
 include/linux/mm_types.h                      |   9 +
 .../selftests/kvm/kvm_binary_stats_test.c     |  11 +-
 virt/kvm/Kconfig                              |   3 +
 virt/kvm/Makefile.kvm                         |   3 +
 virt/kvm/kvm_main.c                           |  27 +-
 {arch/x86 => virt}/kvm/mmu/tdp_iter.c         |  24 +-
 {arch/x86 => virt}/kvm/mmu/tdp_mmu.c          | 412 ++++++++----------
 40 files changed, 1245 insertions(+), 937 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm/mmu_types.h
 create mode 100644 arch/x86/include/asm/kvm/tdp_pgtable.h
 create mode 100644 arch/x86/kvm/mmu/tdp_pgtable.c
 create mode 100644 include/kvm/mmu.h
 create mode 100644 include/kvm/mmu_types.h
 create mode 100644 include/kvm/mmutrace.h
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h (90%)
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h (87%)
 create mode 100644 include/kvm/tdp_pgtable.h
 rename {arch/x86 => virt}/kvm/mmu/tdp_iter.c (89%)
 rename {arch/x86 => virt}/kvm/mmu/tdp_mmu.c (84%)


base-commit: 89b2395859651113375101bb07cd6340b1ba3637
prerequisite-patch-id: 19dc9f47392d43a9a86c0e4f3ce1f13b62004eeb
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
directly as the address space ID throughout the KVM MMU code. This
eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
prepares for making kvm_mmu_page_role architecture-neutral.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/mmu/mmu.c          |  6 +++---
 arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
 arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
 5 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa4eb8cfcd7e..0a819d40131a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -348,7 +348,7 @@ union kvm_mmu_page_role {
 		 * simple shift.  While there is room, give it a whole
 		 * byte so it is also faster to load it from memory.
 		 */
-		unsigned smm:8;
+		unsigned as_id:8;
 	};
 };
 
@@ -2056,7 +2056,7 @@ enum {
 # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
 # define KVM_ADDRESS_SPACE_NUM 2
 # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
-# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
+# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
 #else
 # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4d188f056933..f375b719f565 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
 	union kvm_cpu_role role = {0};
 
 	role.base.access = ACC_ALL;
-	role.base.smm = is_smm(vcpu);
+	role.base.as_id = is_smm(vcpu);
 	role.base.guest_mode = is_guest_mode(vcpu);
 	role.ext.valid = 1;
 
@@ -5112,7 +5112,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
 	role.access = ACC_ALL;
 	role.cr0_wp = true;
 	role.efer_nx = true;
-	role.smm = cpu_role.base.smm;
+	role.as_id = cpu_role.base.as_id;
 	role.guest_mode = cpu_role.base.guest_mode;
 	role.ad_disabled = !kvm_ad_enabled();
 	role.level = kvm_mmu_get_tdp_level(vcpu);
@@ -5233,7 +5233,7 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
 
 	/*
 	 * KVM does not support SMM transfer monitors, and consequently does not
-	 * support the "entry to SMM" control either.  role.base.smm is always 0.
+	 * support the "entry to SMM" control either.  role.base.as_id is always 0.
 	 */
 	WARN_ON_ONCE(is_smm(vcpu));
 	role.base.level = level;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ac00bfbf32f6..5427f65117b4 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -133,16 +133,6 @@ struct kvm_mmu_page {
 
 extern struct kmem_cache *mmu_page_header_cache;
 
-static inline int kvm_mmu_role_as_id(union kvm_mmu_page_role role)
-{
-	return role.smm ? 1 : 0;
-}
-
-static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp)
-{
-	return kvm_mmu_role_as_id(sp->role);
-}
-
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 {
 	/*
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index 39b48e7d7d1a..4a7d58bf81c4 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -52,7 +52,7 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 	iter->root_level = root_level;
 	iter->min_level = min_level;
 	iter->pt_path[iter->root_level - 1] = (tdp_ptep_t)root->spt;
-	iter->as_id = kvm_mmu_page_as_id(root);
+	iter->as_id = root->role.as_id;
 
 	tdp_iter_restart(iter);
 }
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 764f7c87286f..7ccac1aa8df6 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -237,7 +237,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 	     _root;								\
 	     _root = tdp_mmu_next_root(_kvm, _root, _shared, _only_valid))	\
 		if (kvm_lockdep_assert_mmu_lock_held(_kvm, _shared) &&		\
-		    kvm_mmu_page_as_id(_root) != _as_id) {			\
+		    _root->role.as_id != _as_id) {			\
 		} else
 
 #define for_each_valid_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared)	\
@@ -256,7 +256,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 #define for_each_tdp_mmu_root(_kvm, _root, _as_id)			\
 	list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link)	\
 		if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) &&	\
-		    kvm_mmu_page_as_id(_root) != _as_id) {		\
+		    _root->role.as_id != _as_id) {		\
 		} else
 
 static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
@@ -310,7 +310,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 	 * Check for an existing root before allocating a new one.  Note, the
 	 * role check prevents consuming an invalid root.
 	 */
-	for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) {
+	for_each_tdp_mmu_root(kvm, root, role.as_id) {
 		if (root->role.word == role.word &&
 		    kvm_tdp_mmu_get_root(root))
 			goto out;
@@ -496,8 +496,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte,
 							  REMOVED_SPTE, level);
 		}
-		handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
-				    old_spte, REMOVED_SPTE, level, shared);
+		handle_changed_spte(kvm, sp->role.as_id, gfn, old_spte,
+				    REMOVED_SPTE, level, shared);
 	}
 
 	call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
@@ -923,7 +923,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 	if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
 		return false;
 
-	__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
+	__tdp_mmu_set_spte(kvm, sp->role.as_id, sp->ptep, old_spte, 0,
 			   sp->gfn, sp->role.level + 1, true, true);
 
 	return true;
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
directly as the address space ID throughout the KVM MMU code. This
eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
prepares for making kvm_mmu_page_role architecture-neutral.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/mmu/mmu.c          |  6 +++---
 arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
 arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
 5 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa4eb8cfcd7e..0a819d40131a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -348,7 +348,7 @@ union kvm_mmu_page_role {
 		 * simple shift.  While there is room, give it a whole
 		 * byte so it is also faster to load it from memory.
 		 */
-		unsigned smm:8;
+		unsigned as_id:8;
 	};
 };
 
@@ -2056,7 +2056,7 @@ enum {
 # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
 # define KVM_ADDRESS_SPACE_NUM 2
 # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
-# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
+# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
 #else
 # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4d188f056933..f375b719f565 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
 	union kvm_cpu_role role = {0};
 
 	role.base.access = ACC_ALL;
-	role.base.smm = is_smm(vcpu);
+	role.base.as_id = is_smm(vcpu);
 	role.base.guest_mode = is_guest_mode(vcpu);
 	role.ext.valid = 1;
 
@@ -5112,7 +5112,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
 	role.access = ACC_ALL;
 	role.cr0_wp = true;
 	role.efer_nx = true;
-	role.smm = cpu_role.base.smm;
+	role.as_id = cpu_role.base.as_id;
 	role.guest_mode = cpu_role.base.guest_mode;
 	role.ad_disabled = !kvm_ad_enabled();
 	role.level = kvm_mmu_get_tdp_level(vcpu);
@@ -5233,7 +5233,7 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
 
 	/*
 	 * KVM does not support SMM transfer monitors, and consequently does not
-	 * support the "entry to SMM" control either.  role.base.smm is always 0.
+	 * support the "entry to SMM" control either.  role.base.as_id is always 0.
 	 */
 	WARN_ON_ONCE(is_smm(vcpu));
 	role.base.level = level;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ac00bfbf32f6..5427f65117b4 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -133,16 +133,6 @@ struct kvm_mmu_page {
 
 extern struct kmem_cache *mmu_page_header_cache;
 
-static inline int kvm_mmu_role_as_id(union kvm_mmu_page_role role)
-{
-	return role.smm ? 1 : 0;
-}
-
-static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp)
-{
-	return kvm_mmu_role_as_id(sp->role);
-}
-
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 {
 	/*
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index 39b48e7d7d1a..4a7d58bf81c4 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -52,7 +52,7 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 	iter->root_level = root_level;
 	iter->min_level = min_level;
 	iter->pt_path[iter->root_level - 1] = (tdp_ptep_t)root->spt;
-	iter->as_id = kvm_mmu_page_as_id(root);
+	iter->as_id = root->role.as_id;
 
 	tdp_iter_restart(iter);
 }
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 764f7c87286f..7ccac1aa8df6 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -237,7 +237,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 	     _root;								\
 	     _root = tdp_mmu_next_root(_kvm, _root, _shared, _only_valid))	\
 		if (kvm_lockdep_assert_mmu_lock_held(_kvm, _shared) &&		\
-		    kvm_mmu_page_as_id(_root) != _as_id) {			\
+		    _root->role.as_id != _as_id) {			\
 		} else
 
 #define for_each_valid_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared)	\
@@ -256,7 +256,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 #define for_each_tdp_mmu_root(_kvm, _root, _as_id)			\
 	list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link)	\
 		if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) &&	\
-		    kvm_mmu_page_as_id(_root) != _as_id) {		\
+		    _root->role.as_id != _as_id) {		\
 		} else
 
 static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
@@ -310,7 +310,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 	 * Check for an existing root before allocating a new one.  Note, the
 	 * role check prevents consuming an invalid root.
 	 */
-	for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) {
+	for_each_tdp_mmu_root(kvm, root, role.as_id) {
 		if (root->role.word == role.word &&
 		    kvm_tdp_mmu_get_root(root))
 			goto out;
@@ -496,8 +496,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte,
 							  REMOVED_SPTE, level);
 		}
-		handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
-				    old_spte, REMOVED_SPTE, level, shared);
+		handle_changed_spte(kvm, sp->role.as_id, gfn, old_spte,
+				    REMOVED_SPTE, level, shared);
 	}
 
 	call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
@@ -923,7 +923,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 	if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
 		return false;
 
-	__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
+	__tdp_mmu_set_spte(kvm, sp->role.as_id, sp->ptep, old_spte, 0,
 			   sp->gfn, sp->role.level + 1, true, true);
 
 	return true;
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
directly as the address space ID throughout the KVM MMU code. This
eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
prepares for making kvm_mmu_page_role architecture-neutral.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/mmu/mmu.c          |  6 +++---
 arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
 arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
 5 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa4eb8cfcd7e..0a819d40131a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -348,7 +348,7 @@ union kvm_mmu_page_role {
 		 * simple shift.  While there is room, give it a whole
 		 * byte so it is also faster to load it from memory.
 		 */
-		unsigned smm:8;
+		unsigned as_id:8;
 	};
 };
 
@@ -2056,7 +2056,7 @@ enum {
 # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
 # define KVM_ADDRESS_SPACE_NUM 2
 # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
-# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
+# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
 #else
 # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4d188f056933..f375b719f565 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
 	union kvm_cpu_role role = {0};
 
 	role.base.access = ACC_ALL;
-	role.base.smm = is_smm(vcpu);
+	role.base.as_id = is_smm(vcpu);
 	role.base.guest_mode = is_guest_mode(vcpu);
 	role.ext.valid = 1;
 
@@ -5112,7 +5112,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
 	role.access = ACC_ALL;
 	role.cr0_wp = true;
 	role.efer_nx = true;
-	role.smm = cpu_role.base.smm;
+	role.as_id = cpu_role.base.as_id;
 	role.guest_mode = cpu_role.base.guest_mode;
 	role.ad_disabled = !kvm_ad_enabled();
 	role.level = kvm_mmu_get_tdp_level(vcpu);
@@ -5233,7 +5233,7 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
 
 	/*
 	 * KVM does not support SMM transfer monitors, and consequently does not
-	 * support the "entry to SMM" control either.  role.base.smm is always 0.
+	 * support the "entry to SMM" control either.  role.base.as_id is always 0.
 	 */
 	WARN_ON_ONCE(is_smm(vcpu));
 	role.base.level = level;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ac00bfbf32f6..5427f65117b4 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -133,16 +133,6 @@ struct kvm_mmu_page {
 
 extern struct kmem_cache *mmu_page_header_cache;
 
-static inline int kvm_mmu_role_as_id(union kvm_mmu_page_role role)
-{
-	return role.smm ? 1 : 0;
-}
-
-static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp)
-{
-	return kvm_mmu_role_as_id(sp->role);
-}
-
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 {
 	/*
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index 39b48e7d7d1a..4a7d58bf81c4 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -52,7 +52,7 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 	iter->root_level = root_level;
 	iter->min_level = min_level;
 	iter->pt_path[iter->root_level - 1] = (tdp_ptep_t)root->spt;
-	iter->as_id = kvm_mmu_page_as_id(root);
+	iter->as_id = root->role.as_id;
 
 	tdp_iter_restart(iter);
 }
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 764f7c87286f..7ccac1aa8df6 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -237,7 +237,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 	     _root;								\
 	     _root = tdp_mmu_next_root(_kvm, _root, _shared, _only_valid))	\
 		if (kvm_lockdep_assert_mmu_lock_held(_kvm, _shared) &&		\
-		    kvm_mmu_page_as_id(_root) != _as_id) {			\
+		    _root->role.as_id != _as_id) {			\
 		} else
 
 #define for_each_valid_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared)	\
@@ -256,7 +256,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 #define for_each_tdp_mmu_root(_kvm, _root, _as_id)			\
 	list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link)	\
 		if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) &&	\
-		    kvm_mmu_page_as_id(_root) != _as_id) {		\
+		    _root->role.as_id != _as_id) {		\
 		} else
 
 static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
@@ -310,7 +310,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 	 * Check for an existing root before allocating a new one.  Note, the
 	 * role check prevents consuming an invalid root.
 	 */
-	for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) {
+	for_each_tdp_mmu_root(kvm, root, role.as_id) {
 		if (root->role.word == role.word &&
 		    kvm_tdp_mmu_get_root(root))
 			goto out;
@@ -496,8 +496,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte,
 							  REMOVED_SPTE, level);
 		}
-		handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
-				    old_spte, REMOVED_SPTE, level, shared);
+		handle_changed_spte(kvm, sp->role.as_id, gfn, old_spte,
+				    REMOVED_SPTE, level, shared);
 	}
 
 	call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
@@ -923,7 +923,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 	if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
 		return false;
 
-	__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
+	__tdp_mmu_set_spte(kvm, sp->role.as_id, sp->ptep, old_spte, 0,
 			   sp->gfn, sp->role.level + 1, true, true);
 
 	return true;
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
directly as the address space ID throughout the KVM MMU code. This
eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
prepares for making kvm_mmu_page_role architecture-neutral.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/mmu/mmu.c          |  6 +++---
 arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
 arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
 5 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa4eb8cfcd7e..0a819d40131a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -348,7 +348,7 @@ union kvm_mmu_page_role {
 		 * simple shift.  While there is room, give it a whole
 		 * byte so it is also faster to load it from memory.
 		 */
-		unsigned smm:8;
+		unsigned as_id:8;
 	};
 };
 
@@ -2056,7 +2056,7 @@ enum {
 # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
 # define KVM_ADDRESS_SPACE_NUM 2
 # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
-# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
+# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
 #else
 # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4d188f056933..f375b719f565 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
 	union kvm_cpu_role role = {0};
 
 	role.base.access = ACC_ALL;
-	role.base.smm = is_smm(vcpu);
+	role.base.as_id = is_smm(vcpu);
 	role.base.guest_mode = is_guest_mode(vcpu);
 	role.ext.valid = 1;
 
@@ -5112,7 +5112,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
 	role.access = ACC_ALL;
 	role.cr0_wp = true;
 	role.efer_nx = true;
-	role.smm = cpu_role.base.smm;
+	role.as_id = cpu_role.base.as_id;
 	role.guest_mode = cpu_role.base.guest_mode;
 	role.ad_disabled = !kvm_ad_enabled();
 	role.level = kvm_mmu_get_tdp_level(vcpu);
@@ -5233,7 +5233,7 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
 
 	/*
 	 * KVM does not support SMM transfer monitors, and consequently does not
-	 * support the "entry to SMM" control either.  role.base.smm is always 0.
+	 * support the "entry to SMM" control either.  role.base.as_id is always 0.
 	 */
 	WARN_ON_ONCE(is_smm(vcpu));
 	role.base.level = level;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ac00bfbf32f6..5427f65117b4 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -133,16 +133,6 @@ struct kvm_mmu_page {
 
 extern struct kmem_cache *mmu_page_header_cache;
 
-static inline int kvm_mmu_role_as_id(union kvm_mmu_page_role role)
-{
-	return role.smm ? 1 : 0;
-}
-
-static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp)
-{
-	return kvm_mmu_role_as_id(sp->role);
-}
-
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 {
 	/*
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index 39b48e7d7d1a..4a7d58bf81c4 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -52,7 +52,7 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 	iter->root_level = root_level;
 	iter->min_level = min_level;
 	iter->pt_path[iter->root_level - 1] = (tdp_ptep_t)root->spt;
-	iter->as_id = kvm_mmu_page_as_id(root);
+	iter->as_id = root->role.as_id;
 
 	tdp_iter_restart(iter);
 }
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 764f7c87286f..7ccac1aa8df6 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -237,7 +237,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 	     _root;								\
 	     _root = tdp_mmu_next_root(_kvm, _root, _shared, _only_valid))	\
 		if (kvm_lockdep_assert_mmu_lock_held(_kvm, _shared) &&		\
-		    kvm_mmu_page_as_id(_root) != _as_id) {			\
+		    _root->role.as_id != _as_id) {			\
 		} else
 
 #define for_each_valid_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _shared)	\
@@ -256,7 +256,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 #define for_each_tdp_mmu_root(_kvm, _root, _as_id)			\
 	list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link)	\
 		if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) &&	\
-		    kvm_mmu_page_as_id(_root) != _as_id) {		\
+		    _root->role.as_id != _as_id) {		\
 		} else
 
 static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
@@ -310,7 +310,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 	 * Check for an existing root before allocating a new one.  Note, the
 	 * role check prevents consuming an invalid root.
 	 */
-	for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) {
+	for_each_tdp_mmu_root(kvm, root, role.as_id) {
 		if (root->role.word == role.word &&
 		    kvm_tdp_mmu_get_root(root))
 			goto out;
@@ -496,8 +496,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte,
 							  REMOVED_SPTE, level);
 		}
-		handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
-				    old_spte, REMOVED_SPTE, level, shared);
+		handle_changed_spte(kvm, sp->role.as_id, gfn, old_spte,
+				    REMOVED_SPTE, level, shared);
 	}
 
 	call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
@@ -923,7 +923,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 	if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
 		return false;
 
-	__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
+	__tdp_mmu_set_spte(kvm, sp->role.as_id, sp->ptep, old_spte, 0,
 			   sp->gfn, sp->role.level + 1, true, true);
 
 	return true;
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move struct kvm_mmu_page_role into common code, and move all
x86-specific fields into a separate sub-struct within the role,
kvm_mmu_page_role_arch.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 MAINTAINERS                          |   4 +-
 arch/x86/include/asm/kvm/mmu_types.h |  56 ++++++++++
 arch/x86/include/asm/kvm_host.h      |  68 +-----------
 arch/x86/kvm/mmu/mmu.c               | 156 +++++++++++++--------------
 arch/x86/kvm/mmu/mmu_internal.h      |   4 +-
 arch/x86/kvm/mmu/mmutrace.h          |  12 +--
 arch/x86/kvm/mmu/paging_tmpl.h       |  20 ++--
 arch/x86/kvm/mmu/spte.c              |   4 +-
 arch/x86/kvm/mmu/spte.h              |   2 +-
 arch/x86/kvm/x86.c                   |   8 +-
 include/kvm/mmu_types.h              |  37 +++++++
 11 files changed, 202 insertions(+), 169 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm/mmu_types.h
 create mode 100644 include/kvm/mmu_types.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 89672a59c0c3..7e586d7ba78c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11198,7 +11198,8 @@ W:	http://www.linux-kvm.org
 T:	git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
 F:	Documentation/virt/kvm/
 F:	include/asm-generic/kvm*
-F:	include/kvm/iodev.h
+F:	include/kvm/
+X:	include/kvm/arm_*
 F:	include/linux/kvm*
 F:	include/trace/events/kvm.h
 F:	include/uapi/asm-generic/kvm*
@@ -11285,6 +11286,7 @@ L:	kvm@vger.kernel.org
 S:	Supported
 T:	git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
 F:	arch/x86/include/asm/kvm*
+F:	arch/x86/include/asm/kvm/
 F:	arch/x86/include/asm/svm.h
 F:	arch/x86/include/asm/vmx*.h
 F:	arch/x86/include/uapi/asm/kvm*
diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
new file mode 100644
index 000000000000..35f893ebab5a
--- /dev/null
+++ b/arch/x86/include/asm/kvm/mmu_types.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KVM_MMU_TYPES_H
+#define __ASM_KVM_MMU_TYPES_H
+
+#include <linux/types.h>
+
+/*
+ * This is a subset of the overall kvm_cpu_role to minimize the size of
+ * kvm_memory_slot.arch.gfn_track, i.e. allows allocating 2 bytes per gfn
+ * instead of 4 bytes per gfn.
+ *
+ * Upper-level shadow pages having gptes are tracked for write-protection via
+ * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
+ * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
+ * gfn_track will overflow and explosions will ensure.
+ *
+ * A unique shadow page (SP) for a gfn is created if and only if an existing SP
+ * cannot be reused.  The ability to reuse a SP is tracked by its role, which
+ * incorporates various mode bits and properties of the SP.  Roughly speaking,
+ * the number of unique SPs that can theoretically be created is 2^n, where n
+ * is the number of bits that are used to compute the role.
+ *
+ * Note, not all combinations of modes and flags below are possible:
+ *
+ *   - invalid shadow pages are not accounted, so the bits are effectively 18
+ *
+ *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
+ *     execonly and ad_disabled are only used for nested EPT which has
+ *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
+ *
+ *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
+ *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
+ *     paging has exactly one upper level, making level completely redundant
+ *     when has_4_byte_gpte=1.
+ *
+ *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
+ *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
+ *
+ * Therefore, the maximum number of possible upper-level shadow pages for a
+ * single gfn is a bit less than 2^13.
+ */
+struct kvm_mmu_page_role_arch {
+	u16 has_4_byte_gpte:1;
+	u16 quadrant:2;
+	u16 direct:1;
+	u16 access:3;
+	u16 efer_nx:1;
+	u16 cr0_wp:1;
+	u16 smep_andnot_wp:1;
+	u16 smap_andnot_wp:1;
+	u16 ad_disabled:1;
+	u16 guest_mode:1;
+	u16 passthrough:1;
+};
+
+#endif /* !__ASM_KVM_MMU_TYPES_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0a819d40131a..ebcd7a0dabef 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -37,6 +37,8 @@
 #include <asm/kvm_vcpu_regs.h>
 #include <asm/hyperv-tlfs.h>
 
+#include <kvm/mmu_types.h>
+
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
 #define KVM_MAX_VCPUS 1024
@@ -286,72 +288,6 @@ enum x86_intercept_stage;
 
 struct kvm_kernel_irq_routing_entry;
 
-/*
- * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
- * also includes TDP pages) to determine whether or not a page can be used in
- * the given MMU context.  This is a subset of the overall kvm_cpu_role to
- * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
- * 2 bytes per gfn instead of 4 bytes per gfn.
- *
- * Upper-level shadow pages having gptes are tracked for write-protection via
- * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
- * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
- * gfn_track will overflow and explosions will ensure.
- *
- * A unique shadow page (SP) for a gfn is created if and only if an existing SP
- * cannot be reused.  The ability to reuse a SP is tracked by its role, which
- * incorporates various mode bits and properties of the SP.  Roughly speaking,
- * the number of unique SPs that can theoretically be created is 2^n, where n
- * is the number of bits that are used to compute the role.
- *
- * But, even though there are 19 bits in the mask below, not all combinations
- * of modes and flags are possible:
- *
- *   - invalid shadow pages are not accounted, so the bits are effectively 18
- *
- *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
- *     execonly and ad_disabled are only used for nested EPT which has
- *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
- *
- *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
- *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
- *     paging has exactly one upper level, making level completely redundant
- *     when has_4_byte_gpte=1.
- *
- *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
- *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
- *
- * Therefore, the maximum number of possible upper-level shadow pages for a
- * single gfn is a bit less than 2^13.
- */
-union kvm_mmu_page_role {
-	u32 word;
-	struct {
-		unsigned level:4;
-		unsigned has_4_byte_gpte:1;
-		unsigned quadrant:2;
-		unsigned direct:1;
-		unsigned access:3;
-		unsigned invalid:1;
-		unsigned efer_nx:1;
-		unsigned cr0_wp:1;
-		unsigned smep_andnot_wp:1;
-		unsigned smap_andnot_wp:1;
-		unsigned ad_disabled:1;
-		unsigned guest_mode:1;
-		unsigned passthrough:1;
-		unsigned :5;
-
-		/*
-		 * This is left at the top of the word so that
-		 * kvm_memslots_for_spte_role can extract it with a
-		 * simple shift.  While there is room, give it a whole
-		 * byte so it is also faster to load it from memory.
-		 */
-		unsigned as_id:8;
-	};
-};
-
 /*
  * kvm_mmu_extended_role complements kvm_mmu_page_role, tracking properties
  * relevant to the current MMU configuration.   When loading CR0, CR4, or EFER,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f375b719f565..355548603960 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -210,13 +210,13 @@ static inline bool __maybe_unused is_##reg##_##name(struct kvm_mmu *mmu)	\
 {								\
 	return !!(mmu->cpu_role. base_or_ext . reg##_##name);	\
 }
-BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
+BUILD_MMU_ROLE_ACCESSOR(base.arch, cr0, wp);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pse);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smep);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smap);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
-BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
+BUILD_MMU_ROLE_ACCESSOR(base.arch, efer, nx);
 BUILD_MMU_ROLE_ACCESSOR(ext,  efer, lma);
 
 static inline bool is_cr0_pg(struct kvm_mmu *mmu)
@@ -226,7 +226,7 @@ static inline bool is_cr0_pg(struct kvm_mmu *mmu)
 
 static inline bool is_cr4_pae(struct kvm_mmu *mmu)
 {
-        return !mmu->cpu_role.base.has_4_byte_gpte;
+	return !mmu->cpu_role.base.arch.has_4_byte_gpte;
 }
 
 static struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
@@ -618,7 +618,7 @@ static bool mmu_spte_age(u64 *sptep)
 
 static inline bool is_tdp_mmu_active(struct kvm_vcpu *vcpu)
 {
-	return tdp_mmu_enabled && vcpu->arch.mmu->root_role.direct;
+	return tdp_mmu_enabled && vcpu->arch.mmu->root_role.arch.direct;
 }
 
 static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
@@ -695,10 +695,10 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp);
 
 static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 {
-	if (sp->role.passthrough)
+	if (sp->role.arch.passthrough)
 		return sp->gfn;
 
-	if (!sp->role.direct)
+	if (!sp->role.arch.direct)
 		return sp->shadowed_translation[index] >> PAGE_SHIFT;
 
 	return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
@@ -727,7 +727,7 @@ static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
 	 *
 	 * In both cases, sp->role.access contains the correct access bits.
 	 */
-	return sp->role.access;
+	return sp->role.arch.access;
 }
 
 static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
@@ -739,14 +739,14 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
 	}
 
 	WARN_ONCE(access != kvm_mmu_page_get_access(sp, index),
-	          "access mismatch under %s page %llx (expected %u, got %u)\n",
-	          sp->role.passthrough ? "passthrough" : "direct",
-	          sp->gfn, kvm_mmu_page_get_access(sp, index), access);
+		  "access mismatch under %s page %llx (expected %u, got %u)\n",
+		  sp->role.arch.passthrough ? "passthrough" : "direct",
+		  sp->gfn, kvm_mmu_page_get_access(sp, index), access);
 
 	WARN_ONCE(gfn != kvm_mmu_page_get_gfn(sp, index),
-	          "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
-	          sp->role.passthrough ? "passthrough" : "direct",
-	          sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
+		  "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
+		  sp->role.arch.passthrough ? "passthrough" : "direct",
+		  sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
 }
 
 static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
@@ -1723,7 +1723,7 @@ static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
 	hlist_del(&sp->hash_link);
 	list_del(&sp->link);
 	free_page((unsigned long)sp->spt);
-	if (!sp->role.direct)
+	if (!sp->role.arch.direct)
 		free_page((unsigned long)sp->shadowed_translation);
 	kmem_cache_free(mmu_page_header_cache, sp);
 }
@@ -1884,10 +1884,10 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 
 static bool sp_has_gptes(struct kvm_mmu_page *sp)
 {
-	if (sp->role.direct)
+	if (sp->role.arch.direct)
 		return false;
 
-	if (sp->role.passthrough)
+	if (sp->role.arch.passthrough)
 		return false;
 
 	return true;
@@ -2065,7 +2065,7 @@ static void clear_sp_write_flooding_count(u64 *spte)
  * The vCPU is required when finding indirect shadow pages; the shadow
  * page may already exist and syncing it needs the vCPU pointer in
  * order to read guest page tables.  Direct shadow pages are never
- * unsync, thus @vcpu can be NULL if @role.direct is true.
+ * unsync, thus @vcpu can be NULL if @role.arch.direct is true.
  */
 static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 						     struct kvm_vcpu *vcpu,
@@ -2101,7 +2101,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 		}
 
 		/* unsync and write-flooding only apply to indirect SPs. */
-		if (sp->role.direct)
+		if (sp->role.arch.direct)
 			goto out;
 
 		if (sp->unsync) {
@@ -2162,7 +2162,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 
 	sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
 	sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
-	if (!role.direct)
+	if (!role.arch.direct)
 		sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
 
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
@@ -2187,7 +2187,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 	return sp;
 }
 
-/* Note, @vcpu may be NULL if @role.direct is true; see kvm_mmu_find_shadow_page. */
+/* Note, @vcpu may be NULL if @role.arch.direct is true; see kvm_mmu_find_shadow_page. */
 static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
 						      struct kvm_vcpu *vcpu,
 						      struct shadow_page_caches *caches,
@@ -2231,9 +2231,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
 
 	role = parent_sp->role;
 	role.level--;
-	role.access = access;
-	role.direct = direct;
-	role.passthrough = 0;
+	role.arch.access = access;
+	role.arch.direct = direct;
+	role.arch.passthrough = 0;
 
 	/*
 	 * If the guest has 4-byte PTEs then that means it's using 32-bit,
@@ -2261,9 +2261,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
 	 * covers bit 21 (see above), thus the quadrant is calculated from the
 	 * _least_ significant bit of the PDE index.
 	 */
-	if (role.has_4_byte_gpte) {
+	if (role.arch.has_4_byte_gpte) {
 		WARN_ON_ONCE(role.level != PG_LEVEL_4K);
-		role.quadrant = spte_index(sptep) & 1;
+		role.arch.quadrant = spte_index(sptep) & 1;
 	}
 
 	return role;
@@ -2292,7 +2292,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
 
 	if (iterator->level >= PT64_ROOT_4LEVEL &&
 	    vcpu->arch.mmu->cpu_role.base.level < PT64_ROOT_4LEVEL &&
-	    !vcpu->arch.mmu->root_role.direct)
+	    !vcpu->arch.mmu->root_role.arch.direct)
 		iterator->level = PT32E_ROOT_LEVEL;
 
 	if (iterator->level == PT32E_ROOT_LEVEL) {
@@ -2391,7 +2391,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 * a new sp with the correct access.
 		 */
 		child = spte_to_child_sp(*sptep);
-		if (child->role.access == direct_access)
+		if (child->role.arch.access == direct_access)
 			return;
 
 		drop_parent_pte(child, sptep);
@@ -2420,7 +2420,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 			 * avoids retaining a large number of stale nested SPs.
 			 */
 			if (tdp_enabled && invalid_list &&
-			    child->role.guest_mode && !child->parent_ptes.val)
+			    child->role.arch.guest_mode && !child->parent_ptes.val)
 				return kvm_mmu_prepare_zap_page(kvm, child,
 								invalid_list);
 		}
@@ -2689,7 +2689,7 @@ static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
 	gpa_t gpa;
 	int r;
 
-	if (vcpu->arch.mmu->root_role.direct)
+	if (vcpu->arch.mmu->root_role.arch.direct)
 		return 0;
 
 	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
@@ -2900,7 +2900,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 {
 	struct page *pages[PTE_PREFETCH_NUM];
 	struct kvm_memory_slot *slot;
-	unsigned int access = sp->role.access;
+	unsigned int access = sp->role.arch.access;
 	int i, ret;
 	gfn_t gfn;
 
@@ -2928,7 +2928,7 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
 	u64 *spte, *start = NULL;
 	int i;
 
-	WARN_ON(!sp->role.direct);
+	WARN_ON(!sp->role.arch.direct);
 
 	i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
 	spte = sp->spt + i;
@@ -3549,7 +3549,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
 	 * This should not be called while L2 is active, L2 can't invalidate
 	 * _only_ its own roots, e.g. INVVPID unconditionally exits.
 	 */
-	WARN_ON_ONCE(mmu->root_role.guest_mode);
+	WARN_ON_ONCE(mmu->root_role.arch.guest_mode);
 
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
 		root_hpa = mmu->prev_roots[i].hpa;
@@ -3557,7 +3557,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
 			continue;
 
 		if (!to_shadow_page(root_hpa) ||
-			to_shadow_page(root_hpa)->role.guest_mode)
+			to_shadow_page(root_hpa)->role.arch.guest_mode)
 			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
 	}
 
@@ -3585,10 +3585,10 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
 	struct kvm_mmu_page *sp;
 
 	role.level = level;
-	role.quadrant = quadrant;
+	role.arch.quadrant = quadrant;
 
-	WARN_ON_ONCE(quadrant && !role.has_4_byte_gpte);
-	WARN_ON_ONCE(role.direct && role.has_4_byte_gpte);
+	WARN_ON_ONCE(quadrant && !role.arch.has_4_byte_gpte);
+	WARN_ON_ONCE(role.arch.direct && role.arch.has_4_byte_gpte);
 
 	sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
 	++sp->root_count;
@@ -3834,7 +3834,7 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
 	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
 	 * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare.
 	 */
-	if (mmu->root_role.direct ||
+	if (mmu->root_role.arch.direct ||
 	    mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL ||
 	    mmu->root_role.level < PT64_ROOT_4LEVEL)
 		return 0;
@@ -3932,7 +3932,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 	int i;
 	struct kvm_mmu_page *sp;
 
-	if (vcpu->arch.mmu->root_role.direct)
+	if (vcpu->arch.mmu->root_role.arch.direct)
 		return;
 
 	if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
@@ -4161,7 +4161,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 
 	arch.token = alloc_apf_token(vcpu);
 	arch.gfn = gfn;
-	arch.direct_map = vcpu->arch.mmu->root_role.direct;
+	arch.direct_map = vcpu->arch.mmu->root_role.arch.direct;
 	arch.cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu);
 
 	return kvm_setup_async_pf(vcpu, cr2_or_gpa,
@@ -4172,7 +4172,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 {
 	int r;
 
-	if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||
+	if ((vcpu->arch.mmu->root_role.arch.direct != work->arch.direct_map) ||
 	      work->wakeup_all)
 		return;
 
@@ -4180,7 +4180,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 	if (unlikely(r))
 		return;
 
-	if (!vcpu->arch.mmu->root_role.direct &&
+	if (!vcpu->arch.mmu->root_role.arch.direct &&
 	      work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu))
 		return;
 
@@ -4456,7 +4456,7 @@ static void nonpaging_init_context(struct kvm_mmu *context)
 static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
 				  union kvm_mmu_page_role role)
 {
-	return (role.direct || pgd == root->pgd) &&
+	return (role.arch.direct || pgd == root->pgd) &&
 	       VALID_PAGE(root->hpa) &&
 	       role.word == to_shadow_page(root->hpa)->role.word;
 }
@@ -4576,7 +4576,7 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 	 * If this is a direct root page, it doesn't have a write flooding
 	 * count. Otherwise, clear the write flooding count.
 	 */
-	if (!new_role.direct)
+	if (!new_role.arch.direct)
 		__clear_sp_write_flooding_count(
 				to_shadow_page(vcpu->arch.mmu->root.hpa));
 }
@@ -4803,7 +4803,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
 	shadow_zero_check = &context->shadow_zero_check;
 	__reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
 				context->root_role.level,
-				context->root_role.efer_nx,
+				context->root_role.arch.efer_nx,
 				guest_can_use_gbpages(vcpu), is_pse, is_amd);
 
 	if (!shadow_me_mask)
@@ -5055,21 +5055,21 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
 {
 	union kvm_cpu_role role = {0};
 
-	role.base.access = ACC_ALL;
 	role.base.as_id = is_smm(vcpu);
-	role.base.guest_mode = is_guest_mode(vcpu);
+	role.base.arch.access = ACC_ALL;
+	role.base.arch.guest_mode = is_guest_mode(vcpu);
 	role.ext.valid = 1;
 
 	if (!____is_cr0_pg(regs)) {
-		role.base.direct = 1;
+		role.base.arch.direct = 1;
 		return role;
 	}
 
-	role.base.efer_nx = ____is_efer_nx(regs);
-	role.base.cr0_wp = ____is_cr0_wp(regs);
-	role.base.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
-	role.base.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
-	role.base.has_4_byte_gpte = !____is_cr4_pae(regs);
+	role.base.arch.efer_nx = ____is_efer_nx(regs);
+	role.base.arch.cr0_wp = ____is_cr0_wp(regs);
+	role.base.arch.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
+	role.base.arch.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
+	role.base.arch.has_4_byte_gpte = !____is_cr4_pae(regs);
 
 	if (____is_efer_lma(regs))
 		role.base.level = ____is_cr4_la57(regs) ? PT64_ROOT_5LEVEL
@@ -5109,15 +5109,15 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
 {
 	union kvm_mmu_page_role role = {0};
 
-	role.access = ACC_ALL;
-	role.cr0_wp = true;
-	role.efer_nx = true;
 	role.as_id = cpu_role.base.as_id;
-	role.guest_mode = cpu_role.base.guest_mode;
-	role.ad_disabled = !kvm_ad_enabled();
 	role.level = kvm_mmu_get_tdp_level(vcpu);
-	role.direct = true;
-	role.has_4_byte_gpte = false;
+	role.arch.access = ACC_ALL;
+	role.arch.cr0_wp = true;
+	role.arch.efer_nx = true;
+	role.arch.guest_mode = cpu_role.base.arch.guest_mode;
+	role.arch.ad_disabled = !kvm_ad_enabled();
+	role.arch.direct = true;
+	role.arch.has_4_byte_gpte = false;
 
 	return role;
 }
@@ -5194,7 +5194,7 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
 	 * NX can be used by any non-nested shadow MMU to avoid having to reset
 	 * MMU contexts.
 	 */
-	root_role.efer_nx = true;
+	root_role.arch.efer_nx = true;
 
 	shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
 }
@@ -5212,13 +5212,13 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 	union kvm_mmu_page_role root_role;
 
 	/* NPT requires CR0.PG=1. */
-	WARN_ON_ONCE(cpu_role.base.direct);
+	WARN_ON_ONCE(cpu_role.base.arch.direct);
 
 	root_role = cpu_role.base;
 	root_role.level = kvm_mmu_get_tdp_level(vcpu);
 	if (root_role.level == PT64_ROOT_5LEVEL &&
 	    cpu_role.base.level == PT64_ROOT_4LEVEL)
-		root_role.passthrough = 1;
+		root_role.arch.passthrough = 1;
 
 	shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
 	kvm_mmu_new_pgd(vcpu, nested_cr3);
@@ -5237,11 +5237,11 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
 	 */
 	WARN_ON_ONCE(is_smm(vcpu));
 	role.base.level = level;
-	role.base.has_4_byte_gpte = false;
-	role.base.direct = false;
-	role.base.ad_disabled = !accessed_dirty;
-	role.base.guest_mode = true;
-	role.base.access = ACC_ALL;
+	role.base.arch.has_4_byte_gpte = false;
+	role.base.arch.direct = false;
+	role.base.arch.ad_disabled = !accessed_dirty;
+	role.base.arch.guest_mode = true;
+	role.base.arch.access = ACC_ALL;
 
 	role.ext.word = 0;
 	role.ext.execonly = execonly;
@@ -5385,13 +5385,13 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 {
 	int r;
 
-	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.direct);
+	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.arch.direct);
 	if (r)
 		goto out;
 	r = mmu_alloc_special_roots(vcpu);
 	if (r)
 		goto out;
-	if (vcpu->arch.mmu->root_role.direct)
+	if (vcpu->arch.mmu->root_role.arch.direct)
 		r = mmu_alloc_direct_roots(vcpu);
 	else
 		r = mmu_alloc_shadow_roots(vcpu);
@@ -5526,7 +5526,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa,
 		 gpa, bytes, sp->role.word);
 
 	offset = offset_in_page(gpa);
-	pte_size = sp->role.has_4_byte_gpte ? 4 : 8;
+	pte_size = sp->role.arch.has_4_byte_gpte ? 4 : 8;
 
 	/*
 	 * Sometimes, the OS only writes the last one bytes to update status
@@ -5550,7 +5550,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 	page_offset = offset_in_page(gpa);
 	level = sp->role.level;
 	*nspte = 1;
-	if (sp->role.has_4_byte_gpte) {
+	if (sp->role.arch.has_4_byte_gpte) {
 		page_offset <<= 1;	/* 32->64 */
 		/*
 		 * A 32-bit pde maps 4MB while the shadow pdes map
@@ -5564,7 +5564,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 		}
 		quadrant = page_offset >> PAGE_SHIFT;
 		page_offset &= ~PAGE_MASK;
-		if (quadrant != sp->role.quadrant)
+		if (quadrant != sp->role.arch.quadrant)
 			return NULL;
 	}
 
@@ -5628,7 +5628,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 		       void *insn, int insn_len)
 {
 	int r, emulation_type = EMULTYPE_PF;
-	bool direct = vcpu->arch.mmu->root_role.direct;
+	bool direct = vcpu->arch.mmu->root_role.arch.direct;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
 		return RET_PF_RETRY;
@@ -5659,7 +5659,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	 * paging in both guests. If true, we simply unprotect the page
 	 * and resume the guest.
 	 */
-	if (vcpu->arch.mmu->root_role.direct &&
+	if (vcpu->arch.mmu->root_role.arch.direct &&
 	    (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) {
 		kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa));
 		return 1;
@@ -6321,7 +6321,7 @@ static void shadow_mmu_split_huge_page(struct kvm *kvm,
 
 		spte = make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
 		mmu_spte_set(sptep, spte);
-		__rmap_add(kvm, cache, slot, sptep, gfn, sp->role.access);
+		__rmap_add(kvm, cache, slot, sptep, gfn, sp->role.arch.access);
 	}
 
 	__link_shadow_page(kvm, cache, huge_sptep, sp, flush);
@@ -6380,7 +6380,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
 		sp = sptep_to_sp(huge_sptep);
 
 		/* TDP MMU is enabled, so rmap only contains nested MMU SPs. */
-		if (WARN_ON_ONCE(!sp->role.guest_mode))
+		if (WARN_ON_ONCE(!sp->role.arch.guest_mode))
 			continue;
 
 		/* The rmaps should never contain non-leaf SPTEs. */
@@ -6502,7 +6502,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 		 * the guest, and the guest page table is using 4K page size
 		 * mapping if the indirect sp has level = 1.
 		 */
-		if (sp->role.direct &&
+		if (sp->role.arch.direct &&
 		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
 							       PG_LEVEL_NUM)) {
 			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
@@ -6942,7 +6942,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 				      struct kvm_mmu_page,
 				      possible_nx_huge_page_link);
 		WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
-		WARN_ON_ONCE(!sp->role.direct);
+		WARN_ON_ONCE(!sp->role.arch.direct);
 
 		/*
 		 * Unaccount and do not attempt to recover any NX Huge Pages
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 5427f65117b4..c19a80fdeb8d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -143,7 +143,7 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 	 * being enabled is mandatory as the bits used to denote WP-only SPTEs
 	 * are reserved for PAE paging (32-bit KVM).
 	 */
-	return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
+	return kvm_x86_ops.cpu_dirty_log_size && sp->role.arch.guest_mode;
 }
 
 int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
@@ -270,7 +270,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	};
 	int r;
 
-	if (vcpu->arch.mmu->root_role.direct) {
+	if (vcpu->arch.mmu->root_role.arch.direct) {
 		fault.gfn = fault.addr >> PAGE_SHIFT;
 		fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
 	}
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index ae86820cef69..6a4a43b90780 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -35,13 +35,13 @@
 			 " %snxe %sad root %u %s%c",			\
 			 __entry->mmu_valid_gen,			\
 			 __entry->gfn, role.level,			\
-			 role.has_4_byte_gpte ? 4 : 8,			\
-			 role.quadrant,					\
-			 role.direct ? " direct" : "",			\
-			 access_str[role.access],			\
+			 role.arch.has_4_byte_gpte ? 4 : 8,			\
+			 role.arch.quadrant,					\
+			 role.arch.direct ? " direct" : "",			\
+			 access_str[role.arch.access],			\
 			 role.invalid ? " invalid" : "",		\
-			 role.efer_nx ? "" : "!",			\
-			 role.ad_disabled ? "!" : "",			\
+			 role.arch.efer_nx ? "" : "!",			\
+			 role.arch.ad_disabled ? "!" : "",			\
 			 __entry->root_count,				\
 			 __entry->unsync ? "unsync" : "sync", 0);	\
 	saved_ptr;							\
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e5662dbd519c..e15ec1c473da 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -55,7 +55,7 @@
 	#define PT_LEVEL_BITS 9
 	#define PT_GUEST_DIRTY_SHIFT 9
 	#define PT_GUEST_ACCESSED_SHIFT 8
-	#define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.ad_disabled)
+	#define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.arch.ad_disabled)
 	#define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL
 #else
 	#error Invalid PTTYPE value
@@ -532,7 +532,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
 
 	gfn = gpte_to_gfn(gpte);
-	pte_access = sp->role.access & FNAME(gpte_access)(gpte);
+	pte_access = sp->role.arch.access & FNAME(gpte_access)(gpte);
 	FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
 
 	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn,
@@ -592,7 +592,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
 	if (unlikely(vcpu->kvm->mmu_invalidate_in_progress))
 		return;
 
-	if (sp->role.direct)
+	if (sp->role.arch.direct)
 		return __direct_pte_prefetch(vcpu, sp, sptep);
 
 	i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
@@ -884,7 +884,7 @@ static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
 	WARN_ON(sp->role.level != PG_LEVEL_4K);
 
 	if (PTTYPE == 32)
-		offset = sp->role.quadrant << SPTE_LEVEL_BITS;
+		offset = sp->role.arch.quadrant << SPTE_LEVEL_BITS;
 
 	return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
 }
@@ -1003,9 +1003,11 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 	 */
 	const union kvm_mmu_page_role sync_role_ign = {
 		.level = 0xf,
-		.access = 0x7,
-		.quadrant = 0x3,
-		.passthrough = 0x1,
+		.arch = {
+			.access = 0x7,
+			.quadrant = 0x3,
+			.passthrough = 0x1,
+		},
 	};
 
 	/*
@@ -1014,7 +1016,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 	 * differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
 	 * reserved bits checks will be wrong, etc...
 	 */
-	if (WARN_ON_ONCE(sp->role.direct ||
+	if (WARN_ON_ONCE(sp->role.arch.direct ||
 			 (sp->role.word ^ root_role.word) & ~sync_role_ign.word))
 		return -1;
 
@@ -1043,7 +1045,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 		}
 
 		gfn = gpte_to_gfn(gpte);
-		pte_access = sp->role.access;
+		pte_access = sp->role.arch.access;
 		pte_access &= FNAME(gpte_access)(gpte);
 		FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
 
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index c0fd7e049b4e..fe4b626cb431 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -146,7 +146,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 
 	WARN_ON_ONCE(!pte_access && !shadow_present_mask);
 
-	if (sp->role.ad_disabled)
+	if (sp->role.arch.ad_disabled)
 		spte |= SPTE_TDP_AD_DISABLED_MASK;
 	else if (kvm_mmu_page_ad_need_write_protect(sp))
 		spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
@@ -301,7 +301,7 @@ u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte, union kvm_mmu_page
 		 * the page executable as the NX hugepage mitigation no longer
 		 * applies.
 		 */
-		if ((role.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
+		if ((role.arch.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
 			child_spte = make_spte_executable(child_spte);
 	}
 
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 1f03701b943a..ad84c549fe96 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -260,7 +260,7 @@ static inline bool kvm_ad_enabled(void)
 
 static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
 {
-	return sp->role.ad_disabled;
+	return sp->role.arch.ad_disabled;
 }
 
 static inline bool spte_ad_enabled(u64 spte)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b2da8c8f30a..2bfe060768fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8442,7 +8442,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	    WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF)))
 		return false;
 
-	if (!vcpu->arch.mmu->root_role.direct) {
+	if (!vcpu->arch.mmu->root_role.arch.direct) {
 		/*
 		 * Write permission should be allowed since only
 		 * write access need to be emulated.
@@ -8475,7 +8475,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	kvm_release_pfn_clean(pfn);
 
 	/* The instructions are well-emulated on direct mmu. */
-	if (vcpu->arch.mmu->root_role.direct) {
+	if (vcpu->arch.mmu->root_role.arch.direct) {
 		unsigned int indirect_shadow_pages;
 
 		write_lock(&vcpu->kvm->mmu_lock);
@@ -8543,7 +8543,7 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
 	vcpu->arch.last_retry_eip = ctxt->eip;
 	vcpu->arch.last_retry_addr = cr2_or_gpa;
 
-	if (!vcpu->arch.mmu->root_role.direct)
+	if (!vcpu->arch.mmu->root_role.arch.direct)
 		gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);
 
 	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
@@ -8846,7 +8846,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		ctxt->exception.address = cr2_or_gpa;
 
 		/* With shadow page tables, cr2 contains a GVA or nGPA. */
-		if (vcpu->arch.mmu->root_role.direct) {
+		if (vcpu->arch.mmu->root_role.arch.direct) {
 			ctxt->gpa_available = true;
 			ctxt->gpa_val = cr2_or_gpa;
 		}
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
new file mode 100644
index 000000000000..3f35a924e031
--- /dev/null
+++ b/include/kvm/mmu_types.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_MMU_TYPES_H
+#define __KVM_MMU_TYPES_H
+
+#include <linux/bug.h>
+#include <linux/types.h>
+#include <linux/stddef.h>
+
+#include <asm/kvm/mmu_types.h>
+
+/*
+ * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
+ * also includes TDP pages) to determine whether or not a page can be used in
+ * the given MMU context.
+ */
+union kvm_mmu_page_role {
+	u32 word;
+	struct {
+		struct {
+			/* The address space ID mapped by the page. */
+			u16 as_id:8;
+
+			/* The level of the page in the page table hierarchy. */
+			u16 level:4;
+
+			/* Whether the page is invalid, i.e. pending destruction. */
+			u16 invalid:1;
+		};
+
+		/* Architecture-specific properties. */
+		struct kvm_mmu_page_role_arch arch;
+	};
+};
+
+static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page_role, word));
+
+#endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move struct kvm_mmu_page_role into common code, and move all
x86-specific fields into a separate sub-struct within the role,
kvm_mmu_page_role_arch.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 MAINTAINERS                          |   4 +-
 arch/x86/include/asm/kvm/mmu_types.h |  56 ++++++++++
 arch/x86/include/asm/kvm_host.h      |  68 +-----------
 arch/x86/kvm/mmu/mmu.c               | 156 +++++++++++++--------------
 arch/x86/kvm/mmu/mmu_internal.h      |   4 +-
 arch/x86/kvm/mmu/mmutrace.h          |  12 +--
 arch/x86/kvm/mmu/paging_tmpl.h       |  20 ++--
 arch/x86/kvm/mmu/spte.c              |   4 +-
 arch/x86/kvm/mmu/spte.h              |   2 +-
 arch/x86/kvm/x86.c                   |   8 +-
 include/kvm/mmu_types.h              |  37 +++++++
 11 files changed, 202 insertions(+), 169 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm/mmu_types.h
 create mode 100644 include/kvm/mmu_types.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 89672a59c0c3..7e586d7ba78c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11198,7 +11198,8 @@ W:	http://www.linux-kvm.org
 T:	git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
 F:	Documentation/virt/kvm/
 F:	include/asm-generic/kvm*
-F:	include/kvm/iodev.h
+F:	include/kvm/
+X:	include/kvm/arm_*
 F:	include/linux/kvm*
 F:	include/trace/events/kvm.h
 F:	include/uapi/asm-generic/kvm*
@@ -11285,6 +11286,7 @@ L:	kvm@vger.kernel.org
 S:	Supported
 T:	git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
 F:	arch/x86/include/asm/kvm*
+F:	arch/x86/include/asm/kvm/
 F:	arch/x86/include/asm/svm.h
 F:	arch/x86/include/asm/vmx*.h
 F:	arch/x86/include/uapi/asm/kvm*
diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
new file mode 100644
index 000000000000..35f893ebab5a
--- /dev/null
+++ b/arch/x86/include/asm/kvm/mmu_types.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KVM_MMU_TYPES_H
+#define __ASM_KVM_MMU_TYPES_H
+
+#include <linux/types.h>
+
+/*
+ * This is a subset of the overall kvm_cpu_role to minimize the size of
+ * kvm_memory_slot.arch.gfn_track, i.e. allows allocating 2 bytes per gfn
+ * instead of 4 bytes per gfn.
+ *
+ * Upper-level shadow pages having gptes are tracked for write-protection via
+ * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
+ * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
+ * gfn_track will overflow and explosions will ensure.
+ *
+ * A unique shadow page (SP) for a gfn is created if and only if an existing SP
+ * cannot be reused.  The ability to reuse a SP is tracked by its role, which
+ * incorporates various mode bits and properties of the SP.  Roughly speaking,
+ * the number of unique SPs that can theoretically be created is 2^n, where n
+ * is the number of bits that are used to compute the role.
+ *
+ * Note, not all combinations of modes and flags below are possible:
+ *
+ *   - invalid shadow pages are not accounted, so the bits are effectively 18
+ *
+ *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
+ *     execonly and ad_disabled are only used for nested EPT which has
+ *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
+ *
+ *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
+ *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
+ *     paging has exactly one upper level, making level completely redundant
+ *     when has_4_byte_gpte=1.
+ *
+ *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
+ *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
+ *
+ * Therefore, the maximum number of possible upper-level shadow pages for a
+ * single gfn is a bit less than 2^13.
+ */
+struct kvm_mmu_page_role_arch {
+	u16 has_4_byte_gpte:1;
+	u16 quadrant:2;
+	u16 direct:1;
+	u16 access:3;
+	u16 efer_nx:1;
+	u16 cr0_wp:1;
+	u16 smep_andnot_wp:1;
+	u16 smap_andnot_wp:1;
+	u16 ad_disabled:1;
+	u16 guest_mode:1;
+	u16 passthrough:1;
+};
+
+#endif /* !__ASM_KVM_MMU_TYPES_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0a819d40131a..ebcd7a0dabef 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -37,6 +37,8 @@
 #include <asm/kvm_vcpu_regs.h>
 #include <asm/hyperv-tlfs.h>
 
+#include <kvm/mmu_types.h>
+
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
 #define KVM_MAX_VCPUS 1024
@@ -286,72 +288,6 @@ enum x86_intercept_stage;
 
 struct kvm_kernel_irq_routing_entry;
 
-/*
- * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
- * also includes TDP pages) to determine whether or not a page can be used in
- * the given MMU context.  This is a subset of the overall kvm_cpu_role to
- * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
- * 2 bytes per gfn instead of 4 bytes per gfn.
- *
- * Upper-level shadow pages having gptes are tracked for write-protection via
- * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
- * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
- * gfn_track will overflow and explosions will ensure.
- *
- * A unique shadow page (SP) for a gfn is created if and only if an existing SP
- * cannot be reused.  The ability to reuse a SP is tracked by its role, which
- * incorporates various mode bits and properties of the SP.  Roughly speaking,
- * the number of unique SPs that can theoretically be created is 2^n, where n
- * is the number of bits that are used to compute the role.
- *
- * But, even though there are 19 bits in the mask below, not all combinations
- * of modes and flags are possible:
- *
- *   - invalid shadow pages are not accounted, so the bits are effectively 18
- *
- *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
- *     execonly and ad_disabled are only used for nested EPT which has
- *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
- *
- *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
- *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
- *     paging has exactly one upper level, making level completely redundant
- *     when has_4_byte_gpte=1.
- *
- *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
- *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
- *
- * Therefore, the maximum number of possible upper-level shadow pages for a
- * single gfn is a bit less than 2^13.
- */
-union kvm_mmu_page_role {
-	u32 word;
-	struct {
-		unsigned level:4;
-		unsigned has_4_byte_gpte:1;
-		unsigned quadrant:2;
-		unsigned direct:1;
-		unsigned access:3;
-		unsigned invalid:1;
-		unsigned efer_nx:1;
-		unsigned cr0_wp:1;
-		unsigned smep_andnot_wp:1;
-		unsigned smap_andnot_wp:1;
-		unsigned ad_disabled:1;
-		unsigned guest_mode:1;
-		unsigned passthrough:1;
-		unsigned :5;
-
-		/*
-		 * This is left at the top of the word so that
-		 * kvm_memslots_for_spte_role can extract it with a
-		 * simple shift.  While there is room, give it a whole
-		 * byte so it is also faster to load it from memory.
-		 */
-		unsigned as_id:8;
-	};
-};
-
 /*
  * kvm_mmu_extended_role complements kvm_mmu_page_role, tracking properties
  * relevant to the current MMU configuration.   When loading CR0, CR4, or EFER,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f375b719f565..355548603960 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -210,13 +210,13 @@ static inline bool __maybe_unused is_##reg##_##name(struct kvm_mmu *mmu)	\
 {								\
 	return !!(mmu->cpu_role. base_or_ext . reg##_##name);	\
 }
-BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
+BUILD_MMU_ROLE_ACCESSOR(base.arch, cr0, wp);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pse);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smep);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smap);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
-BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
+BUILD_MMU_ROLE_ACCESSOR(base.arch, efer, nx);
 BUILD_MMU_ROLE_ACCESSOR(ext,  efer, lma);
 
 static inline bool is_cr0_pg(struct kvm_mmu *mmu)
@@ -226,7 +226,7 @@ static inline bool is_cr0_pg(struct kvm_mmu *mmu)
 
 static inline bool is_cr4_pae(struct kvm_mmu *mmu)
 {
-        return !mmu->cpu_role.base.has_4_byte_gpte;
+	return !mmu->cpu_role.base.arch.has_4_byte_gpte;
 }
 
 static struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
@@ -618,7 +618,7 @@ static bool mmu_spte_age(u64 *sptep)
 
 static inline bool is_tdp_mmu_active(struct kvm_vcpu *vcpu)
 {
-	return tdp_mmu_enabled && vcpu->arch.mmu->root_role.direct;
+	return tdp_mmu_enabled && vcpu->arch.mmu->root_role.arch.direct;
 }
 
 static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
@@ -695,10 +695,10 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp);
 
 static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 {
-	if (sp->role.passthrough)
+	if (sp->role.arch.passthrough)
 		return sp->gfn;
 
-	if (!sp->role.direct)
+	if (!sp->role.arch.direct)
 		return sp->shadowed_translation[index] >> PAGE_SHIFT;
 
 	return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
@@ -727,7 +727,7 @@ static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
 	 *
 	 * In both cases, sp->role.access contains the correct access bits.
 	 */
-	return sp->role.access;
+	return sp->role.arch.access;
 }
 
 static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
@@ -739,14 +739,14 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
 	}
 
 	WARN_ONCE(access != kvm_mmu_page_get_access(sp, index),
-	          "access mismatch under %s page %llx (expected %u, got %u)\n",
-	          sp->role.passthrough ? "passthrough" : "direct",
-	          sp->gfn, kvm_mmu_page_get_access(sp, index), access);
+		  "access mismatch under %s page %llx (expected %u, got %u)\n",
+		  sp->role.arch.passthrough ? "passthrough" : "direct",
+		  sp->gfn, kvm_mmu_page_get_access(sp, index), access);
 
 	WARN_ONCE(gfn != kvm_mmu_page_get_gfn(sp, index),
-	          "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
-	          sp->role.passthrough ? "passthrough" : "direct",
-	          sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
+		  "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
+		  sp->role.arch.passthrough ? "passthrough" : "direct",
+		  sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
 }
 
 static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
@@ -1723,7 +1723,7 @@ static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
 	hlist_del(&sp->hash_link);
 	list_del(&sp->link);
 	free_page((unsigned long)sp->spt);
-	if (!sp->role.direct)
+	if (!sp->role.arch.direct)
 		free_page((unsigned long)sp->shadowed_translation);
 	kmem_cache_free(mmu_page_header_cache, sp);
 }
@@ -1884,10 +1884,10 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 
 static bool sp_has_gptes(struct kvm_mmu_page *sp)
 {
-	if (sp->role.direct)
+	if (sp->role.arch.direct)
 		return false;
 
-	if (sp->role.passthrough)
+	if (sp->role.arch.passthrough)
 		return false;
 
 	return true;
@@ -2065,7 +2065,7 @@ static void clear_sp_write_flooding_count(u64 *spte)
  * The vCPU is required when finding indirect shadow pages; the shadow
  * page may already exist and syncing it needs the vCPU pointer in
  * order to read guest page tables.  Direct shadow pages are never
- * unsync, thus @vcpu can be NULL if @role.direct is true.
+ * unsync, thus @vcpu can be NULL if @role.arch.direct is true.
  */
 static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 						     struct kvm_vcpu *vcpu,
@@ -2101,7 +2101,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 		}
 
 		/* unsync and write-flooding only apply to indirect SPs. */
-		if (sp->role.direct)
+		if (sp->role.arch.direct)
 			goto out;
 
 		if (sp->unsync) {
@@ -2162,7 +2162,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 
 	sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
 	sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
-	if (!role.direct)
+	if (!role.arch.direct)
 		sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
 
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
@@ -2187,7 +2187,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 	return sp;
 }
 
-/* Note, @vcpu may be NULL if @role.direct is true; see kvm_mmu_find_shadow_page. */
+/* Note, @vcpu may be NULL if @role.arch.direct is true; see kvm_mmu_find_shadow_page. */
 static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
 						      struct kvm_vcpu *vcpu,
 						      struct shadow_page_caches *caches,
@@ -2231,9 +2231,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
 
 	role = parent_sp->role;
 	role.level--;
-	role.access = access;
-	role.direct = direct;
-	role.passthrough = 0;
+	role.arch.access = access;
+	role.arch.direct = direct;
+	role.arch.passthrough = 0;
 
 	/*
 	 * If the guest has 4-byte PTEs then that means it's using 32-bit,
@@ -2261,9 +2261,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
 	 * covers bit 21 (see above), thus the quadrant is calculated from the
 	 * _least_ significant bit of the PDE index.
 	 */
-	if (role.has_4_byte_gpte) {
+	if (role.arch.has_4_byte_gpte) {
 		WARN_ON_ONCE(role.level != PG_LEVEL_4K);
-		role.quadrant = spte_index(sptep) & 1;
+		role.arch.quadrant = spte_index(sptep) & 1;
 	}
 
 	return role;
@@ -2292,7 +2292,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
 
 	if (iterator->level >= PT64_ROOT_4LEVEL &&
 	    vcpu->arch.mmu->cpu_role.base.level < PT64_ROOT_4LEVEL &&
-	    !vcpu->arch.mmu->root_role.direct)
+	    !vcpu->arch.mmu->root_role.arch.direct)
 		iterator->level = PT32E_ROOT_LEVEL;
 
 	if (iterator->level == PT32E_ROOT_LEVEL) {
@@ -2391,7 +2391,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 * a new sp with the correct access.
 		 */
 		child = spte_to_child_sp(*sptep);
-		if (child->role.access == direct_access)
+		if (child->role.arch.access == direct_access)
 			return;
 
 		drop_parent_pte(child, sptep);
@@ -2420,7 +2420,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 			 * avoids retaining a large number of stale nested SPs.
 			 */
 			if (tdp_enabled && invalid_list &&
-			    child->role.guest_mode && !child->parent_ptes.val)
+			    child->role.arch.guest_mode && !child->parent_ptes.val)
 				return kvm_mmu_prepare_zap_page(kvm, child,
 								invalid_list);
 		}
@@ -2689,7 +2689,7 @@ static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
 	gpa_t gpa;
 	int r;
 
-	if (vcpu->arch.mmu->root_role.direct)
+	if (vcpu->arch.mmu->root_role.arch.direct)
 		return 0;
 
 	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
@@ -2900,7 +2900,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 {
 	struct page *pages[PTE_PREFETCH_NUM];
 	struct kvm_memory_slot *slot;
-	unsigned int access = sp->role.access;
+	unsigned int access = sp->role.arch.access;
 	int i, ret;
 	gfn_t gfn;
 
@@ -2928,7 +2928,7 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
 	u64 *spte, *start = NULL;
 	int i;
 
-	WARN_ON(!sp->role.direct);
+	WARN_ON(!sp->role.arch.direct);
 
 	i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
 	spte = sp->spt + i;
@@ -3549,7 +3549,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
 	 * This should not be called while L2 is active, L2 can't invalidate
 	 * _only_ its own roots, e.g. INVVPID unconditionally exits.
 	 */
-	WARN_ON_ONCE(mmu->root_role.guest_mode);
+	WARN_ON_ONCE(mmu->root_role.arch.guest_mode);
 
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
 		root_hpa = mmu->prev_roots[i].hpa;
@@ -3557,7 +3557,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
 			continue;
 
 		if (!to_shadow_page(root_hpa) ||
-			to_shadow_page(root_hpa)->role.guest_mode)
+			to_shadow_page(root_hpa)->role.arch.guest_mode)
 			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
 	}
 
@@ -3585,10 +3585,10 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
 	struct kvm_mmu_page *sp;
 
 	role.level = level;
-	role.quadrant = quadrant;
+	role.arch.quadrant = quadrant;
 
-	WARN_ON_ONCE(quadrant && !role.has_4_byte_gpte);
-	WARN_ON_ONCE(role.direct && role.has_4_byte_gpte);
+	WARN_ON_ONCE(quadrant && !role.arch.has_4_byte_gpte);
+	WARN_ON_ONCE(role.arch.direct && role.arch.has_4_byte_gpte);
 
 	sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
 	++sp->root_count;
@@ -3834,7 +3834,7 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
 	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
 	 * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare.
 	 */
-	if (mmu->root_role.direct ||
+	if (mmu->root_role.arch.direct ||
 	    mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL ||
 	    mmu->root_role.level < PT64_ROOT_4LEVEL)
 		return 0;
@@ -3932,7 +3932,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 	int i;
 	struct kvm_mmu_page *sp;
 
-	if (vcpu->arch.mmu->root_role.direct)
+	if (vcpu->arch.mmu->root_role.arch.direct)
 		return;
 
 	if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
@@ -4161,7 +4161,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 
 	arch.token = alloc_apf_token(vcpu);
 	arch.gfn = gfn;
-	arch.direct_map = vcpu->arch.mmu->root_role.direct;
+	arch.direct_map = vcpu->arch.mmu->root_role.arch.direct;
 	arch.cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu);
 
 	return kvm_setup_async_pf(vcpu, cr2_or_gpa,
@@ -4172,7 +4172,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 {
 	int r;
 
-	if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||
+	if ((vcpu->arch.mmu->root_role.arch.direct != work->arch.direct_map) ||
 	      work->wakeup_all)
 		return;
 
@@ -4180,7 +4180,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 	if (unlikely(r))
 		return;
 
-	if (!vcpu->arch.mmu->root_role.direct &&
+	if (!vcpu->arch.mmu->root_role.arch.direct &&
 	      work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu))
 		return;
 
@@ -4456,7 +4456,7 @@ static void nonpaging_init_context(struct kvm_mmu *context)
 static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
 				  union kvm_mmu_page_role role)
 {
-	return (role.direct || pgd == root->pgd) &&
+	return (role.arch.direct || pgd == root->pgd) &&
 	       VALID_PAGE(root->hpa) &&
 	       role.word == to_shadow_page(root->hpa)->role.word;
 }
@@ -4576,7 +4576,7 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 	 * If this is a direct root page, it doesn't have a write flooding
 	 * count. Otherwise, clear the write flooding count.
 	 */
-	if (!new_role.direct)
+	if (!new_role.arch.direct)
 		__clear_sp_write_flooding_count(
 				to_shadow_page(vcpu->arch.mmu->root.hpa));
 }
@@ -4803,7 +4803,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
 	shadow_zero_check = &context->shadow_zero_check;
 	__reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
 				context->root_role.level,
-				context->root_role.efer_nx,
+				context->root_role.arch.efer_nx,
 				guest_can_use_gbpages(vcpu), is_pse, is_amd);
 
 	if (!shadow_me_mask)
@@ -5055,21 +5055,21 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
 {
 	union kvm_cpu_role role = {0};
 
-	role.base.access = ACC_ALL;
 	role.base.as_id = is_smm(vcpu);
-	role.base.guest_mode = is_guest_mode(vcpu);
+	role.base.arch.access = ACC_ALL;
+	role.base.arch.guest_mode = is_guest_mode(vcpu);
 	role.ext.valid = 1;
 
 	if (!____is_cr0_pg(regs)) {
-		role.base.direct = 1;
+		role.base.arch.direct = 1;
 		return role;
 	}
 
-	role.base.efer_nx = ____is_efer_nx(regs);
-	role.base.cr0_wp = ____is_cr0_wp(regs);
-	role.base.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
-	role.base.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
-	role.base.has_4_byte_gpte = !____is_cr4_pae(regs);
+	role.base.arch.efer_nx = ____is_efer_nx(regs);
+	role.base.arch.cr0_wp = ____is_cr0_wp(regs);
+	role.base.arch.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
+	role.base.arch.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
+	role.base.arch.has_4_byte_gpte = !____is_cr4_pae(regs);
 
 	if (____is_efer_lma(regs))
 		role.base.level = ____is_cr4_la57(regs) ? PT64_ROOT_5LEVEL
@@ -5109,15 +5109,15 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
 {
 	union kvm_mmu_page_role role = {0};
 
-	role.access = ACC_ALL;
-	role.cr0_wp = true;
-	role.efer_nx = true;
 	role.as_id = cpu_role.base.as_id;
-	role.guest_mode = cpu_role.base.guest_mode;
-	role.ad_disabled = !kvm_ad_enabled();
 	role.level = kvm_mmu_get_tdp_level(vcpu);
-	role.direct = true;
-	role.has_4_byte_gpte = false;
+	role.arch.access = ACC_ALL;
+	role.arch.cr0_wp = true;
+	role.arch.efer_nx = true;
+	role.arch.guest_mode = cpu_role.base.arch.guest_mode;
+	role.arch.ad_disabled = !kvm_ad_enabled();
+	role.arch.direct = true;
+	role.arch.has_4_byte_gpte = false;
 
 	return role;
 }
@@ -5194,7 +5194,7 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
 	 * NX can be used by any non-nested shadow MMU to avoid having to reset
 	 * MMU contexts.
 	 */
-	root_role.efer_nx = true;
+	root_role.arch.efer_nx = true;
 
 	shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
 }
@@ -5212,13 +5212,13 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 	union kvm_mmu_page_role root_role;
 
 	/* NPT requires CR0.PG=1. */
-	WARN_ON_ONCE(cpu_role.base.direct);
+	WARN_ON_ONCE(cpu_role.base.arch.direct);
 
 	root_role = cpu_role.base;
 	root_role.level = kvm_mmu_get_tdp_level(vcpu);
 	if (root_role.level == PT64_ROOT_5LEVEL &&
 	    cpu_role.base.level == PT64_ROOT_4LEVEL)
-		root_role.passthrough = 1;
+		root_role.arch.passthrough = 1;
 
 	shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
 	kvm_mmu_new_pgd(vcpu, nested_cr3);
@@ -5237,11 +5237,11 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
 	 */
 	WARN_ON_ONCE(is_smm(vcpu));
 	role.base.level = level;
-	role.base.has_4_byte_gpte = false;
-	role.base.direct = false;
-	role.base.ad_disabled = !accessed_dirty;
-	role.base.guest_mode = true;
-	role.base.access = ACC_ALL;
+	role.base.arch.has_4_byte_gpte = false;
+	role.base.arch.direct = false;
+	role.base.arch.ad_disabled = !accessed_dirty;
+	role.base.arch.guest_mode = true;
+	role.base.arch.access = ACC_ALL;
 
 	role.ext.word = 0;
 	role.ext.execonly = execonly;
@@ -5385,13 +5385,13 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 {
 	int r;
 
-	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.direct);
+	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.arch.direct);
 	if (r)
 		goto out;
 	r = mmu_alloc_special_roots(vcpu);
 	if (r)
 		goto out;
-	if (vcpu->arch.mmu->root_role.direct)
+	if (vcpu->arch.mmu->root_role.arch.direct)
 		r = mmu_alloc_direct_roots(vcpu);
 	else
 		r = mmu_alloc_shadow_roots(vcpu);
@@ -5526,7 +5526,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa,
 		 gpa, bytes, sp->role.word);
 
 	offset = offset_in_page(gpa);
-	pte_size = sp->role.has_4_byte_gpte ? 4 : 8;
+	pte_size = sp->role.arch.has_4_byte_gpte ? 4 : 8;
 
 	/*
 	 * Sometimes, the OS only writes the last one bytes to update status
@@ -5550,7 +5550,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 	page_offset = offset_in_page(gpa);
 	level = sp->role.level;
 	*nspte = 1;
-	if (sp->role.has_4_byte_gpte) {
+	if (sp->role.arch.has_4_byte_gpte) {
 		page_offset <<= 1;	/* 32->64 */
 		/*
 		 * A 32-bit pde maps 4MB while the shadow pdes map
@@ -5564,7 +5564,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 		}
 		quadrant = page_offset >> PAGE_SHIFT;
 		page_offset &= ~PAGE_MASK;
-		if (quadrant != sp->role.quadrant)
+		if (quadrant != sp->role.arch.quadrant)
 			return NULL;
 	}
 
@@ -5628,7 +5628,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 		       void *insn, int insn_len)
 {
 	int r, emulation_type = EMULTYPE_PF;
-	bool direct = vcpu->arch.mmu->root_role.direct;
+	bool direct = vcpu->arch.mmu->root_role.arch.direct;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
 		return RET_PF_RETRY;
@@ -5659,7 +5659,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	 * paging in both guests. If true, we simply unprotect the page
 	 * and resume the guest.
 	 */
-	if (vcpu->arch.mmu->root_role.direct &&
+	if (vcpu->arch.mmu->root_role.arch.direct &&
 	    (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) {
 		kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa));
 		return 1;
@@ -6321,7 +6321,7 @@ static void shadow_mmu_split_huge_page(struct kvm *kvm,
 
 		spte = make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
 		mmu_spte_set(sptep, spte);
-		__rmap_add(kvm, cache, slot, sptep, gfn, sp->role.access);
+		__rmap_add(kvm, cache, slot, sptep, gfn, sp->role.arch.access);
 	}
 
 	__link_shadow_page(kvm, cache, huge_sptep, sp, flush);
@@ -6380,7 +6380,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
 		sp = sptep_to_sp(huge_sptep);
 
 		/* TDP MMU is enabled, so rmap only contains nested MMU SPs. */
-		if (WARN_ON_ONCE(!sp->role.guest_mode))
+		if (WARN_ON_ONCE(!sp->role.arch.guest_mode))
 			continue;
 
 		/* The rmaps should never contain non-leaf SPTEs. */
@@ -6502,7 +6502,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 		 * the guest, and the guest page table is using 4K page size
 		 * mapping if the indirect sp has level = 1.
 		 */
-		if (sp->role.direct &&
+		if (sp->role.arch.direct &&
 		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
 							       PG_LEVEL_NUM)) {
 			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
@@ -6942,7 +6942,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 				      struct kvm_mmu_page,
 				      possible_nx_huge_page_link);
 		WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
-		WARN_ON_ONCE(!sp->role.direct);
+		WARN_ON_ONCE(!sp->role.arch.direct);
 
 		/*
 		 * Unaccount and do not attempt to recover any NX Huge Pages
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 5427f65117b4..c19a80fdeb8d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -143,7 +143,7 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 	 * being enabled is mandatory as the bits used to denote WP-only SPTEs
 	 * are reserved for PAE paging (32-bit KVM).
 	 */
-	return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
+	return kvm_x86_ops.cpu_dirty_log_size && sp->role.arch.guest_mode;
 }
 
 int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
@@ -270,7 +270,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	};
 	int r;
 
-	if (vcpu->arch.mmu->root_role.direct) {
+	if (vcpu->arch.mmu->root_role.arch.direct) {
 		fault.gfn = fault.addr >> PAGE_SHIFT;
 		fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
 	}
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index ae86820cef69..6a4a43b90780 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -35,13 +35,13 @@
 			 " %snxe %sad root %u %s%c",			\
 			 __entry->mmu_valid_gen,			\
 			 __entry->gfn, role.level,			\
-			 role.has_4_byte_gpte ? 4 : 8,			\
-			 role.quadrant,					\
-			 role.direct ? " direct" : "",			\
-			 access_str[role.access],			\
+			 role.arch.has_4_byte_gpte ? 4 : 8,			\
+			 role.arch.quadrant,					\
+			 role.arch.direct ? " direct" : "",			\
+			 access_str[role.arch.access],			\
 			 role.invalid ? " invalid" : "",		\
-			 role.efer_nx ? "" : "!",			\
-			 role.ad_disabled ? "!" : "",			\
+			 role.arch.efer_nx ? "" : "!",			\
+			 role.arch.ad_disabled ? "!" : "",			\
 			 __entry->root_count,				\
 			 __entry->unsync ? "unsync" : "sync", 0);	\
 	saved_ptr;							\
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e5662dbd519c..e15ec1c473da 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -55,7 +55,7 @@
 	#define PT_LEVEL_BITS 9
 	#define PT_GUEST_DIRTY_SHIFT 9
 	#define PT_GUEST_ACCESSED_SHIFT 8
-	#define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.ad_disabled)
+	#define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.arch.ad_disabled)
 	#define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL
 #else
 	#error Invalid PTTYPE value
@@ -532,7 +532,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
 
 	gfn = gpte_to_gfn(gpte);
-	pte_access = sp->role.access & FNAME(gpte_access)(gpte);
+	pte_access = sp->role.arch.access & FNAME(gpte_access)(gpte);
 	FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
 
 	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn,
@@ -592,7 +592,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
 	if (unlikely(vcpu->kvm->mmu_invalidate_in_progress))
 		return;
 
-	if (sp->role.direct)
+	if (sp->role.arch.direct)
 		return __direct_pte_prefetch(vcpu, sp, sptep);
 
 	i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
@@ -884,7 +884,7 @@ static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
 	WARN_ON(sp->role.level != PG_LEVEL_4K);
 
 	if (PTTYPE == 32)
-		offset = sp->role.quadrant << SPTE_LEVEL_BITS;
+		offset = sp->role.arch.quadrant << SPTE_LEVEL_BITS;
 
 	return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
 }
@@ -1003,9 +1003,11 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 	 */
 	const union kvm_mmu_page_role sync_role_ign = {
 		.level = 0xf,
-		.access = 0x7,
-		.quadrant = 0x3,
-		.passthrough = 0x1,
+		.arch = {
+			.access = 0x7,
+			.quadrant = 0x3,
+			.passthrough = 0x1,
+		},
 	};
 
 	/*
@@ -1014,7 +1016,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 	 * differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
 	 * reserved bits checks will be wrong, etc...
 	 */
-	if (WARN_ON_ONCE(sp->role.direct ||
+	if (WARN_ON_ONCE(sp->role.arch.direct ||
 			 (sp->role.word ^ root_role.word) & ~sync_role_ign.word))
 		return -1;
 
@@ -1043,7 +1045,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 		}
 
 		gfn = gpte_to_gfn(gpte);
-		pte_access = sp->role.access;
+		pte_access = sp->role.arch.access;
 		pte_access &= FNAME(gpte_access)(gpte);
 		FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
 
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index c0fd7e049b4e..fe4b626cb431 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -146,7 +146,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 
 	WARN_ON_ONCE(!pte_access && !shadow_present_mask);
 
-	if (sp->role.ad_disabled)
+	if (sp->role.arch.ad_disabled)
 		spte |= SPTE_TDP_AD_DISABLED_MASK;
 	else if (kvm_mmu_page_ad_need_write_protect(sp))
 		spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
@@ -301,7 +301,7 @@ u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte, union kvm_mmu_page
 		 * the page executable as the NX hugepage mitigation no longer
 		 * applies.
 		 */
-		if ((role.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
+		if ((role.arch.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
 			child_spte = make_spte_executable(child_spte);
 	}
 
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 1f03701b943a..ad84c549fe96 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -260,7 +260,7 @@ static inline bool kvm_ad_enabled(void)
 
 static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
 {
-	return sp->role.ad_disabled;
+	return sp->role.arch.ad_disabled;
 }
 
 static inline bool spte_ad_enabled(u64 spte)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b2da8c8f30a..2bfe060768fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8442,7 +8442,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	    WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF)))
 		return false;
 
-	if (!vcpu->arch.mmu->root_role.direct) {
+	if (!vcpu->arch.mmu->root_role.arch.direct) {
 		/*
 		 * Write permission should be allowed since only
 		 * write access need to be emulated.
@@ -8475,7 +8475,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	kvm_release_pfn_clean(pfn);
 
 	/* The instructions are well-emulated on direct mmu. */
-	if (vcpu->arch.mmu->root_role.direct) {
+	if (vcpu->arch.mmu->root_role.arch.direct) {
 		unsigned int indirect_shadow_pages;
 
 		write_lock(&vcpu->kvm->mmu_lock);
@@ -8543,7 +8543,7 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
 	vcpu->arch.last_retry_eip = ctxt->eip;
 	vcpu->arch.last_retry_addr = cr2_or_gpa;
 
-	if (!vcpu->arch.mmu->root_role.direct)
+	if (!vcpu->arch.mmu->root_role.arch.direct)
 		gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);
 
 	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
@@ -8846,7 +8846,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		ctxt->exception.address = cr2_or_gpa;
 
 		/* With shadow page tables, cr2 contains a GVA or nGPA. */
-		if (vcpu->arch.mmu->root_role.direct) {
+		if (vcpu->arch.mmu->root_role.arch.direct) {
 			ctxt->gpa_available = true;
 			ctxt->gpa_val = cr2_or_gpa;
 		}
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
new file mode 100644
index 000000000000..3f35a924e031
--- /dev/null
+++ b/include/kvm/mmu_types.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_MMU_TYPES_H
+#define __KVM_MMU_TYPES_H
+
+#include <linux/bug.h>
+#include <linux/types.h>
+#include <linux/stddef.h>
+
+#include <asm/kvm/mmu_types.h>
+
+/*
+ * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
+ * also includes TDP pages) to determine whether or not a page can be used in
+ * the given MMU context.
+ */
+union kvm_mmu_page_role {
+	u32 word;
+	struct {
+		struct {
+			/* The address space ID mapped by the page. */
+			u16 as_id:8;
+
+			/* The level of the page in the page table hierarchy. */
+			u16 level:4;
+
+			/* Whether the page is invalid, i.e. pending destruction. */
+			u16 invalid:1;
+		};
+
+		/* Architecture-specific properties. */
+		struct kvm_mmu_page_role_arch arch;
+	};
+};
+
+static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page_role, word));
+
+#endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move struct kvm_mmu_page_role into common code, and move all
x86-specific fields into a separate sub-struct within the role,
kvm_mmu_page_role_arch.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 MAINTAINERS                          |   4 +-
 arch/x86/include/asm/kvm/mmu_types.h |  56 ++++++++++
 arch/x86/include/asm/kvm_host.h      |  68 +-----------
 arch/x86/kvm/mmu/mmu.c               | 156 +++++++++++++--------------
 arch/x86/kvm/mmu/mmu_internal.h      |   4 +-
 arch/x86/kvm/mmu/mmutrace.h          |  12 +--
 arch/x86/kvm/mmu/paging_tmpl.h       |  20 ++--
 arch/x86/kvm/mmu/spte.c              |   4 +-
 arch/x86/kvm/mmu/spte.h              |   2 +-
 arch/x86/kvm/x86.c                   |   8 +-
 include/kvm/mmu_types.h              |  37 +++++++
 11 files changed, 202 insertions(+), 169 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm/mmu_types.h
 create mode 100644 include/kvm/mmu_types.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 89672a59c0c3..7e586d7ba78c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11198,7 +11198,8 @@ W:	http://www.linux-kvm.org
 T:	git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
 F:	Documentation/virt/kvm/
 F:	include/asm-generic/kvm*
-F:	include/kvm/iodev.h
+F:	include/kvm/
+X:	include/kvm/arm_*
 F:	include/linux/kvm*
 F:	include/trace/events/kvm.h
 F:	include/uapi/asm-generic/kvm*
@@ -11285,6 +11286,7 @@ L:	kvm@vger.kernel.org
 S:	Supported
 T:	git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
 F:	arch/x86/include/asm/kvm*
+F:	arch/x86/include/asm/kvm/
 F:	arch/x86/include/asm/svm.h
 F:	arch/x86/include/asm/vmx*.h
 F:	arch/x86/include/uapi/asm/kvm*
diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
new file mode 100644
index 000000000000..35f893ebab5a
--- /dev/null
+++ b/arch/x86/include/asm/kvm/mmu_types.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KVM_MMU_TYPES_H
+#define __ASM_KVM_MMU_TYPES_H
+
+#include <linux/types.h>
+
+/*
+ * This is a subset of the overall kvm_cpu_role to minimize the size of
+ * kvm_memory_slot.arch.gfn_track, i.e. allows allocating 2 bytes per gfn
+ * instead of 4 bytes per gfn.
+ *
+ * Upper-level shadow pages having gptes are tracked for write-protection via
+ * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
+ * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
+ * gfn_track will overflow and explosions will ensure.
+ *
+ * A unique shadow page (SP) for a gfn is created if and only if an existing SP
+ * cannot be reused.  The ability to reuse a SP is tracked by its role, which
+ * incorporates various mode bits and properties of the SP.  Roughly speaking,
+ * the number of unique SPs that can theoretically be created is 2^n, where n
+ * is the number of bits that are used to compute the role.
+ *
+ * Note, not all combinations of modes and flags below are possible:
+ *
+ *   - invalid shadow pages are not accounted, so the bits are effectively 18
+ *
+ *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
+ *     execonly and ad_disabled are only used for nested EPT which has
+ *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
+ *
+ *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
+ *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
+ *     paging has exactly one upper level, making level completely redundant
+ *     when has_4_byte_gpte=1.
+ *
+ *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
+ *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
+ *
+ * Therefore, the maximum number of possible upper-level shadow pages for a
+ * single gfn is a bit less than 2^13.
+ */
+struct kvm_mmu_page_role_arch {
+	u16 has_4_byte_gpte:1;
+	u16 quadrant:2;
+	u16 direct:1;
+	u16 access:3;
+	u16 efer_nx:1;
+	u16 cr0_wp:1;
+	u16 smep_andnot_wp:1;
+	u16 smap_andnot_wp:1;
+	u16 ad_disabled:1;
+	u16 guest_mode:1;
+	u16 passthrough:1;
+};
+
+#endif /* !__ASM_KVM_MMU_TYPES_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0a819d40131a..ebcd7a0dabef 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -37,6 +37,8 @@
 #include <asm/kvm_vcpu_regs.h>
 #include <asm/hyperv-tlfs.h>
 
+#include <kvm/mmu_types.h>
+
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
 #define KVM_MAX_VCPUS 1024
@@ -286,72 +288,6 @@ enum x86_intercept_stage;
 
 struct kvm_kernel_irq_routing_entry;
 
-/*
- * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
- * also includes TDP pages) to determine whether or not a page can be used in
- * the given MMU context.  This is a subset of the overall kvm_cpu_role to
- * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
- * 2 bytes per gfn instead of 4 bytes per gfn.
- *
- * Upper-level shadow pages having gptes are tracked for write-protection via
- * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
- * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
- * gfn_track will overflow and explosions will ensure.
- *
- * A unique shadow page (SP) for a gfn is created if and only if an existing SP
- * cannot be reused.  The ability to reuse a SP is tracked by its role, which
- * incorporates various mode bits and properties of the SP.  Roughly speaking,
- * the number of unique SPs that can theoretically be created is 2^n, where n
- * is the number of bits that are used to compute the role.
- *
- * But, even though there are 19 bits in the mask below, not all combinations
- * of modes and flags are possible:
- *
- *   - invalid shadow pages are not accounted, so the bits are effectively 18
- *
- *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
- *     execonly and ad_disabled are only used for nested EPT which has
- *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
- *
- *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
- *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
- *     paging has exactly one upper level, making level completely redundant
- *     when has_4_byte_gpte=1.
- *
- *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
- *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
- *
- * Therefore, the maximum number of possible upper-level shadow pages for a
- * single gfn is a bit less than 2^13.
- */
-union kvm_mmu_page_role {
-	u32 word;
-	struct {
-		unsigned level:4;
-		unsigned has_4_byte_gpte:1;
-		unsigned quadrant:2;
-		unsigned direct:1;
-		unsigned access:3;
-		unsigned invalid:1;
-		unsigned efer_nx:1;
-		unsigned cr0_wp:1;
-		unsigned smep_andnot_wp:1;
-		unsigned smap_andnot_wp:1;
-		unsigned ad_disabled:1;
-		unsigned guest_mode:1;
-		unsigned passthrough:1;
-		unsigned :5;
-
-		/*
-		 * This is left at the top of the word so that
-		 * kvm_memslots_for_spte_role can extract it with a
-		 * simple shift.  While there is room, give it a whole
-		 * byte so it is also faster to load it from memory.
-		 */
-		unsigned as_id:8;
-	};
-};
-
 /*
  * kvm_mmu_extended_role complements kvm_mmu_page_role, tracking properties
  * relevant to the current MMU configuration.   When loading CR0, CR4, or EFER,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f375b719f565..355548603960 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -210,13 +210,13 @@ static inline bool __maybe_unused is_##reg##_##name(struct kvm_mmu *mmu)	\
 {								\
 	return !!(mmu->cpu_role. base_or_ext . reg##_##name);	\
 }
-BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
+BUILD_MMU_ROLE_ACCESSOR(base.arch, cr0, wp);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pse);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smep);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smap);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
-BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
+BUILD_MMU_ROLE_ACCESSOR(base.arch, efer, nx);
 BUILD_MMU_ROLE_ACCESSOR(ext,  efer, lma);
 
 static inline bool is_cr0_pg(struct kvm_mmu *mmu)
@@ -226,7 +226,7 @@ static inline bool is_cr0_pg(struct kvm_mmu *mmu)
 
 static inline bool is_cr4_pae(struct kvm_mmu *mmu)
 {
-        return !mmu->cpu_role.base.has_4_byte_gpte;
+	return !mmu->cpu_role.base.arch.has_4_byte_gpte;
 }
 
 static struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
@@ -618,7 +618,7 @@ static bool mmu_spte_age(u64 *sptep)
 
 static inline bool is_tdp_mmu_active(struct kvm_vcpu *vcpu)
 {
-	return tdp_mmu_enabled && vcpu->arch.mmu->root_role.direct;
+	return tdp_mmu_enabled && vcpu->arch.mmu->root_role.arch.direct;
 }
 
 static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
@@ -695,10 +695,10 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp);
 
 static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 {
-	if (sp->role.passthrough)
+	if (sp->role.arch.passthrough)
 		return sp->gfn;
 
-	if (!sp->role.direct)
+	if (!sp->role.arch.direct)
 		return sp->shadowed_translation[index] >> PAGE_SHIFT;
 
 	return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
@@ -727,7 +727,7 @@ static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
 	 *
 	 * In both cases, sp->role.access contains the correct access bits.
 	 */
-	return sp->role.access;
+	return sp->role.arch.access;
 }
 
 static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
@@ -739,14 +739,14 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
 	}
 
 	WARN_ONCE(access != kvm_mmu_page_get_access(sp, index),
-	          "access mismatch under %s page %llx (expected %u, got %u)\n",
-	          sp->role.passthrough ? "passthrough" : "direct",
-	          sp->gfn, kvm_mmu_page_get_access(sp, index), access);
+		  "access mismatch under %s page %llx (expected %u, got %u)\n",
+		  sp->role.arch.passthrough ? "passthrough" : "direct",
+		  sp->gfn, kvm_mmu_page_get_access(sp, index), access);
 
 	WARN_ONCE(gfn != kvm_mmu_page_get_gfn(sp, index),
-	          "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
-	          sp->role.passthrough ? "passthrough" : "direct",
-	          sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
+		  "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
+		  sp->role.arch.passthrough ? "passthrough" : "direct",
+		  sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
 }
 
 static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
@@ -1723,7 +1723,7 @@ static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
 	hlist_del(&sp->hash_link);
 	list_del(&sp->link);
 	free_page((unsigned long)sp->spt);
-	if (!sp->role.direct)
+	if (!sp->role.arch.direct)
 		free_page((unsigned long)sp->shadowed_translation);
 	kmem_cache_free(mmu_page_header_cache, sp);
 }
@@ -1884,10 +1884,10 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 
 static bool sp_has_gptes(struct kvm_mmu_page *sp)
 {
-	if (sp->role.direct)
+	if (sp->role.arch.direct)
 		return false;
 
-	if (sp->role.passthrough)
+	if (sp->role.arch.passthrough)
 		return false;
 
 	return true;
@@ -2065,7 +2065,7 @@ static void clear_sp_write_flooding_count(u64 *spte)
  * The vCPU is required when finding indirect shadow pages; the shadow
  * page may already exist and syncing it needs the vCPU pointer in
  * order to read guest page tables.  Direct shadow pages are never
- * unsync, thus @vcpu can be NULL if @role.direct is true.
+ * unsync, thus @vcpu can be NULL if @role.arch.direct is true.
  */
 static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 						     struct kvm_vcpu *vcpu,
@@ -2101,7 +2101,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 		}
 
 		/* unsync and write-flooding only apply to indirect SPs. */
-		if (sp->role.direct)
+		if (sp->role.arch.direct)
 			goto out;
 
 		if (sp->unsync) {
@@ -2162,7 +2162,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 
 	sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
 	sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
-	if (!role.direct)
+	if (!role.arch.direct)
 		sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
 
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
@@ -2187,7 +2187,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 	return sp;
 }
 
-/* Note, @vcpu may be NULL if @role.direct is true; see kvm_mmu_find_shadow_page. */
+/* Note, @vcpu may be NULL if @role.arch.direct is true; see kvm_mmu_find_shadow_page. */
 static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
 						      struct kvm_vcpu *vcpu,
 						      struct shadow_page_caches *caches,
@@ -2231,9 +2231,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
 
 	role = parent_sp->role;
 	role.level--;
-	role.access = access;
-	role.direct = direct;
-	role.passthrough = 0;
+	role.arch.access = access;
+	role.arch.direct = direct;
+	role.arch.passthrough = 0;
 
 	/*
 	 * If the guest has 4-byte PTEs then that means it's using 32-bit,
@@ -2261,9 +2261,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
 	 * covers bit 21 (see above), thus the quadrant is calculated from the
 	 * _least_ significant bit of the PDE index.
 	 */
-	if (role.has_4_byte_gpte) {
+	if (role.arch.has_4_byte_gpte) {
 		WARN_ON_ONCE(role.level != PG_LEVEL_4K);
-		role.quadrant = spte_index(sptep) & 1;
+		role.arch.quadrant = spte_index(sptep) & 1;
 	}
 
 	return role;
@@ -2292,7 +2292,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
 
 	if (iterator->level >= PT64_ROOT_4LEVEL &&
 	    vcpu->arch.mmu->cpu_role.base.level < PT64_ROOT_4LEVEL &&
-	    !vcpu->arch.mmu->root_role.direct)
+	    !vcpu->arch.mmu->root_role.arch.direct)
 		iterator->level = PT32E_ROOT_LEVEL;
 
 	if (iterator->level == PT32E_ROOT_LEVEL) {
@@ -2391,7 +2391,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 * a new sp with the correct access.
 		 */
 		child = spte_to_child_sp(*sptep);
-		if (child->role.access == direct_access)
+		if (child->role.arch.access == direct_access)
 			return;
 
 		drop_parent_pte(child, sptep);
@@ -2420,7 +2420,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 			 * avoids retaining a large number of stale nested SPs.
 			 */
 			if (tdp_enabled && invalid_list &&
-			    child->role.guest_mode && !child->parent_ptes.val)
+			    child->role.arch.guest_mode && !child->parent_ptes.val)
 				return kvm_mmu_prepare_zap_page(kvm, child,
 								invalid_list);
 		}
@@ -2689,7 +2689,7 @@ static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
 	gpa_t gpa;
 	int r;
 
-	if (vcpu->arch.mmu->root_role.direct)
+	if (vcpu->arch.mmu->root_role.arch.direct)
 		return 0;
 
 	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
@@ -2900,7 +2900,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 {
 	struct page *pages[PTE_PREFETCH_NUM];
 	struct kvm_memory_slot *slot;
-	unsigned int access = sp->role.access;
+	unsigned int access = sp->role.arch.access;
 	int i, ret;
 	gfn_t gfn;
 
@@ -2928,7 +2928,7 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
 	u64 *spte, *start = NULL;
 	int i;
 
-	WARN_ON(!sp->role.direct);
+	WARN_ON(!sp->role.arch.direct);
 
 	i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
 	spte = sp->spt + i;
@@ -3549,7 +3549,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
 	 * This should not be called while L2 is active, L2 can't invalidate
 	 * _only_ its own roots, e.g. INVVPID unconditionally exits.
 	 */
-	WARN_ON_ONCE(mmu->root_role.guest_mode);
+	WARN_ON_ONCE(mmu->root_role.arch.guest_mode);
 
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
 		root_hpa = mmu->prev_roots[i].hpa;
@@ -3557,7 +3557,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
 			continue;
 
 		if (!to_shadow_page(root_hpa) ||
-			to_shadow_page(root_hpa)->role.guest_mode)
+			to_shadow_page(root_hpa)->role.arch.guest_mode)
 			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
 	}
 
@@ -3585,10 +3585,10 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
 	struct kvm_mmu_page *sp;
 
 	role.level = level;
-	role.quadrant = quadrant;
+	role.arch.quadrant = quadrant;
 
-	WARN_ON_ONCE(quadrant && !role.has_4_byte_gpte);
-	WARN_ON_ONCE(role.direct && role.has_4_byte_gpte);
+	WARN_ON_ONCE(quadrant && !role.arch.has_4_byte_gpte);
+	WARN_ON_ONCE(role.arch.direct && role.arch.has_4_byte_gpte);
 
 	sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
 	++sp->root_count;
@@ -3834,7 +3834,7 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
 	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
 	 * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare.
 	 */
-	if (mmu->root_role.direct ||
+	if (mmu->root_role.arch.direct ||
 	    mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL ||
 	    mmu->root_role.level < PT64_ROOT_4LEVEL)
 		return 0;
@@ -3932,7 +3932,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 	int i;
 	struct kvm_mmu_page *sp;
 
-	if (vcpu->arch.mmu->root_role.direct)
+	if (vcpu->arch.mmu->root_role.arch.direct)
 		return;
 
 	if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
@@ -4161,7 +4161,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 
 	arch.token = alloc_apf_token(vcpu);
 	arch.gfn = gfn;
-	arch.direct_map = vcpu->arch.mmu->root_role.direct;
+	arch.direct_map = vcpu->arch.mmu->root_role.arch.direct;
 	arch.cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu);
 
 	return kvm_setup_async_pf(vcpu, cr2_or_gpa,
@@ -4172,7 +4172,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 {
 	int r;
 
-	if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||
+	if ((vcpu->arch.mmu->root_role.arch.direct != work->arch.direct_map) ||
 	      work->wakeup_all)
 		return;
 
@@ -4180,7 +4180,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 	if (unlikely(r))
 		return;
 
-	if (!vcpu->arch.mmu->root_role.direct &&
+	if (!vcpu->arch.mmu->root_role.arch.direct &&
 	      work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu))
 		return;
 
@@ -4456,7 +4456,7 @@ static void nonpaging_init_context(struct kvm_mmu *context)
 static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
 				  union kvm_mmu_page_role role)
 {
-	return (role.direct || pgd == root->pgd) &&
+	return (role.arch.direct || pgd == root->pgd) &&
 	       VALID_PAGE(root->hpa) &&
 	       role.word == to_shadow_page(root->hpa)->role.word;
 }
@@ -4576,7 +4576,7 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 	 * If this is a direct root page, it doesn't have a write flooding
 	 * count. Otherwise, clear the write flooding count.
 	 */
-	if (!new_role.direct)
+	if (!new_role.arch.direct)
 		__clear_sp_write_flooding_count(
 				to_shadow_page(vcpu->arch.mmu->root.hpa));
 }
@@ -4803,7 +4803,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
 	shadow_zero_check = &context->shadow_zero_check;
 	__reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
 				context->root_role.level,
-				context->root_role.efer_nx,
+				context->root_role.arch.efer_nx,
 				guest_can_use_gbpages(vcpu), is_pse, is_amd);
 
 	if (!shadow_me_mask)
@@ -5055,21 +5055,21 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
 {
 	union kvm_cpu_role role = {0};
 
-	role.base.access = ACC_ALL;
 	role.base.as_id = is_smm(vcpu);
-	role.base.guest_mode = is_guest_mode(vcpu);
+	role.base.arch.access = ACC_ALL;
+	role.base.arch.guest_mode = is_guest_mode(vcpu);
 	role.ext.valid = 1;
 
 	if (!____is_cr0_pg(regs)) {
-		role.base.direct = 1;
+		role.base.arch.direct = 1;
 		return role;
 	}
 
-	role.base.efer_nx = ____is_efer_nx(regs);
-	role.base.cr0_wp = ____is_cr0_wp(regs);
-	role.base.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
-	role.base.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
-	role.base.has_4_byte_gpte = !____is_cr4_pae(regs);
+	role.base.arch.efer_nx = ____is_efer_nx(regs);
+	role.base.arch.cr0_wp = ____is_cr0_wp(regs);
+	role.base.arch.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
+	role.base.arch.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
+	role.base.arch.has_4_byte_gpte = !____is_cr4_pae(regs);
 
 	if (____is_efer_lma(regs))
 		role.base.level = ____is_cr4_la57(regs) ? PT64_ROOT_5LEVEL
@@ -5109,15 +5109,15 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
 {
 	union kvm_mmu_page_role role = {0};
 
-	role.access = ACC_ALL;
-	role.cr0_wp = true;
-	role.efer_nx = true;
 	role.as_id = cpu_role.base.as_id;
-	role.guest_mode = cpu_role.base.guest_mode;
-	role.ad_disabled = !kvm_ad_enabled();
 	role.level = kvm_mmu_get_tdp_level(vcpu);
-	role.direct = true;
-	role.has_4_byte_gpte = false;
+	role.arch.access = ACC_ALL;
+	role.arch.cr0_wp = true;
+	role.arch.efer_nx = true;
+	role.arch.guest_mode = cpu_role.base.arch.guest_mode;
+	role.arch.ad_disabled = !kvm_ad_enabled();
+	role.arch.direct = true;
+	role.arch.has_4_byte_gpte = false;
 
 	return role;
 }
@@ -5194,7 +5194,7 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
 	 * NX can be used by any non-nested shadow MMU to avoid having to reset
 	 * MMU contexts.
 	 */
-	root_role.efer_nx = true;
+	root_role.arch.efer_nx = true;
 
 	shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
 }
@@ -5212,13 +5212,13 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 	union kvm_mmu_page_role root_role;
 
 	/* NPT requires CR0.PG=1. */
-	WARN_ON_ONCE(cpu_role.base.direct);
+	WARN_ON_ONCE(cpu_role.base.arch.direct);
 
 	root_role = cpu_role.base;
 	root_role.level = kvm_mmu_get_tdp_level(vcpu);
 	if (root_role.level == PT64_ROOT_5LEVEL &&
 	    cpu_role.base.level == PT64_ROOT_4LEVEL)
-		root_role.passthrough = 1;
+		root_role.arch.passthrough = 1;
 
 	shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
 	kvm_mmu_new_pgd(vcpu, nested_cr3);
@@ -5237,11 +5237,11 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
 	 */
 	WARN_ON_ONCE(is_smm(vcpu));
 	role.base.level = level;
-	role.base.has_4_byte_gpte = false;
-	role.base.direct = false;
-	role.base.ad_disabled = !accessed_dirty;
-	role.base.guest_mode = true;
-	role.base.access = ACC_ALL;
+	role.base.arch.has_4_byte_gpte = false;
+	role.base.arch.direct = false;
+	role.base.arch.ad_disabled = !accessed_dirty;
+	role.base.arch.guest_mode = true;
+	role.base.arch.access = ACC_ALL;
 
 	role.ext.word = 0;
 	role.ext.execonly = execonly;
@@ -5385,13 +5385,13 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 {
 	int r;
 
-	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.direct);
+	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.arch.direct);
 	if (r)
 		goto out;
 	r = mmu_alloc_special_roots(vcpu);
 	if (r)
 		goto out;
-	if (vcpu->arch.mmu->root_role.direct)
+	if (vcpu->arch.mmu->root_role.arch.direct)
 		r = mmu_alloc_direct_roots(vcpu);
 	else
 		r = mmu_alloc_shadow_roots(vcpu);
@@ -5526,7 +5526,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa,
 		 gpa, bytes, sp->role.word);
 
 	offset = offset_in_page(gpa);
-	pte_size = sp->role.has_4_byte_gpte ? 4 : 8;
+	pte_size = sp->role.arch.has_4_byte_gpte ? 4 : 8;
 
 	/*
 	 * Sometimes, the OS only writes the last one bytes to update status
@@ -5550,7 +5550,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 	page_offset = offset_in_page(gpa);
 	level = sp->role.level;
 	*nspte = 1;
-	if (sp->role.has_4_byte_gpte) {
+	if (sp->role.arch.has_4_byte_gpte) {
 		page_offset <<= 1;	/* 32->64 */
 		/*
 		 * A 32-bit pde maps 4MB while the shadow pdes map
@@ -5564,7 +5564,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 		}
 		quadrant = page_offset >> PAGE_SHIFT;
 		page_offset &= ~PAGE_MASK;
-		if (quadrant != sp->role.quadrant)
+		if (quadrant != sp->role.arch.quadrant)
 			return NULL;
 	}
 
@@ -5628,7 +5628,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 		       void *insn, int insn_len)
 {
 	int r, emulation_type = EMULTYPE_PF;
-	bool direct = vcpu->arch.mmu->root_role.direct;
+	bool direct = vcpu->arch.mmu->root_role.arch.direct;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
 		return RET_PF_RETRY;
@@ -5659,7 +5659,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	 * paging in both guests. If true, we simply unprotect the page
 	 * and resume the guest.
 	 */
-	if (vcpu->arch.mmu->root_role.direct &&
+	if (vcpu->arch.mmu->root_role.arch.direct &&
 	    (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) {
 		kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa));
 		return 1;
@@ -6321,7 +6321,7 @@ static void shadow_mmu_split_huge_page(struct kvm *kvm,
 
 		spte = make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
 		mmu_spte_set(sptep, spte);
-		__rmap_add(kvm, cache, slot, sptep, gfn, sp->role.access);
+		__rmap_add(kvm, cache, slot, sptep, gfn, sp->role.arch.access);
 	}
 
 	__link_shadow_page(kvm, cache, huge_sptep, sp, flush);
@@ -6380,7 +6380,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
 		sp = sptep_to_sp(huge_sptep);
 
 		/* TDP MMU is enabled, so rmap only contains nested MMU SPs. */
-		if (WARN_ON_ONCE(!sp->role.guest_mode))
+		if (WARN_ON_ONCE(!sp->role.arch.guest_mode))
 			continue;
 
 		/* The rmaps should never contain non-leaf SPTEs. */
@@ -6502,7 +6502,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 		 * the guest, and the guest page table is using 4K page size
 		 * mapping if the indirect sp has level = 1.
 		 */
-		if (sp->role.direct &&
+		if (sp->role.arch.direct &&
 		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
 							       PG_LEVEL_NUM)) {
 			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
@@ -6942,7 +6942,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 				      struct kvm_mmu_page,
 				      possible_nx_huge_page_link);
 		WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
-		WARN_ON_ONCE(!sp->role.direct);
+		WARN_ON_ONCE(!sp->role.arch.direct);
 
 		/*
 		 * Unaccount and do not attempt to recover any NX Huge Pages
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 5427f65117b4..c19a80fdeb8d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -143,7 +143,7 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 	 * being enabled is mandatory as the bits used to denote WP-only SPTEs
 	 * are reserved for PAE paging (32-bit KVM).
 	 */
-	return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
+	return kvm_x86_ops.cpu_dirty_log_size && sp->role.arch.guest_mode;
 }
 
 int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
@@ -270,7 +270,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	};
 	int r;
 
-	if (vcpu->arch.mmu->root_role.direct) {
+	if (vcpu->arch.mmu->root_role.arch.direct) {
 		fault.gfn = fault.addr >> PAGE_SHIFT;
 		fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
 	}
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index ae86820cef69..6a4a43b90780 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -35,13 +35,13 @@
 			 " %snxe %sad root %u %s%c",			\
 			 __entry->mmu_valid_gen,			\
 			 __entry->gfn, role.level,			\
-			 role.has_4_byte_gpte ? 4 : 8,			\
-			 role.quadrant,					\
-			 role.direct ? " direct" : "",			\
-			 access_str[role.access],			\
+			 role.arch.has_4_byte_gpte ? 4 : 8,			\
+			 role.arch.quadrant,					\
+			 role.arch.direct ? " direct" : "",			\
+			 access_str[role.arch.access],			\
 			 role.invalid ? " invalid" : "",		\
-			 role.efer_nx ? "" : "!",			\
-			 role.ad_disabled ? "!" : "",			\
+			 role.arch.efer_nx ? "" : "!",			\
+			 role.arch.ad_disabled ? "!" : "",			\
 			 __entry->root_count,				\
 			 __entry->unsync ? "unsync" : "sync", 0);	\
 	saved_ptr;							\
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e5662dbd519c..e15ec1c473da 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -55,7 +55,7 @@
 	#define PT_LEVEL_BITS 9
 	#define PT_GUEST_DIRTY_SHIFT 9
 	#define PT_GUEST_ACCESSED_SHIFT 8
-	#define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.ad_disabled)
+	#define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.arch.ad_disabled)
 	#define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL
 #else
 	#error Invalid PTTYPE value
@@ -532,7 +532,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
 
 	gfn = gpte_to_gfn(gpte);
-	pte_access = sp->role.access & FNAME(gpte_access)(gpte);
+	pte_access = sp->role.arch.access & FNAME(gpte_access)(gpte);
 	FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
 
 	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn,
@@ -592,7 +592,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
 	if (unlikely(vcpu->kvm->mmu_invalidate_in_progress))
 		return;
 
-	if (sp->role.direct)
+	if (sp->role.arch.direct)
 		return __direct_pte_prefetch(vcpu, sp, sptep);
 
 	i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
@@ -884,7 +884,7 @@ static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
 	WARN_ON(sp->role.level != PG_LEVEL_4K);
 
 	if (PTTYPE == 32)
-		offset = sp->role.quadrant << SPTE_LEVEL_BITS;
+		offset = sp->role.arch.quadrant << SPTE_LEVEL_BITS;
 
 	return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
 }
@@ -1003,9 +1003,11 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 	 */
 	const union kvm_mmu_page_role sync_role_ign = {
 		.level = 0xf,
-		.access = 0x7,
-		.quadrant = 0x3,
-		.passthrough = 0x1,
+		.arch = {
+			.access = 0x7,
+			.quadrant = 0x3,
+			.passthrough = 0x1,
+		},
 	};
 
 	/*
@@ -1014,7 +1016,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 	 * differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
 	 * reserved bits checks will be wrong, etc...
 	 */
-	if (WARN_ON_ONCE(sp->role.direct ||
+	if (WARN_ON_ONCE(sp->role.arch.direct ||
 			 (sp->role.word ^ root_role.word) & ~sync_role_ign.word))
 		return -1;
 
@@ -1043,7 +1045,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 		}
 
 		gfn = gpte_to_gfn(gpte);
-		pte_access = sp->role.access;
+		pte_access = sp->role.arch.access;
 		pte_access &= FNAME(gpte_access)(gpte);
 		FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
 
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index c0fd7e049b4e..fe4b626cb431 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -146,7 +146,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 
 	WARN_ON_ONCE(!pte_access && !shadow_present_mask);
 
-	if (sp->role.ad_disabled)
+	if (sp->role.arch.ad_disabled)
 		spte |= SPTE_TDP_AD_DISABLED_MASK;
 	else if (kvm_mmu_page_ad_need_write_protect(sp))
 		spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
@@ -301,7 +301,7 @@ u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte, union kvm_mmu_page
 		 * the page executable as the NX hugepage mitigation no longer
 		 * applies.
 		 */
-		if ((role.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
+		if ((role.arch.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
 			child_spte = make_spte_executable(child_spte);
 	}
 
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 1f03701b943a..ad84c549fe96 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -260,7 +260,7 @@ static inline bool kvm_ad_enabled(void)
 
 static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
 {
-	return sp->role.ad_disabled;
+	return sp->role.arch.ad_disabled;
 }
 
 static inline bool spte_ad_enabled(u64 spte)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b2da8c8f30a..2bfe060768fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8442,7 +8442,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	    WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF)))
 		return false;
 
-	if (!vcpu->arch.mmu->root_role.direct) {
+	if (!vcpu->arch.mmu->root_role.arch.direct) {
 		/*
 		 * Write permission should be allowed since only
 		 * write access need to be emulated.
@@ -8475,7 +8475,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	kvm_release_pfn_clean(pfn);
 
 	/* The instructions are well-emulated on direct mmu. */
-	if (vcpu->arch.mmu->root_role.direct) {
+	if (vcpu->arch.mmu->root_role.arch.direct) {
 		unsigned int indirect_shadow_pages;
 
 		write_lock(&vcpu->kvm->mmu_lock);
@@ -8543,7 +8543,7 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
 	vcpu->arch.last_retry_eip = ctxt->eip;
 	vcpu->arch.last_retry_addr = cr2_or_gpa;
 
-	if (!vcpu->arch.mmu->root_role.direct)
+	if (!vcpu->arch.mmu->root_role.arch.direct)
 		gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);
 
 	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
@@ -8846,7 +8846,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		ctxt->exception.address = cr2_or_gpa;
 
 		/* With shadow page tables, cr2 contains a GVA or nGPA. */
-		if (vcpu->arch.mmu->root_role.direct) {
+		if (vcpu->arch.mmu->root_role.arch.direct) {
 			ctxt->gpa_available = true;
 			ctxt->gpa_val = cr2_or_gpa;
 		}
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
new file mode 100644
index 000000000000..3f35a924e031
--- /dev/null
+++ b/include/kvm/mmu_types.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_MMU_TYPES_H
+#define __KVM_MMU_TYPES_H
+
+#include <linux/bug.h>
+#include <linux/types.h>
+#include <linux/stddef.h>
+
+#include <asm/kvm/mmu_types.h>
+
+/*
+ * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
+ * also includes TDP pages) to determine whether or not a page can be used in
+ * the given MMU context.
+ */
+union kvm_mmu_page_role {
+	u32 word;
+	struct {
+		struct {
+			/* The address space ID mapped by the page. */
+			u16 as_id:8;
+
+			/* The level of the page in the page table hierarchy. */
+			u16 level:4;
+
+			/* Whether the page is invalid, i.e. pending destruction. */
+			u16 invalid:1;
+		};
+
+		/* Architecture-specific properties. */
+		struct kvm_mmu_page_role_arch arch;
+	};
+};
+
+static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page_role, word));
+
+#endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move struct kvm_mmu_page_role into common code, and move all
x86-specific fields into a separate sub-struct within the role,
kvm_mmu_page_role_arch.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 MAINTAINERS                          |   4 +-
 arch/x86/include/asm/kvm/mmu_types.h |  56 ++++++++++
 arch/x86/include/asm/kvm_host.h      |  68 +-----------
 arch/x86/kvm/mmu/mmu.c               | 156 +++++++++++++--------------
 arch/x86/kvm/mmu/mmu_internal.h      |   4 +-
 arch/x86/kvm/mmu/mmutrace.h          |  12 +--
 arch/x86/kvm/mmu/paging_tmpl.h       |  20 ++--
 arch/x86/kvm/mmu/spte.c              |   4 +-
 arch/x86/kvm/mmu/spte.h              |   2 +-
 arch/x86/kvm/x86.c                   |   8 +-
 include/kvm/mmu_types.h              |  37 +++++++
 11 files changed, 202 insertions(+), 169 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm/mmu_types.h
 create mode 100644 include/kvm/mmu_types.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 89672a59c0c3..7e586d7ba78c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11198,7 +11198,8 @@ W:	http://www.linux-kvm.org
 T:	git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
 F:	Documentation/virt/kvm/
 F:	include/asm-generic/kvm*
-F:	include/kvm/iodev.h
+F:	include/kvm/
+X:	include/kvm/arm_*
 F:	include/linux/kvm*
 F:	include/trace/events/kvm.h
 F:	include/uapi/asm-generic/kvm*
@@ -11285,6 +11286,7 @@ L:	kvm@vger.kernel.org
 S:	Supported
 T:	git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
 F:	arch/x86/include/asm/kvm*
+F:	arch/x86/include/asm/kvm/
 F:	arch/x86/include/asm/svm.h
 F:	arch/x86/include/asm/vmx*.h
 F:	arch/x86/include/uapi/asm/kvm*
diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
new file mode 100644
index 000000000000..35f893ebab5a
--- /dev/null
+++ b/arch/x86/include/asm/kvm/mmu_types.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KVM_MMU_TYPES_H
+#define __ASM_KVM_MMU_TYPES_H
+
+#include <linux/types.h>
+
+/*
+ * This is a subset of the overall kvm_cpu_role to minimize the size of
+ * kvm_memory_slot.arch.gfn_track, i.e. allows allocating 2 bytes per gfn
+ * instead of 4 bytes per gfn.
+ *
+ * Upper-level shadow pages having gptes are tracked for write-protection via
+ * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
+ * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
+ * gfn_track will overflow and explosions will ensure.
+ *
+ * A unique shadow page (SP) for a gfn is created if and only if an existing SP
+ * cannot be reused.  The ability to reuse a SP is tracked by its role, which
+ * incorporates various mode bits and properties of the SP.  Roughly speaking,
+ * the number of unique SPs that can theoretically be created is 2^n, where n
+ * is the number of bits that are used to compute the role.
+ *
+ * Note, not all combinations of modes and flags below are possible:
+ *
+ *   - invalid shadow pages are not accounted, so the bits are effectively 18
+ *
+ *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
+ *     execonly and ad_disabled are only used for nested EPT which has
+ *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
+ *
+ *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
+ *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
+ *     paging has exactly one upper level, making level completely redundant
+ *     when has_4_byte_gpte=1.
+ *
+ *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
+ *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
+ *
+ * Therefore, the maximum number of possible upper-level shadow pages for a
+ * single gfn is a bit less than 2^13.
+ */
+struct kvm_mmu_page_role_arch {
+	u16 has_4_byte_gpte:1;
+	u16 quadrant:2;
+	u16 direct:1;
+	u16 access:3;
+	u16 efer_nx:1;
+	u16 cr0_wp:1;
+	u16 smep_andnot_wp:1;
+	u16 smap_andnot_wp:1;
+	u16 ad_disabled:1;
+	u16 guest_mode:1;
+	u16 passthrough:1;
+};
+
+#endif /* !__ASM_KVM_MMU_TYPES_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0a819d40131a..ebcd7a0dabef 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -37,6 +37,8 @@
 #include <asm/kvm_vcpu_regs.h>
 #include <asm/hyperv-tlfs.h>
 
+#include <kvm/mmu_types.h>
+
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
 #define KVM_MAX_VCPUS 1024
@@ -286,72 +288,6 @@ enum x86_intercept_stage;
 
 struct kvm_kernel_irq_routing_entry;
 
-/*
- * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
- * also includes TDP pages) to determine whether or not a page can be used in
- * the given MMU context.  This is a subset of the overall kvm_cpu_role to
- * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
- * 2 bytes per gfn instead of 4 bytes per gfn.
- *
- * Upper-level shadow pages having gptes are tracked for write-protection via
- * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
- * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
- * gfn_track will overflow and explosions will ensure.
- *
- * A unique shadow page (SP) for a gfn is created if and only if an existing SP
- * cannot be reused.  The ability to reuse a SP is tracked by its role, which
- * incorporates various mode bits and properties of the SP.  Roughly speaking,
- * the number of unique SPs that can theoretically be created is 2^n, where n
- * is the number of bits that are used to compute the role.
- *
- * But, even though there are 19 bits in the mask below, not all combinations
- * of modes and flags are possible:
- *
- *   - invalid shadow pages are not accounted, so the bits are effectively 18
- *
- *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
- *     execonly and ad_disabled are only used for nested EPT which has
- *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
- *
- *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
- *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
- *     paging has exactly one upper level, making level completely redundant
- *     when has_4_byte_gpte=1.
- *
- *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
- *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
- *
- * Therefore, the maximum number of possible upper-level shadow pages for a
- * single gfn is a bit less than 2^13.
- */
-union kvm_mmu_page_role {
-	u32 word;
-	struct {
-		unsigned level:4;
-		unsigned has_4_byte_gpte:1;
-		unsigned quadrant:2;
-		unsigned direct:1;
-		unsigned access:3;
-		unsigned invalid:1;
-		unsigned efer_nx:1;
-		unsigned cr0_wp:1;
-		unsigned smep_andnot_wp:1;
-		unsigned smap_andnot_wp:1;
-		unsigned ad_disabled:1;
-		unsigned guest_mode:1;
-		unsigned passthrough:1;
-		unsigned :5;
-
-		/*
-		 * This is left at the top of the word so that
-		 * kvm_memslots_for_spte_role can extract it with a
-		 * simple shift.  While there is room, give it a whole
-		 * byte so it is also faster to load it from memory.
-		 */
-		unsigned as_id:8;
-	};
-};
-
 /*
  * kvm_mmu_extended_role complements kvm_mmu_page_role, tracking properties
  * relevant to the current MMU configuration.   When loading CR0, CR4, or EFER,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f375b719f565..355548603960 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -210,13 +210,13 @@ static inline bool __maybe_unused is_##reg##_##name(struct kvm_mmu *mmu)	\
 {								\
 	return !!(mmu->cpu_role. base_or_ext . reg##_##name);	\
 }
-BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
+BUILD_MMU_ROLE_ACCESSOR(base.arch, cr0, wp);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pse);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smep);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smap);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
 BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
-BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
+BUILD_MMU_ROLE_ACCESSOR(base.arch, efer, nx);
 BUILD_MMU_ROLE_ACCESSOR(ext,  efer, lma);
 
 static inline bool is_cr0_pg(struct kvm_mmu *mmu)
@@ -226,7 +226,7 @@ static inline bool is_cr0_pg(struct kvm_mmu *mmu)
 
 static inline bool is_cr4_pae(struct kvm_mmu *mmu)
 {
-        return !mmu->cpu_role.base.has_4_byte_gpte;
+	return !mmu->cpu_role.base.arch.has_4_byte_gpte;
 }
 
 static struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
@@ -618,7 +618,7 @@ static bool mmu_spte_age(u64 *sptep)
 
 static inline bool is_tdp_mmu_active(struct kvm_vcpu *vcpu)
 {
-	return tdp_mmu_enabled && vcpu->arch.mmu->root_role.direct;
+	return tdp_mmu_enabled && vcpu->arch.mmu->root_role.arch.direct;
 }
 
 static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
@@ -695,10 +695,10 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp);
 
 static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 {
-	if (sp->role.passthrough)
+	if (sp->role.arch.passthrough)
 		return sp->gfn;
 
-	if (!sp->role.direct)
+	if (!sp->role.arch.direct)
 		return sp->shadowed_translation[index] >> PAGE_SHIFT;
 
 	return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
@@ -727,7 +727,7 @@ static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
 	 *
 	 * In both cases, sp->role.access contains the correct access bits.
 	 */
-	return sp->role.access;
+	return sp->role.arch.access;
 }
 
 static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
@@ -739,14 +739,14 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
 	}
 
 	WARN_ONCE(access != kvm_mmu_page_get_access(sp, index),
-	          "access mismatch under %s page %llx (expected %u, got %u)\n",
-	          sp->role.passthrough ? "passthrough" : "direct",
-	          sp->gfn, kvm_mmu_page_get_access(sp, index), access);
+		  "access mismatch under %s page %llx (expected %u, got %u)\n",
+		  sp->role.arch.passthrough ? "passthrough" : "direct",
+		  sp->gfn, kvm_mmu_page_get_access(sp, index), access);
 
 	WARN_ONCE(gfn != kvm_mmu_page_get_gfn(sp, index),
-	          "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
-	          sp->role.passthrough ? "passthrough" : "direct",
-	          sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
+		  "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
+		  sp->role.arch.passthrough ? "passthrough" : "direct",
+		  sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
 }
 
 static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
@@ -1723,7 +1723,7 @@ static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
 	hlist_del(&sp->hash_link);
 	list_del(&sp->link);
 	free_page((unsigned long)sp->spt);
-	if (!sp->role.direct)
+	if (!sp->role.arch.direct)
 		free_page((unsigned long)sp->shadowed_translation);
 	kmem_cache_free(mmu_page_header_cache, sp);
 }
@@ -1884,10 +1884,10 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 
 static bool sp_has_gptes(struct kvm_mmu_page *sp)
 {
-	if (sp->role.direct)
+	if (sp->role.arch.direct)
 		return false;
 
-	if (sp->role.passthrough)
+	if (sp->role.arch.passthrough)
 		return false;
 
 	return true;
@@ -2065,7 +2065,7 @@ static void clear_sp_write_flooding_count(u64 *spte)
  * The vCPU is required when finding indirect shadow pages; the shadow
  * page may already exist and syncing it needs the vCPU pointer in
  * order to read guest page tables.  Direct shadow pages are never
- * unsync, thus @vcpu can be NULL if @role.direct is true.
+ * unsync, thus @vcpu can be NULL if @role.arch.direct is true.
  */
 static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 						     struct kvm_vcpu *vcpu,
@@ -2101,7 +2101,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 		}
 
 		/* unsync and write-flooding only apply to indirect SPs. */
-		if (sp->role.direct)
+		if (sp->role.arch.direct)
 			goto out;
 
 		if (sp->unsync) {
@@ -2162,7 +2162,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 
 	sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
 	sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
-	if (!role.direct)
+	if (!role.arch.direct)
 		sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
 
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
@@ -2187,7 +2187,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 	return sp;
 }
 
-/* Note, @vcpu may be NULL if @role.direct is true; see kvm_mmu_find_shadow_page. */
+/* Note, @vcpu may be NULL if @role.arch.direct is true; see kvm_mmu_find_shadow_page. */
 static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
 						      struct kvm_vcpu *vcpu,
 						      struct shadow_page_caches *caches,
@@ -2231,9 +2231,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
 
 	role = parent_sp->role;
 	role.level--;
-	role.access = access;
-	role.direct = direct;
-	role.passthrough = 0;
+	role.arch.access = access;
+	role.arch.direct = direct;
+	role.arch.passthrough = 0;
 
 	/*
 	 * If the guest has 4-byte PTEs then that means it's using 32-bit,
@@ -2261,9 +2261,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
 	 * covers bit 21 (see above), thus the quadrant is calculated from the
 	 * _least_ significant bit of the PDE index.
 	 */
-	if (role.has_4_byte_gpte) {
+	if (role.arch.has_4_byte_gpte) {
 		WARN_ON_ONCE(role.level != PG_LEVEL_4K);
-		role.quadrant = spte_index(sptep) & 1;
+		role.arch.quadrant = spte_index(sptep) & 1;
 	}
 
 	return role;
@@ -2292,7 +2292,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
 
 	if (iterator->level >= PT64_ROOT_4LEVEL &&
 	    vcpu->arch.mmu->cpu_role.base.level < PT64_ROOT_4LEVEL &&
-	    !vcpu->arch.mmu->root_role.direct)
+	    !vcpu->arch.mmu->root_role.arch.direct)
 		iterator->level = PT32E_ROOT_LEVEL;
 
 	if (iterator->level == PT32E_ROOT_LEVEL) {
@@ -2391,7 +2391,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 * a new sp with the correct access.
 		 */
 		child = spte_to_child_sp(*sptep);
-		if (child->role.access == direct_access)
+		if (child->role.arch.access == direct_access)
 			return;
 
 		drop_parent_pte(child, sptep);
@@ -2420,7 +2420,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 			 * avoids retaining a large number of stale nested SPs.
 			 */
 			if (tdp_enabled && invalid_list &&
-			    child->role.guest_mode && !child->parent_ptes.val)
+			    child->role.arch.guest_mode && !child->parent_ptes.val)
 				return kvm_mmu_prepare_zap_page(kvm, child,
 								invalid_list);
 		}
@@ -2689,7 +2689,7 @@ static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
 	gpa_t gpa;
 	int r;
 
-	if (vcpu->arch.mmu->root_role.direct)
+	if (vcpu->arch.mmu->root_role.arch.direct)
 		return 0;
 
 	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
@@ -2900,7 +2900,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 {
 	struct page *pages[PTE_PREFETCH_NUM];
 	struct kvm_memory_slot *slot;
-	unsigned int access = sp->role.access;
+	unsigned int access = sp->role.arch.access;
 	int i, ret;
 	gfn_t gfn;
 
@@ -2928,7 +2928,7 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
 	u64 *spte, *start = NULL;
 	int i;
 
-	WARN_ON(!sp->role.direct);
+	WARN_ON(!sp->role.arch.direct);
 
 	i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
 	spte = sp->spt + i;
@@ -3549,7 +3549,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
 	 * This should not be called while L2 is active, L2 can't invalidate
 	 * _only_ its own roots, e.g. INVVPID unconditionally exits.
 	 */
-	WARN_ON_ONCE(mmu->root_role.guest_mode);
+	WARN_ON_ONCE(mmu->root_role.arch.guest_mode);
 
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
 		root_hpa = mmu->prev_roots[i].hpa;
@@ -3557,7 +3557,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
 			continue;
 
 		if (!to_shadow_page(root_hpa) ||
-			to_shadow_page(root_hpa)->role.guest_mode)
+			to_shadow_page(root_hpa)->role.arch.guest_mode)
 			roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
 	}
 
@@ -3585,10 +3585,10 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
 	struct kvm_mmu_page *sp;
 
 	role.level = level;
-	role.quadrant = quadrant;
+	role.arch.quadrant = quadrant;
 
-	WARN_ON_ONCE(quadrant && !role.has_4_byte_gpte);
-	WARN_ON_ONCE(role.direct && role.has_4_byte_gpte);
+	WARN_ON_ONCE(quadrant && !role.arch.has_4_byte_gpte);
+	WARN_ON_ONCE(role.arch.direct && role.arch.has_4_byte_gpte);
 
 	sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
 	++sp->root_count;
@@ -3834,7 +3834,7 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
 	 * equivalent level in the guest's NPT to shadow.  Allocate the tables
 	 * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare.
 	 */
-	if (mmu->root_role.direct ||
+	if (mmu->root_role.arch.direct ||
 	    mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL ||
 	    mmu->root_role.level < PT64_ROOT_4LEVEL)
 		return 0;
@@ -3932,7 +3932,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 	int i;
 	struct kvm_mmu_page *sp;
 
-	if (vcpu->arch.mmu->root_role.direct)
+	if (vcpu->arch.mmu->root_role.arch.direct)
 		return;
 
 	if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
@@ -4161,7 +4161,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 
 	arch.token = alloc_apf_token(vcpu);
 	arch.gfn = gfn;
-	arch.direct_map = vcpu->arch.mmu->root_role.direct;
+	arch.direct_map = vcpu->arch.mmu->root_role.arch.direct;
 	arch.cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu);
 
 	return kvm_setup_async_pf(vcpu, cr2_or_gpa,
@@ -4172,7 +4172,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 {
 	int r;
 
-	if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||
+	if ((vcpu->arch.mmu->root_role.arch.direct != work->arch.direct_map) ||
 	      work->wakeup_all)
 		return;
 
@@ -4180,7 +4180,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 	if (unlikely(r))
 		return;
 
-	if (!vcpu->arch.mmu->root_role.direct &&
+	if (!vcpu->arch.mmu->root_role.arch.direct &&
 	      work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu))
 		return;
 
@@ -4456,7 +4456,7 @@ static void nonpaging_init_context(struct kvm_mmu *context)
 static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
 				  union kvm_mmu_page_role role)
 {
-	return (role.direct || pgd == root->pgd) &&
+	return (role.arch.direct || pgd == root->pgd) &&
 	       VALID_PAGE(root->hpa) &&
 	       role.word == to_shadow_page(root->hpa)->role.word;
 }
@@ -4576,7 +4576,7 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 	 * If this is a direct root page, it doesn't have a write flooding
 	 * count. Otherwise, clear the write flooding count.
 	 */
-	if (!new_role.direct)
+	if (!new_role.arch.direct)
 		__clear_sp_write_flooding_count(
 				to_shadow_page(vcpu->arch.mmu->root.hpa));
 }
@@ -4803,7 +4803,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
 	shadow_zero_check = &context->shadow_zero_check;
 	__reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
 				context->root_role.level,
-				context->root_role.efer_nx,
+				context->root_role.arch.efer_nx,
 				guest_can_use_gbpages(vcpu), is_pse, is_amd);
 
 	if (!shadow_me_mask)
@@ -5055,21 +5055,21 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
 {
 	union kvm_cpu_role role = {0};
 
-	role.base.access = ACC_ALL;
 	role.base.as_id = is_smm(vcpu);
-	role.base.guest_mode = is_guest_mode(vcpu);
+	role.base.arch.access = ACC_ALL;
+	role.base.arch.guest_mode = is_guest_mode(vcpu);
 	role.ext.valid = 1;
 
 	if (!____is_cr0_pg(regs)) {
-		role.base.direct = 1;
+		role.base.arch.direct = 1;
 		return role;
 	}
 
-	role.base.efer_nx = ____is_efer_nx(regs);
-	role.base.cr0_wp = ____is_cr0_wp(regs);
-	role.base.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
-	role.base.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
-	role.base.has_4_byte_gpte = !____is_cr4_pae(regs);
+	role.base.arch.efer_nx = ____is_efer_nx(regs);
+	role.base.arch.cr0_wp = ____is_cr0_wp(regs);
+	role.base.arch.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
+	role.base.arch.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
+	role.base.arch.has_4_byte_gpte = !____is_cr4_pae(regs);
 
 	if (____is_efer_lma(regs))
 		role.base.level = ____is_cr4_la57(regs) ? PT64_ROOT_5LEVEL
@@ -5109,15 +5109,15 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
 {
 	union kvm_mmu_page_role role = {0};
 
-	role.access = ACC_ALL;
-	role.cr0_wp = true;
-	role.efer_nx = true;
 	role.as_id = cpu_role.base.as_id;
-	role.guest_mode = cpu_role.base.guest_mode;
-	role.ad_disabled = !kvm_ad_enabled();
 	role.level = kvm_mmu_get_tdp_level(vcpu);
-	role.direct = true;
-	role.has_4_byte_gpte = false;
+	role.arch.access = ACC_ALL;
+	role.arch.cr0_wp = true;
+	role.arch.efer_nx = true;
+	role.arch.guest_mode = cpu_role.base.arch.guest_mode;
+	role.arch.ad_disabled = !kvm_ad_enabled();
+	role.arch.direct = true;
+	role.arch.has_4_byte_gpte = false;
 
 	return role;
 }
@@ -5194,7 +5194,7 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
 	 * NX can be used by any non-nested shadow MMU to avoid having to reset
 	 * MMU contexts.
 	 */
-	root_role.efer_nx = true;
+	root_role.arch.efer_nx = true;
 
 	shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
 }
@@ -5212,13 +5212,13 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 	union kvm_mmu_page_role root_role;
 
 	/* NPT requires CR0.PG=1. */
-	WARN_ON_ONCE(cpu_role.base.direct);
+	WARN_ON_ONCE(cpu_role.base.arch.direct);
 
 	root_role = cpu_role.base;
 	root_role.level = kvm_mmu_get_tdp_level(vcpu);
 	if (root_role.level == PT64_ROOT_5LEVEL &&
 	    cpu_role.base.level == PT64_ROOT_4LEVEL)
-		root_role.passthrough = 1;
+		root_role.arch.passthrough = 1;
 
 	shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
 	kvm_mmu_new_pgd(vcpu, nested_cr3);
@@ -5237,11 +5237,11 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
 	 */
 	WARN_ON_ONCE(is_smm(vcpu));
 	role.base.level = level;
-	role.base.has_4_byte_gpte = false;
-	role.base.direct = false;
-	role.base.ad_disabled = !accessed_dirty;
-	role.base.guest_mode = true;
-	role.base.access = ACC_ALL;
+	role.base.arch.has_4_byte_gpte = false;
+	role.base.arch.direct = false;
+	role.base.arch.ad_disabled = !accessed_dirty;
+	role.base.arch.guest_mode = true;
+	role.base.arch.access = ACC_ALL;
 
 	role.ext.word = 0;
 	role.ext.execonly = execonly;
@@ -5385,13 +5385,13 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 {
 	int r;
 
-	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.direct);
+	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.arch.direct);
 	if (r)
 		goto out;
 	r = mmu_alloc_special_roots(vcpu);
 	if (r)
 		goto out;
-	if (vcpu->arch.mmu->root_role.direct)
+	if (vcpu->arch.mmu->root_role.arch.direct)
 		r = mmu_alloc_direct_roots(vcpu);
 	else
 		r = mmu_alloc_shadow_roots(vcpu);
@@ -5526,7 +5526,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa,
 		 gpa, bytes, sp->role.word);
 
 	offset = offset_in_page(gpa);
-	pte_size = sp->role.has_4_byte_gpte ? 4 : 8;
+	pte_size = sp->role.arch.has_4_byte_gpte ? 4 : 8;
 
 	/*
 	 * Sometimes, the OS only writes the last one bytes to update status
@@ -5550,7 +5550,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 	page_offset = offset_in_page(gpa);
 	level = sp->role.level;
 	*nspte = 1;
-	if (sp->role.has_4_byte_gpte) {
+	if (sp->role.arch.has_4_byte_gpte) {
 		page_offset <<= 1;	/* 32->64 */
 		/*
 		 * A 32-bit pde maps 4MB while the shadow pdes map
@@ -5564,7 +5564,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 		}
 		quadrant = page_offset >> PAGE_SHIFT;
 		page_offset &= ~PAGE_MASK;
-		if (quadrant != sp->role.quadrant)
+		if (quadrant != sp->role.arch.quadrant)
 			return NULL;
 	}
 
@@ -5628,7 +5628,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 		       void *insn, int insn_len)
 {
 	int r, emulation_type = EMULTYPE_PF;
-	bool direct = vcpu->arch.mmu->root_role.direct;
+	bool direct = vcpu->arch.mmu->root_role.arch.direct;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
 		return RET_PF_RETRY;
@@ -5659,7 +5659,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	 * paging in both guests. If true, we simply unprotect the page
 	 * and resume the guest.
 	 */
-	if (vcpu->arch.mmu->root_role.direct &&
+	if (vcpu->arch.mmu->root_role.arch.direct &&
 	    (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) {
 		kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa));
 		return 1;
@@ -6321,7 +6321,7 @@ static void shadow_mmu_split_huge_page(struct kvm *kvm,
 
 		spte = make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
 		mmu_spte_set(sptep, spte);
-		__rmap_add(kvm, cache, slot, sptep, gfn, sp->role.access);
+		__rmap_add(kvm, cache, slot, sptep, gfn, sp->role.arch.access);
 	}
 
 	__link_shadow_page(kvm, cache, huge_sptep, sp, flush);
@@ -6380,7 +6380,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
 		sp = sptep_to_sp(huge_sptep);
 
 		/* TDP MMU is enabled, so rmap only contains nested MMU SPs. */
-		if (WARN_ON_ONCE(!sp->role.guest_mode))
+		if (WARN_ON_ONCE(!sp->role.arch.guest_mode))
 			continue;
 
 		/* The rmaps should never contain non-leaf SPTEs. */
@@ -6502,7 +6502,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 		 * the guest, and the guest page table is using 4K page size
 		 * mapping if the indirect sp has level = 1.
 		 */
-		if (sp->role.direct &&
+		if (sp->role.arch.direct &&
 		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
 							       PG_LEVEL_NUM)) {
 			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
@@ -6942,7 +6942,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 				      struct kvm_mmu_page,
 				      possible_nx_huge_page_link);
 		WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
-		WARN_ON_ONCE(!sp->role.direct);
+		WARN_ON_ONCE(!sp->role.arch.direct);
 
 		/*
 		 * Unaccount and do not attempt to recover any NX Huge Pages
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 5427f65117b4..c19a80fdeb8d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -143,7 +143,7 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 	 * being enabled is mandatory as the bits used to denote WP-only SPTEs
 	 * are reserved for PAE paging (32-bit KVM).
 	 */
-	return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
+	return kvm_x86_ops.cpu_dirty_log_size && sp->role.arch.guest_mode;
 }
 
 int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
@@ -270,7 +270,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	};
 	int r;
 
-	if (vcpu->arch.mmu->root_role.direct) {
+	if (vcpu->arch.mmu->root_role.arch.direct) {
 		fault.gfn = fault.addr >> PAGE_SHIFT;
 		fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
 	}
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index ae86820cef69..6a4a43b90780 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -35,13 +35,13 @@
 			 " %snxe %sad root %u %s%c",			\
 			 __entry->mmu_valid_gen,			\
 			 __entry->gfn, role.level,			\
-			 role.has_4_byte_gpte ? 4 : 8,			\
-			 role.quadrant,					\
-			 role.direct ? " direct" : "",			\
-			 access_str[role.access],			\
+			 role.arch.has_4_byte_gpte ? 4 : 8,			\
+			 role.arch.quadrant,					\
+			 role.arch.direct ? " direct" : "",			\
+			 access_str[role.arch.access],			\
 			 role.invalid ? " invalid" : "",		\
-			 role.efer_nx ? "" : "!",			\
-			 role.ad_disabled ? "!" : "",			\
+			 role.arch.efer_nx ? "" : "!",			\
+			 role.arch.ad_disabled ? "!" : "",			\
 			 __entry->root_count,				\
 			 __entry->unsync ? "unsync" : "sync", 0);	\
 	saved_ptr;							\
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e5662dbd519c..e15ec1c473da 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -55,7 +55,7 @@
 	#define PT_LEVEL_BITS 9
 	#define PT_GUEST_DIRTY_SHIFT 9
 	#define PT_GUEST_ACCESSED_SHIFT 8
-	#define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.ad_disabled)
+	#define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.arch.ad_disabled)
 	#define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL
 #else
 	#error Invalid PTTYPE value
@@ -532,7 +532,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
 
 	gfn = gpte_to_gfn(gpte);
-	pte_access = sp->role.access & FNAME(gpte_access)(gpte);
+	pte_access = sp->role.arch.access & FNAME(gpte_access)(gpte);
 	FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
 
 	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn,
@@ -592,7 +592,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
 	if (unlikely(vcpu->kvm->mmu_invalidate_in_progress))
 		return;
 
-	if (sp->role.direct)
+	if (sp->role.arch.direct)
 		return __direct_pte_prefetch(vcpu, sp, sptep);
 
 	i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
@@ -884,7 +884,7 @@ static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
 	WARN_ON(sp->role.level != PG_LEVEL_4K);
 
 	if (PTTYPE == 32)
-		offset = sp->role.quadrant << SPTE_LEVEL_BITS;
+		offset = sp->role.arch.quadrant << SPTE_LEVEL_BITS;
 
 	return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
 }
@@ -1003,9 +1003,11 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 	 */
 	const union kvm_mmu_page_role sync_role_ign = {
 		.level = 0xf,
-		.access = 0x7,
-		.quadrant = 0x3,
-		.passthrough = 0x1,
+		.arch = {
+			.access = 0x7,
+			.quadrant = 0x3,
+			.passthrough = 0x1,
+		},
 	};
 
 	/*
@@ -1014,7 +1016,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 	 * differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
 	 * reserved bits checks will be wrong, etc...
 	 */
-	if (WARN_ON_ONCE(sp->role.direct ||
+	if (WARN_ON_ONCE(sp->role.arch.direct ||
 			 (sp->role.word ^ root_role.word) & ~sync_role_ign.word))
 		return -1;
 
@@ -1043,7 +1045,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 		}
 
 		gfn = gpte_to_gfn(gpte);
-		pte_access = sp->role.access;
+		pte_access = sp->role.arch.access;
 		pte_access &= FNAME(gpte_access)(gpte);
 		FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
 
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index c0fd7e049b4e..fe4b626cb431 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -146,7 +146,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 
 	WARN_ON_ONCE(!pte_access && !shadow_present_mask);
 
-	if (sp->role.ad_disabled)
+	if (sp->role.arch.ad_disabled)
 		spte |= SPTE_TDP_AD_DISABLED_MASK;
 	else if (kvm_mmu_page_ad_need_write_protect(sp))
 		spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
@@ -301,7 +301,7 @@ u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte, union kvm_mmu_page
 		 * the page executable as the NX hugepage mitigation no longer
 		 * applies.
 		 */
-		if ((role.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
+		if ((role.arch.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
 			child_spte = make_spte_executable(child_spte);
 	}
 
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 1f03701b943a..ad84c549fe96 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -260,7 +260,7 @@ static inline bool kvm_ad_enabled(void)
 
 static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
 {
-	return sp->role.ad_disabled;
+	return sp->role.arch.ad_disabled;
 }
 
 static inline bool spte_ad_enabled(u64 spte)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b2da8c8f30a..2bfe060768fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8442,7 +8442,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	    WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF)))
 		return false;
 
-	if (!vcpu->arch.mmu->root_role.direct) {
+	if (!vcpu->arch.mmu->root_role.arch.direct) {
 		/*
 		 * Write permission should be allowed since only
 		 * write access need to be emulated.
@@ -8475,7 +8475,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	kvm_release_pfn_clean(pfn);
 
 	/* The instructions are well-emulated on direct mmu. */
-	if (vcpu->arch.mmu->root_role.direct) {
+	if (vcpu->arch.mmu->root_role.arch.direct) {
 		unsigned int indirect_shadow_pages;
 
 		write_lock(&vcpu->kvm->mmu_lock);
@@ -8543,7 +8543,7 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
 	vcpu->arch.last_retry_eip = ctxt->eip;
 	vcpu->arch.last_retry_addr = cr2_or_gpa;
 
-	if (!vcpu->arch.mmu->root_role.direct)
+	if (!vcpu->arch.mmu->root_role.arch.direct)
 		gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);
 
 	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
@@ -8846,7 +8846,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		ctxt->exception.address = cr2_or_gpa;
 
 		/* With shadow page tables, cr2 contains a GVA or nGPA. */
-		if (vcpu->arch.mmu->root_role.direct) {
+		if (vcpu->arch.mmu->root_role.arch.direct) {
 			ctxt->gpa_available = true;
 			ctxt->gpa_val = cr2_or_gpa;
 		}
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
new file mode 100644
index 000000000000..3f35a924e031
--- /dev/null
+++ b/include/kvm/mmu_types.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_MMU_TYPES_H
+#define __KVM_MMU_TYPES_H
+
+#include <linux/bug.h>
+#include <linux/types.h>
+#include <linux/stddef.h>
+
+#include <asm/kvm/mmu_types.h>
+
+/*
+ * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
+ * also includes TDP pages) to determine whether or not a page can be used in
+ * the given MMU context.
+ */
+union kvm_mmu_page_role {
+	u32 word;
+	struct {
+		struct {
+			/* The address space ID mapped by the page. */
+			u16 as_id:8;
+
+			/* The level of the page in the page table hierarchy. */
+			u16 level:4;
+
+			/* Whether the page is invalid, i.e. pending destruction. */
+			u16 invalid:1;
+		};
+
+		/* Architecture-specific properties. */
+		struct kvm_mmu_page_role_arch arch;
+	};
+};
+
+static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page_role, word));
+
+#endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 03/37] KVM: MMU: Move tdp_ptep_t into common code
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move the definition of tdp_ptep_t into kvm/mmu_types.h so it can be used
from common code in future commits.

No functional changed intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 2 --
 include/kvm/mmu_types.h         | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index c19a80fdeb8d..e32379c5b1ad 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -44,8 +44,6 @@ extern bool dbg;
 #define INVALID_PAE_ROOT	0
 #define IS_VALID_PAE_ROOT(x)	(!!(x))
 
-typedef u64 __rcu *tdp_ptep_t;
-
 struct kvm_mmu_page {
 	/*
 	 * Note, "link" through "spt" fit in a single 64 byte cache line on
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 3f35a924e031..14099956fdac 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -34,4 +34,6 @@ union kvm_mmu_page_role {
 
 static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page_role, word));
 
+typedef u64 __rcu *tdp_ptep_t;
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 03/37] KVM: MMU: Move tdp_ptep_t into common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move the definition of tdp_ptep_t into kvm/mmu_types.h so it can be used
from common code in future commits.

No functional changed intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 2 --
 include/kvm/mmu_types.h         | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index c19a80fdeb8d..e32379c5b1ad 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -44,8 +44,6 @@ extern bool dbg;
 #define INVALID_PAE_ROOT	0
 #define IS_VALID_PAE_ROOT(x)	(!!(x))
 
-typedef u64 __rcu *tdp_ptep_t;
-
 struct kvm_mmu_page {
 	/*
 	 * Note, "link" through "spt" fit in a single 64 byte cache line on
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 3f35a924e031..14099956fdac 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -34,4 +34,6 @@ union kvm_mmu_page_role {
 
 static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page_role, word));
 
+typedef u64 __rcu *tdp_ptep_t;
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 03/37] KVM: MMU: Move tdp_ptep_t into common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move the definition of tdp_ptep_t into kvm/mmu_types.h so it can be used
from common code in future commits.

No functional changed intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 2 --
 include/kvm/mmu_types.h         | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index c19a80fdeb8d..e32379c5b1ad 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -44,8 +44,6 @@ extern bool dbg;
 #define INVALID_PAE_ROOT	0
 #define IS_VALID_PAE_ROOT(x)	(!!(x))
 
-typedef u64 __rcu *tdp_ptep_t;
-
 struct kvm_mmu_page {
 	/*
 	 * Note, "link" through "spt" fit in a single 64 byte cache line on
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 3f35a924e031..14099956fdac 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -34,4 +34,6 @@ union kvm_mmu_page_role {
 
 static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page_role, word));
 
+typedef u64 __rcu *tdp_ptep_t;
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 03/37] KVM: MMU: Move tdp_ptep_t into common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move the definition of tdp_ptep_t into kvm/mmu_types.h so it can be used
from common code in future commits.

No functional changed intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 2 --
 include/kvm/mmu_types.h         | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index c19a80fdeb8d..e32379c5b1ad 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -44,8 +44,6 @@ extern bool dbg;
 #define INVALID_PAE_ROOT	0
 #define IS_VALID_PAE_ROOT(x)	(!!(x))
 
-typedef u64 __rcu *tdp_ptep_t;
-
 struct kvm_mmu_page {
 	/*
 	 * Note, "link" through "spt" fit in a single 64 byte cache line on
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 3f35a924e031..14099956fdac 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -34,4 +34,6 @@ union kvm_mmu_page_role {
 
 static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page_role, word));
 
+typedef u64 __rcu *tdp_ptep_t;
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 04/37] KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Invert the meaning of sp->tdp_mmu_page and rename it accordingly. This
allows the TDP MMU code to not care about this field, which will be used
in a subsequent commit to move the TDP MMU to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 1 +
 arch/x86/kvm/mmu/mmu_internal.h | 2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      | 3 ---
 arch/x86/kvm/mmu/tdp_mmu.h      | 5 ++++-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 355548603960..f7668a32721d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2180,6 +2180,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 
 	sp->gfn = gfn;
 	sp->role = role;
+	sp->shadow_mmu_page = true;
 	hlist_add_head(&sp->hash_link, sp_list);
 	if (sp_has_gptes(sp))
 		account_shadowed(kvm, sp);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index e32379c5b1ad..c1a379fba24d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -52,7 +52,7 @@ struct kvm_mmu_page {
 	struct list_head link;
 	struct hlist_node hash_link;
 
-	bool tdp_mmu_page;
+	bool shadow_mmu_page;
 	bool unsync;
 	u8 mmu_valid_gen;
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7ccac1aa8df6..fc0b87ceb1ea 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -133,8 +133,6 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
 		return;
 
-	WARN_ON(!is_tdp_mmu_page(root));
-
 	/*
 	 * The root now has refcount=0.  It is valid, but readers already
 	 * cannot acquire a reference to it because kvm_tdp_mmu_get_root()
@@ -279,7 +277,6 @@ static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 	sp->role = role;
 	sp->gfn = gfn;
 	sp->ptep = sptep;
-	sp->tdp_mmu_page = true;
 
 	trace_kvm_mmu_get_page(sp, true);
 }
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 0a63b1afabd3..18d3719f14ea 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -71,7 +71,10 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 					u64 *spte);
 
 #ifdef CONFIG_X86_64
-static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
+static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
+{
+	return !sp->shadow_mmu_page;
+}
 #else
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
 #endif
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 04/37] KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Invert the meaning of sp->tdp_mmu_page and rename it accordingly. This
allows the TDP MMU code to not care about this field, which will be used
in a subsequent commit to move the TDP MMU to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 1 +
 arch/x86/kvm/mmu/mmu_internal.h | 2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      | 3 ---
 arch/x86/kvm/mmu/tdp_mmu.h      | 5 ++++-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 355548603960..f7668a32721d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2180,6 +2180,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 
 	sp->gfn = gfn;
 	sp->role = role;
+	sp->shadow_mmu_page = true;
 	hlist_add_head(&sp->hash_link, sp_list);
 	if (sp_has_gptes(sp))
 		account_shadowed(kvm, sp);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index e32379c5b1ad..c1a379fba24d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -52,7 +52,7 @@ struct kvm_mmu_page {
 	struct list_head link;
 	struct hlist_node hash_link;
 
-	bool tdp_mmu_page;
+	bool shadow_mmu_page;
 	bool unsync;
 	u8 mmu_valid_gen;
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7ccac1aa8df6..fc0b87ceb1ea 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -133,8 +133,6 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
 		return;
 
-	WARN_ON(!is_tdp_mmu_page(root));
-
 	/*
 	 * The root now has refcount=0.  It is valid, but readers already
 	 * cannot acquire a reference to it because kvm_tdp_mmu_get_root()
@@ -279,7 +277,6 @@ static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 	sp->role = role;
 	sp->gfn = gfn;
 	sp->ptep = sptep;
-	sp->tdp_mmu_page = true;
 
 	trace_kvm_mmu_get_page(sp, true);
 }
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 0a63b1afabd3..18d3719f14ea 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -71,7 +71,10 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 					u64 *spte);
 
 #ifdef CONFIG_X86_64
-static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
+static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
+{
+	return !sp->shadow_mmu_page;
+}
 #else
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
 #endif
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 04/37] KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Invert the meaning of sp->tdp_mmu_page and rename it accordingly. This
allows the TDP MMU code to not care about this field, which will be used
in a subsequent commit to move the TDP MMU to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 1 +
 arch/x86/kvm/mmu/mmu_internal.h | 2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      | 3 ---
 arch/x86/kvm/mmu/tdp_mmu.h      | 5 ++++-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 355548603960..f7668a32721d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2180,6 +2180,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 
 	sp->gfn = gfn;
 	sp->role = role;
+	sp->shadow_mmu_page = true;
 	hlist_add_head(&sp->hash_link, sp_list);
 	if (sp_has_gptes(sp))
 		account_shadowed(kvm, sp);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index e32379c5b1ad..c1a379fba24d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -52,7 +52,7 @@ struct kvm_mmu_page {
 	struct list_head link;
 	struct hlist_node hash_link;
 
-	bool tdp_mmu_page;
+	bool shadow_mmu_page;
 	bool unsync;
 	u8 mmu_valid_gen;
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7ccac1aa8df6..fc0b87ceb1ea 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -133,8 +133,6 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
 		return;
 
-	WARN_ON(!is_tdp_mmu_page(root));
-
 	/*
 	 * The root now has refcount=0.  It is valid, but readers already
 	 * cannot acquire a reference to it because kvm_tdp_mmu_get_root()
@@ -279,7 +277,6 @@ static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 	sp->role = role;
 	sp->gfn = gfn;
 	sp->ptep = sptep;
-	sp->tdp_mmu_page = true;
 
 	trace_kvm_mmu_get_page(sp, true);
 }
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 0a63b1afabd3..18d3719f14ea 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -71,7 +71,10 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 					u64 *spte);
 
 #ifdef CONFIG_X86_64
-static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
+static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
+{
+	return !sp->shadow_mmu_page;
+}
 #else
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
 #endif
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 04/37] KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Invert the meaning of sp->tdp_mmu_page and rename it accordingly. This
allows the TDP MMU code to not care about this field, which will be used
in a subsequent commit to move the TDP MMU to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 1 +
 arch/x86/kvm/mmu/mmu_internal.h | 2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      | 3 ---
 arch/x86/kvm/mmu/tdp_mmu.h      | 5 ++++-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 355548603960..f7668a32721d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2180,6 +2180,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 
 	sp->gfn = gfn;
 	sp->role = role;
+	sp->shadow_mmu_page = true;
 	hlist_add_head(&sp->hash_link, sp_list);
 	if (sp_has_gptes(sp))
 		account_shadowed(kvm, sp);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index e32379c5b1ad..c1a379fba24d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -52,7 +52,7 @@ struct kvm_mmu_page {
 	struct list_head link;
 	struct hlist_node hash_link;
 
-	bool tdp_mmu_page;
+	bool shadow_mmu_page;
 	bool unsync;
 	u8 mmu_valid_gen;
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7ccac1aa8df6..fc0b87ceb1ea 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -133,8 +133,6 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
 		return;
 
-	WARN_ON(!is_tdp_mmu_page(root));
-
 	/*
 	 * The root now has refcount=0.  It is valid, but readers already
 	 * cannot acquire a reference to it because kvm_tdp_mmu_get_root()
@@ -279,7 +277,6 @@ static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 	sp->role = role;
 	sp->gfn = gfn;
 	sp->ptep = sptep;
-	sp->tdp_mmu_page = true;
 
 	trace_kvm_mmu_get_page(sp, true);
 }
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 0a63b1afabd3..18d3719f14ea 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -71,7 +71,10 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 					u64 *spte);
 
 #ifdef CONFIG_X86_64
-static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
+static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
+{
+	return !sp->shadow_mmu_page;
+}
 #else
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
 #endif
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 05/37] KVM: x86/mmu: Unify TDP MMU and Shadow MMU root refcounts
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Use the same field for refcounting roots in the TDP MMU and Shadow MMU.
The atomicity provided by refcount_t is overkill for the Shadow MMU,
since it holds the write-lock. But converging this field will enable a
future commit to more easily move struct kvm_mmu_page to common code.

Note, refcount_dec_and_test() returns true if the resulting refcount is
0. Hence the check in mmu_free_root_page() is inverted to check if
shadow root refcount is 0.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 14 +++++++-------
 arch/x86/kvm/mmu/mmu_internal.h |  6 ++----
 arch/x86/kvm/mmu/mmutrace.h     |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      |  8 ++++----
 arch/x86/kvm/mmu/tdp_mmu.h      |  2 +-
 5 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f7668a32721d..11cef930d5ed 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2498,14 +2498,14 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 
 	if (sp->unsync)
 		kvm_unlink_unsync_page(kvm, sp);
-	if (!sp->root_count) {
+	if (!refcount_read(&sp->root_refcount)) {
 		/* Count self */
 		(*nr_zapped)++;
 
 		/*
 		 * Already invalid pages (previously active roots) are not on
 		 * the active page list.  See list_del() in the "else" case of
-		 * !sp->root_count.
+		 * !sp->root_refcount.
 		 */
 		if (sp->role.invalid)
 			list_add(&sp->link, invalid_list);
@@ -2515,7 +2515,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 	} else {
 		/*
 		 * Remove the active root from the active page list, the root
-		 * will be explicitly freed when the root_count hits zero.
+		 * will be explicitly freed when the root_refcount hits zero.
 		 */
 		list_del(&sp->link);
 
@@ -2570,7 +2570,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 	kvm_flush_remote_tlbs(kvm);
 
 	list_for_each_entry_safe(sp, nsp, invalid_list, link) {
-		WARN_ON(!sp->role.invalid || sp->root_count);
+		WARN_ON(!sp->role.invalid || refcount_read(&sp->root_refcount));
 		kvm_mmu_free_shadow_page(sp);
 	}
 }
@@ -2593,7 +2593,7 @@ static unsigned long kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm,
 		 * Don't zap active root pages, the page itself can't be freed
 		 * and zapping it will just force vCPUs to realloc and reload.
 		 */
-		if (sp->root_count)
+		if (refcount_read(&sp->root_refcount))
 			continue;
 
 		unstable = __kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list,
@@ -3481,7 +3481,7 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
 
 	if (is_tdp_mmu_page(sp))
 		kvm_tdp_mmu_put_root(kvm, sp, false);
-	else if (!--sp->root_count && sp->role.invalid)
+	else if (refcount_dec_and_test(&sp->root_refcount) && sp->role.invalid)
 		kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
 
 	*root_hpa = INVALID_PAGE;
@@ -3592,7 +3592,7 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
 	WARN_ON_ONCE(role.arch.direct && role.arch.has_4_byte_gpte);
 
 	sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
-	++sp->root_count;
+	refcount_inc(&sp->root_refcount);
 
 	return __pa(sp->spt);
 }
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index c1a379fba24d..fd4990c8b0e9 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -87,10 +87,8 @@ struct kvm_mmu_page {
 	u64 *shadowed_translation;
 
 	/* Currently serving as active root */
-	union {
-		int root_count;
-		refcount_t tdp_mmu_root_count;
-	};
+	refcount_t root_refcount;
+
 	unsigned int unsync_children;
 	union {
 		struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 6a4a43b90780..ffd10ce3eae3 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -19,7 +19,7 @@
 	__entry->mmu_valid_gen = sp->mmu_valid_gen;	\
 	__entry->gfn = sp->gfn;				\
 	__entry->role = sp->role.word;			\
-	__entry->root_count = sp->root_count;		\
+	__entry->root_count = refcount_read(&sp->root_refcount);	\
 	__entry->unsync = sp->unsync;
 
 #define KVM_MMU_PAGE_PRINTK() ({				        \
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index fc0b87ceb1ea..34d674080170 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -130,7 +130,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 {
 	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
 
-	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
+	if (!refcount_dec_and_test(&root->root_refcount))
 		return;
 
 	/*
@@ -158,7 +158,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	 * zap the root because a root cannot go from invalid to valid.
 	 */
 	if (!kvm_tdp_root_mark_invalid(root)) {
-		refcount_set(&root->tdp_mmu_root_count, 1);
+		refcount_set(&root->root_refcount, 1);
 
 		/*
 		 * Zapping the root in a worker is not just "nice to have";
@@ -316,7 +316,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 	root = tdp_mmu_alloc_sp(vcpu);
 	tdp_mmu_init_sp(root, NULL, 0, role);
 
-	refcount_set(&root->tdp_mmu_root_count, 1);
+	refcount_set(&root->root_refcount, 1);
 
 	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 	list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots);
@@ -883,7 +883,7 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	 * and lead to use-after-free as zapping a SPTE triggers "writeback" of
 	 * dirty accessed bits to the SPTE's associated struct page.
 	 */
-	WARN_ON_ONCE(!refcount_read(&root->tdp_mmu_root_count));
+	WARN_ON_ONCE(!refcount_read(&root->root_refcount));
 
 	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 18d3719f14ea..19d3153051a3 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -14,7 +14,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
 
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root)
 {
-	return refcount_inc_not_zero(&root->tdp_mmu_root_count);
+	return refcount_inc_not_zero(&root->root_refcount);
 }
 
 void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 05/37] KVM: x86/mmu: Unify TDP MMU and Shadow MMU root refcounts
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Use the same field for refcounting roots in the TDP MMU and Shadow MMU.
The atomicity provided by refcount_t is overkill for the Shadow MMU,
since it holds the write-lock. But converging this field will enable a
future commit to more easily move struct kvm_mmu_page to common code.

Note, refcount_dec_and_test() returns true if the resulting refcount is
0. Hence the check in mmu_free_root_page() is inverted to check if
shadow root refcount is 0.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 14 +++++++-------
 arch/x86/kvm/mmu/mmu_internal.h |  6 ++----
 arch/x86/kvm/mmu/mmutrace.h     |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      |  8 ++++----
 arch/x86/kvm/mmu/tdp_mmu.h      |  2 +-
 5 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f7668a32721d..11cef930d5ed 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2498,14 +2498,14 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 
 	if (sp->unsync)
 		kvm_unlink_unsync_page(kvm, sp);
-	if (!sp->root_count) {
+	if (!refcount_read(&sp->root_refcount)) {
 		/* Count self */
 		(*nr_zapped)++;
 
 		/*
 		 * Already invalid pages (previously active roots) are not on
 		 * the active page list.  See list_del() in the "else" case of
-		 * !sp->root_count.
+		 * !sp->root_refcount.
 		 */
 		if (sp->role.invalid)
 			list_add(&sp->link, invalid_list);
@@ -2515,7 +2515,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 	} else {
 		/*
 		 * Remove the active root from the active page list, the root
-		 * will be explicitly freed when the root_count hits zero.
+		 * will be explicitly freed when the root_refcount hits zero.
 		 */
 		list_del(&sp->link);
 
@@ -2570,7 +2570,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 	kvm_flush_remote_tlbs(kvm);
 
 	list_for_each_entry_safe(sp, nsp, invalid_list, link) {
-		WARN_ON(!sp->role.invalid || sp->root_count);
+		WARN_ON(!sp->role.invalid || refcount_read(&sp->root_refcount));
 		kvm_mmu_free_shadow_page(sp);
 	}
 }
@@ -2593,7 +2593,7 @@ static unsigned long kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm,
 		 * Don't zap active root pages, the page itself can't be freed
 		 * and zapping it will just force vCPUs to realloc and reload.
 		 */
-		if (sp->root_count)
+		if (refcount_read(&sp->root_refcount))
 			continue;
 
 		unstable = __kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list,
@@ -3481,7 +3481,7 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
 
 	if (is_tdp_mmu_page(sp))
 		kvm_tdp_mmu_put_root(kvm, sp, false);
-	else if (!--sp->root_count && sp->role.invalid)
+	else if (refcount_dec_and_test(&sp->root_refcount) && sp->role.invalid)
 		kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
 
 	*root_hpa = INVALID_PAGE;
@@ -3592,7 +3592,7 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
 	WARN_ON_ONCE(role.arch.direct && role.arch.has_4_byte_gpte);
 
 	sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
-	++sp->root_count;
+	refcount_inc(&sp->root_refcount);
 
 	return __pa(sp->spt);
 }
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index c1a379fba24d..fd4990c8b0e9 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -87,10 +87,8 @@ struct kvm_mmu_page {
 	u64 *shadowed_translation;
 
 	/* Currently serving as active root */
-	union {
-		int root_count;
-		refcount_t tdp_mmu_root_count;
-	};
+	refcount_t root_refcount;
+
 	unsigned int unsync_children;
 	union {
 		struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 6a4a43b90780..ffd10ce3eae3 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -19,7 +19,7 @@
 	__entry->mmu_valid_gen = sp->mmu_valid_gen;	\
 	__entry->gfn = sp->gfn;				\
 	__entry->role = sp->role.word;			\
-	__entry->root_count = sp->root_count;		\
+	__entry->root_count = refcount_read(&sp->root_refcount);	\
 	__entry->unsync = sp->unsync;
 
 #define KVM_MMU_PAGE_PRINTK() ({				        \
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index fc0b87ceb1ea..34d674080170 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -130,7 +130,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 {
 	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
 
-	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
+	if (!refcount_dec_and_test(&root->root_refcount))
 		return;
 
 	/*
@@ -158,7 +158,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	 * zap the root because a root cannot go from invalid to valid.
 	 */
 	if (!kvm_tdp_root_mark_invalid(root)) {
-		refcount_set(&root->tdp_mmu_root_count, 1);
+		refcount_set(&root->root_refcount, 1);
 
 		/*
 		 * Zapping the root in a worker is not just "nice to have";
@@ -316,7 +316,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 	root = tdp_mmu_alloc_sp(vcpu);
 	tdp_mmu_init_sp(root, NULL, 0, role);
 
-	refcount_set(&root->tdp_mmu_root_count, 1);
+	refcount_set(&root->root_refcount, 1);
 
 	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 	list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots);
@@ -883,7 +883,7 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	 * and lead to use-after-free as zapping a SPTE triggers "writeback" of
 	 * dirty accessed bits to the SPTE's associated struct page.
 	 */
-	WARN_ON_ONCE(!refcount_read(&root->tdp_mmu_root_count));
+	WARN_ON_ONCE(!refcount_read(&root->root_refcount));
 
 	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 18d3719f14ea..19d3153051a3 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -14,7 +14,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
 
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root)
 {
-	return refcount_inc_not_zero(&root->tdp_mmu_root_count);
+	return refcount_inc_not_zero(&root->root_refcount);
 }
 
 void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 05/37] KVM: x86/mmu: Unify TDP MMU and Shadow MMU root refcounts
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Use the same field for refcounting roots in the TDP MMU and Shadow MMU.
The atomicity provided by refcount_t is overkill for the Shadow MMU,
since it holds the write-lock. But converging this field will enable a
future commit to more easily move struct kvm_mmu_page to common code.

Note, refcount_dec_and_test() returns true if the resulting refcount is
0. Hence the check in mmu_free_root_page() is inverted to check if
shadow root refcount is 0.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 14 +++++++-------
 arch/x86/kvm/mmu/mmu_internal.h |  6 ++----
 arch/x86/kvm/mmu/mmutrace.h     |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      |  8 ++++----
 arch/x86/kvm/mmu/tdp_mmu.h      |  2 +-
 5 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f7668a32721d..11cef930d5ed 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2498,14 +2498,14 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 
 	if (sp->unsync)
 		kvm_unlink_unsync_page(kvm, sp);
-	if (!sp->root_count) {
+	if (!refcount_read(&sp->root_refcount)) {
 		/* Count self */
 		(*nr_zapped)++;
 
 		/*
 		 * Already invalid pages (previously active roots) are not on
 		 * the active page list.  See list_del() in the "else" case of
-		 * !sp->root_count.
+		 * !sp->root_refcount.
 		 */
 		if (sp->role.invalid)
 			list_add(&sp->link, invalid_list);
@@ -2515,7 +2515,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 	} else {
 		/*
 		 * Remove the active root from the active page list, the root
-		 * will be explicitly freed when the root_count hits zero.
+		 * will be explicitly freed when the root_refcount hits zero.
 		 */
 		list_del(&sp->link);
 
@@ -2570,7 +2570,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 	kvm_flush_remote_tlbs(kvm);
 
 	list_for_each_entry_safe(sp, nsp, invalid_list, link) {
-		WARN_ON(!sp->role.invalid || sp->root_count);
+		WARN_ON(!sp->role.invalid || refcount_read(&sp->root_refcount));
 		kvm_mmu_free_shadow_page(sp);
 	}
 }
@@ -2593,7 +2593,7 @@ static unsigned long kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm,
 		 * Don't zap active root pages, the page itself can't be freed
 		 * and zapping it will just force vCPUs to realloc and reload.
 		 */
-		if (sp->root_count)
+		if (refcount_read(&sp->root_refcount))
 			continue;
 
 		unstable = __kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list,
@@ -3481,7 +3481,7 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
 
 	if (is_tdp_mmu_page(sp))
 		kvm_tdp_mmu_put_root(kvm, sp, false);
-	else if (!--sp->root_count && sp->role.invalid)
+	else if (refcount_dec_and_test(&sp->root_refcount) && sp->role.invalid)
 		kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
 
 	*root_hpa = INVALID_PAGE;
@@ -3592,7 +3592,7 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
 	WARN_ON_ONCE(role.arch.direct && role.arch.has_4_byte_gpte);
 
 	sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
-	++sp->root_count;
+	refcount_inc(&sp->root_refcount);
 
 	return __pa(sp->spt);
 }
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index c1a379fba24d..fd4990c8b0e9 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -87,10 +87,8 @@ struct kvm_mmu_page {
 	u64 *shadowed_translation;
 
 	/* Currently serving as active root */
-	union {
-		int root_count;
-		refcount_t tdp_mmu_root_count;
-	};
+	refcount_t root_refcount;
+
 	unsigned int unsync_children;
 	union {
 		struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 6a4a43b90780..ffd10ce3eae3 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -19,7 +19,7 @@
 	__entry->mmu_valid_gen = sp->mmu_valid_gen;	\
 	__entry->gfn = sp->gfn;				\
 	__entry->role = sp->role.word;			\
-	__entry->root_count = sp->root_count;		\
+	__entry->root_count = refcount_read(&sp->root_refcount);	\
 	__entry->unsync = sp->unsync;
 
 #define KVM_MMU_PAGE_PRINTK() ({				        \
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index fc0b87ceb1ea..34d674080170 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -130,7 +130,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 {
 	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
 
-	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
+	if (!refcount_dec_and_test(&root->root_refcount))
 		return;
 
 	/*
@@ -158,7 +158,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	 * zap the root because a root cannot go from invalid to valid.
 	 */
 	if (!kvm_tdp_root_mark_invalid(root)) {
-		refcount_set(&root->tdp_mmu_root_count, 1);
+		refcount_set(&root->root_refcount, 1);
 
 		/*
 		 * Zapping the root in a worker is not just "nice to have";
@@ -316,7 +316,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 	root = tdp_mmu_alloc_sp(vcpu);
 	tdp_mmu_init_sp(root, NULL, 0, role);
 
-	refcount_set(&root->tdp_mmu_root_count, 1);
+	refcount_set(&root->root_refcount, 1);
 
 	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 	list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots);
@@ -883,7 +883,7 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	 * and lead to use-after-free as zapping a SPTE triggers "writeback" of
 	 * dirty accessed bits to the SPTE's associated struct page.
 	 */
-	WARN_ON_ONCE(!refcount_read(&root->tdp_mmu_root_count));
+	WARN_ON_ONCE(!refcount_read(&root->root_refcount));
 
 	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 18d3719f14ea..19d3153051a3 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -14,7 +14,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
 
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root)
 {
-	return refcount_inc_not_zero(&root->tdp_mmu_root_count);
+	return refcount_inc_not_zero(&root->root_refcount);
 }
 
 void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 05/37] KVM: x86/mmu: Unify TDP MMU and Shadow MMU root refcounts
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Use the same field for refcounting roots in the TDP MMU and Shadow MMU.
The atomicity provided by refcount_t is overkill for the Shadow MMU,
since it holds the write-lock. But converging this field will enable a
future commit to more easily move struct kvm_mmu_page to common code.

Note, refcount_dec_and_test() returns true if the resulting refcount is
0. Hence the check in mmu_free_root_page() is inverted to check if
shadow root refcount is 0.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 14 +++++++-------
 arch/x86/kvm/mmu/mmu_internal.h |  6 ++----
 arch/x86/kvm/mmu/mmutrace.h     |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      |  8 ++++----
 arch/x86/kvm/mmu/tdp_mmu.h      |  2 +-
 5 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f7668a32721d..11cef930d5ed 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2498,14 +2498,14 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 
 	if (sp->unsync)
 		kvm_unlink_unsync_page(kvm, sp);
-	if (!sp->root_count) {
+	if (!refcount_read(&sp->root_refcount)) {
 		/* Count self */
 		(*nr_zapped)++;
 
 		/*
 		 * Already invalid pages (previously active roots) are not on
 		 * the active page list.  See list_del() in the "else" case of
-		 * !sp->root_count.
+		 * !sp->root_refcount.
 		 */
 		if (sp->role.invalid)
 			list_add(&sp->link, invalid_list);
@@ -2515,7 +2515,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 	} else {
 		/*
 		 * Remove the active root from the active page list, the root
-		 * will be explicitly freed when the root_count hits zero.
+		 * will be explicitly freed when the root_refcount hits zero.
 		 */
 		list_del(&sp->link);
 
@@ -2570,7 +2570,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 	kvm_flush_remote_tlbs(kvm);
 
 	list_for_each_entry_safe(sp, nsp, invalid_list, link) {
-		WARN_ON(!sp->role.invalid || sp->root_count);
+		WARN_ON(!sp->role.invalid || refcount_read(&sp->root_refcount));
 		kvm_mmu_free_shadow_page(sp);
 	}
 }
@@ -2593,7 +2593,7 @@ static unsigned long kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm,
 		 * Don't zap active root pages, the page itself can't be freed
 		 * and zapping it will just force vCPUs to realloc and reload.
 		 */
-		if (sp->root_count)
+		if (refcount_read(&sp->root_refcount))
 			continue;
 
 		unstable = __kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list,
@@ -3481,7 +3481,7 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa,
 
 	if (is_tdp_mmu_page(sp))
 		kvm_tdp_mmu_put_root(kvm, sp, false);
-	else if (!--sp->root_count && sp->role.invalid)
+	else if (refcount_dec_and_test(&sp->root_refcount) && sp->role.invalid)
 		kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
 
 	*root_hpa = INVALID_PAGE;
@@ -3592,7 +3592,7 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
 	WARN_ON_ONCE(role.arch.direct && role.arch.has_4_byte_gpte);
 
 	sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
-	++sp->root_count;
+	refcount_inc(&sp->root_refcount);
 
 	return __pa(sp->spt);
 }
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index c1a379fba24d..fd4990c8b0e9 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -87,10 +87,8 @@ struct kvm_mmu_page {
 	u64 *shadowed_translation;
 
 	/* Currently serving as active root */
-	union {
-		int root_count;
-		refcount_t tdp_mmu_root_count;
-	};
+	refcount_t root_refcount;
+
 	unsigned int unsync_children;
 	union {
 		struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 6a4a43b90780..ffd10ce3eae3 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -19,7 +19,7 @@
 	__entry->mmu_valid_gen = sp->mmu_valid_gen;	\
 	__entry->gfn = sp->gfn;				\
 	__entry->role = sp->role.word;			\
-	__entry->root_count = sp->root_count;		\
+	__entry->root_count = refcount_read(&sp->root_refcount);	\
 	__entry->unsync = sp->unsync;
 
 #define KVM_MMU_PAGE_PRINTK() ({				        \
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index fc0b87ceb1ea..34d674080170 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -130,7 +130,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 {
 	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
 
-	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
+	if (!refcount_dec_and_test(&root->root_refcount))
 		return;
 
 	/*
@@ -158,7 +158,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	 * zap the root because a root cannot go from invalid to valid.
 	 */
 	if (!kvm_tdp_root_mark_invalid(root)) {
-		refcount_set(&root->tdp_mmu_root_count, 1);
+		refcount_set(&root->root_refcount, 1);
 
 		/*
 		 * Zapping the root in a worker is not just "nice to have";
@@ -316,7 +316,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 	root = tdp_mmu_alloc_sp(vcpu);
 	tdp_mmu_init_sp(root, NULL, 0, role);
 
-	refcount_set(&root->tdp_mmu_root_count, 1);
+	refcount_set(&root->root_refcount, 1);
 
 	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 	list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots);
@@ -883,7 +883,7 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	 * and lead to use-after-free as zapping a SPTE triggers "writeback" of
 	 * dirty accessed bits to the SPTE's associated struct page.
 	 */
-	WARN_ON_ONCE(!refcount_read(&root->tdp_mmu_root_count));
+	WARN_ON_ONCE(!refcount_read(&root->root_refcount));
 
 	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 18d3719f14ea..19d3153051a3 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -14,7 +14,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
 
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root)
 {
-	return refcount_inc_not_zero(&root->tdp_mmu_root_count);
+	return refcount_inc_not_zero(&root->root_refcount);
 }
 
 void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move struct kvm_mmu_page to common code and all x86-specific fields into
kvm_mmu_page_arch.

This commit increases the size of struct kvm_mmu_page by 64 bytes on
x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
in future commits by moving TDP MMU root fields into a separate struct
and by dynamically allocating fields only used by the Shadow MMU.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/mmu_types.h |  62 ++++++++++++++
 arch/x86/include/asm/kvm_host.h      |   4 -
 arch/x86/kvm/mmu/mmu.c               | 122 ++++++++++++++-------------
 arch/x86/kvm/mmu/mmu_internal.h      |  83 ------------------
 arch/x86/kvm/mmu/mmutrace.h          |   4 +-
 arch/x86/kvm/mmu/paging_tmpl.h       |  10 +--
 arch/x86/kvm/mmu/tdp_mmu.c           |   8 +-
 arch/x86/kvm/mmu/tdp_mmu.h           |   2 +-
 include/kvm/mmu_types.h              |  32 ++++++-
 9 files changed, 167 insertions(+), 160 deletions(-)

diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
index 35f893ebab5a..affcb520b482 100644
--- a/arch/x86/include/asm/kvm/mmu_types.h
+++ b/arch/x86/include/asm/kvm/mmu_types.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_KVM_MMU_TYPES_H
 #define __ASM_KVM_MMU_TYPES_H
 
+#include <linux/bitmap.h>
+#include <linux/list.h>
 #include <linux/types.h>
 
 /*
@@ -53,4 +55,64 @@ struct kvm_mmu_page_role_arch {
 	u16 passthrough:1;
 };
 
+struct kvm_rmap_head {
+	unsigned long val;
+};
+
+struct kvm_mmu_page_arch {
+	struct hlist_node hash_link;
+
+	bool shadow_mmu_page;
+	bool unsync;
+	u8 mmu_valid_gen;
+
+	 /*
+	  * The shadow page can't be replaced by an equivalent huge page
+	  * because it is being used to map an executable page in the guest
+	  * and the NX huge page mitigation is enabled.
+	  */
+	bool nx_huge_page_disallowed;
+
+	/*
+	 * Stores the result of the guest translation being shadowed by each
+	 * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
+	 * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
+	 * cases the result of the translation is a GPA and a set of access
+	 * constraints.
+	 *
+	 * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
+	 * access permissions are stored in the lower bits. Note, for
+	 * convenience and uniformity across guests, the access permissions are
+	 * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
+	 */
+	u64 *shadowed_translation;
+
+	unsigned int unsync_children;
+
+	/* Rmap pointers to all parent sptes. */
+	struct kvm_rmap_head parent_ptes;
+
+	DECLARE_BITMAP(unsync_child_bitmap, 512);
+
+	/*
+	 * Tracks shadow pages that, if zapped, would allow KVM to create an NX
+	 * huge page.  A shadow page will have nx_huge_page_disallowed set but
+	 * not be on the list if a huge page is disallowed for other reasons,
+	 * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
+	 * isn't properly aligned, etc...
+	 */
+	struct list_head possible_nx_huge_page_link;
+
+#ifdef CONFIG_X86_32
+	/*
+	 * Used out of the mmu-lock to avoid reading spte values while an
+	 * update is in progress; see the comments in __get_spte_lockless().
+	 */
+	int clear_spte_count;
+#endif
+
+	/* Number of writes since the last time traversal visited this page.  */
+	atomic_t write_flooding_count;
+};
+
 #endif /* !__ASM_KVM_MMU_TYPES_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ebcd7a0dabef..f5743a652e10 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -329,10 +329,6 @@ union kvm_cpu_role {
 	};
 };
 
-struct kvm_rmap_head {
-	unsigned long val;
-};
-
 struct kvm_pio_request {
 	unsigned long linear_rip;
 	unsigned long count;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 11cef930d5ed..e47f35878ab5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -350,7 +350,7 @@ static void count_spte_clear(u64 *sptep, u64 spte)
 
 	/* Ensure the spte is completely set before we increase the count */
 	smp_wmb();
-	sp->clear_spte_count++;
+	sp->arch.clear_spte_count++;
 }
 
 static void __set_spte(u64 *sptep, u64 spte)
@@ -432,7 +432,7 @@ static u64 __get_spte_lockless(u64 *sptep)
 	int count;
 
 retry:
-	count = sp->clear_spte_count;
+	count = sp->arch.clear_spte_count;
 	smp_rmb();
 
 	spte.spte_low = orig->spte_low;
@@ -442,7 +442,7 @@ static u64 __get_spte_lockless(u64 *sptep)
 	smp_rmb();
 
 	if (unlikely(spte.spte_low != orig->spte_low ||
-	      count != sp->clear_spte_count))
+	      count != sp->arch.clear_spte_count))
 		goto retry;
 
 	return spte.spte;
@@ -699,7 +699,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 		return sp->gfn;
 
 	if (!sp->role.arch.direct)
-		return sp->shadowed_translation[index] >> PAGE_SHIFT;
+		return sp->arch.shadowed_translation[index] >> PAGE_SHIFT;
 
 	return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
 }
@@ -713,7 +713,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
 {
 	if (sp_has_gptes(sp))
-		return sp->shadowed_translation[index] & ACC_ALL;
+		return sp->arch.shadowed_translation[index] & ACC_ALL;
 
 	/*
 	 * For direct MMUs (e.g. TDP or non-paging guests) or passthrough SPs,
@@ -734,7 +734,7 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
 					 gfn_t gfn, unsigned int access)
 {
 	if (sp_has_gptes(sp)) {
-		sp->shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
+		sp->arch.shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
 		return;
 	}
 
@@ -825,18 +825,18 @@ void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 	 * on the list if KVM is reusing an existing shadow page, i.e. if KVM
 	 * links a shadow page at multiple points.
 	 */
-	if (!list_empty(&sp->possible_nx_huge_page_link))
+	if (!list_empty(&sp->arch.possible_nx_huge_page_link))
 		return;
 
 	++kvm->stat.nx_lpage_splits;
-	list_add_tail(&sp->possible_nx_huge_page_link,
+	list_add_tail(&sp->arch.possible_nx_huge_page_link,
 		      &kvm->arch.possible_nx_huge_pages);
 }
 
 static void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 				 bool nx_huge_page_possible)
 {
-	sp->nx_huge_page_disallowed = true;
+	sp->arch.nx_huge_page_disallowed = true;
 
 	if (nx_huge_page_possible)
 		track_possible_nx_huge_page(kvm, sp);
@@ -861,16 +861,16 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	if (list_empty(&sp->possible_nx_huge_page_link))
+	if (list_empty(&sp->arch.possible_nx_huge_page_link))
 		return;
 
 	--kvm->stat.nx_lpage_splits;
-	list_del_init(&sp->possible_nx_huge_page_link);
+	list_del_init(&sp->arch.possible_nx_huge_page_link);
 }
 
 static void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	sp->nx_huge_page_disallowed = false;
+	sp->arch.nx_huge_page_disallowed = false;
 
 	untrack_possible_nx_huge_page(kvm, sp);
 }
@@ -1720,11 +1720,11 @@ static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
 {
 	MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
-	hlist_del(&sp->hash_link);
+	hlist_del(&sp->arch.hash_link);
 	list_del(&sp->link);
 	free_page((unsigned long)sp->spt);
 	if (!sp->role.arch.direct)
-		free_page((unsigned long)sp->shadowed_translation);
+		free_page((unsigned long)sp->arch.shadowed_translation);
 	kmem_cache_free(mmu_page_header_cache, sp);
 }
 
@@ -1739,13 +1739,13 @@ static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache,
 	if (!parent_pte)
 		return;
 
-	pte_list_add(cache, parent_pte, &sp->parent_ptes);
+	pte_list_add(cache, parent_pte, &sp->arch.parent_ptes);
 }
 
 static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
 				       u64 *parent_pte)
 {
-	pte_list_remove(parent_pte, &sp->parent_ptes);
+	pte_list_remove(parent_pte, &sp->arch.parent_ptes);
 }
 
 static void drop_parent_pte(struct kvm_mmu_page *sp,
@@ -1761,7 +1761,7 @@ static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
 	u64 *sptep;
 	struct rmap_iterator iter;
 
-	for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
+	for_each_rmap_spte(&sp->arch.parent_ptes, &iter, sptep) {
 		mark_unsync(sptep);
 	}
 }
@@ -1771,9 +1771,9 @@ static void mark_unsync(u64 *spte)
 	struct kvm_mmu_page *sp;
 
 	sp = sptep_to_sp(spte);
-	if (__test_and_set_bit(spte_index(spte), sp->unsync_child_bitmap))
+	if (__test_and_set_bit(spte_index(spte), sp->arch.unsync_child_bitmap))
 		return;
-	if (sp->unsync_children++)
+	if (sp->arch.unsync_children++)
 		return;
 	kvm_mmu_mark_parents_unsync(sp);
 }
@@ -1799,7 +1799,7 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
 {
 	int i;
 
-	if (sp->unsync)
+	if (sp->arch.unsync)
 		for (i=0; i < pvec->nr; i++)
 			if (pvec->page[i].sp == sp)
 				return 0;
@@ -1812,9 +1812,9 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
 
 static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
 {
-	--sp->unsync_children;
-	WARN_ON((int)sp->unsync_children < 0);
-	__clear_bit(idx, sp->unsync_child_bitmap);
+	--sp->arch.unsync_children;
+	WARN_ON((int)sp->arch.unsync_children < 0);
+	__clear_bit(idx, sp->arch.unsync_child_bitmap);
 }
 
 static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
@@ -1822,7 +1822,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 {
 	int i, ret, nr_unsync_leaf = 0;
 
-	for_each_set_bit(i, sp->unsync_child_bitmap, 512) {
+	for_each_set_bit(i, sp->arch.unsync_child_bitmap, 512) {
 		struct kvm_mmu_page *child;
 		u64 ent = sp->spt[i];
 
@@ -1833,7 +1833,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 
 		child = spte_to_child_sp(ent);
 
-		if (child->unsync_children) {
+		if (child->arch.unsync_children) {
 			if (mmu_pages_add(pvec, child, i))
 				return -ENOSPC;
 
@@ -1845,7 +1845,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 				nr_unsync_leaf += ret;
 			} else
 				return ret;
-		} else if (child->unsync) {
+		} else if (child->arch.unsync) {
 			nr_unsync_leaf++;
 			if (mmu_pages_add(pvec, child, i))
 				return -ENOSPC;
@@ -1862,7 +1862,7 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
 			   struct kvm_mmu_pages *pvec)
 {
 	pvec->nr = 0;
-	if (!sp->unsync_children)
+	if (!sp->arch.unsync_children)
 		return 0;
 
 	mmu_pages_add(pvec, sp, INVALID_INDEX);
@@ -1871,9 +1871,9 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
 
 static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	WARN_ON(!sp->unsync);
+	WARN_ON(!sp->arch.unsync);
 	trace_kvm_mmu_sync_page(sp);
-	sp->unsync = 0;
+	sp->arch.unsync = 0;
 	--kvm->stat.mmu_unsync;
 }
 
@@ -1894,7 +1894,7 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
 }
 
 #define for_each_valid_sp(_kvm, _sp, _list)				\
-	hlist_for_each_entry(_sp, _list, hash_link)			\
+	hlist_for_each_entry(_sp, _list, arch.hash_link)			\
 		if (is_obsolete_sp((_kvm), (_sp))) {			\
 		} else
 
@@ -1934,7 +1934,7 @@ static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* TDP MMU pages do not use the MMU generation. */
 	return !is_tdp_mmu_page(sp) &&
-	       unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
+	       unlikely(sp->arch.mmu_valid_gen != kvm->arch.mmu_valid_gen);
 }
 
 struct mmu_page_path {
@@ -2006,7 +2006,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
 		WARN_ON(idx == INVALID_INDEX);
 		clear_unsync_child_bit(sp, idx);
 		level++;
-	} while (!sp->unsync_children);
+	} while (!sp->arch.unsync_children);
 }
 
 static int mmu_sync_children(struct kvm_vcpu *vcpu,
@@ -2053,7 +2053,7 @@ static int mmu_sync_children(struct kvm_vcpu *vcpu,
 
 static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
 {
-	atomic_set(&sp->write_flooding_count,  0);
+	atomic_set(&sp->arch.write_flooding_count,  0);
 }
 
 static void clear_sp_write_flooding_count(u64 *spte)
@@ -2094,7 +2094,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 			 * Unsync pages must not be left as is, because the new
 			 * upper-level page will be write-protected.
 			 */
-			if (role.level > PG_LEVEL_4K && sp->unsync)
+			if (role.level > PG_LEVEL_4K && sp->arch.unsync)
 				kvm_mmu_prepare_zap_page(kvm, sp,
 							 &invalid_list);
 			continue;
@@ -2104,7 +2104,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 		if (sp->role.arch.direct)
 			goto out;
 
-		if (sp->unsync) {
+		if (sp->arch.unsync) {
 			if (KVM_BUG_ON(!vcpu, kvm))
 				break;
 
@@ -2163,25 +2163,26 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 	sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
 	sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
 	if (!role.arch.direct)
-		sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
+		sp->arch.shadowed_translation =
+			kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
 
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
-	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
+	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
 
 	/*
 	 * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
 	 * depends on valid pages being added to the head of the list.  See
 	 * comments in kvm_zap_obsolete_pages().
 	 */
-	sp->mmu_valid_gen = kvm->arch.mmu_valid_gen;
+	sp->arch.mmu_valid_gen = kvm->arch.mmu_valid_gen;
 	list_add(&sp->link, &kvm->arch.active_mmu_pages);
 	kvm_account_mmu_page(kvm, sp);
 
 	sp->gfn = gfn;
 	sp->role = role;
-	sp->shadow_mmu_page = true;
-	hlist_add_head(&sp->hash_link, sp_list);
+	sp->arch.shadow_mmu_page = true;
+	hlist_add_head(&sp->arch.hash_link, sp_list);
 	if (sp_has_gptes(sp))
 		account_shadowed(kvm, sp);
 
@@ -2368,7 +2369,7 @@ static void __link_shadow_page(struct kvm *kvm,
 
 	mmu_page_add_parent_pte(cache, sp, sptep);
 
-	if (sp->unsync_children || sp->unsync)
+	if (sp->arch.unsync_children || sp->arch.unsync)
 		mark_unsync(sptep);
 }
 
@@ -2421,7 +2422,8 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 			 * avoids retaining a large number of stale nested SPs.
 			 */
 			if (tdp_enabled && invalid_list &&
-			    child->role.arch.guest_mode && !child->parent_ptes.val)
+			    child->role.arch.guest_mode &&
+			    !child->arch.parent_ptes.val)
 				return kvm_mmu_prepare_zap_page(kvm, child,
 								invalid_list);
 		}
@@ -2449,7 +2451,7 @@ static void kvm_mmu_unlink_parents(struct kvm_mmu_page *sp)
 	u64 *sptep;
 	struct rmap_iterator iter;
 
-	while ((sptep = rmap_get_first(&sp->parent_ptes, &iter)))
+	while ((sptep = rmap_get_first(&sp->arch.parent_ptes, &iter)))
 		drop_parent_pte(sp, sptep);
 }
 
@@ -2496,7 +2498,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 	if (!sp->role.invalid && sp_has_gptes(sp))
 		unaccount_shadowed(kvm, sp);
 
-	if (sp->unsync)
+	if (sp->arch.unsync)
 		kvm_unlink_unsync_page(kvm, sp);
 	if (!refcount_read(&sp->root_refcount)) {
 		/* Count self */
@@ -2527,7 +2529,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 		zapped_root = !is_obsolete_sp(kvm, sp);
 	}
 
-	if (sp->nx_huge_page_disallowed)
+	if (sp->arch.nx_huge_page_disallowed)
 		unaccount_nx_huge_page(kvm, sp);
 
 	sp->role.invalid = 1;
@@ -2704,7 +2706,7 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	trace_kvm_mmu_unsync_page(sp);
 	++kvm->stat.mmu_unsync;
-	sp->unsync = 1;
+	sp->arch.unsync = 1;
 
 	kvm_mmu_mark_parents_unsync(sp);
 }
@@ -2739,7 +2741,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 		if (!can_unsync)
 			return -EPERM;
 
-		if (sp->unsync)
+		if (sp->arch.unsync)
 			continue;
 
 		if (prefetch)
@@ -2764,7 +2766,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 			 * for write, i.e. unsync cannot transition from 0->1
 			 * while this CPU holds mmu_lock for read (or write).
 			 */
-			if (READ_ONCE(sp->unsync))
+			if (READ_ONCE(sp->arch.unsync))
 				continue;
 		}
 
@@ -2796,8 +2798,8 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 *                      2.2 Guest issues TLB flush.
 	 *                          That causes a VM Exit.
 	 *
-	 *                      2.3 Walking of unsync pages sees sp->unsync is
-	 *                          false and skips the page.
+	 *                      2.3 Walking of unsync pages sees sp->arch.unsync
+	 *                          is false and skips the page.
 	 *
 	 *                      2.4 Guest accesses GVA X.
 	 *                          Since the mapping in the SP was not updated,
@@ -2805,7 +2807,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 *                          gets used.
 	 * 1.1 Host marks SP
 	 *     as unsync
-	 *     (sp->unsync = true)
+	 *     (sp->arch.unsync = true)
 	 *
 	 * The write barrier below ensures that 1.1 happens before 1.2 and thus
 	 * the situation in 2.4 does not arise.  It pairs with the read barrier
@@ -3126,7 +3128,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 	    cur_level == fault->goal_level &&
 	    is_shadow_present_pte(spte) &&
 	    !is_large_pte(spte) &&
-	    spte_to_child_sp(spte)->nx_huge_page_disallowed) {
+	    spte_to_child_sp(spte)->arch.nx_huge_page_disallowed) {
 		/*
 		 * A small SPTE exists for this pfn, but FNAME(fetch),
 		 * direct_map(), or kvm_tdp_mmu_map() would like to create a
@@ -3902,7 +3904,7 @@ static bool is_unsync_root(hpa_t root)
 
 	/*
 	 * The read barrier orders the CPU's read of SPTE.W during the page table
-	 * walk before the reads of sp->unsync/sp->unsync_children here.
+	 * walk before the reads of sp->arch.{unsync,unsync_children} here.
 	 *
 	 * Even if another CPU was marking the SP as unsync-ed simultaneously,
 	 * any guest page table changes are not guaranteed to be visible anyway
@@ -3922,7 +3924,7 @@ static bool is_unsync_root(hpa_t root)
 	if (WARN_ON_ONCE(!sp))
 		return false;
 
-	if (sp->unsync || sp->unsync_children)
+	if (sp->arch.unsync || sp->arch.unsync_children)
 		return true;
 
 	return false;
@@ -5510,8 +5512,8 @@ static bool detect_write_flooding(struct kvm_mmu_page *sp)
 	if (sp->role.level == PG_LEVEL_4K)
 		return false;
 
-	atomic_inc(&sp->write_flooding_count);
-	return atomic_read(&sp->write_flooding_count) >= 3;
+	atomic_inc(&sp->arch.write_flooding_count);
+	return atomic_read(&sp->arch.write_flooding_count) >= 3;
 }
 
 /*
@@ -6389,7 +6391,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
 			continue;
 
 		/* SPs with level >PG_LEVEL_4K should never by unsync. */
-		if (WARN_ON_ONCE(sp->unsync))
+		if (WARN_ON_ONCE(sp->arch.unsync))
 			continue;
 
 		/* Don't bother splitting huge pages on invalid SPs. */
@@ -6941,8 +6943,8 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 		 */
 		sp = list_first_entry(&kvm->arch.possible_nx_huge_pages,
 				      struct kvm_mmu_page,
-				      possible_nx_huge_page_link);
-		WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
+				      arch.possible_nx_huge_page_link);
+		WARN_ON_ONCE(!sp->arch.nx_huge_page_disallowed);
 		WARN_ON_ONCE(!sp->role.arch.direct);
 
 		/*
@@ -6977,7 +6979,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 			flush |= kvm_tdp_mmu_zap_sp(kvm, sp);
 		else
 			kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
-		WARN_ON_ONCE(sp->nx_huge_page_disallowed);
+		WARN_ON_ONCE(sp->arch.nx_huge_page_disallowed);
 
 		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
 			kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index fd4990c8b0e9..af2ae4887e35 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -44,89 +44,6 @@ extern bool dbg;
 #define INVALID_PAE_ROOT	0
 #define IS_VALID_PAE_ROOT(x)	(!!(x))
 
-struct kvm_mmu_page {
-	/*
-	 * Note, "link" through "spt" fit in a single 64 byte cache line on
-	 * 64-bit kernels, keep it that way unless there's a reason not to.
-	 */
-	struct list_head link;
-	struct hlist_node hash_link;
-
-	bool shadow_mmu_page;
-	bool unsync;
-	u8 mmu_valid_gen;
-
-	 /*
-	  * The shadow page can't be replaced by an equivalent huge page
-	  * because it is being used to map an executable page in the guest
-	  * and the NX huge page mitigation is enabled.
-	  */
-	bool nx_huge_page_disallowed;
-
-	/*
-	 * The following two entries are used to key the shadow page in the
-	 * hash table.
-	 */
-	union kvm_mmu_page_role role;
-	gfn_t gfn;
-
-	u64 *spt;
-
-	/*
-	 * Stores the result of the guest translation being shadowed by each
-	 * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
-	 * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
-	 * cases the result of the translation is a GPA and a set of access
-	 * constraints.
-	 *
-	 * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
-	 * access permissions are stored in the lower bits. Note, for
-	 * convenience and uniformity across guests, the access permissions are
-	 * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
-	 */
-	u64 *shadowed_translation;
-
-	/* Currently serving as active root */
-	refcount_t root_refcount;
-
-	unsigned int unsync_children;
-	union {
-		struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
-		tdp_ptep_t ptep;
-	};
-	union {
-		DECLARE_BITMAP(unsync_child_bitmap, 512);
-		struct {
-			struct work_struct tdp_mmu_async_work;
-			void *tdp_mmu_async_data;
-		};
-	};
-
-	/*
-	 * Tracks shadow pages that, if zapped, would allow KVM to create an NX
-	 * huge page.  A shadow page will have nx_huge_page_disallowed set but
-	 * not be on the list if a huge page is disallowed for other reasons,
-	 * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
-	 * isn't properly aligned, etc...
-	 */
-	struct list_head possible_nx_huge_page_link;
-#ifdef CONFIG_X86_32
-	/*
-	 * Used out of the mmu-lock to avoid reading spte values while an
-	 * update is in progress; see the comments in __get_spte_lockless().
-	 */
-	int clear_spte_count;
-#endif
-
-	/* Number of writes since the last time traversal visited this page.  */
-	atomic_t write_flooding_count;
-
-#ifdef CONFIG_X86_64
-	/* Used for freeing the page asynchronously if it is a TDP MMU page. */
-	struct rcu_head rcu_head;
-#endif
-};
-
 extern struct kmem_cache *mmu_page_header_cache;
 
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index ffd10ce3eae3..335f26dabdf3 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -16,11 +16,11 @@
 	__field(bool, unsync)
 
 #define KVM_MMU_PAGE_ASSIGN(sp)				\
-	__entry->mmu_valid_gen = sp->mmu_valid_gen;	\
+	__entry->mmu_valid_gen = sp->arch.mmu_valid_gen;	\
 	__entry->gfn = sp->gfn;				\
 	__entry->role = sp->role.word;			\
 	__entry->root_count = refcount_read(&sp->root_refcount);	\
-	__entry->unsync = sp->unsync;
+	__entry->unsync = sp->arch.unsync;
 
 #define KVM_MMU_PAGE_PRINTK() ({				        \
 	const char *saved_ptr = trace_seq_buffer_ptr(p);		\
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e15ec1c473da..18bb92b70a01 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -671,7 +671,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			 * KVM_REQ_MMU_SYNC is not necessary but it
 			 * expedites the process.
 			 */
-			if (sp->unsync_children &&
+			if (sp->arch.unsync_children &&
 			    mmu_sync_children(vcpu, sp, false))
 				return RET_PF_RETRY;
 		}
@@ -921,7 +921,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 			pt_element_t gpte;
 			gpa_t pte_gpa;
 
-			if (!sp->unsync)
+			if (!sp->arch.unsync)
 				break;
 
 			pte_gpa = FNAME(get_level1_sp_gpa)(sp);
@@ -942,7 +942,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 			FNAME(prefetch_gpte)(vcpu, sp, sptep, gpte, false);
 		}
 
-		if (!sp->unsync_children)
+		if (!sp->arch.unsync_children)
 			break;
 	}
 	write_unlock(&vcpu->kvm->mmu_lock);
@@ -974,8 +974,8 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 }
 
 /*
- * Using the information in sp->shadowed_translation (kvm_mmu_page_get_gfn()) is
- * safe because:
+ * Using the information in sp->arch.shadowed_translation
+ * (kvm_mmu_page_get_gfn()) is safe because:
  * - The spte has a reference to the struct page, so the pfn for a given gfn
  *   can't change unless all sptes pointing to it are nuked first.
  *
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 34d674080170..66231c7ed31e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -270,7 +270,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 			    gfn_t gfn, union kvm_mmu_page_role role)
 {
-	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
+	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
 
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
@@ -385,7 +385,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 {
 	tdp_unaccount_mmu_page(kvm, sp);
 
-	if (!sp->nx_huge_page_disallowed)
+	if (!sp->arch.nx_huge_page_disallowed)
 		return;
 
 	if (shared)
@@ -393,7 +393,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 	else
 		lockdep_assert_held_write(&kvm->mmu_lock);
 
-	sp->nx_huge_page_disallowed = false;
+	sp->arch.nx_huge_page_disallowed = false;
 	untrack_possible_nx_huge_page(kvm, sp);
 
 	if (shared)
@@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		sp = tdp_mmu_alloc_sp(vcpu);
 		tdp_mmu_init_child_sp(sp, &iter);
 
-		sp->nx_huge_page_disallowed = fault->huge_page_disallowed;
+		sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
 
 		if (is_shadow_present_pte(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 19d3153051a3..e6a929089715 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -73,7 +73,7 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 #ifdef CONFIG_X86_64
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
 {
-	return !sp->shadow_mmu_page;
+	return !sp->arch.shadow_mmu_page;
 }
 #else
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 14099956fdac..a9da33d4baa8 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -3,8 +3,11 @@
 #define __KVM_MMU_TYPES_H
 
 #include <linux/bug.h>
-#include <linux/types.h>
+#include <linux/kvm_types.h>
+#include <linux/refcount.h>
 #include <linux/stddef.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
 
 #include <asm/kvm/mmu_types.h>
 
@@ -36,4 +39,31 @@ static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page
 
 typedef u64 __rcu *tdp_ptep_t;
 
+struct kvm_mmu_page {
+	struct list_head link;
+
+	union kvm_mmu_page_role role;
+
+	/* The start of the GFN region mapped by this shadow page. */
+	gfn_t gfn;
+
+	/* The page table page. */
+	u64 *spt;
+
+	/* The PTE that points to this shadow page. */
+	tdp_ptep_t ptep;
+
+	/* Used for freeing TDP MMU pages asynchronously. */
+	struct rcu_head rcu_head;
+
+	/* The number of references to this shadow page as a root. */
+	refcount_t root_refcount;
+
+	/* Used for tearing down an entire page table tree. */
+	struct work_struct tdp_mmu_async_work;
+	void *tdp_mmu_async_data;
+
+	struct kvm_mmu_page_arch arch;
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move struct kvm_mmu_page to common code and all x86-specific fields into
kvm_mmu_page_arch.

This commit increases the size of struct kvm_mmu_page by 64 bytes on
x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
in future commits by moving TDP MMU root fields into a separate struct
and by dynamically allocating fields only used by the Shadow MMU.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/mmu_types.h |  62 ++++++++++++++
 arch/x86/include/asm/kvm_host.h      |   4 -
 arch/x86/kvm/mmu/mmu.c               | 122 ++++++++++++++-------------
 arch/x86/kvm/mmu/mmu_internal.h      |  83 ------------------
 arch/x86/kvm/mmu/mmutrace.h          |   4 +-
 arch/x86/kvm/mmu/paging_tmpl.h       |  10 +--
 arch/x86/kvm/mmu/tdp_mmu.c           |   8 +-
 arch/x86/kvm/mmu/tdp_mmu.h           |   2 +-
 include/kvm/mmu_types.h              |  32 ++++++-
 9 files changed, 167 insertions(+), 160 deletions(-)

diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
index 35f893ebab5a..affcb520b482 100644
--- a/arch/x86/include/asm/kvm/mmu_types.h
+++ b/arch/x86/include/asm/kvm/mmu_types.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_KVM_MMU_TYPES_H
 #define __ASM_KVM_MMU_TYPES_H
 
+#include <linux/bitmap.h>
+#include <linux/list.h>
 #include <linux/types.h>
 
 /*
@@ -53,4 +55,64 @@ struct kvm_mmu_page_role_arch {
 	u16 passthrough:1;
 };
 
+struct kvm_rmap_head {
+	unsigned long val;
+};
+
+struct kvm_mmu_page_arch {
+	struct hlist_node hash_link;
+
+	bool shadow_mmu_page;
+	bool unsync;
+	u8 mmu_valid_gen;
+
+	 /*
+	  * The shadow page can't be replaced by an equivalent huge page
+	  * because it is being used to map an executable page in the guest
+	  * and the NX huge page mitigation is enabled.
+	  */
+	bool nx_huge_page_disallowed;
+
+	/*
+	 * Stores the result of the guest translation being shadowed by each
+	 * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
+	 * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
+	 * cases the result of the translation is a GPA and a set of access
+	 * constraints.
+	 *
+	 * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
+	 * access permissions are stored in the lower bits. Note, for
+	 * convenience and uniformity across guests, the access permissions are
+	 * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
+	 */
+	u64 *shadowed_translation;
+
+	unsigned int unsync_children;
+
+	/* Rmap pointers to all parent sptes. */
+	struct kvm_rmap_head parent_ptes;
+
+	DECLARE_BITMAP(unsync_child_bitmap, 512);
+
+	/*
+	 * Tracks shadow pages that, if zapped, would allow KVM to create an NX
+	 * huge page.  A shadow page will have nx_huge_page_disallowed set but
+	 * not be on the list if a huge page is disallowed for other reasons,
+	 * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
+	 * isn't properly aligned, etc...
+	 */
+	struct list_head possible_nx_huge_page_link;
+
+#ifdef CONFIG_X86_32
+	/*
+	 * Used out of the mmu-lock to avoid reading spte values while an
+	 * update is in progress; see the comments in __get_spte_lockless().
+	 */
+	int clear_spte_count;
+#endif
+
+	/* Number of writes since the last time traversal visited this page.  */
+	atomic_t write_flooding_count;
+};
+
 #endif /* !__ASM_KVM_MMU_TYPES_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ebcd7a0dabef..f5743a652e10 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -329,10 +329,6 @@ union kvm_cpu_role {
 	};
 };
 
-struct kvm_rmap_head {
-	unsigned long val;
-};
-
 struct kvm_pio_request {
 	unsigned long linear_rip;
 	unsigned long count;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 11cef930d5ed..e47f35878ab5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -350,7 +350,7 @@ static void count_spte_clear(u64 *sptep, u64 spte)
 
 	/* Ensure the spte is completely set before we increase the count */
 	smp_wmb();
-	sp->clear_spte_count++;
+	sp->arch.clear_spte_count++;
 }
 
 static void __set_spte(u64 *sptep, u64 spte)
@@ -432,7 +432,7 @@ static u64 __get_spte_lockless(u64 *sptep)
 	int count;
 
 retry:
-	count = sp->clear_spte_count;
+	count = sp->arch.clear_spte_count;
 	smp_rmb();
 
 	spte.spte_low = orig->spte_low;
@@ -442,7 +442,7 @@ static u64 __get_spte_lockless(u64 *sptep)
 	smp_rmb();
 
 	if (unlikely(spte.spte_low != orig->spte_low ||
-	      count != sp->clear_spte_count))
+	      count != sp->arch.clear_spte_count))
 		goto retry;
 
 	return spte.spte;
@@ -699,7 +699,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 		return sp->gfn;
 
 	if (!sp->role.arch.direct)
-		return sp->shadowed_translation[index] >> PAGE_SHIFT;
+		return sp->arch.shadowed_translation[index] >> PAGE_SHIFT;
 
 	return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
 }
@@ -713,7 +713,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
 {
 	if (sp_has_gptes(sp))
-		return sp->shadowed_translation[index] & ACC_ALL;
+		return sp->arch.shadowed_translation[index] & ACC_ALL;
 
 	/*
 	 * For direct MMUs (e.g. TDP or non-paging guests) or passthrough SPs,
@@ -734,7 +734,7 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
 					 gfn_t gfn, unsigned int access)
 {
 	if (sp_has_gptes(sp)) {
-		sp->shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
+		sp->arch.shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
 		return;
 	}
 
@@ -825,18 +825,18 @@ void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 	 * on the list if KVM is reusing an existing shadow page, i.e. if KVM
 	 * links a shadow page at multiple points.
 	 */
-	if (!list_empty(&sp->possible_nx_huge_page_link))
+	if (!list_empty(&sp->arch.possible_nx_huge_page_link))
 		return;
 
 	++kvm->stat.nx_lpage_splits;
-	list_add_tail(&sp->possible_nx_huge_page_link,
+	list_add_tail(&sp->arch.possible_nx_huge_page_link,
 		      &kvm->arch.possible_nx_huge_pages);
 }
 
 static void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 				 bool nx_huge_page_possible)
 {
-	sp->nx_huge_page_disallowed = true;
+	sp->arch.nx_huge_page_disallowed = true;
 
 	if (nx_huge_page_possible)
 		track_possible_nx_huge_page(kvm, sp);
@@ -861,16 +861,16 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	if (list_empty(&sp->possible_nx_huge_page_link))
+	if (list_empty(&sp->arch.possible_nx_huge_page_link))
 		return;
 
 	--kvm->stat.nx_lpage_splits;
-	list_del_init(&sp->possible_nx_huge_page_link);
+	list_del_init(&sp->arch.possible_nx_huge_page_link);
 }
 
 static void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	sp->nx_huge_page_disallowed = false;
+	sp->arch.nx_huge_page_disallowed = false;
 
 	untrack_possible_nx_huge_page(kvm, sp);
 }
@@ -1720,11 +1720,11 @@ static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
 {
 	MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
-	hlist_del(&sp->hash_link);
+	hlist_del(&sp->arch.hash_link);
 	list_del(&sp->link);
 	free_page((unsigned long)sp->spt);
 	if (!sp->role.arch.direct)
-		free_page((unsigned long)sp->shadowed_translation);
+		free_page((unsigned long)sp->arch.shadowed_translation);
 	kmem_cache_free(mmu_page_header_cache, sp);
 }
 
@@ -1739,13 +1739,13 @@ static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache,
 	if (!parent_pte)
 		return;
 
-	pte_list_add(cache, parent_pte, &sp->parent_ptes);
+	pte_list_add(cache, parent_pte, &sp->arch.parent_ptes);
 }
 
 static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
 				       u64 *parent_pte)
 {
-	pte_list_remove(parent_pte, &sp->parent_ptes);
+	pte_list_remove(parent_pte, &sp->arch.parent_ptes);
 }
 
 static void drop_parent_pte(struct kvm_mmu_page *sp,
@@ -1761,7 +1761,7 @@ static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
 	u64 *sptep;
 	struct rmap_iterator iter;
 
-	for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
+	for_each_rmap_spte(&sp->arch.parent_ptes, &iter, sptep) {
 		mark_unsync(sptep);
 	}
 }
@@ -1771,9 +1771,9 @@ static void mark_unsync(u64 *spte)
 	struct kvm_mmu_page *sp;
 
 	sp = sptep_to_sp(spte);
-	if (__test_and_set_bit(spte_index(spte), sp->unsync_child_bitmap))
+	if (__test_and_set_bit(spte_index(spte), sp->arch.unsync_child_bitmap))
 		return;
-	if (sp->unsync_children++)
+	if (sp->arch.unsync_children++)
 		return;
 	kvm_mmu_mark_parents_unsync(sp);
 }
@@ -1799,7 +1799,7 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
 {
 	int i;
 
-	if (sp->unsync)
+	if (sp->arch.unsync)
 		for (i=0; i < pvec->nr; i++)
 			if (pvec->page[i].sp == sp)
 				return 0;
@@ -1812,9 +1812,9 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
 
 static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
 {
-	--sp->unsync_children;
-	WARN_ON((int)sp->unsync_children < 0);
-	__clear_bit(idx, sp->unsync_child_bitmap);
+	--sp->arch.unsync_children;
+	WARN_ON((int)sp->arch.unsync_children < 0);
+	__clear_bit(idx, sp->arch.unsync_child_bitmap);
 }
 
 static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
@@ -1822,7 +1822,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 {
 	int i, ret, nr_unsync_leaf = 0;
 
-	for_each_set_bit(i, sp->unsync_child_bitmap, 512) {
+	for_each_set_bit(i, sp->arch.unsync_child_bitmap, 512) {
 		struct kvm_mmu_page *child;
 		u64 ent = sp->spt[i];
 
@@ -1833,7 +1833,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 
 		child = spte_to_child_sp(ent);
 
-		if (child->unsync_children) {
+		if (child->arch.unsync_children) {
 			if (mmu_pages_add(pvec, child, i))
 				return -ENOSPC;
 
@@ -1845,7 +1845,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 				nr_unsync_leaf += ret;
 			} else
 				return ret;
-		} else if (child->unsync) {
+		} else if (child->arch.unsync) {
 			nr_unsync_leaf++;
 			if (mmu_pages_add(pvec, child, i))
 				return -ENOSPC;
@@ -1862,7 +1862,7 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
 			   struct kvm_mmu_pages *pvec)
 {
 	pvec->nr = 0;
-	if (!sp->unsync_children)
+	if (!sp->arch.unsync_children)
 		return 0;
 
 	mmu_pages_add(pvec, sp, INVALID_INDEX);
@@ -1871,9 +1871,9 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
 
 static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	WARN_ON(!sp->unsync);
+	WARN_ON(!sp->arch.unsync);
 	trace_kvm_mmu_sync_page(sp);
-	sp->unsync = 0;
+	sp->arch.unsync = 0;
 	--kvm->stat.mmu_unsync;
 }
 
@@ -1894,7 +1894,7 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
 }
 
 #define for_each_valid_sp(_kvm, _sp, _list)				\
-	hlist_for_each_entry(_sp, _list, hash_link)			\
+	hlist_for_each_entry(_sp, _list, arch.hash_link)			\
 		if (is_obsolete_sp((_kvm), (_sp))) {			\
 		} else
 
@@ -1934,7 +1934,7 @@ static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* TDP MMU pages do not use the MMU generation. */
 	return !is_tdp_mmu_page(sp) &&
-	       unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
+	       unlikely(sp->arch.mmu_valid_gen != kvm->arch.mmu_valid_gen);
 }
 
 struct mmu_page_path {
@@ -2006,7 +2006,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
 		WARN_ON(idx == INVALID_INDEX);
 		clear_unsync_child_bit(sp, idx);
 		level++;
-	} while (!sp->unsync_children);
+	} while (!sp->arch.unsync_children);
 }
 
 static int mmu_sync_children(struct kvm_vcpu *vcpu,
@@ -2053,7 +2053,7 @@ static int mmu_sync_children(struct kvm_vcpu *vcpu,
 
 static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
 {
-	atomic_set(&sp->write_flooding_count,  0);
+	atomic_set(&sp->arch.write_flooding_count,  0);
 }
 
 static void clear_sp_write_flooding_count(u64 *spte)
@@ -2094,7 +2094,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 			 * Unsync pages must not be left as is, because the new
 			 * upper-level page will be write-protected.
 			 */
-			if (role.level > PG_LEVEL_4K && sp->unsync)
+			if (role.level > PG_LEVEL_4K && sp->arch.unsync)
 				kvm_mmu_prepare_zap_page(kvm, sp,
 							 &invalid_list);
 			continue;
@@ -2104,7 +2104,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 		if (sp->role.arch.direct)
 			goto out;
 
-		if (sp->unsync) {
+		if (sp->arch.unsync) {
 			if (KVM_BUG_ON(!vcpu, kvm))
 				break;
 
@@ -2163,25 +2163,26 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 	sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
 	sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
 	if (!role.arch.direct)
-		sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
+		sp->arch.shadowed_translation =
+			kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
 
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
-	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
+	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
 
 	/*
 	 * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
 	 * depends on valid pages being added to the head of the list.  See
 	 * comments in kvm_zap_obsolete_pages().
 	 */
-	sp->mmu_valid_gen = kvm->arch.mmu_valid_gen;
+	sp->arch.mmu_valid_gen = kvm->arch.mmu_valid_gen;
 	list_add(&sp->link, &kvm->arch.active_mmu_pages);
 	kvm_account_mmu_page(kvm, sp);
 
 	sp->gfn = gfn;
 	sp->role = role;
-	sp->shadow_mmu_page = true;
-	hlist_add_head(&sp->hash_link, sp_list);
+	sp->arch.shadow_mmu_page = true;
+	hlist_add_head(&sp->arch.hash_link, sp_list);
 	if (sp_has_gptes(sp))
 		account_shadowed(kvm, sp);
 
@@ -2368,7 +2369,7 @@ static void __link_shadow_page(struct kvm *kvm,
 
 	mmu_page_add_parent_pte(cache, sp, sptep);
 
-	if (sp->unsync_children || sp->unsync)
+	if (sp->arch.unsync_children || sp->arch.unsync)
 		mark_unsync(sptep);
 }
 
@@ -2421,7 +2422,8 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 			 * avoids retaining a large number of stale nested SPs.
 			 */
 			if (tdp_enabled && invalid_list &&
-			    child->role.arch.guest_mode && !child->parent_ptes.val)
+			    child->role.arch.guest_mode &&
+			    !child->arch.parent_ptes.val)
 				return kvm_mmu_prepare_zap_page(kvm, child,
 								invalid_list);
 		}
@@ -2449,7 +2451,7 @@ static void kvm_mmu_unlink_parents(struct kvm_mmu_page *sp)
 	u64 *sptep;
 	struct rmap_iterator iter;
 
-	while ((sptep = rmap_get_first(&sp->parent_ptes, &iter)))
+	while ((sptep = rmap_get_first(&sp->arch.parent_ptes, &iter)))
 		drop_parent_pte(sp, sptep);
 }
 
@@ -2496,7 +2498,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 	if (!sp->role.invalid && sp_has_gptes(sp))
 		unaccount_shadowed(kvm, sp);
 
-	if (sp->unsync)
+	if (sp->arch.unsync)
 		kvm_unlink_unsync_page(kvm, sp);
 	if (!refcount_read(&sp->root_refcount)) {
 		/* Count self */
@@ -2527,7 +2529,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 		zapped_root = !is_obsolete_sp(kvm, sp);
 	}
 
-	if (sp->nx_huge_page_disallowed)
+	if (sp->arch.nx_huge_page_disallowed)
 		unaccount_nx_huge_page(kvm, sp);
 
 	sp->role.invalid = 1;
@@ -2704,7 +2706,7 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	trace_kvm_mmu_unsync_page(sp);
 	++kvm->stat.mmu_unsync;
-	sp->unsync = 1;
+	sp->arch.unsync = 1;
 
 	kvm_mmu_mark_parents_unsync(sp);
 }
@@ -2739,7 +2741,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 		if (!can_unsync)
 			return -EPERM;
 
-		if (sp->unsync)
+		if (sp->arch.unsync)
 			continue;
 
 		if (prefetch)
@@ -2764,7 +2766,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 			 * for write, i.e. unsync cannot transition from 0->1
 			 * while this CPU holds mmu_lock for read (or write).
 			 */
-			if (READ_ONCE(sp->unsync))
+			if (READ_ONCE(sp->arch.unsync))
 				continue;
 		}
 
@@ -2796,8 +2798,8 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 *                      2.2 Guest issues TLB flush.
 	 *                          That causes a VM Exit.
 	 *
-	 *                      2.3 Walking of unsync pages sees sp->unsync is
-	 *                          false and skips the page.
+	 *                      2.3 Walking of unsync pages sees sp->arch.unsync
+	 *                          is false and skips the page.
 	 *
 	 *                      2.4 Guest accesses GVA X.
 	 *                          Since the mapping in the SP was not updated,
@@ -2805,7 +2807,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 *                          gets used.
 	 * 1.1 Host marks SP
 	 *     as unsync
-	 *     (sp->unsync = true)
+	 *     (sp->arch.unsync = true)
 	 *
 	 * The write barrier below ensures that 1.1 happens before 1.2 and thus
 	 * the situation in 2.4 does not arise.  It pairs with the read barrier
@@ -3126,7 +3128,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 	    cur_level == fault->goal_level &&
 	    is_shadow_present_pte(spte) &&
 	    !is_large_pte(spte) &&
-	    spte_to_child_sp(spte)->nx_huge_page_disallowed) {
+	    spte_to_child_sp(spte)->arch.nx_huge_page_disallowed) {
 		/*
 		 * A small SPTE exists for this pfn, but FNAME(fetch),
 		 * direct_map(), or kvm_tdp_mmu_map() would like to create a
@@ -3902,7 +3904,7 @@ static bool is_unsync_root(hpa_t root)
 
 	/*
 	 * The read barrier orders the CPU's read of SPTE.W during the page table
-	 * walk before the reads of sp->unsync/sp->unsync_children here.
+	 * walk before the reads of sp->arch.{unsync,unsync_children} here.
 	 *
 	 * Even if another CPU was marking the SP as unsync-ed simultaneously,
 	 * any guest page table changes are not guaranteed to be visible anyway
@@ -3922,7 +3924,7 @@ static bool is_unsync_root(hpa_t root)
 	if (WARN_ON_ONCE(!sp))
 		return false;
 
-	if (sp->unsync || sp->unsync_children)
+	if (sp->arch.unsync || sp->arch.unsync_children)
 		return true;
 
 	return false;
@@ -5510,8 +5512,8 @@ static bool detect_write_flooding(struct kvm_mmu_page *sp)
 	if (sp->role.level == PG_LEVEL_4K)
 		return false;
 
-	atomic_inc(&sp->write_flooding_count);
-	return atomic_read(&sp->write_flooding_count) >= 3;
+	atomic_inc(&sp->arch.write_flooding_count);
+	return atomic_read(&sp->arch.write_flooding_count) >= 3;
 }
 
 /*
@@ -6389,7 +6391,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
 			continue;
 
 		/* SPs with level >PG_LEVEL_4K should never by unsync. */
-		if (WARN_ON_ONCE(sp->unsync))
+		if (WARN_ON_ONCE(sp->arch.unsync))
 			continue;
 
 		/* Don't bother splitting huge pages on invalid SPs. */
@@ -6941,8 +6943,8 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 		 */
 		sp = list_first_entry(&kvm->arch.possible_nx_huge_pages,
 				      struct kvm_mmu_page,
-				      possible_nx_huge_page_link);
-		WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
+				      arch.possible_nx_huge_page_link);
+		WARN_ON_ONCE(!sp->arch.nx_huge_page_disallowed);
 		WARN_ON_ONCE(!sp->role.arch.direct);
 
 		/*
@@ -6977,7 +6979,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 			flush |= kvm_tdp_mmu_zap_sp(kvm, sp);
 		else
 			kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
-		WARN_ON_ONCE(sp->nx_huge_page_disallowed);
+		WARN_ON_ONCE(sp->arch.nx_huge_page_disallowed);
 
 		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
 			kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index fd4990c8b0e9..af2ae4887e35 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -44,89 +44,6 @@ extern bool dbg;
 #define INVALID_PAE_ROOT	0
 #define IS_VALID_PAE_ROOT(x)	(!!(x))
 
-struct kvm_mmu_page {
-	/*
-	 * Note, "link" through "spt" fit in a single 64 byte cache line on
-	 * 64-bit kernels, keep it that way unless there's a reason not to.
-	 */
-	struct list_head link;
-	struct hlist_node hash_link;
-
-	bool shadow_mmu_page;
-	bool unsync;
-	u8 mmu_valid_gen;
-
-	 /*
-	  * The shadow page can't be replaced by an equivalent huge page
-	  * because it is being used to map an executable page in the guest
-	  * and the NX huge page mitigation is enabled.
-	  */
-	bool nx_huge_page_disallowed;
-
-	/*
-	 * The following two entries are used to key the shadow page in the
-	 * hash table.
-	 */
-	union kvm_mmu_page_role role;
-	gfn_t gfn;
-
-	u64 *spt;
-
-	/*
-	 * Stores the result of the guest translation being shadowed by each
-	 * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
-	 * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
-	 * cases the result of the translation is a GPA and a set of access
-	 * constraints.
-	 *
-	 * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
-	 * access permissions are stored in the lower bits. Note, for
-	 * convenience and uniformity across guests, the access permissions are
-	 * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
-	 */
-	u64 *shadowed_translation;
-
-	/* Currently serving as active root */
-	refcount_t root_refcount;
-
-	unsigned int unsync_children;
-	union {
-		struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
-		tdp_ptep_t ptep;
-	};
-	union {
-		DECLARE_BITMAP(unsync_child_bitmap, 512);
-		struct {
-			struct work_struct tdp_mmu_async_work;
-			void *tdp_mmu_async_data;
-		};
-	};
-
-	/*
-	 * Tracks shadow pages that, if zapped, would allow KVM to create an NX
-	 * huge page.  A shadow page will have nx_huge_page_disallowed set but
-	 * not be on the list if a huge page is disallowed for other reasons,
-	 * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
-	 * isn't properly aligned, etc...
-	 */
-	struct list_head possible_nx_huge_page_link;
-#ifdef CONFIG_X86_32
-	/*
-	 * Used out of the mmu-lock to avoid reading spte values while an
-	 * update is in progress; see the comments in __get_spte_lockless().
-	 */
-	int clear_spte_count;
-#endif
-
-	/* Number of writes since the last time traversal visited this page.  */
-	atomic_t write_flooding_count;
-
-#ifdef CONFIG_X86_64
-	/* Used for freeing the page asynchronously if it is a TDP MMU page. */
-	struct rcu_head rcu_head;
-#endif
-};
-
 extern struct kmem_cache *mmu_page_header_cache;
 
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index ffd10ce3eae3..335f26dabdf3 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -16,11 +16,11 @@
 	__field(bool, unsync)
 
 #define KVM_MMU_PAGE_ASSIGN(sp)				\
-	__entry->mmu_valid_gen = sp->mmu_valid_gen;	\
+	__entry->mmu_valid_gen = sp->arch.mmu_valid_gen;	\
 	__entry->gfn = sp->gfn;				\
 	__entry->role = sp->role.word;			\
 	__entry->root_count = refcount_read(&sp->root_refcount);	\
-	__entry->unsync = sp->unsync;
+	__entry->unsync = sp->arch.unsync;
 
 #define KVM_MMU_PAGE_PRINTK() ({				        \
 	const char *saved_ptr = trace_seq_buffer_ptr(p);		\
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e15ec1c473da..18bb92b70a01 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -671,7 +671,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			 * KVM_REQ_MMU_SYNC is not necessary but it
 			 * expedites the process.
 			 */
-			if (sp->unsync_children &&
+			if (sp->arch.unsync_children &&
 			    mmu_sync_children(vcpu, sp, false))
 				return RET_PF_RETRY;
 		}
@@ -921,7 +921,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 			pt_element_t gpte;
 			gpa_t pte_gpa;
 
-			if (!sp->unsync)
+			if (!sp->arch.unsync)
 				break;
 
 			pte_gpa = FNAME(get_level1_sp_gpa)(sp);
@@ -942,7 +942,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 			FNAME(prefetch_gpte)(vcpu, sp, sptep, gpte, false);
 		}
 
-		if (!sp->unsync_children)
+		if (!sp->arch.unsync_children)
 			break;
 	}
 	write_unlock(&vcpu->kvm->mmu_lock);
@@ -974,8 +974,8 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 }
 
 /*
- * Using the information in sp->shadowed_translation (kvm_mmu_page_get_gfn()) is
- * safe because:
+ * Using the information in sp->arch.shadowed_translation
+ * (kvm_mmu_page_get_gfn()) is safe because:
  * - The spte has a reference to the struct page, so the pfn for a given gfn
  *   can't change unless all sptes pointing to it are nuked first.
  *
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 34d674080170..66231c7ed31e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -270,7 +270,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 			    gfn_t gfn, union kvm_mmu_page_role role)
 {
-	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
+	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
 
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
@@ -385,7 +385,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 {
 	tdp_unaccount_mmu_page(kvm, sp);
 
-	if (!sp->nx_huge_page_disallowed)
+	if (!sp->arch.nx_huge_page_disallowed)
 		return;
 
 	if (shared)
@@ -393,7 +393,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 	else
 		lockdep_assert_held_write(&kvm->mmu_lock);
 
-	sp->nx_huge_page_disallowed = false;
+	sp->arch.nx_huge_page_disallowed = false;
 	untrack_possible_nx_huge_page(kvm, sp);
 
 	if (shared)
@@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		sp = tdp_mmu_alloc_sp(vcpu);
 		tdp_mmu_init_child_sp(sp, &iter);
 
-		sp->nx_huge_page_disallowed = fault->huge_page_disallowed;
+		sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
 
 		if (is_shadow_present_pte(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 19d3153051a3..e6a929089715 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -73,7 +73,7 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 #ifdef CONFIG_X86_64
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
 {
-	return !sp->shadow_mmu_page;
+	return !sp->arch.shadow_mmu_page;
 }
 #else
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 14099956fdac..a9da33d4baa8 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -3,8 +3,11 @@
 #define __KVM_MMU_TYPES_H
 
 #include <linux/bug.h>
-#include <linux/types.h>
+#include <linux/kvm_types.h>
+#include <linux/refcount.h>
 #include <linux/stddef.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
 
 #include <asm/kvm/mmu_types.h>
 
@@ -36,4 +39,31 @@ static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page
 
 typedef u64 __rcu *tdp_ptep_t;
 
+struct kvm_mmu_page {
+	struct list_head link;
+
+	union kvm_mmu_page_role role;
+
+	/* The start of the GFN region mapped by this shadow page. */
+	gfn_t gfn;
+
+	/* The page table page. */
+	u64 *spt;
+
+	/* The PTE that points to this shadow page. */
+	tdp_ptep_t ptep;
+
+	/* Used for freeing TDP MMU pages asynchronously. */
+	struct rcu_head rcu_head;
+
+	/* The number of references to this shadow page as a root. */
+	refcount_t root_refcount;
+
+	/* Used for tearing down an entire page table tree. */
+	struct work_struct tdp_mmu_async_work;
+	void *tdp_mmu_async_data;
+
+	struct kvm_mmu_page_arch arch;
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move struct kvm_mmu_page to common code and all x86-specific fields into
kvm_mmu_page_arch.

This commit increases the size of struct kvm_mmu_page by 64 bytes on
x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
in future commits by moving TDP MMU root fields into a separate struct
and by dynamically allocating fields only used by the Shadow MMU.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/mmu_types.h |  62 ++++++++++++++
 arch/x86/include/asm/kvm_host.h      |   4 -
 arch/x86/kvm/mmu/mmu.c               | 122 ++++++++++++++-------------
 arch/x86/kvm/mmu/mmu_internal.h      |  83 ------------------
 arch/x86/kvm/mmu/mmutrace.h          |   4 +-
 arch/x86/kvm/mmu/paging_tmpl.h       |  10 +--
 arch/x86/kvm/mmu/tdp_mmu.c           |   8 +-
 arch/x86/kvm/mmu/tdp_mmu.h           |   2 +-
 include/kvm/mmu_types.h              |  32 ++++++-
 9 files changed, 167 insertions(+), 160 deletions(-)

diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
index 35f893ebab5a..affcb520b482 100644
--- a/arch/x86/include/asm/kvm/mmu_types.h
+++ b/arch/x86/include/asm/kvm/mmu_types.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_KVM_MMU_TYPES_H
 #define __ASM_KVM_MMU_TYPES_H
 
+#include <linux/bitmap.h>
+#include <linux/list.h>
 #include <linux/types.h>
 
 /*
@@ -53,4 +55,64 @@ struct kvm_mmu_page_role_arch {
 	u16 passthrough:1;
 };
 
+struct kvm_rmap_head {
+	unsigned long val;
+};
+
+struct kvm_mmu_page_arch {
+	struct hlist_node hash_link;
+
+	bool shadow_mmu_page;
+	bool unsync;
+	u8 mmu_valid_gen;
+
+	 /*
+	  * The shadow page can't be replaced by an equivalent huge page
+	  * because it is being used to map an executable page in the guest
+	  * and the NX huge page mitigation is enabled.
+	  */
+	bool nx_huge_page_disallowed;
+
+	/*
+	 * Stores the result of the guest translation being shadowed by each
+	 * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
+	 * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
+	 * cases the result of the translation is a GPA and a set of access
+	 * constraints.
+	 *
+	 * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
+	 * access permissions are stored in the lower bits. Note, for
+	 * convenience and uniformity across guests, the access permissions are
+	 * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
+	 */
+	u64 *shadowed_translation;
+
+	unsigned int unsync_children;
+
+	/* Rmap pointers to all parent sptes. */
+	struct kvm_rmap_head parent_ptes;
+
+	DECLARE_BITMAP(unsync_child_bitmap, 512);
+
+	/*
+	 * Tracks shadow pages that, if zapped, would allow KVM to create an NX
+	 * huge page.  A shadow page will have nx_huge_page_disallowed set but
+	 * not be on the list if a huge page is disallowed for other reasons,
+	 * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
+	 * isn't properly aligned, etc...
+	 */
+	struct list_head possible_nx_huge_page_link;
+
+#ifdef CONFIG_X86_32
+	/*
+	 * Used out of the mmu-lock to avoid reading spte values while an
+	 * update is in progress; see the comments in __get_spte_lockless().
+	 */
+	int clear_spte_count;
+#endif
+
+	/* Number of writes since the last time traversal visited this page.  */
+	atomic_t write_flooding_count;
+};
+
 #endif /* !__ASM_KVM_MMU_TYPES_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ebcd7a0dabef..f5743a652e10 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -329,10 +329,6 @@ union kvm_cpu_role {
 	};
 };
 
-struct kvm_rmap_head {
-	unsigned long val;
-};
-
 struct kvm_pio_request {
 	unsigned long linear_rip;
 	unsigned long count;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 11cef930d5ed..e47f35878ab5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -350,7 +350,7 @@ static void count_spte_clear(u64 *sptep, u64 spte)
 
 	/* Ensure the spte is completely set before we increase the count */
 	smp_wmb();
-	sp->clear_spte_count++;
+	sp->arch.clear_spte_count++;
 }
 
 static void __set_spte(u64 *sptep, u64 spte)
@@ -432,7 +432,7 @@ static u64 __get_spte_lockless(u64 *sptep)
 	int count;
 
 retry:
-	count = sp->clear_spte_count;
+	count = sp->arch.clear_spte_count;
 	smp_rmb();
 
 	spte.spte_low = orig->spte_low;
@@ -442,7 +442,7 @@ static u64 __get_spte_lockless(u64 *sptep)
 	smp_rmb();
 
 	if (unlikely(spte.spte_low != orig->spte_low ||
-	      count != sp->clear_spte_count))
+	      count != sp->arch.clear_spte_count))
 		goto retry;
 
 	return spte.spte;
@@ -699,7 +699,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 		return sp->gfn;
 
 	if (!sp->role.arch.direct)
-		return sp->shadowed_translation[index] >> PAGE_SHIFT;
+		return sp->arch.shadowed_translation[index] >> PAGE_SHIFT;
 
 	return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
 }
@@ -713,7 +713,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
 {
 	if (sp_has_gptes(sp))
-		return sp->shadowed_translation[index] & ACC_ALL;
+		return sp->arch.shadowed_translation[index] & ACC_ALL;
 
 	/*
 	 * For direct MMUs (e.g. TDP or non-paging guests) or passthrough SPs,
@@ -734,7 +734,7 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
 					 gfn_t gfn, unsigned int access)
 {
 	if (sp_has_gptes(sp)) {
-		sp->shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
+		sp->arch.shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
 		return;
 	}
 
@@ -825,18 +825,18 @@ void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 	 * on the list if KVM is reusing an existing shadow page, i.e. if KVM
 	 * links a shadow page at multiple points.
 	 */
-	if (!list_empty(&sp->possible_nx_huge_page_link))
+	if (!list_empty(&sp->arch.possible_nx_huge_page_link))
 		return;
 
 	++kvm->stat.nx_lpage_splits;
-	list_add_tail(&sp->possible_nx_huge_page_link,
+	list_add_tail(&sp->arch.possible_nx_huge_page_link,
 		      &kvm->arch.possible_nx_huge_pages);
 }
 
 static void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 				 bool nx_huge_page_possible)
 {
-	sp->nx_huge_page_disallowed = true;
+	sp->arch.nx_huge_page_disallowed = true;
 
 	if (nx_huge_page_possible)
 		track_possible_nx_huge_page(kvm, sp);
@@ -861,16 +861,16 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	if (list_empty(&sp->possible_nx_huge_page_link))
+	if (list_empty(&sp->arch.possible_nx_huge_page_link))
 		return;
 
 	--kvm->stat.nx_lpage_splits;
-	list_del_init(&sp->possible_nx_huge_page_link);
+	list_del_init(&sp->arch.possible_nx_huge_page_link);
 }
 
 static void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	sp->nx_huge_page_disallowed = false;
+	sp->arch.nx_huge_page_disallowed = false;
 
 	untrack_possible_nx_huge_page(kvm, sp);
 }
@@ -1720,11 +1720,11 @@ static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
 {
 	MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
-	hlist_del(&sp->hash_link);
+	hlist_del(&sp->arch.hash_link);
 	list_del(&sp->link);
 	free_page((unsigned long)sp->spt);
 	if (!sp->role.arch.direct)
-		free_page((unsigned long)sp->shadowed_translation);
+		free_page((unsigned long)sp->arch.shadowed_translation);
 	kmem_cache_free(mmu_page_header_cache, sp);
 }
 
@@ -1739,13 +1739,13 @@ static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache,
 	if (!parent_pte)
 		return;
 
-	pte_list_add(cache, parent_pte, &sp->parent_ptes);
+	pte_list_add(cache, parent_pte, &sp->arch.parent_ptes);
 }
 
 static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
 				       u64 *parent_pte)
 {
-	pte_list_remove(parent_pte, &sp->parent_ptes);
+	pte_list_remove(parent_pte, &sp->arch.parent_ptes);
 }
 
 static void drop_parent_pte(struct kvm_mmu_page *sp,
@@ -1761,7 +1761,7 @@ static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
 	u64 *sptep;
 	struct rmap_iterator iter;
 
-	for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
+	for_each_rmap_spte(&sp->arch.parent_ptes, &iter, sptep) {
 		mark_unsync(sptep);
 	}
 }
@@ -1771,9 +1771,9 @@ static void mark_unsync(u64 *spte)
 	struct kvm_mmu_page *sp;
 
 	sp = sptep_to_sp(spte);
-	if (__test_and_set_bit(spte_index(spte), sp->unsync_child_bitmap))
+	if (__test_and_set_bit(spte_index(spte), sp->arch.unsync_child_bitmap))
 		return;
-	if (sp->unsync_children++)
+	if (sp->arch.unsync_children++)
 		return;
 	kvm_mmu_mark_parents_unsync(sp);
 }
@@ -1799,7 +1799,7 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
 {
 	int i;
 
-	if (sp->unsync)
+	if (sp->arch.unsync)
 		for (i=0; i < pvec->nr; i++)
 			if (pvec->page[i].sp == sp)
 				return 0;
@@ -1812,9 +1812,9 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
 
 static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
 {
-	--sp->unsync_children;
-	WARN_ON((int)sp->unsync_children < 0);
-	__clear_bit(idx, sp->unsync_child_bitmap);
+	--sp->arch.unsync_children;
+	WARN_ON((int)sp->arch.unsync_children < 0);
+	__clear_bit(idx, sp->arch.unsync_child_bitmap);
 }
 
 static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
@@ -1822,7 +1822,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 {
 	int i, ret, nr_unsync_leaf = 0;
 
-	for_each_set_bit(i, sp->unsync_child_bitmap, 512) {
+	for_each_set_bit(i, sp->arch.unsync_child_bitmap, 512) {
 		struct kvm_mmu_page *child;
 		u64 ent = sp->spt[i];
 
@@ -1833,7 +1833,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 
 		child = spte_to_child_sp(ent);
 
-		if (child->unsync_children) {
+		if (child->arch.unsync_children) {
 			if (mmu_pages_add(pvec, child, i))
 				return -ENOSPC;
 
@@ -1845,7 +1845,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 				nr_unsync_leaf += ret;
 			} else
 				return ret;
-		} else if (child->unsync) {
+		} else if (child->arch.unsync) {
 			nr_unsync_leaf++;
 			if (mmu_pages_add(pvec, child, i))
 				return -ENOSPC;
@@ -1862,7 +1862,7 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
 			   struct kvm_mmu_pages *pvec)
 {
 	pvec->nr = 0;
-	if (!sp->unsync_children)
+	if (!sp->arch.unsync_children)
 		return 0;
 
 	mmu_pages_add(pvec, sp, INVALID_INDEX);
@@ -1871,9 +1871,9 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
 
 static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	WARN_ON(!sp->unsync);
+	WARN_ON(!sp->arch.unsync);
 	trace_kvm_mmu_sync_page(sp);
-	sp->unsync = 0;
+	sp->arch.unsync = 0;
 	--kvm->stat.mmu_unsync;
 }
 
@@ -1894,7 +1894,7 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
 }
 
 #define for_each_valid_sp(_kvm, _sp, _list)				\
-	hlist_for_each_entry(_sp, _list, hash_link)			\
+	hlist_for_each_entry(_sp, _list, arch.hash_link)			\
 		if (is_obsolete_sp((_kvm), (_sp))) {			\
 		} else
 
@@ -1934,7 +1934,7 @@ static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* TDP MMU pages do not use the MMU generation. */
 	return !is_tdp_mmu_page(sp) &&
-	       unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
+	       unlikely(sp->arch.mmu_valid_gen != kvm->arch.mmu_valid_gen);
 }
 
 struct mmu_page_path {
@@ -2006,7 +2006,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
 		WARN_ON(idx == INVALID_INDEX);
 		clear_unsync_child_bit(sp, idx);
 		level++;
-	} while (!sp->unsync_children);
+	} while (!sp->arch.unsync_children);
 }
 
 static int mmu_sync_children(struct kvm_vcpu *vcpu,
@@ -2053,7 +2053,7 @@ static int mmu_sync_children(struct kvm_vcpu *vcpu,
 
 static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
 {
-	atomic_set(&sp->write_flooding_count,  0);
+	atomic_set(&sp->arch.write_flooding_count,  0);
 }
 
 static void clear_sp_write_flooding_count(u64 *spte)
@@ -2094,7 +2094,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 			 * Unsync pages must not be left as is, because the new
 			 * upper-level page will be write-protected.
 			 */
-			if (role.level > PG_LEVEL_4K && sp->unsync)
+			if (role.level > PG_LEVEL_4K && sp->arch.unsync)
 				kvm_mmu_prepare_zap_page(kvm, sp,
 							 &invalid_list);
 			continue;
@@ -2104,7 +2104,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 		if (sp->role.arch.direct)
 			goto out;
 
-		if (sp->unsync) {
+		if (sp->arch.unsync) {
 			if (KVM_BUG_ON(!vcpu, kvm))
 				break;
 
@@ -2163,25 +2163,26 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 	sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
 	sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
 	if (!role.arch.direct)
-		sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
+		sp->arch.shadowed_translation =
+			kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
 
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
-	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
+	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
 
 	/*
 	 * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
 	 * depends on valid pages being added to the head of the list.  See
 	 * comments in kvm_zap_obsolete_pages().
 	 */
-	sp->mmu_valid_gen = kvm->arch.mmu_valid_gen;
+	sp->arch.mmu_valid_gen = kvm->arch.mmu_valid_gen;
 	list_add(&sp->link, &kvm->arch.active_mmu_pages);
 	kvm_account_mmu_page(kvm, sp);
 
 	sp->gfn = gfn;
 	sp->role = role;
-	sp->shadow_mmu_page = true;
-	hlist_add_head(&sp->hash_link, sp_list);
+	sp->arch.shadow_mmu_page = true;
+	hlist_add_head(&sp->arch.hash_link, sp_list);
 	if (sp_has_gptes(sp))
 		account_shadowed(kvm, sp);
 
@@ -2368,7 +2369,7 @@ static void __link_shadow_page(struct kvm *kvm,
 
 	mmu_page_add_parent_pte(cache, sp, sptep);
 
-	if (sp->unsync_children || sp->unsync)
+	if (sp->arch.unsync_children || sp->arch.unsync)
 		mark_unsync(sptep);
 }
 
@@ -2421,7 +2422,8 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 			 * avoids retaining a large number of stale nested SPs.
 			 */
 			if (tdp_enabled && invalid_list &&
-			    child->role.arch.guest_mode && !child->parent_ptes.val)
+			    child->role.arch.guest_mode &&
+			    !child->arch.parent_ptes.val)
 				return kvm_mmu_prepare_zap_page(kvm, child,
 								invalid_list);
 		}
@@ -2449,7 +2451,7 @@ static void kvm_mmu_unlink_parents(struct kvm_mmu_page *sp)
 	u64 *sptep;
 	struct rmap_iterator iter;
 
-	while ((sptep = rmap_get_first(&sp->parent_ptes, &iter)))
+	while ((sptep = rmap_get_first(&sp->arch.parent_ptes, &iter)))
 		drop_parent_pte(sp, sptep);
 }
 
@@ -2496,7 +2498,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 	if (!sp->role.invalid && sp_has_gptes(sp))
 		unaccount_shadowed(kvm, sp);
 
-	if (sp->unsync)
+	if (sp->arch.unsync)
 		kvm_unlink_unsync_page(kvm, sp);
 	if (!refcount_read(&sp->root_refcount)) {
 		/* Count self */
@@ -2527,7 +2529,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 		zapped_root = !is_obsolete_sp(kvm, sp);
 	}
 
-	if (sp->nx_huge_page_disallowed)
+	if (sp->arch.nx_huge_page_disallowed)
 		unaccount_nx_huge_page(kvm, sp);
 
 	sp->role.invalid = 1;
@@ -2704,7 +2706,7 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	trace_kvm_mmu_unsync_page(sp);
 	++kvm->stat.mmu_unsync;
-	sp->unsync = 1;
+	sp->arch.unsync = 1;
 
 	kvm_mmu_mark_parents_unsync(sp);
 }
@@ -2739,7 +2741,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 		if (!can_unsync)
 			return -EPERM;
 
-		if (sp->unsync)
+		if (sp->arch.unsync)
 			continue;
 
 		if (prefetch)
@@ -2764,7 +2766,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 			 * for write, i.e. unsync cannot transition from 0->1
 			 * while this CPU holds mmu_lock for read (or write).
 			 */
-			if (READ_ONCE(sp->unsync))
+			if (READ_ONCE(sp->arch.unsync))
 				continue;
 		}
 
@@ -2796,8 +2798,8 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 *                      2.2 Guest issues TLB flush.
 	 *                          That causes a VM Exit.
 	 *
-	 *                      2.3 Walking of unsync pages sees sp->unsync is
-	 *                          false and skips the page.
+	 *                      2.3 Walking of unsync pages sees sp->arch.unsync
+	 *                          is false and skips the page.
 	 *
 	 *                      2.4 Guest accesses GVA X.
 	 *                          Since the mapping in the SP was not updated,
@@ -2805,7 +2807,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 *                          gets used.
 	 * 1.1 Host marks SP
 	 *     as unsync
-	 *     (sp->unsync = true)
+	 *     (sp->arch.unsync = true)
 	 *
 	 * The write barrier below ensures that 1.1 happens before 1.2 and thus
 	 * the situation in 2.4 does not arise.  It pairs with the read barrier
@@ -3126,7 +3128,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 	    cur_level == fault->goal_level &&
 	    is_shadow_present_pte(spte) &&
 	    !is_large_pte(spte) &&
-	    spte_to_child_sp(spte)->nx_huge_page_disallowed) {
+	    spte_to_child_sp(spte)->arch.nx_huge_page_disallowed) {
 		/*
 		 * A small SPTE exists for this pfn, but FNAME(fetch),
 		 * direct_map(), or kvm_tdp_mmu_map() would like to create a
@@ -3902,7 +3904,7 @@ static bool is_unsync_root(hpa_t root)
 
 	/*
 	 * The read barrier orders the CPU's read of SPTE.W during the page table
-	 * walk before the reads of sp->unsync/sp->unsync_children here.
+	 * walk before the reads of sp->arch.{unsync,unsync_children} here.
 	 *
 	 * Even if another CPU was marking the SP as unsync-ed simultaneously,
 	 * any guest page table changes are not guaranteed to be visible anyway
@@ -3922,7 +3924,7 @@ static bool is_unsync_root(hpa_t root)
 	if (WARN_ON_ONCE(!sp))
 		return false;
 
-	if (sp->unsync || sp->unsync_children)
+	if (sp->arch.unsync || sp->arch.unsync_children)
 		return true;
 
 	return false;
@@ -5510,8 +5512,8 @@ static bool detect_write_flooding(struct kvm_mmu_page *sp)
 	if (sp->role.level == PG_LEVEL_4K)
 		return false;
 
-	atomic_inc(&sp->write_flooding_count);
-	return atomic_read(&sp->write_flooding_count) >= 3;
+	atomic_inc(&sp->arch.write_flooding_count);
+	return atomic_read(&sp->arch.write_flooding_count) >= 3;
 }
 
 /*
@@ -6389,7 +6391,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
 			continue;
 
 		/* SPs with level >PG_LEVEL_4K should never by unsync. */
-		if (WARN_ON_ONCE(sp->unsync))
+		if (WARN_ON_ONCE(sp->arch.unsync))
 			continue;
 
 		/* Don't bother splitting huge pages on invalid SPs. */
@@ -6941,8 +6943,8 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 		 */
 		sp = list_first_entry(&kvm->arch.possible_nx_huge_pages,
 				      struct kvm_mmu_page,
-				      possible_nx_huge_page_link);
-		WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
+				      arch.possible_nx_huge_page_link);
+		WARN_ON_ONCE(!sp->arch.nx_huge_page_disallowed);
 		WARN_ON_ONCE(!sp->role.arch.direct);
 
 		/*
@@ -6977,7 +6979,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 			flush |= kvm_tdp_mmu_zap_sp(kvm, sp);
 		else
 			kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
-		WARN_ON_ONCE(sp->nx_huge_page_disallowed);
+		WARN_ON_ONCE(sp->arch.nx_huge_page_disallowed);
 
 		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
 			kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index fd4990c8b0e9..af2ae4887e35 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -44,89 +44,6 @@ extern bool dbg;
 #define INVALID_PAE_ROOT	0
 #define IS_VALID_PAE_ROOT(x)	(!!(x))
 
-struct kvm_mmu_page {
-	/*
-	 * Note, "link" through "spt" fit in a single 64 byte cache line on
-	 * 64-bit kernels, keep it that way unless there's a reason not to.
-	 */
-	struct list_head link;
-	struct hlist_node hash_link;
-
-	bool shadow_mmu_page;
-	bool unsync;
-	u8 mmu_valid_gen;
-
-	 /*
-	  * The shadow page can't be replaced by an equivalent huge page
-	  * because it is being used to map an executable page in the guest
-	  * and the NX huge page mitigation is enabled.
-	  */
-	bool nx_huge_page_disallowed;
-
-	/*
-	 * The following two entries are used to key the shadow page in the
-	 * hash table.
-	 */
-	union kvm_mmu_page_role role;
-	gfn_t gfn;
-
-	u64 *spt;
-
-	/*
-	 * Stores the result of the guest translation being shadowed by each
-	 * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
-	 * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
-	 * cases the result of the translation is a GPA and a set of access
-	 * constraints.
-	 *
-	 * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
-	 * access permissions are stored in the lower bits. Note, for
-	 * convenience and uniformity across guests, the access permissions are
-	 * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
-	 */
-	u64 *shadowed_translation;
-
-	/* Currently serving as active root */
-	refcount_t root_refcount;
-
-	unsigned int unsync_children;
-	union {
-		struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
-		tdp_ptep_t ptep;
-	};
-	union {
-		DECLARE_BITMAP(unsync_child_bitmap, 512);
-		struct {
-			struct work_struct tdp_mmu_async_work;
-			void *tdp_mmu_async_data;
-		};
-	};
-
-	/*
-	 * Tracks shadow pages that, if zapped, would allow KVM to create an NX
-	 * huge page.  A shadow page will have nx_huge_page_disallowed set but
-	 * not be on the list if a huge page is disallowed for other reasons,
-	 * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
-	 * isn't properly aligned, etc...
-	 */
-	struct list_head possible_nx_huge_page_link;
-#ifdef CONFIG_X86_32
-	/*
-	 * Used out of the mmu-lock to avoid reading spte values while an
-	 * update is in progress; see the comments in __get_spte_lockless().
-	 */
-	int clear_spte_count;
-#endif
-
-	/* Number of writes since the last time traversal visited this page.  */
-	atomic_t write_flooding_count;
-
-#ifdef CONFIG_X86_64
-	/* Used for freeing the page asynchronously if it is a TDP MMU page. */
-	struct rcu_head rcu_head;
-#endif
-};
-
 extern struct kmem_cache *mmu_page_header_cache;
 
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index ffd10ce3eae3..335f26dabdf3 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -16,11 +16,11 @@
 	__field(bool, unsync)
 
 #define KVM_MMU_PAGE_ASSIGN(sp)				\
-	__entry->mmu_valid_gen = sp->mmu_valid_gen;	\
+	__entry->mmu_valid_gen = sp->arch.mmu_valid_gen;	\
 	__entry->gfn = sp->gfn;				\
 	__entry->role = sp->role.word;			\
 	__entry->root_count = refcount_read(&sp->root_refcount);	\
-	__entry->unsync = sp->unsync;
+	__entry->unsync = sp->arch.unsync;
 
 #define KVM_MMU_PAGE_PRINTK() ({				        \
 	const char *saved_ptr = trace_seq_buffer_ptr(p);		\
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e15ec1c473da..18bb92b70a01 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -671,7 +671,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			 * KVM_REQ_MMU_SYNC is not necessary but it
 			 * expedites the process.
 			 */
-			if (sp->unsync_children &&
+			if (sp->arch.unsync_children &&
 			    mmu_sync_children(vcpu, sp, false))
 				return RET_PF_RETRY;
 		}
@@ -921,7 +921,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 			pt_element_t gpte;
 			gpa_t pte_gpa;
 
-			if (!sp->unsync)
+			if (!sp->arch.unsync)
 				break;
 
 			pte_gpa = FNAME(get_level1_sp_gpa)(sp);
@@ -942,7 +942,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 			FNAME(prefetch_gpte)(vcpu, sp, sptep, gpte, false);
 		}
 
-		if (!sp->unsync_children)
+		if (!sp->arch.unsync_children)
 			break;
 	}
 	write_unlock(&vcpu->kvm->mmu_lock);
@@ -974,8 +974,8 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 }
 
 /*
- * Using the information in sp->shadowed_translation (kvm_mmu_page_get_gfn()) is
- * safe because:
+ * Using the information in sp->arch.shadowed_translation
+ * (kvm_mmu_page_get_gfn()) is safe because:
  * - The spte has a reference to the struct page, so the pfn for a given gfn
  *   can't change unless all sptes pointing to it are nuked first.
  *
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 34d674080170..66231c7ed31e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -270,7 +270,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 			    gfn_t gfn, union kvm_mmu_page_role role)
 {
-	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
+	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
 
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
@@ -385,7 +385,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 {
 	tdp_unaccount_mmu_page(kvm, sp);
 
-	if (!sp->nx_huge_page_disallowed)
+	if (!sp->arch.nx_huge_page_disallowed)
 		return;
 
 	if (shared)
@@ -393,7 +393,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 	else
 		lockdep_assert_held_write(&kvm->mmu_lock);
 
-	sp->nx_huge_page_disallowed = false;
+	sp->arch.nx_huge_page_disallowed = false;
 	untrack_possible_nx_huge_page(kvm, sp);
 
 	if (shared)
@@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		sp = tdp_mmu_alloc_sp(vcpu);
 		tdp_mmu_init_child_sp(sp, &iter);
 
-		sp->nx_huge_page_disallowed = fault->huge_page_disallowed;
+		sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
 
 		if (is_shadow_present_pte(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 19d3153051a3..e6a929089715 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -73,7 +73,7 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 #ifdef CONFIG_X86_64
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
 {
-	return !sp->shadow_mmu_page;
+	return !sp->arch.shadow_mmu_page;
 }
 #else
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 14099956fdac..a9da33d4baa8 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -3,8 +3,11 @@
 #define __KVM_MMU_TYPES_H
 
 #include <linux/bug.h>
-#include <linux/types.h>
+#include <linux/kvm_types.h>
+#include <linux/refcount.h>
 #include <linux/stddef.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
 
 #include <asm/kvm/mmu_types.h>
 
@@ -36,4 +39,31 @@ static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page
 
 typedef u64 __rcu *tdp_ptep_t;
 
+struct kvm_mmu_page {
+	struct list_head link;
+
+	union kvm_mmu_page_role role;
+
+	/* The start of the GFN region mapped by this shadow page. */
+	gfn_t gfn;
+
+	/* The page table page. */
+	u64 *spt;
+
+	/* The PTE that points to this shadow page. */
+	tdp_ptep_t ptep;
+
+	/* Used for freeing TDP MMU pages asynchronously. */
+	struct rcu_head rcu_head;
+
+	/* The number of references to this shadow page as a root. */
+	refcount_t root_refcount;
+
+	/* Used for tearing down an entire page table tree. */
+	struct work_struct tdp_mmu_async_work;
+	void *tdp_mmu_async_data;
+
+	struct kvm_mmu_page_arch arch;
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move struct kvm_mmu_page to common code and all x86-specific fields into
kvm_mmu_page_arch.

This commit increases the size of struct kvm_mmu_page by 64 bytes on
x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
in future commits by moving TDP MMU root fields into a separate struct
and by dynamically allocating fields only used by the Shadow MMU.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/mmu_types.h |  62 ++++++++++++++
 arch/x86/include/asm/kvm_host.h      |   4 -
 arch/x86/kvm/mmu/mmu.c               | 122 ++++++++++++++-------------
 arch/x86/kvm/mmu/mmu_internal.h      |  83 ------------------
 arch/x86/kvm/mmu/mmutrace.h          |   4 +-
 arch/x86/kvm/mmu/paging_tmpl.h       |  10 +--
 arch/x86/kvm/mmu/tdp_mmu.c           |   8 +-
 arch/x86/kvm/mmu/tdp_mmu.h           |   2 +-
 include/kvm/mmu_types.h              |  32 ++++++-
 9 files changed, 167 insertions(+), 160 deletions(-)

diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
index 35f893ebab5a..affcb520b482 100644
--- a/arch/x86/include/asm/kvm/mmu_types.h
+++ b/arch/x86/include/asm/kvm/mmu_types.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_KVM_MMU_TYPES_H
 #define __ASM_KVM_MMU_TYPES_H
 
+#include <linux/bitmap.h>
+#include <linux/list.h>
 #include <linux/types.h>
 
 /*
@@ -53,4 +55,64 @@ struct kvm_mmu_page_role_arch {
 	u16 passthrough:1;
 };
 
+struct kvm_rmap_head {
+	unsigned long val;
+};
+
+struct kvm_mmu_page_arch {
+	struct hlist_node hash_link;
+
+	bool shadow_mmu_page;
+	bool unsync;
+	u8 mmu_valid_gen;
+
+	 /*
+	  * The shadow page can't be replaced by an equivalent huge page
+	  * because it is being used to map an executable page in the guest
+	  * and the NX huge page mitigation is enabled.
+	  */
+	bool nx_huge_page_disallowed;
+
+	/*
+	 * Stores the result of the guest translation being shadowed by each
+	 * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
+	 * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
+	 * cases the result of the translation is a GPA and a set of access
+	 * constraints.
+	 *
+	 * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
+	 * access permissions are stored in the lower bits. Note, for
+	 * convenience and uniformity across guests, the access permissions are
+	 * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
+	 */
+	u64 *shadowed_translation;
+
+	unsigned int unsync_children;
+
+	/* Rmap pointers to all parent sptes. */
+	struct kvm_rmap_head parent_ptes;
+
+	DECLARE_BITMAP(unsync_child_bitmap, 512);
+
+	/*
+	 * Tracks shadow pages that, if zapped, would allow KVM to create an NX
+	 * huge page.  A shadow page will have nx_huge_page_disallowed set but
+	 * not be on the list if a huge page is disallowed for other reasons,
+	 * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
+	 * isn't properly aligned, etc...
+	 */
+	struct list_head possible_nx_huge_page_link;
+
+#ifdef CONFIG_X86_32
+	/*
+	 * Used out of the mmu-lock to avoid reading spte values while an
+	 * update is in progress; see the comments in __get_spte_lockless().
+	 */
+	int clear_spte_count;
+#endif
+
+	/* Number of writes since the last time traversal visited this page.  */
+	atomic_t write_flooding_count;
+};
+
 #endif /* !__ASM_KVM_MMU_TYPES_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ebcd7a0dabef..f5743a652e10 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -329,10 +329,6 @@ union kvm_cpu_role {
 	};
 };
 
-struct kvm_rmap_head {
-	unsigned long val;
-};
-
 struct kvm_pio_request {
 	unsigned long linear_rip;
 	unsigned long count;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 11cef930d5ed..e47f35878ab5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -350,7 +350,7 @@ static void count_spte_clear(u64 *sptep, u64 spte)
 
 	/* Ensure the spte is completely set before we increase the count */
 	smp_wmb();
-	sp->clear_spte_count++;
+	sp->arch.clear_spte_count++;
 }
 
 static void __set_spte(u64 *sptep, u64 spte)
@@ -432,7 +432,7 @@ static u64 __get_spte_lockless(u64 *sptep)
 	int count;
 
 retry:
-	count = sp->clear_spte_count;
+	count = sp->arch.clear_spte_count;
 	smp_rmb();
 
 	spte.spte_low = orig->spte_low;
@@ -442,7 +442,7 @@ static u64 __get_spte_lockless(u64 *sptep)
 	smp_rmb();
 
 	if (unlikely(spte.spte_low != orig->spte_low ||
-	      count != sp->clear_spte_count))
+	      count != sp->arch.clear_spte_count))
 		goto retry;
 
 	return spte.spte;
@@ -699,7 +699,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 		return sp->gfn;
 
 	if (!sp->role.arch.direct)
-		return sp->shadowed_translation[index] >> PAGE_SHIFT;
+		return sp->arch.shadowed_translation[index] >> PAGE_SHIFT;
 
 	return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
 }
@@ -713,7 +713,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
 static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
 {
 	if (sp_has_gptes(sp))
-		return sp->shadowed_translation[index] & ACC_ALL;
+		return sp->arch.shadowed_translation[index] & ACC_ALL;
 
 	/*
 	 * For direct MMUs (e.g. TDP or non-paging guests) or passthrough SPs,
@@ -734,7 +734,7 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
 					 gfn_t gfn, unsigned int access)
 {
 	if (sp_has_gptes(sp)) {
-		sp->shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
+		sp->arch.shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
 		return;
 	}
 
@@ -825,18 +825,18 @@ void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 	 * on the list if KVM is reusing an existing shadow page, i.e. if KVM
 	 * links a shadow page at multiple points.
 	 */
-	if (!list_empty(&sp->possible_nx_huge_page_link))
+	if (!list_empty(&sp->arch.possible_nx_huge_page_link))
 		return;
 
 	++kvm->stat.nx_lpage_splits;
-	list_add_tail(&sp->possible_nx_huge_page_link,
+	list_add_tail(&sp->arch.possible_nx_huge_page_link,
 		      &kvm->arch.possible_nx_huge_pages);
 }
 
 static void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 				 bool nx_huge_page_possible)
 {
-	sp->nx_huge_page_disallowed = true;
+	sp->arch.nx_huge_page_disallowed = true;
 
 	if (nx_huge_page_possible)
 		track_possible_nx_huge_page(kvm, sp);
@@ -861,16 +861,16 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	if (list_empty(&sp->possible_nx_huge_page_link))
+	if (list_empty(&sp->arch.possible_nx_huge_page_link))
 		return;
 
 	--kvm->stat.nx_lpage_splits;
-	list_del_init(&sp->possible_nx_huge_page_link);
+	list_del_init(&sp->arch.possible_nx_huge_page_link);
 }
 
 static void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	sp->nx_huge_page_disallowed = false;
+	sp->arch.nx_huge_page_disallowed = false;
 
 	untrack_possible_nx_huge_page(kvm, sp);
 }
@@ -1720,11 +1720,11 @@ static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
 {
 	MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
-	hlist_del(&sp->hash_link);
+	hlist_del(&sp->arch.hash_link);
 	list_del(&sp->link);
 	free_page((unsigned long)sp->spt);
 	if (!sp->role.arch.direct)
-		free_page((unsigned long)sp->shadowed_translation);
+		free_page((unsigned long)sp->arch.shadowed_translation);
 	kmem_cache_free(mmu_page_header_cache, sp);
 }
 
@@ -1739,13 +1739,13 @@ static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache,
 	if (!parent_pte)
 		return;
 
-	pte_list_add(cache, parent_pte, &sp->parent_ptes);
+	pte_list_add(cache, parent_pte, &sp->arch.parent_ptes);
 }
 
 static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
 				       u64 *parent_pte)
 {
-	pte_list_remove(parent_pte, &sp->parent_ptes);
+	pte_list_remove(parent_pte, &sp->arch.parent_ptes);
 }
 
 static void drop_parent_pte(struct kvm_mmu_page *sp,
@@ -1761,7 +1761,7 @@ static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
 	u64 *sptep;
 	struct rmap_iterator iter;
 
-	for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
+	for_each_rmap_spte(&sp->arch.parent_ptes, &iter, sptep) {
 		mark_unsync(sptep);
 	}
 }
@@ -1771,9 +1771,9 @@ static void mark_unsync(u64 *spte)
 	struct kvm_mmu_page *sp;
 
 	sp = sptep_to_sp(spte);
-	if (__test_and_set_bit(spte_index(spte), sp->unsync_child_bitmap))
+	if (__test_and_set_bit(spte_index(spte), sp->arch.unsync_child_bitmap))
 		return;
-	if (sp->unsync_children++)
+	if (sp->arch.unsync_children++)
 		return;
 	kvm_mmu_mark_parents_unsync(sp);
 }
@@ -1799,7 +1799,7 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
 {
 	int i;
 
-	if (sp->unsync)
+	if (sp->arch.unsync)
 		for (i=0; i < pvec->nr; i++)
 			if (pvec->page[i].sp == sp)
 				return 0;
@@ -1812,9 +1812,9 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
 
 static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
 {
-	--sp->unsync_children;
-	WARN_ON((int)sp->unsync_children < 0);
-	__clear_bit(idx, sp->unsync_child_bitmap);
+	--sp->arch.unsync_children;
+	WARN_ON((int)sp->arch.unsync_children < 0);
+	__clear_bit(idx, sp->arch.unsync_child_bitmap);
 }
 
 static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
@@ -1822,7 +1822,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 {
 	int i, ret, nr_unsync_leaf = 0;
 
-	for_each_set_bit(i, sp->unsync_child_bitmap, 512) {
+	for_each_set_bit(i, sp->arch.unsync_child_bitmap, 512) {
 		struct kvm_mmu_page *child;
 		u64 ent = sp->spt[i];
 
@@ -1833,7 +1833,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 
 		child = spte_to_child_sp(ent);
 
-		if (child->unsync_children) {
+		if (child->arch.unsync_children) {
 			if (mmu_pages_add(pvec, child, i))
 				return -ENOSPC;
 
@@ -1845,7 +1845,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 				nr_unsync_leaf += ret;
 			} else
 				return ret;
-		} else if (child->unsync) {
+		} else if (child->arch.unsync) {
 			nr_unsync_leaf++;
 			if (mmu_pages_add(pvec, child, i))
 				return -ENOSPC;
@@ -1862,7 +1862,7 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
 			   struct kvm_mmu_pages *pvec)
 {
 	pvec->nr = 0;
-	if (!sp->unsync_children)
+	if (!sp->arch.unsync_children)
 		return 0;
 
 	mmu_pages_add(pvec, sp, INVALID_INDEX);
@@ -1871,9 +1871,9 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
 
 static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	WARN_ON(!sp->unsync);
+	WARN_ON(!sp->arch.unsync);
 	trace_kvm_mmu_sync_page(sp);
-	sp->unsync = 0;
+	sp->arch.unsync = 0;
 	--kvm->stat.mmu_unsync;
 }
 
@@ -1894,7 +1894,7 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
 }
 
 #define for_each_valid_sp(_kvm, _sp, _list)				\
-	hlist_for_each_entry(_sp, _list, hash_link)			\
+	hlist_for_each_entry(_sp, _list, arch.hash_link)			\
 		if (is_obsolete_sp((_kvm), (_sp))) {			\
 		} else
 
@@ -1934,7 +1934,7 @@ static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* TDP MMU pages do not use the MMU generation. */
 	return !is_tdp_mmu_page(sp) &&
-	       unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
+	       unlikely(sp->arch.mmu_valid_gen != kvm->arch.mmu_valid_gen);
 }
 
 struct mmu_page_path {
@@ -2006,7 +2006,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
 		WARN_ON(idx == INVALID_INDEX);
 		clear_unsync_child_bit(sp, idx);
 		level++;
-	} while (!sp->unsync_children);
+	} while (!sp->arch.unsync_children);
 }
 
 static int mmu_sync_children(struct kvm_vcpu *vcpu,
@@ -2053,7 +2053,7 @@ static int mmu_sync_children(struct kvm_vcpu *vcpu,
 
 static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
 {
-	atomic_set(&sp->write_flooding_count,  0);
+	atomic_set(&sp->arch.write_flooding_count,  0);
 }
 
 static void clear_sp_write_flooding_count(u64 *spte)
@@ -2094,7 +2094,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 			 * Unsync pages must not be left as is, because the new
 			 * upper-level page will be write-protected.
 			 */
-			if (role.level > PG_LEVEL_4K && sp->unsync)
+			if (role.level > PG_LEVEL_4K && sp->arch.unsync)
 				kvm_mmu_prepare_zap_page(kvm, sp,
 							 &invalid_list);
 			continue;
@@ -2104,7 +2104,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 		if (sp->role.arch.direct)
 			goto out;
 
-		if (sp->unsync) {
+		if (sp->arch.unsync) {
 			if (KVM_BUG_ON(!vcpu, kvm))
 				break;
 
@@ -2163,25 +2163,26 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 	sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
 	sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
 	if (!role.arch.direct)
-		sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
+		sp->arch.shadowed_translation =
+			kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
 
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
-	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
+	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
 
 	/*
 	 * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
 	 * depends on valid pages being added to the head of the list.  See
 	 * comments in kvm_zap_obsolete_pages().
 	 */
-	sp->mmu_valid_gen = kvm->arch.mmu_valid_gen;
+	sp->arch.mmu_valid_gen = kvm->arch.mmu_valid_gen;
 	list_add(&sp->link, &kvm->arch.active_mmu_pages);
 	kvm_account_mmu_page(kvm, sp);
 
 	sp->gfn = gfn;
 	sp->role = role;
-	sp->shadow_mmu_page = true;
-	hlist_add_head(&sp->hash_link, sp_list);
+	sp->arch.shadow_mmu_page = true;
+	hlist_add_head(&sp->arch.hash_link, sp_list);
 	if (sp_has_gptes(sp))
 		account_shadowed(kvm, sp);
 
@@ -2368,7 +2369,7 @@ static void __link_shadow_page(struct kvm *kvm,
 
 	mmu_page_add_parent_pte(cache, sp, sptep);
 
-	if (sp->unsync_children || sp->unsync)
+	if (sp->arch.unsync_children || sp->arch.unsync)
 		mark_unsync(sptep);
 }
 
@@ -2421,7 +2422,8 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 			 * avoids retaining a large number of stale nested SPs.
 			 */
 			if (tdp_enabled && invalid_list &&
-			    child->role.arch.guest_mode && !child->parent_ptes.val)
+			    child->role.arch.guest_mode &&
+			    !child->arch.parent_ptes.val)
 				return kvm_mmu_prepare_zap_page(kvm, child,
 								invalid_list);
 		}
@@ -2449,7 +2451,7 @@ static void kvm_mmu_unlink_parents(struct kvm_mmu_page *sp)
 	u64 *sptep;
 	struct rmap_iterator iter;
 
-	while ((sptep = rmap_get_first(&sp->parent_ptes, &iter)))
+	while ((sptep = rmap_get_first(&sp->arch.parent_ptes, &iter)))
 		drop_parent_pte(sp, sptep);
 }
 
@@ -2496,7 +2498,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 	if (!sp->role.invalid && sp_has_gptes(sp))
 		unaccount_shadowed(kvm, sp);
 
-	if (sp->unsync)
+	if (sp->arch.unsync)
 		kvm_unlink_unsync_page(kvm, sp);
 	if (!refcount_read(&sp->root_refcount)) {
 		/* Count self */
@@ -2527,7 +2529,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 		zapped_root = !is_obsolete_sp(kvm, sp);
 	}
 
-	if (sp->nx_huge_page_disallowed)
+	if (sp->arch.nx_huge_page_disallowed)
 		unaccount_nx_huge_page(kvm, sp);
 
 	sp->role.invalid = 1;
@@ -2704,7 +2706,7 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	trace_kvm_mmu_unsync_page(sp);
 	++kvm->stat.mmu_unsync;
-	sp->unsync = 1;
+	sp->arch.unsync = 1;
 
 	kvm_mmu_mark_parents_unsync(sp);
 }
@@ -2739,7 +2741,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 		if (!can_unsync)
 			return -EPERM;
 
-		if (sp->unsync)
+		if (sp->arch.unsync)
 			continue;
 
 		if (prefetch)
@@ -2764,7 +2766,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 			 * for write, i.e. unsync cannot transition from 0->1
 			 * while this CPU holds mmu_lock for read (or write).
 			 */
-			if (READ_ONCE(sp->unsync))
+			if (READ_ONCE(sp->arch.unsync))
 				continue;
 		}
 
@@ -2796,8 +2798,8 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 *                      2.2 Guest issues TLB flush.
 	 *                          That causes a VM Exit.
 	 *
-	 *                      2.3 Walking of unsync pages sees sp->unsync is
-	 *                          false and skips the page.
+	 *                      2.3 Walking of unsync pages sees sp->arch.unsync
+	 *                          is false and skips the page.
 	 *
 	 *                      2.4 Guest accesses GVA X.
 	 *                          Since the mapping in the SP was not updated,
@@ -2805,7 +2807,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 *                          gets used.
 	 * 1.1 Host marks SP
 	 *     as unsync
-	 *     (sp->unsync = true)
+	 *     (sp->arch.unsync = true)
 	 *
 	 * The write barrier below ensures that 1.1 happens before 1.2 and thus
 	 * the situation in 2.4 does not arise.  It pairs with the read barrier
@@ -3126,7 +3128,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
 	    cur_level == fault->goal_level &&
 	    is_shadow_present_pte(spte) &&
 	    !is_large_pte(spte) &&
-	    spte_to_child_sp(spte)->nx_huge_page_disallowed) {
+	    spte_to_child_sp(spte)->arch.nx_huge_page_disallowed) {
 		/*
 		 * A small SPTE exists for this pfn, but FNAME(fetch),
 		 * direct_map(), or kvm_tdp_mmu_map() would like to create a
@@ -3902,7 +3904,7 @@ static bool is_unsync_root(hpa_t root)
 
 	/*
 	 * The read barrier orders the CPU's read of SPTE.W during the page table
-	 * walk before the reads of sp->unsync/sp->unsync_children here.
+	 * walk before the reads of sp->arch.{unsync,unsync_children} here.
 	 *
 	 * Even if another CPU was marking the SP as unsync-ed simultaneously,
 	 * any guest page table changes are not guaranteed to be visible anyway
@@ -3922,7 +3924,7 @@ static bool is_unsync_root(hpa_t root)
 	if (WARN_ON_ONCE(!sp))
 		return false;
 
-	if (sp->unsync || sp->unsync_children)
+	if (sp->arch.unsync || sp->arch.unsync_children)
 		return true;
 
 	return false;
@@ -5510,8 +5512,8 @@ static bool detect_write_flooding(struct kvm_mmu_page *sp)
 	if (sp->role.level == PG_LEVEL_4K)
 		return false;
 
-	atomic_inc(&sp->write_flooding_count);
-	return atomic_read(&sp->write_flooding_count) >= 3;
+	atomic_inc(&sp->arch.write_flooding_count);
+	return atomic_read(&sp->arch.write_flooding_count) >= 3;
 }
 
 /*
@@ -6389,7 +6391,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
 			continue;
 
 		/* SPs with level >PG_LEVEL_4K should never by unsync. */
-		if (WARN_ON_ONCE(sp->unsync))
+		if (WARN_ON_ONCE(sp->arch.unsync))
 			continue;
 
 		/* Don't bother splitting huge pages on invalid SPs. */
@@ -6941,8 +6943,8 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 		 */
 		sp = list_first_entry(&kvm->arch.possible_nx_huge_pages,
 				      struct kvm_mmu_page,
-				      possible_nx_huge_page_link);
-		WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
+				      arch.possible_nx_huge_page_link);
+		WARN_ON_ONCE(!sp->arch.nx_huge_page_disallowed);
 		WARN_ON_ONCE(!sp->role.arch.direct);
 
 		/*
@@ -6977,7 +6979,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 			flush |= kvm_tdp_mmu_zap_sp(kvm, sp);
 		else
 			kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
-		WARN_ON_ONCE(sp->nx_huge_page_disallowed);
+		WARN_ON_ONCE(sp->arch.nx_huge_page_disallowed);
 
 		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
 			kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index fd4990c8b0e9..af2ae4887e35 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -44,89 +44,6 @@ extern bool dbg;
 #define INVALID_PAE_ROOT	0
 #define IS_VALID_PAE_ROOT(x)	(!!(x))
 
-struct kvm_mmu_page {
-	/*
-	 * Note, "link" through "spt" fit in a single 64 byte cache line on
-	 * 64-bit kernels, keep it that way unless there's a reason not to.
-	 */
-	struct list_head link;
-	struct hlist_node hash_link;
-
-	bool shadow_mmu_page;
-	bool unsync;
-	u8 mmu_valid_gen;
-
-	 /*
-	  * The shadow page can't be replaced by an equivalent huge page
-	  * because it is being used to map an executable page in the guest
-	  * and the NX huge page mitigation is enabled.
-	  */
-	bool nx_huge_page_disallowed;
-
-	/*
-	 * The following two entries are used to key the shadow page in the
-	 * hash table.
-	 */
-	union kvm_mmu_page_role role;
-	gfn_t gfn;
-
-	u64 *spt;
-
-	/*
-	 * Stores the result of the guest translation being shadowed by each
-	 * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
-	 * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
-	 * cases the result of the translation is a GPA and a set of access
-	 * constraints.
-	 *
-	 * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
-	 * access permissions are stored in the lower bits. Note, for
-	 * convenience and uniformity across guests, the access permissions are
-	 * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
-	 */
-	u64 *shadowed_translation;
-
-	/* Currently serving as active root */
-	refcount_t root_refcount;
-
-	unsigned int unsync_children;
-	union {
-		struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
-		tdp_ptep_t ptep;
-	};
-	union {
-		DECLARE_BITMAP(unsync_child_bitmap, 512);
-		struct {
-			struct work_struct tdp_mmu_async_work;
-			void *tdp_mmu_async_data;
-		};
-	};
-
-	/*
-	 * Tracks shadow pages that, if zapped, would allow KVM to create an NX
-	 * huge page.  A shadow page will have nx_huge_page_disallowed set but
-	 * not be on the list if a huge page is disallowed for other reasons,
-	 * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
-	 * isn't properly aligned, etc...
-	 */
-	struct list_head possible_nx_huge_page_link;
-#ifdef CONFIG_X86_32
-	/*
-	 * Used out of the mmu-lock to avoid reading spte values while an
-	 * update is in progress; see the comments in __get_spte_lockless().
-	 */
-	int clear_spte_count;
-#endif
-
-	/* Number of writes since the last time traversal visited this page.  */
-	atomic_t write_flooding_count;
-
-#ifdef CONFIG_X86_64
-	/* Used for freeing the page asynchronously if it is a TDP MMU page. */
-	struct rcu_head rcu_head;
-#endif
-};
-
 extern struct kmem_cache *mmu_page_header_cache;
 
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index ffd10ce3eae3..335f26dabdf3 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -16,11 +16,11 @@
 	__field(bool, unsync)
 
 #define KVM_MMU_PAGE_ASSIGN(sp)				\
-	__entry->mmu_valid_gen = sp->mmu_valid_gen;	\
+	__entry->mmu_valid_gen = sp->arch.mmu_valid_gen;	\
 	__entry->gfn = sp->gfn;				\
 	__entry->role = sp->role.word;			\
 	__entry->root_count = refcount_read(&sp->root_refcount);	\
-	__entry->unsync = sp->unsync;
+	__entry->unsync = sp->arch.unsync;
 
 #define KVM_MMU_PAGE_PRINTK() ({				        \
 	const char *saved_ptr = trace_seq_buffer_ptr(p);		\
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e15ec1c473da..18bb92b70a01 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -671,7 +671,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			 * KVM_REQ_MMU_SYNC is not necessary but it
 			 * expedites the process.
 			 */
-			if (sp->unsync_children &&
+			if (sp->arch.unsync_children &&
 			    mmu_sync_children(vcpu, sp, false))
 				return RET_PF_RETRY;
 		}
@@ -921,7 +921,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 			pt_element_t gpte;
 			gpa_t pte_gpa;
 
-			if (!sp->unsync)
+			if (!sp->arch.unsync)
 				break;
 
 			pte_gpa = FNAME(get_level1_sp_gpa)(sp);
@@ -942,7 +942,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 			FNAME(prefetch_gpte)(vcpu, sp, sptep, gpte, false);
 		}
 
-		if (!sp->unsync_children)
+		if (!sp->arch.unsync_children)
 			break;
 	}
 	write_unlock(&vcpu->kvm->mmu_lock);
@@ -974,8 +974,8 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 }
 
 /*
- * Using the information in sp->shadowed_translation (kvm_mmu_page_get_gfn()) is
- * safe because:
+ * Using the information in sp->arch.shadowed_translation
+ * (kvm_mmu_page_get_gfn()) is safe because:
  * - The spte has a reference to the struct page, so the pfn for a given gfn
  *   can't change unless all sptes pointing to it are nuked first.
  *
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 34d674080170..66231c7ed31e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -270,7 +270,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 			    gfn_t gfn, union kvm_mmu_page_role role)
 {
-	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
+	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
 
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
@@ -385,7 +385,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 {
 	tdp_unaccount_mmu_page(kvm, sp);
 
-	if (!sp->nx_huge_page_disallowed)
+	if (!sp->arch.nx_huge_page_disallowed)
 		return;
 
 	if (shared)
@@ -393,7 +393,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 	else
 		lockdep_assert_held_write(&kvm->mmu_lock);
 
-	sp->nx_huge_page_disallowed = false;
+	sp->arch.nx_huge_page_disallowed = false;
 	untrack_possible_nx_huge_page(kvm, sp);
 
 	if (shared)
@@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		sp = tdp_mmu_alloc_sp(vcpu);
 		tdp_mmu_init_child_sp(sp, &iter);
 
-		sp->nx_huge_page_disallowed = fault->huge_page_disallowed;
+		sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
 
 		if (is_shadow_present_pte(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 19d3153051a3..e6a929089715 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -73,7 +73,7 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 #ifdef CONFIG_X86_64
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
 {
-	return !sp->shadow_mmu_page;
+	return !sp->arch.shadow_mmu_page;
 }
 #else
 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 14099956fdac..a9da33d4baa8 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -3,8 +3,11 @@
 #define __KVM_MMU_TYPES_H
 
 #include <linux/bug.h>
-#include <linux/types.h>
+#include <linux/kvm_types.h>
+#include <linux/refcount.h>
 #include <linux/stddef.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
 
 #include <asm/kvm/mmu_types.h>
 
@@ -36,4 +39,31 @@ static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page
 
 typedef u64 __rcu *tdp_ptep_t;
 
+struct kvm_mmu_page {
+	struct list_head link;
+
+	union kvm_mmu_page_role role;
+
+	/* The start of the GFN region mapped by this shadow page. */
+	gfn_t gfn;
+
+	/* The page table page. */
+	u64 *spt;
+
+	/* The PTE that points to this shadow page. */
+	tdp_ptep_t ptep;
+
+	/* Used for freeing TDP MMU pages asynchronously. */
+	struct rcu_head rcu_head;
+
+	/* The number of references to this shadow page as a root. */
+	refcount_t root_refcount;
+
+	/* Used for tearing down an entire page table tree. */
+	struct work_struct tdp_mmu_async_work;
+	void *tdp_mmu_async_data;
+
+	struct kvm_mmu_page_arch arch;
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 07/37] mm: Introduce architecture-neutral PG_LEVEL macros
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce architecture-neutral versions of the x86 macros PG_LEVEL_4K,
PG_LEVEL_2M, etc. The x86 macros are used extensively by KVM/x86's page
table management code. Introducing architecture-neutral version of these
macros paves the way for porting KVM/x86's page table management code to
architecture-neutral code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/pgtable_types.h | 12 ++++--------
 include/linux/mm_types.h             |  9 +++++++++
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index aa174fed3a71..bdf41325f089 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -518,14 +518,10 @@ extern void native_pagetable_init(void);
 struct seq_file;
 extern void arch_report_meminfo(struct seq_file *m);
 
-enum pg_level {
-	PG_LEVEL_NONE,
-	PG_LEVEL_4K,
-	PG_LEVEL_2M,
-	PG_LEVEL_1G,
-	PG_LEVEL_512G,
-	PG_LEVEL_NUM
-};
+#define PG_LEVEL_4K	PG_LEVEL_PTE
+#define PG_LEVEL_2M	PG_LEVEL_PMD
+#define PG_LEVEL_1G	PG_LEVEL_PUD
+#define PG_LEVEL_512G	PG_LEVEL_P4D
 
 #ifdef CONFIG_PROC_FS
 extern void update_page_count(int level, unsigned long pages);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 500e536796ca..0445d0673afe 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1003,4 +1003,13 @@ enum fault_flag {
 
 typedef unsigned int __bitwise zap_flags_t;
 
+enum pg_level {
+	PG_LEVEL_NONE,
+	PG_LEVEL_PTE,
+	PG_LEVEL_PMD,
+	PG_LEVEL_PUD,
+	PG_LEVEL_P4D,
+	PG_LEVEL_NUM
+};
+
 #endif /* _LINUX_MM_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 07/37] mm: Introduce architecture-neutral PG_LEVEL macros
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Introduce architecture-neutral versions of the x86 macros PG_LEVEL_4K,
PG_LEVEL_2M, etc. The x86 macros are used extensively by KVM/x86's page
table management code. Introducing architecture-neutral version of these
macros paves the way for porting KVM/x86's page table management code to
architecture-neutral code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/pgtable_types.h | 12 ++++--------
 include/linux/mm_types.h             |  9 +++++++++
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index aa174fed3a71..bdf41325f089 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -518,14 +518,10 @@ extern void native_pagetable_init(void);
 struct seq_file;
 extern void arch_report_meminfo(struct seq_file *m);
 
-enum pg_level {
-	PG_LEVEL_NONE,
-	PG_LEVEL_4K,
-	PG_LEVEL_2M,
-	PG_LEVEL_1G,
-	PG_LEVEL_512G,
-	PG_LEVEL_NUM
-};
+#define PG_LEVEL_4K	PG_LEVEL_PTE
+#define PG_LEVEL_2M	PG_LEVEL_PMD
+#define PG_LEVEL_1G	PG_LEVEL_PUD
+#define PG_LEVEL_512G	PG_LEVEL_P4D
 
 #ifdef CONFIG_PROC_FS
 extern void update_page_count(int level, unsigned long pages);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 500e536796ca..0445d0673afe 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1003,4 +1003,13 @@ enum fault_flag {
 
 typedef unsigned int __bitwise zap_flags_t;
 
+enum pg_level {
+	PG_LEVEL_NONE,
+	PG_LEVEL_PTE,
+	PG_LEVEL_PMD,
+	PG_LEVEL_PUD,
+	PG_LEVEL_P4D,
+	PG_LEVEL_NUM
+};
+
 #endif /* _LINUX_MM_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 07/37] mm: Introduce architecture-neutral PG_LEVEL macros
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce architecture-neutral versions of the x86 macros PG_LEVEL_4K,
PG_LEVEL_2M, etc. The x86 macros are used extensively by KVM/x86's page
table management code. Introducing architecture-neutral version of these
macros paves the way for porting KVM/x86's page table management code to
architecture-neutral code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/pgtable_types.h | 12 ++++--------
 include/linux/mm_types.h             |  9 +++++++++
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index aa174fed3a71..bdf41325f089 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -518,14 +518,10 @@ extern void native_pagetable_init(void);
 struct seq_file;
 extern void arch_report_meminfo(struct seq_file *m);
 
-enum pg_level {
-	PG_LEVEL_NONE,
-	PG_LEVEL_4K,
-	PG_LEVEL_2M,
-	PG_LEVEL_1G,
-	PG_LEVEL_512G,
-	PG_LEVEL_NUM
-};
+#define PG_LEVEL_4K	PG_LEVEL_PTE
+#define PG_LEVEL_2M	PG_LEVEL_PMD
+#define PG_LEVEL_1G	PG_LEVEL_PUD
+#define PG_LEVEL_512G	PG_LEVEL_P4D
 
 #ifdef CONFIG_PROC_FS
 extern void update_page_count(int level, unsigned long pages);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 500e536796ca..0445d0673afe 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1003,4 +1003,13 @@ enum fault_flag {
 
 typedef unsigned int __bitwise zap_flags_t;
 
+enum pg_level {
+	PG_LEVEL_NONE,
+	PG_LEVEL_PTE,
+	PG_LEVEL_PMD,
+	PG_LEVEL_PUD,
+	PG_LEVEL_P4D,
+	PG_LEVEL_NUM
+};
+
 #endif /* _LINUX_MM_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 07/37] mm: Introduce architecture-neutral PG_LEVEL macros
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce architecture-neutral versions of the x86 macros PG_LEVEL_4K,
PG_LEVEL_2M, etc. The x86 macros are used extensively by KVM/x86's page
table management code. Introducing architecture-neutral version of these
macros paves the way for porting KVM/x86's page table management code to
architecture-neutral code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/pgtable_types.h | 12 ++++--------
 include/linux/mm_types.h             |  9 +++++++++
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index aa174fed3a71..bdf41325f089 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -518,14 +518,10 @@ extern void native_pagetable_init(void);
 struct seq_file;
 extern void arch_report_meminfo(struct seq_file *m);
 
-enum pg_level {
-	PG_LEVEL_NONE,
-	PG_LEVEL_4K,
-	PG_LEVEL_2M,
-	PG_LEVEL_1G,
-	PG_LEVEL_512G,
-	PG_LEVEL_NUM
-};
+#define PG_LEVEL_4K	PG_LEVEL_PTE
+#define PG_LEVEL_2M	PG_LEVEL_PMD
+#define PG_LEVEL_1G	PG_LEVEL_PUD
+#define PG_LEVEL_512G	PG_LEVEL_P4D
 
 #ifdef CONFIG_PROC_FS
 extern void update_page_count(int level, unsigned long pages);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 500e536796ca..0445d0673afe 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1003,4 +1003,13 @@ enum fault_flag {
 
 typedef unsigned int __bitwise zap_flags_t;
 
+enum pg_level {
+	PG_LEVEL_NONE,
+	PG_LEVEL_PTE,
+	PG_LEVEL_PMD,
+	PG_LEVEL_PUD,
+	PG_LEVEL_P4D,
+	PG_LEVEL_NUM
+};
+
 #endif /* _LINUX_MM_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 08/37] KVM: selftests: Stop assuming stats are contiguous in kvm_binary_stats_test
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

From: Jing Zhang <jingzhangos@google.com>

Remove the assumption from kvm_binary_stats_test that all stats are
laid out contiguously in memory. The KVM stats ABI specifically allows
holes in the stats data, since each stat exposes its own offset.

While here drop the check that each stats' offset is less than
size_data, as that is now always true by construction.

Fixes: 0b45d58738cd ("KVM: selftests: Add selftest for KVM statistics data binary interface")
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
[Re-worded the commit message.]

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/kvm_binary_stats_test.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
index 894417c96f70..46a66ece29fd 100644
--- a/tools/testing/selftests/kvm/kvm_binary_stats_test.c
+++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
@@ -134,7 +134,8 @@ static void stats_test(int stats_fd)
 				    "Bucket size of stats (%s) is not zero",
 				    pdesc->name);
 		}
-		size_data += pdesc->size * sizeof(*stats_data);
+		size_data = max(size_data,
+			pdesc->offset + pdesc->size * sizeof(*stats_data));
 	}
 
 	/*
@@ -149,14 +150,6 @@ static void stats_test(int stats_fd)
 	TEST_ASSERT(size_data >= header.num_desc * sizeof(*stats_data),
 		    "Data size is not correct");
 
-	/* Check stats offset */
-	for (i = 0; i < header.num_desc; ++i) {
-		pdesc = get_stats_descriptor(stats_desc, i, &header);
-		TEST_ASSERT(pdesc->offset < size_data,
-			    "Invalid offset (%u) for stats: %s",
-			    pdesc->offset, pdesc->name);
-	}
-
 	/* Allocate memory for stats data */
 	stats_data = malloc(size_data);
 	TEST_ASSERT(stats_data, "Allocate memory for stats data");
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 08/37] KVM: selftests: Stop assuming stats are contiguous in kvm_binary_stats_test
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

From: Jing Zhang <jingzhangos@google.com>

Remove the assumption from kvm_binary_stats_test that all stats are
laid out contiguously in memory. The KVM stats ABI specifically allows
holes in the stats data, since each stat exposes its own offset.

While here drop the check that each stats' offset is less than
size_data, as that is now always true by construction.

Fixes: 0b45d58738cd ("KVM: selftests: Add selftest for KVM statistics data binary interface")
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
[Re-worded the commit message.]

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/kvm_binary_stats_test.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
index 894417c96f70..46a66ece29fd 100644
--- a/tools/testing/selftests/kvm/kvm_binary_stats_test.c
+++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
@@ -134,7 +134,8 @@ static void stats_test(int stats_fd)
 				    "Bucket size of stats (%s) is not zero",
 				    pdesc->name);
 		}
-		size_data += pdesc->size * sizeof(*stats_data);
+		size_data = max(size_data,
+			pdesc->offset + pdesc->size * sizeof(*stats_data));
 	}
 
 	/*
@@ -149,14 +150,6 @@ static void stats_test(int stats_fd)
 	TEST_ASSERT(size_data >= header.num_desc * sizeof(*stats_data),
 		    "Data size is not correct");
 
-	/* Check stats offset */
-	for (i = 0; i < header.num_desc; ++i) {
-		pdesc = get_stats_descriptor(stats_desc, i, &header);
-		TEST_ASSERT(pdesc->offset < size_data,
-			    "Invalid offset (%u) for stats: %s",
-			    pdesc->offset, pdesc->name);
-	}
-
 	/* Allocate memory for stats data */
 	stats_data = malloc(size_data);
 	TEST_ASSERT(stats_data, "Allocate memory for stats data");
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 08/37] KVM: selftests: Stop assuming stats are contiguous in kvm_binary_stats_test
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

From: Jing Zhang <jingzhangos@google.com>

Remove the assumption from kvm_binary_stats_test that all stats are
laid out contiguously in memory. The KVM stats ABI specifically allows
holes in the stats data, since each stat exposes its own offset.

While here drop the check that each stats' offset is less than
size_data, as that is now always true by construction.

Fixes: 0b45d58738cd ("KVM: selftests: Add selftest for KVM statistics data binary interface")
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
[Re-worded the commit message.]

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/kvm_binary_stats_test.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
index 894417c96f70..46a66ece29fd 100644
--- a/tools/testing/selftests/kvm/kvm_binary_stats_test.c
+++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
@@ -134,7 +134,8 @@ static void stats_test(int stats_fd)
 				    "Bucket size of stats (%s) is not zero",
 				    pdesc->name);
 		}
-		size_data += pdesc->size * sizeof(*stats_data);
+		size_data = max(size_data,
+			pdesc->offset + pdesc->size * sizeof(*stats_data));
 	}
 
 	/*
@@ -149,14 +150,6 @@ static void stats_test(int stats_fd)
 	TEST_ASSERT(size_data >= header.num_desc * sizeof(*stats_data),
 		    "Data size is not correct");
 
-	/* Check stats offset */
-	for (i = 0; i < header.num_desc; ++i) {
-		pdesc = get_stats_descriptor(stats_desc, i, &header);
-		TEST_ASSERT(pdesc->offset < size_data,
-			    "Invalid offset (%u) for stats: %s",
-			    pdesc->offset, pdesc->name);
-	}
-
 	/* Allocate memory for stats data */
 	stats_data = malloc(size_data);
 	TEST_ASSERT(stats_data, "Allocate memory for stats data");
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 08/37] KVM: selftests: Stop assuming stats are contiguous in kvm_binary_stats_test
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

From: Jing Zhang <jingzhangos@google.com>

Remove the assumption from kvm_binary_stats_test that all stats are
laid out contiguously in memory. The KVM stats ABI specifically allows
holes in the stats data, since each stat exposes its own offset.

While here drop the check that each stats' offset is less than
size_data, as that is now always true by construction.

Fixes: 0b45d58738cd ("KVM: selftests: Add selftest for KVM statistics data binary interface")
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
[Re-worded the commit message.]

Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/kvm_binary_stats_test.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
index 894417c96f70..46a66ece29fd 100644
--- a/tools/testing/selftests/kvm/kvm_binary_stats_test.c
+++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
@@ -134,7 +134,8 @@ static void stats_test(int stats_fd)
 				    "Bucket size of stats (%s) is not zero",
 				    pdesc->name);
 		}
-		size_data += pdesc->size * sizeof(*stats_data);
+		size_data = max(size_data,
+			pdesc->offset + pdesc->size * sizeof(*stats_data));
 	}
 
 	/*
@@ -149,14 +150,6 @@ static void stats_test(int stats_fd)
 	TEST_ASSERT(size_data >= header.num_desc * sizeof(*stats_data),
 		    "Data size is not correct");
 
-	/* Check stats offset */
-	for (i = 0; i < header.num_desc; ++i) {
-		pdesc = get_stats_descriptor(stats_desc, i, &header);
-		TEST_ASSERT(pdesc->offset < size_data,
-			    "Invalid offset (%u) for stats: %s",
-			    pdesc->offset, pdesc->name);
-	}
-
 	/* Allocate memory for stats data */
 	stats_data = malloc(size_data);
 	TEST_ASSERT(stats_data, "Allocate memory for stats data");
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 09/37] KVM: Move page size stats into common code
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move the page size stats into common code. This will be used in a future
commit to move the TDP MMU, which populates these stats, into common
code. Architectures can also start populating these stats if they wish,
and export different stats depending on the page size.

Continue to only expose these stats on x86, since that's currently the
only architecture that populates them.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h | 8 --------
 arch/x86/kvm/mmu.h              | 5 -----
 arch/x86/kvm/x86.c              | 6 +++---
 include/linux/kvm_host.h        | 5 +++++
 include/linux/kvm_types.h       | 9 +++++++++
 5 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f5743a652e10..9cf8f956bac3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1363,14 +1363,6 @@ struct kvm_vm_stat {
 	u64 mmu_recycled;
 	u64 mmu_cache_miss;
 	u64 mmu_unsync;
-	union {
-		struct {
-			atomic64_t pages_4k;
-			atomic64_t pages_2m;
-			atomic64_t pages_1g;
-		};
-		atomic64_t pages[KVM_NR_PAGE_SIZES];
-	};
 	u64 nx_lpage_splits;
 	u64 max_mmu_page_hash_collisions;
 	u64 max_mmu_rmap_size;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 168c46fd8dd1..ec662108d2eb 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -261,11 +261,6 @@ kvm_mmu_slot_lpages(struct kvm_memory_slot *slot, int level)
 	return __kvm_mmu_slot_lpages(slot, slot->npages, level);
 }
 
-static inline void kvm_update_page_stats(struct kvm *kvm, int level, int count)
-{
-	atomic64_add(count, &kvm->stat.pages[level - 1]);
-}
-
 gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u64 access,
 			   struct x86_exception *exception);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2bfe060768fc..517c8ed33542 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -231,6 +231,9 @@ EXPORT_SYMBOL_GPL(host_xss);
 
 const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
 	KVM_GENERIC_VM_STATS(),
+	STATS_DESC_ICOUNTER(VM_GENERIC, pages_4k),
+	STATS_DESC_ICOUNTER(VM_GENERIC, pages_2m),
+	STATS_DESC_ICOUNTER(VM_GENERIC, pages_1g),
 	STATS_DESC_COUNTER(VM, mmu_shadow_zapped),
 	STATS_DESC_COUNTER(VM, mmu_pte_write),
 	STATS_DESC_COUNTER(VM, mmu_pde_zapped),
@@ -238,9 +241,6 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
 	STATS_DESC_COUNTER(VM, mmu_recycled),
 	STATS_DESC_COUNTER(VM, mmu_cache_miss),
 	STATS_DESC_ICOUNTER(VM, mmu_unsync),
-	STATS_DESC_ICOUNTER(VM, pages_4k),
-	STATS_DESC_ICOUNTER(VM, pages_2m),
-	STATS_DESC_ICOUNTER(VM, pages_1g),
 	STATS_DESC_ICOUNTER(VM, nx_lpage_splits),
 	STATS_DESC_PCOUNTER(VM, max_mmu_rmap_size),
 	STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f16c4689322b..22ecb7ce4d31 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2280,4 +2280,9 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr)
 /* Max number of entries allowed for each kvm dirty ring */
 #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
 
+static inline void kvm_update_page_stats(struct kvm *kvm, int level, int count)
+{
+	atomic64_add(count, &kvm->stat.generic.pages[level - 1]);
+}
+
 #endif
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 76de36e56cdf..59cf958d69df 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -20,6 +20,7 @@ enum kvm_mr_change;
 
 #include <linux/bits.h>
 #include <linux/mutex.h>
+#include <linux/pgtable.h>
 #include <linux/types.h>
 #include <linux/spinlock_types.h>
 
@@ -105,6 +106,14 @@ struct kvm_mmu_memory_cache {
 struct kvm_vm_stat_generic {
 	u64 remote_tlb_flush;
 	u64 remote_tlb_flush_requests;
+	union {
+		struct {
+			atomic64_t pages_4k;
+			atomic64_t pages_2m;
+			atomic64_t pages_1g;
+		};
+		atomic64_t pages[PG_LEVEL_NUM - 1];
+	};
 };
 
 struct kvm_vcpu_stat_generic {
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 09/37] KVM: Move page size stats into common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move the page size stats into common code. This will be used in a future
commit to move the TDP MMU, which populates these stats, into common
code. Architectures can also start populating these stats if they wish,
and export different stats depending on the page size.

Continue to only expose these stats on x86, since that's currently the
only architecture that populates them.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h | 8 --------
 arch/x86/kvm/mmu.h              | 5 -----
 arch/x86/kvm/x86.c              | 6 +++---
 include/linux/kvm_host.h        | 5 +++++
 include/linux/kvm_types.h       | 9 +++++++++
 5 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f5743a652e10..9cf8f956bac3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1363,14 +1363,6 @@ struct kvm_vm_stat {
 	u64 mmu_recycled;
 	u64 mmu_cache_miss;
 	u64 mmu_unsync;
-	union {
-		struct {
-			atomic64_t pages_4k;
-			atomic64_t pages_2m;
-			atomic64_t pages_1g;
-		};
-		atomic64_t pages[KVM_NR_PAGE_SIZES];
-	};
 	u64 nx_lpage_splits;
 	u64 max_mmu_page_hash_collisions;
 	u64 max_mmu_rmap_size;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 168c46fd8dd1..ec662108d2eb 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -261,11 +261,6 @@ kvm_mmu_slot_lpages(struct kvm_memory_slot *slot, int level)
 	return __kvm_mmu_slot_lpages(slot, slot->npages, level);
 }
 
-static inline void kvm_update_page_stats(struct kvm *kvm, int level, int count)
-{
-	atomic64_add(count, &kvm->stat.pages[level - 1]);
-}
-
 gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u64 access,
 			   struct x86_exception *exception);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2bfe060768fc..517c8ed33542 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -231,6 +231,9 @@ EXPORT_SYMBOL_GPL(host_xss);
 
 const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
 	KVM_GENERIC_VM_STATS(),
+	STATS_DESC_ICOUNTER(VM_GENERIC, pages_4k),
+	STATS_DESC_ICOUNTER(VM_GENERIC, pages_2m),
+	STATS_DESC_ICOUNTER(VM_GENERIC, pages_1g),
 	STATS_DESC_COUNTER(VM, mmu_shadow_zapped),
 	STATS_DESC_COUNTER(VM, mmu_pte_write),
 	STATS_DESC_COUNTER(VM, mmu_pde_zapped),
@@ -238,9 +241,6 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
 	STATS_DESC_COUNTER(VM, mmu_recycled),
 	STATS_DESC_COUNTER(VM, mmu_cache_miss),
 	STATS_DESC_ICOUNTER(VM, mmu_unsync),
-	STATS_DESC_ICOUNTER(VM, pages_4k),
-	STATS_DESC_ICOUNTER(VM, pages_2m),
-	STATS_DESC_ICOUNTER(VM, pages_1g),
 	STATS_DESC_ICOUNTER(VM, nx_lpage_splits),
 	STATS_DESC_PCOUNTER(VM, max_mmu_rmap_size),
 	STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f16c4689322b..22ecb7ce4d31 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2280,4 +2280,9 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr)
 /* Max number of entries allowed for each kvm dirty ring */
 #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
 
+static inline void kvm_update_page_stats(struct kvm *kvm, int level, int count)
+{
+	atomic64_add(count, &kvm->stat.generic.pages[level - 1]);
+}
+
 #endif
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 76de36e56cdf..59cf958d69df 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -20,6 +20,7 @@ enum kvm_mr_change;
 
 #include <linux/bits.h>
 #include <linux/mutex.h>
+#include <linux/pgtable.h>
 #include <linux/types.h>
 #include <linux/spinlock_types.h>
 
@@ -105,6 +106,14 @@ struct kvm_mmu_memory_cache {
 struct kvm_vm_stat_generic {
 	u64 remote_tlb_flush;
 	u64 remote_tlb_flush_requests;
+	union {
+		struct {
+			atomic64_t pages_4k;
+			atomic64_t pages_2m;
+			atomic64_t pages_1g;
+		};
+		atomic64_t pages[PG_LEVEL_NUM - 1];
+	};
 };
 
 struct kvm_vcpu_stat_generic {
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 09/37] KVM: Move page size stats into common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move the page size stats into common code. This will be used in a future
commit to move the TDP MMU, which populates these stats, into common
code. Architectures can also start populating these stats if they wish,
and export different stats depending on the page size.

Continue to only expose these stats on x86, since that's currently the
only architecture that populates them.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h | 8 --------
 arch/x86/kvm/mmu.h              | 5 -----
 arch/x86/kvm/x86.c              | 6 +++---
 include/linux/kvm_host.h        | 5 +++++
 include/linux/kvm_types.h       | 9 +++++++++
 5 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f5743a652e10..9cf8f956bac3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1363,14 +1363,6 @@ struct kvm_vm_stat {
 	u64 mmu_recycled;
 	u64 mmu_cache_miss;
 	u64 mmu_unsync;
-	union {
-		struct {
-			atomic64_t pages_4k;
-			atomic64_t pages_2m;
-			atomic64_t pages_1g;
-		};
-		atomic64_t pages[KVM_NR_PAGE_SIZES];
-	};
 	u64 nx_lpage_splits;
 	u64 max_mmu_page_hash_collisions;
 	u64 max_mmu_rmap_size;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 168c46fd8dd1..ec662108d2eb 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -261,11 +261,6 @@ kvm_mmu_slot_lpages(struct kvm_memory_slot *slot, int level)
 	return __kvm_mmu_slot_lpages(slot, slot->npages, level);
 }
 
-static inline void kvm_update_page_stats(struct kvm *kvm, int level, int count)
-{
-	atomic64_add(count, &kvm->stat.pages[level - 1]);
-}
-
 gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u64 access,
 			   struct x86_exception *exception);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2bfe060768fc..517c8ed33542 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -231,6 +231,9 @@ EXPORT_SYMBOL_GPL(host_xss);
 
 const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
 	KVM_GENERIC_VM_STATS(),
+	STATS_DESC_ICOUNTER(VM_GENERIC, pages_4k),
+	STATS_DESC_ICOUNTER(VM_GENERIC, pages_2m),
+	STATS_DESC_ICOUNTER(VM_GENERIC, pages_1g),
 	STATS_DESC_COUNTER(VM, mmu_shadow_zapped),
 	STATS_DESC_COUNTER(VM, mmu_pte_write),
 	STATS_DESC_COUNTER(VM, mmu_pde_zapped),
@@ -238,9 +241,6 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
 	STATS_DESC_COUNTER(VM, mmu_recycled),
 	STATS_DESC_COUNTER(VM, mmu_cache_miss),
 	STATS_DESC_ICOUNTER(VM, mmu_unsync),
-	STATS_DESC_ICOUNTER(VM, pages_4k),
-	STATS_DESC_ICOUNTER(VM, pages_2m),
-	STATS_DESC_ICOUNTER(VM, pages_1g),
 	STATS_DESC_ICOUNTER(VM, nx_lpage_splits),
 	STATS_DESC_PCOUNTER(VM, max_mmu_rmap_size),
 	STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f16c4689322b..22ecb7ce4d31 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2280,4 +2280,9 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr)
 /* Max number of entries allowed for each kvm dirty ring */
 #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
 
+static inline void kvm_update_page_stats(struct kvm *kvm, int level, int count)
+{
+	atomic64_add(count, &kvm->stat.generic.pages[level - 1]);
+}
+
 #endif
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 76de36e56cdf..59cf958d69df 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -20,6 +20,7 @@ enum kvm_mr_change;
 
 #include <linux/bits.h>
 #include <linux/mutex.h>
+#include <linux/pgtable.h>
 #include <linux/types.h>
 #include <linux/spinlock_types.h>
 
@@ -105,6 +106,14 @@ struct kvm_mmu_memory_cache {
 struct kvm_vm_stat_generic {
 	u64 remote_tlb_flush;
 	u64 remote_tlb_flush_requests;
+	union {
+		struct {
+			atomic64_t pages_4k;
+			atomic64_t pages_2m;
+			atomic64_t pages_1g;
+		};
+		atomic64_t pages[PG_LEVEL_NUM - 1];
+	};
 };
 
 struct kvm_vcpu_stat_generic {
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 09/37] KVM: Move page size stats into common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move the page size stats into common code. This will be used in a future
commit to move the TDP MMU, which populates these stats, into common
code. Architectures can also start populating these stats if they wish,
and export different stats depending on the page size.

Continue to only expose these stats on x86, since that's currently the
only architecture that populates them.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h | 8 --------
 arch/x86/kvm/mmu.h              | 5 -----
 arch/x86/kvm/x86.c              | 6 +++---
 include/linux/kvm_host.h        | 5 +++++
 include/linux/kvm_types.h       | 9 +++++++++
 5 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f5743a652e10..9cf8f956bac3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1363,14 +1363,6 @@ struct kvm_vm_stat {
 	u64 mmu_recycled;
 	u64 mmu_cache_miss;
 	u64 mmu_unsync;
-	union {
-		struct {
-			atomic64_t pages_4k;
-			atomic64_t pages_2m;
-			atomic64_t pages_1g;
-		};
-		atomic64_t pages[KVM_NR_PAGE_SIZES];
-	};
 	u64 nx_lpage_splits;
 	u64 max_mmu_page_hash_collisions;
 	u64 max_mmu_rmap_size;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 168c46fd8dd1..ec662108d2eb 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -261,11 +261,6 @@ kvm_mmu_slot_lpages(struct kvm_memory_slot *slot, int level)
 	return __kvm_mmu_slot_lpages(slot, slot->npages, level);
 }
 
-static inline void kvm_update_page_stats(struct kvm *kvm, int level, int count)
-{
-	atomic64_add(count, &kvm->stat.pages[level - 1]);
-}
-
 gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u64 access,
 			   struct x86_exception *exception);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2bfe060768fc..517c8ed33542 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -231,6 +231,9 @@ EXPORT_SYMBOL_GPL(host_xss);
 
 const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
 	KVM_GENERIC_VM_STATS(),
+	STATS_DESC_ICOUNTER(VM_GENERIC, pages_4k),
+	STATS_DESC_ICOUNTER(VM_GENERIC, pages_2m),
+	STATS_DESC_ICOUNTER(VM_GENERIC, pages_1g),
 	STATS_DESC_COUNTER(VM, mmu_shadow_zapped),
 	STATS_DESC_COUNTER(VM, mmu_pte_write),
 	STATS_DESC_COUNTER(VM, mmu_pde_zapped),
@@ -238,9 +241,6 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
 	STATS_DESC_COUNTER(VM, mmu_recycled),
 	STATS_DESC_COUNTER(VM, mmu_cache_miss),
 	STATS_DESC_ICOUNTER(VM, mmu_unsync),
-	STATS_DESC_ICOUNTER(VM, pages_4k),
-	STATS_DESC_ICOUNTER(VM, pages_2m),
-	STATS_DESC_ICOUNTER(VM, pages_1g),
 	STATS_DESC_ICOUNTER(VM, nx_lpage_splits),
 	STATS_DESC_PCOUNTER(VM, max_mmu_rmap_size),
 	STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f16c4689322b..22ecb7ce4d31 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2280,4 +2280,9 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr)
 /* Max number of entries allowed for each kvm dirty ring */
 #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
 
+static inline void kvm_update_page_stats(struct kvm *kvm, int level, int count)
+{
+	atomic64_add(count, &kvm->stat.generic.pages[level - 1]);
+}
+
 #endif
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 76de36e56cdf..59cf958d69df 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -20,6 +20,7 @@ enum kvm_mr_change;
 
 #include <linux/bits.h>
 #include <linux/mutex.h>
+#include <linux/pgtable.h>
 #include <linux/types.h>
 #include <linux/spinlock_types.h>
 
@@ -105,6 +106,14 @@ struct kvm_mmu_memory_cache {
 struct kvm_vm_stat_generic {
 	u64 remote_tlb_flush;
 	u64 remote_tlb_flush_requests;
+	union {
+		struct {
+			atomic64_t pages_4k;
+			atomic64_t pages_2m;
+			atomic64_t pages_1g;
+		};
+		atomic64_t pages[PG_LEVEL_NUM - 1];
+	};
 };
 
 struct kvm_vcpu_stat_generic {
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move struct kvm_page_fault to common code. This will be used in a future
commit to move the TDP MMU to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/mmu_types.h | 20 +++++++
 arch/x86/kvm/mmu/mmu.c               | 19 +++----
 arch/x86/kvm/mmu/mmu_internal.h      | 79 ++++++----------------------
 arch/x86/kvm/mmu/mmutrace.h          |  2 +-
 arch/x86/kvm/mmu/paging_tmpl.h       | 14 ++---
 arch/x86/kvm/mmu/tdp_mmu.c           |  6 +--
 include/kvm/mmu_types.h              | 44 ++++++++++++++++
 7 files changed, 100 insertions(+), 84 deletions(-)

diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
index affcb520b482..59d1be85f4b7 100644
--- a/arch/x86/include/asm/kvm/mmu_types.h
+++ b/arch/x86/include/asm/kvm/mmu_types.h
@@ -5,6 +5,7 @@
 #include <linux/bitmap.h>
 #include <linux/list.h>
 #include <linux/types.h>
+#include <linux/kvm_types.h>
 
 /*
  * This is a subset of the overall kvm_cpu_role to minimize the size of
@@ -115,4 +116,23 @@ struct kvm_mmu_page_arch {
 	atomic_t write_flooding_count;
 };
 
+struct kvm_page_fault_arch {
+	const u32 error_code;
+
+	/* x86-specific error code bits */
+	const bool present;
+	const bool rsvd;
+	const bool user;
+
+	/* Derived from mmu and global state.  */
+	const bool is_tdp;
+	const bool nx_huge_page_workaround_enabled;
+
+	/*
+	 * Whether a >4KB mapping can be created or is forbidden due to NX
+	 * hugepages.
+	 */
+	bool huge_page_disallowed;
+};
+
 #endif /* !__ASM_KVM_MMU_TYPES_H */
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e47f35878ab5..0593d4a60139 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3092,7 +3092,8 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	struct kvm_memory_slot *slot = fault->slot;
 	kvm_pfn_t mask;
 
-	fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
+	fault->arch.huge_page_disallowed =
+		fault->exec && fault->arch.nx_huge_page_workaround_enabled;
 
 	if (unlikely(fault->max_level == PG_LEVEL_4K))
 		return;
@@ -3109,7 +3110,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 */
 	fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
 						     fault->gfn, fault->max_level);
-	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
+	if (fault->req_level == PG_LEVEL_4K || fault->arch.huge_page_disallowed)
 		return;
 
 	/*
@@ -3158,7 +3159,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		 * We cannot overwrite existing page tables with an NX
 		 * large page, as the leaf could be executable.
 		 */
-		if (fault->nx_huge_page_workaround_enabled)
+		if (fault->arch.nx_huge_page_workaround_enabled)
 			disallowed_hugepage_adjust(fault, *it.sptep, it.level);
 
 		base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
@@ -3170,7 +3171,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			continue;
 
 		link_shadow_page(vcpu, it.sptep, sp);
-		if (fault->huge_page_disallowed)
+		if (fault->arch.huge_page_disallowed)
 			account_nx_huge_page(vcpu->kvm, sp,
 					     fault->req_level >= it.level);
 	}
@@ -3221,7 +3222,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
 				   struct kvm_page_fault *fault,
 				   unsigned int access)
 {
-	gva_t gva = fault->is_tdp ? 0 : fault->addr;
+	gva_t gva = fault->arch.is_tdp ? 0 : fault->addr;
 
 	vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
 			     access & shadow_mmio_access_mask);
@@ -3255,7 +3256,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
 	 * generation number.  Refreshing the MMIO generation needs to go down
 	 * the slow path.  Note, EPT Misconfigs do NOT set the PRESENT flag!
 	 */
-	if (fault->rsvd)
+	if (fault->arch.rsvd)
 		return false;
 
 	/*
@@ -3273,7 +3274,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
 	 *    SPTE is MMU-writable (determined later), the fault can be fixed
 	 *    by setting the Writable bit, which can be done out of mmu_lock.
 	 */
-	if (!fault->present)
+	if (!fault->arch.present)
 		return !kvm_ad_enabled();
 
 	/*
@@ -4119,10 +4120,10 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
 static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 					 struct kvm_page_fault *fault)
 {
-	if (unlikely(fault->rsvd))
+	if (unlikely(fault->arch.rsvd))
 		return false;
 
-	if (!fault->present || !fault->write)
+	if (!fault->arch.present || !fault->write)
 		return false;
 
 	/*
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index af2ae4887e35..4abb80a3bd01 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -77,60 +77,6 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
 	return READ_ONCE(nx_huge_pages) && !kvm->arch.disable_nx_huge_pages;
 }
 
-struct kvm_page_fault {
-	/* arguments to kvm_mmu_do_page_fault.  */
-	const gpa_t addr;
-	const u32 error_code;
-	const bool prefetch;
-
-	/* Derived from error_code.  */
-	const bool exec;
-	const bool write;
-	const bool present;
-	const bool rsvd;
-	const bool user;
-
-	/* Derived from mmu and global state.  */
-	const bool is_tdp;
-	const bool nx_huge_page_workaround_enabled;
-
-	/*
-	 * Whether a >4KB mapping can be created or is forbidden due to NX
-	 * hugepages.
-	 */
-	bool huge_page_disallowed;
-
-	/*
-	 * Maximum page size that can be created for this fault; input to
-	 * FNAME(fetch), direct_map() and kvm_tdp_mmu_map().
-	 */
-	u8 max_level;
-
-	/*
-	 * Page size that can be created based on the max_level and the
-	 * page size used by the host mapping.
-	 */
-	u8 req_level;
-
-	/*
-	 * Page size that will be created based on the req_level and
-	 * huge_page_disallowed.
-	 */
-	u8 goal_level;
-
-	/* Shifted addr, or result of guest page table walk if addr is a gva.  */
-	gfn_t gfn;
-
-	/* The memslot containing gfn. May be NULL. */
-	struct kvm_memory_slot *slot;
-
-	/* Outputs of kvm_faultin_pfn.  */
-	unsigned long mmu_seq;
-	kvm_pfn_t pfn;
-	hva_t hva;
-	bool map_writable;
-};
-
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 
 /*
@@ -164,22 +110,27 @@ enum {
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 					u32 err, bool prefetch)
 {
+	bool is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault);
 	struct kvm_page_fault fault = {
 		.addr = cr2_or_gpa,
-		.error_code = err,
-		.exec = err & PFERR_FETCH_MASK,
-		.write = err & PFERR_WRITE_MASK,
-		.present = err & PFERR_PRESENT_MASK,
-		.rsvd = err & PFERR_RSVD_MASK,
-		.user = err & PFERR_USER_MASK,
 		.prefetch = prefetch,
-		.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
-		.nx_huge_page_workaround_enabled =
-			is_nx_huge_page_enabled(vcpu->kvm),
+
+		.write = err & PFERR_WRITE_MASK,
+		.exec = err & PFERR_FETCH_MASK,
 
 		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
 		.req_level = PG_LEVEL_4K,
 		.goal_level = PG_LEVEL_4K,
+
+		.arch = {
+			.error_code = err,
+			.present = err & PFERR_PRESENT_MASK,
+			.rsvd = err & PFERR_RSVD_MASK,
+			.user = err & PFERR_USER_MASK,
+			.is_tdp = is_tdp,
+			.nx_huge_page_workaround_enabled =
+				is_nx_huge_page_enabled(vcpu->kvm),
+		},
 	};
 	int r;
 
@@ -196,7 +147,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	if (!prefetch)
 		vcpu->stat.pf_taken++;
 
-	if (IS_ENABLED(CONFIG_RETPOLINE) && fault.is_tdp)
+	if (IS_ENABLED(CONFIG_RETPOLINE) && fault.arch.is_tdp)
 		r = kvm_tdp_page_fault(vcpu, &fault);
 	else
 		r = vcpu->arch.mmu->page_fault(vcpu, &fault);
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 335f26dabdf3..b01767acf073 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -270,7 +270,7 @@ TRACE_EVENT(
 	TP_fast_assign(
 		__entry->vcpu_id = vcpu->vcpu_id;
 		__entry->cr2_or_gpa = fault->addr;
-		__entry->error_code = fault->error_code;
+		__entry->error_code = fault->arch.error_code;
 		__entry->sptep = sptep;
 		__entry->old_spte = old_spte;
 		__entry->new_spte = *sptep;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 18bb92b70a01..daf9c7731edc 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -698,7 +698,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		 * We cannot overwrite existing page tables with an NX
 		 * large page, as the leaf could be executable.
 		 */
-		if (fault->nx_huge_page_workaround_enabled)
+		if (fault->arch.nx_huge_page_workaround_enabled)
 			disallowed_hugepage_adjust(fault, *it.sptep, it.level);
 
 		base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
@@ -713,7 +713,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			continue;
 
 		link_shadow_page(vcpu, it.sptep, sp);
-		if (fault->huge_page_disallowed)
+		if (fault->arch.huge_page_disallowed)
 			account_nx_huge_page(vcpu->kvm, sp,
 					     fault->req_level >= it.level);
 	}
@@ -793,8 +793,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	int r;
 	bool is_self_change_mapping;
 
-	pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
-	WARN_ON_ONCE(fault->is_tdp);
+	pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->arch.error_code);
+	WARN_ON_ONCE(fault->arch.is_tdp);
 
 	/*
 	 * Look up the guest pte for the faulting address.
@@ -802,7 +802,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * The bit needs to be cleared before walking guest page tables.
 	 */
 	r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
-			     fault->error_code & ~PFERR_RSVD_MASK);
+			     fault->arch.error_code & ~PFERR_RSVD_MASK);
 
 	/*
 	 * The page is not mapped by the guest.  Let the guest handle it.
@@ -830,7 +830,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	vcpu->arch.write_fault_to_shadow_pgtable = false;
 
 	is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
-	      &walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable);
+	      &walker, fault->arch.user, &vcpu->arch.write_fault_to_shadow_pgtable);
 
 	if (is_self_change_mapping)
 		fault->max_level = PG_LEVEL_4K;
@@ -846,7 +846,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * we will cache the incorrect access into mmio spte.
 	 */
 	if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) &&
-	    !is_cr0_wp(vcpu->arch.mmu) && !fault->user && fault->slot) {
+	    !is_cr0_wp(vcpu->arch.mmu) && !fault->arch.user && fault->slot) {
 		walker.pte_access |= ACC_WRITE_MASK;
 		walker.pte_access &= ~ACC_USER_MASK;
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 66231c7ed31e..4940413d3767 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1156,7 +1156,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
 		int r;
 
-		if (fault->nx_huge_page_workaround_enabled)
+		if (fault->arch.nx_huge_page_workaround_enabled)
 			disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
 
 		if (iter.level == fault->goal_level)
@@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		sp = tdp_mmu_alloc_sp(vcpu);
 		tdp_mmu_init_child_sp(sp, &iter);
 
-		sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
+		sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
 
 		if (is_shadow_present_pte(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
@@ -1197,7 +1197,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			goto retry;
 		}
 
-		if (fault->huge_page_disallowed &&
+		if (fault->arch.huge_page_disallowed &&
 		    fault->req_level >= iter.level) {
 			spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 			track_possible_nx_huge_page(kvm, sp);
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index a9da33d4baa8..9f0ca920bf68 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -66,4 +66,48 @@ struct kvm_mmu_page {
 	struct kvm_mmu_page_arch arch;
 };
 
+struct kvm_page_fault {
+	/* The raw faulting address. */
+	const gpa_t addr;
+
+	/* Whether the fault was synthesized to prefetch a mapping. */
+	const bool prefetch;
+
+	/* Information about the cause of the fault. */
+	const bool write;
+	const bool exec;
+
+	/* Shifted addr, or result of guest page table walk if shadow paging. */
+	gfn_t gfn;
+
+	/* The memslot that contains @gfn. May be NULL. */
+	struct kvm_memory_slot *slot;
+
+	/* Maximum page size that can be created for this fault. */
+	u8 max_level;
+
+	/*
+	 * Page size that can be created based on the max_level and the page
+	 * size used by the host mapping.
+	 */
+	u8 req_level;
+
+	/* Final page size that will be created. */
+	u8 goal_level;
+
+	/*
+	 * The value of kvm->mmu_invalidate_seq before fetching the host
+	 * mapping. Used to verify that the host mapping has not changed
+	 * after grabbing the MMU lock.
+	 */
+	unsigned long mmu_seq;
+
+	/* Information about the host mapping. */
+	kvm_pfn_t pfn;
+	hva_t hva;
+	bool map_writable;
+
+	struct kvm_page_fault_arch arch;
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move struct kvm_page_fault to common code. This will be used in a future
commit to move the TDP MMU to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/mmu_types.h | 20 +++++++
 arch/x86/kvm/mmu/mmu.c               | 19 +++----
 arch/x86/kvm/mmu/mmu_internal.h      | 79 ++++++----------------------
 arch/x86/kvm/mmu/mmutrace.h          |  2 +-
 arch/x86/kvm/mmu/paging_tmpl.h       | 14 ++---
 arch/x86/kvm/mmu/tdp_mmu.c           |  6 +--
 include/kvm/mmu_types.h              | 44 ++++++++++++++++
 7 files changed, 100 insertions(+), 84 deletions(-)

diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
index affcb520b482..59d1be85f4b7 100644
--- a/arch/x86/include/asm/kvm/mmu_types.h
+++ b/arch/x86/include/asm/kvm/mmu_types.h
@@ -5,6 +5,7 @@
 #include <linux/bitmap.h>
 #include <linux/list.h>
 #include <linux/types.h>
+#include <linux/kvm_types.h>
 
 /*
  * This is a subset of the overall kvm_cpu_role to minimize the size of
@@ -115,4 +116,23 @@ struct kvm_mmu_page_arch {
 	atomic_t write_flooding_count;
 };
 
+struct kvm_page_fault_arch {
+	const u32 error_code;
+
+	/* x86-specific error code bits */
+	const bool present;
+	const bool rsvd;
+	const bool user;
+
+	/* Derived from mmu and global state.  */
+	const bool is_tdp;
+	const bool nx_huge_page_workaround_enabled;
+
+	/*
+	 * Whether a >4KB mapping can be created or is forbidden due to NX
+	 * hugepages.
+	 */
+	bool huge_page_disallowed;
+};
+
 #endif /* !__ASM_KVM_MMU_TYPES_H */
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e47f35878ab5..0593d4a60139 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3092,7 +3092,8 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	struct kvm_memory_slot *slot = fault->slot;
 	kvm_pfn_t mask;
 
-	fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
+	fault->arch.huge_page_disallowed =
+		fault->exec && fault->arch.nx_huge_page_workaround_enabled;
 
 	if (unlikely(fault->max_level == PG_LEVEL_4K))
 		return;
@@ -3109,7 +3110,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 */
 	fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
 						     fault->gfn, fault->max_level);
-	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
+	if (fault->req_level == PG_LEVEL_4K || fault->arch.huge_page_disallowed)
 		return;
 
 	/*
@@ -3158,7 +3159,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		 * We cannot overwrite existing page tables with an NX
 		 * large page, as the leaf could be executable.
 		 */
-		if (fault->nx_huge_page_workaround_enabled)
+		if (fault->arch.nx_huge_page_workaround_enabled)
 			disallowed_hugepage_adjust(fault, *it.sptep, it.level);
 
 		base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
@@ -3170,7 +3171,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			continue;
 
 		link_shadow_page(vcpu, it.sptep, sp);
-		if (fault->huge_page_disallowed)
+		if (fault->arch.huge_page_disallowed)
 			account_nx_huge_page(vcpu->kvm, sp,
 					     fault->req_level >= it.level);
 	}
@@ -3221,7 +3222,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
 				   struct kvm_page_fault *fault,
 				   unsigned int access)
 {
-	gva_t gva = fault->is_tdp ? 0 : fault->addr;
+	gva_t gva = fault->arch.is_tdp ? 0 : fault->addr;
 
 	vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
 			     access & shadow_mmio_access_mask);
@@ -3255,7 +3256,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
 	 * generation number.  Refreshing the MMIO generation needs to go down
 	 * the slow path.  Note, EPT Misconfigs do NOT set the PRESENT flag!
 	 */
-	if (fault->rsvd)
+	if (fault->arch.rsvd)
 		return false;
 
 	/*
@@ -3273,7 +3274,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
 	 *    SPTE is MMU-writable (determined later), the fault can be fixed
 	 *    by setting the Writable bit, which can be done out of mmu_lock.
 	 */
-	if (!fault->present)
+	if (!fault->arch.present)
 		return !kvm_ad_enabled();
 
 	/*
@@ -4119,10 +4120,10 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
 static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 					 struct kvm_page_fault *fault)
 {
-	if (unlikely(fault->rsvd))
+	if (unlikely(fault->arch.rsvd))
 		return false;
 
-	if (!fault->present || !fault->write)
+	if (!fault->arch.present || !fault->write)
 		return false;
 
 	/*
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index af2ae4887e35..4abb80a3bd01 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -77,60 +77,6 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
 	return READ_ONCE(nx_huge_pages) && !kvm->arch.disable_nx_huge_pages;
 }
 
-struct kvm_page_fault {
-	/* arguments to kvm_mmu_do_page_fault.  */
-	const gpa_t addr;
-	const u32 error_code;
-	const bool prefetch;
-
-	/* Derived from error_code.  */
-	const bool exec;
-	const bool write;
-	const bool present;
-	const bool rsvd;
-	const bool user;
-
-	/* Derived from mmu and global state.  */
-	const bool is_tdp;
-	const bool nx_huge_page_workaround_enabled;
-
-	/*
-	 * Whether a >4KB mapping can be created or is forbidden due to NX
-	 * hugepages.
-	 */
-	bool huge_page_disallowed;
-
-	/*
-	 * Maximum page size that can be created for this fault; input to
-	 * FNAME(fetch), direct_map() and kvm_tdp_mmu_map().
-	 */
-	u8 max_level;
-
-	/*
-	 * Page size that can be created based on the max_level and the
-	 * page size used by the host mapping.
-	 */
-	u8 req_level;
-
-	/*
-	 * Page size that will be created based on the req_level and
-	 * huge_page_disallowed.
-	 */
-	u8 goal_level;
-
-	/* Shifted addr, or result of guest page table walk if addr is a gva.  */
-	gfn_t gfn;
-
-	/* The memslot containing gfn. May be NULL. */
-	struct kvm_memory_slot *slot;
-
-	/* Outputs of kvm_faultin_pfn.  */
-	unsigned long mmu_seq;
-	kvm_pfn_t pfn;
-	hva_t hva;
-	bool map_writable;
-};
-
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 
 /*
@@ -164,22 +110,27 @@ enum {
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 					u32 err, bool prefetch)
 {
+	bool is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault);
 	struct kvm_page_fault fault = {
 		.addr = cr2_or_gpa,
-		.error_code = err,
-		.exec = err & PFERR_FETCH_MASK,
-		.write = err & PFERR_WRITE_MASK,
-		.present = err & PFERR_PRESENT_MASK,
-		.rsvd = err & PFERR_RSVD_MASK,
-		.user = err & PFERR_USER_MASK,
 		.prefetch = prefetch,
-		.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
-		.nx_huge_page_workaround_enabled =
-			is_nx_huge_page_enabled(vcpu->kvm),
+
+		.write = err & PFERR_WRITE_MASK,
+		.exec = err & PFERR_FETCH_MASK,
 
 		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
 		.req_level = PG_LEVEL_4K,
 		.goal_level = PG_LEVEL_4K,
+
+		.arch = {
+			.error_code = err,
+			.present = err & PFERR_PRESENT_MASK,
+			.rsvd = err & PFERR_RSVD_MASK,
+			.user = err & PFERR_USER_MASK,
+			.is_tdp = is_tdp,
+			.nx_huge_page_workaround_enabled =
+				is_nx_huge_page_enabled(vcpu->kvm),
+		},
 	};
 	int r;
 
@@ -196,7 +147,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	if (!prefetch)
 		vcpu->stat.pf_taken++;
 
-	if (IS_ENABLED(CONFIG_RETPOLINE) && fault.is_tdp)
+	if (IS_ENABLED(CONFIG_RETPOLINE) && fault.arch.is_tdp)
 		r = kvm_tdp_page_fault(vcpu, &fault);
 	else
 		r = vcpu->arch.mmu->page_fault(vcpu, &fault);
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 335f26dabdf3..b01767acf073 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -270,7 +270,7 @@ TRACE_EVENT(
 	TP_fast_assign(
 		__entry->vcpu_id = vcpu->vcpu_id;
 		__entry->cr2_or_gpa = fault->addr;
-		__entry->error_code = fault->error_code;
+		__entry->error_code = fault->arch.error_code;
 		__entry->sptep = sptep;
 		__entry->old_spte = old_spte;
 		__entry->new_spte = *sptep;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 18bb92b70a01..daf9c7731edc 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -698,7 +698,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		 * We cannot overwrite existing page tables with an NX
 		 * large page, as the leaf could be executable.
 		 */
-		if (fault->nx_huge_page_workaround_enabled)
+		if (fault->arch.nx_huge_page_workaround_enabled)
 			disallowed_hugepage_adjust(fault, *it.sptep, it.level);
 
 		base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
@@ -713,7 +713,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			continue;
 
 		link_shadow_page(vcpu, it.sptep, sp);
-		if (fault->huge_page_disallowed)
+		if (fault->arch.huge_page_disallowed)
 			account_nx_huge_page(vcpu->kvm, sp,
 					     fault->req_level >= it.level);
 	}
@@ -793,8 +793,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	int r;
 	bool is_self_change_mapping;
 
-	pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
-	WARN_ON_ONCE(fault->is_tdp);
+	pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->arch.error_code);
+	WARN_ON_ONCE(fault->arch.is_tdp);
 
 	/*
 	 * Look up the guest pte for the faulting address.
@@ -802,7 +802,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * The bit needs to be cleared before walking guest page tables.
 	 */
 	r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
-			     fault->error_code & ~PFERR_RSVD_MASK);
+			     fault->arch.error_code & ~PFERR_RSVD_MASK);
 
 	/*
 	 * The page is not mapped by the guest.  Let the guest handle it.
@@ -830,7 +830,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	vcpu->arch.write_fault_to_shadow_pgtable = false;
 
 	is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
-	      &walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable);
+	      &walker, fault->arch.user, &vcpu->arch.write_fault_to_shadow_pgtable);
 
 	if (is_self_change_mapping)
 		fault->max_level = PG_LEVEL_4K;
@@ -846,7 +846,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * we will cache the incorrect access into mmio spte.
 	 */
 	if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) &&
-	    !is_cr0_wp(vcpu->arch.mmu) && !fault->user && fault->slot) {
+	    !is_cr0_wp(vcpu->arch.mmu) && !fault->arch.user && fault->slot) {
 		walker.pte_access |= ACC_WRITE_MASK;
 		walker.pte_access &= ~ACC_USER_MASK;
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 66231c7ed31e..4940413d3767 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1156,7 +1156,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
 		int r;
 
-		if (fault->nx_huge_page_workaround_enabled)
+		if (fault->arch.nx_huge_page_workaround_enabled)
 			disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
 
 		if (iter.level == fault->goal_level)
@@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		sp = tdp_mmu_alloc_sp(vcpu);
 		tdp_mmu_init_child_sp(sp, &iter);
 
-		sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
+		sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
 
 		if (is_shadow_present_pte(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
@@ -1197,7 +1197,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			goto retry;
 		}
 
-		if (fault->huge_page_disallowed &&
+		if (fault->arch.huge_page_disallowed &&
 		    fault->req_level >= iter.level) {
 			spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 			track_possible_nx_huge_page(kvm, sp);
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index a9da33d4baa8..9f0ca920bf68 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -66,4 +66,48 @@ struct kvm_mmu_page {
 	struct kvm_mmu_page_arch arch;
 };
 
+struct kvm_page_fault {
+	/* The raw faulting address. */
+	const gpa_t addr;
+
+	/* Whether the fault was synthesized to prefetch a mapping. */
+	const bool prefetch;
+
+	/* Information about the cause of the fault. */
+	const bool write;
+	const bool exec;
+
+	/* Shifted addr, or result of guest page table walk if shadow paging. */
+	gfn_t gfn;
+
+	/* The memslot that contains @gfn. May be NULL. */
+	struct kvm_memory_slot *slot;
+
+	/* Maximum page size that can be created for this fault. */
+	u8 max_level;
+
+	/*
+	 * Page size that can be created based on the max_level and the page
+	 * size used by the host mapping.
+	 */
+	u8 req_level;
+
+	/* Final page size that will be created. */
+	u8 goal_level;
+
+	/*
+	 * The value of kvm->mmu_invalidate_seq before fetching the host
+	 * mapping. Used to verify that the host mapping has not changed
+	 * after grabbing the MMU lock.
+	 */
+	unsigned long mmu_seq;
+
+	/* Information about the host mapping. */
+	kvm_pfn_t pfn;
+	hva_t hva;
+	bool map_writable;
+
+	struct kvm_page_fault_arch arch;
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move struct kvm_page_fault to common code. This will be used in a future
commit to move the TDP MMU to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/mmu_types.h | 20 +++++++
 arch/x86/kvm/mmu/mmu.c               | 19 +++----
 arch/x86/kvm/mmu/mmu_internal.h      | 79 ++++++----------------------
 arch/x86/kvm/mmu/mmutrace.h          |  2 +-
 arch/x86/kvm/mmu/paging_tmpl.h       | 14 ++---
 arch/x86/kvm/mmu/tdp_mmu.c           |  6 +--
 include/kvm/mmu_types.h              | 44 ++++++++++++++++
 7 files changed, 100 insertions(+), 84 deletions(-)

diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
index affcb520b482..59d1be85f4b7 100644
--- a/arch/x86/include/asm/kvm/mmu_types.h
+++ b/arch/x86/include/asm/kvm/mmu_types.h
@@ -5,6 +5,7 @@
 #include <linux/bitmap.h>
 #include <linux/list.h>
 #include <linux/types.h>
+#include <linux/kvm_types.h>
 
 /*
  * This is a subset of the overall kvm_cpu_role to minimize the size of
@@ -115,4 +116,23 @@ struct kvm_mmu_page_arch {
 	atomic_t write_flooding_count;
 };
 
+struct kvm_page_fault_arch {
+	const u32 error_code;
+
+	/* x86-specific error code bits */
+	const bool present;
+	const bool rsvd;
+	const bool user;
+
+	/* Derived from mmu and global state.  */
+	const bool is_tdp;
+	const bool nx_huge_page_workaround_enabled;
+
+	/*
+	 * Whether a >4KB mapping can be created or is forbidden due to NX
+	 * hugepages.
+	 */
+	bool huge_page_disallowed;
+};
+
 #endif /* !__ASM_KVM_MMU_TYPES_H */
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e47f35878ab5..0593d4a60139 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3092,7 +3092,8 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	struct kvm_memory_slot *slot = fault->slot;
 	kvm_pfn_t mask;
 
-	fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
+	fault->arch.huge_page_disallowed =
+		fault->exec && fault->arch.nx_huge_page_workaround_enabled;
 
 	if (unlikely(fault->max_level == PG_LEVEL_4K))
 		return;
@@ -3109,7 +3110,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 */
 	fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
 						     fault->gfn, fault->max_level);
-	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
+	if (fault->req_level == PG_LEVEL_4K || fault->arch.huge_page_disallowed)
 		return;
 
 	/*
@@ -3158,7 +3159,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		 * We cannot overwrite existing page tables with an NX
 		 * large page, as the leaf could be executable.
 		 */
-		if (fault->nx_huge_page_workaround_enabled)
+		if (fault->arch.nx_huge_page_workaround_enabled)
 			disallowed_hugepage_adjust(fault, *it.sptep, it.level);
 
 		base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
@@ -3170,7 +3171,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			continue;
 
 		link_shadow_page(vcpu, it.sptep, sp);
-		if (fault->huge_page_disallowed)
+		if (fault->arch.huge_page_disallowed)
 			account_nx_huge_page(vcpu->kvm, sp,
 					     fault->req_level >= it.level);
 	}
@@ -3221,7 +3222,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
 				   struct kvm_page_fault *fault,
 				   unsigned int access)
 {
-	gva_t gva = fault->is_tdp ? 0 : fault->addr;
+	gva_t gva = fault->arch.is_tdp ? 0 : fault->addr;
 
 	vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
 			     access & shadow_mmio_access_mask);
@@ -3255,7 +3256,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
 	 * generation number.  Refreshing the MMIO generation needs to go down
 	 * the slow path.  Note, EPT Misconfigs do NOT set the PRESENT flag!
 	 */
-	if (fault->rsvd)
+	if (fault->arch.rsvd)
 		return false;
 
 	/*
@@ -3273,7 +3274,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
 	 *    SPTE is MMU-writable (determined later), the fault can be fixed
 	 *    by setting the Writable bit, which can be done out of mmu_lock.
 	 */
-	if (!fault->present)
+	if (!fault->arch.present)
 		return !kvm_ad_enabled();
 
 	/*
@@ -4119,10 +4120,10 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
 static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 					 struct kvm_page_fault *fault)
 {
-	if (unlikely(fault->rsvd))
+	if (unlikely(fault->arch.rsvd))
 		return false;
 
-	if (!fault->present || !fault->write)
+	if (!fault->arch.present || !fault->write)
 		return false;
 
 	/*
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index af2ae4887e35..4abb80a3bd01 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -77,60 +77,6 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
 	return READ_ONCE(nx_huge_pages) && !kvm->arch.disable_nx_huge_pages;
 }
 
-struct kvm_page_fault {
-	/* arguments to kvm_mmu_do_page_fault.  */
-	const gpa_t addr;
-	const u32 error_code;
-	const bool prefetch;
-
-	/* Derived from error_code.  */
-	const bool exec;
-	const bool write;
-	const bool present;
-	const bool rsvd;
-	const bool user;
-
-	/* Derived from mmu and global state.  */
-	const bool is_tdp;
-	const bool nx_huge_page_workaround_enabled;
-
-	/*
-	 * Whether a >4KB mapping can be created or is forbidden due to NX
-	 * hugepages.
-	 */
-	bool huge_page_disallowed;
-
-	/*
-	 * Maximum page size that can be created for this fault; input to
-	 * FNAME(fetch), direct_map() and kvm_tdp_mmu_map().
-	 */
-	u8 max_level;
-
-	/*
-	 * Page size that can be created based on the max_level and the
-	 * page size used by the host mapping.
-	 */
-	u8 req_level;
-
-	/*
-	 * Page size that will be created based on the req_level and
-	 * huge_page_disallowed.
-	 */
-	u8 goal_level;
-
-	/* Shifted addr, or result of guest page table walk if addr is a gva.  */
-	gfn_t gfn;
-
-	/* The memslot containing gfn. May be NULL. */
-	struct kvm_memory_slot *slot;
-
-	/* Outputs of kvm_faultin_pfn.  */
-	unsigned long mmu_seq;
-	kvm_pfn_t pfn;
-	hva_t hva;
-	bool map_writable;
-};
-
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 
 /*
@@ -164,22 +110,27 @@ enum {
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 					u32 err, bool prefetch)
 {
+	bool is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault);
 	struct kvm_page_fault fault = {
 		.addr = cr2_or_gpa,
-		.error_code = err,
-		.exec = err & PFERR_FETCH_MASK,
-		.write = err & PFERR_WRITE_MASK,
-		.present = err & PFERR_PRESENT_MASK,
-		.rsvd = err & PFERR_RSVD_MASK,
-		.user = err & PFERR_USER_MASK,
 		.prefetch = prefetch,
-		.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
-		.nx_huge_page_workaround_enabled =
-			is_nx_huge_page_enabled(vcpu->kvm),
+
+		.write = err & PFERR_WRITE_MASK,
+		.exec = err & PFERR_FETCH_MASK,
 
 		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
 		.req_level = PG_LEVEL_4K,
 		.goal_level = PG_LEVEL_4K,
+
+		.arch = {
+			.error_code = err,
+			.present = err & PFERR_PRESENT_MASK,
+			.rsvd = err & PFERR_RSVD_MASK,
+			.user = err & PFERR_USER_MASK,
+			.is_tdp = is_tdp,
+			.nx_huge_page_workaround_enabled =
+				is_nx_huge_page_enabled(vcpu->kvm),
+		},
 	};
 	int r;
 
@@ -196,7 +147,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	if (!prefetch)
 		vcpu->stat.pf_taken++;
 
-	if (IS_ENABLED(CONFIG_RETPOLINE) && fault.is_tdp)
+	if (IS_ENABLED(CONFIG_RETPOLINE) && fault.arch.is_tdp)
 		r = kvm_tdp_page_fault(vcpu, &fault);
 	else
 		r = vcpu->arch.mmu->page_fault(vcpu, &fault);
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 335f26dabdf3..b01767acf073 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -270,7 +270,7 @@ TRACE_EVENT(
 	TP_fast_assign(
 		__entry->vcpu_id = vcpu->vcpu_id;
 		__entry->cr2_or_gpa = fault->addr;
-		__entry->error_code = fault->error_code;
+		__entry->error_code = fault->arch.error_code;
 		__entry->sptep = sptep;
 		__entry->old_spte = old_spte;
 		__entry->new_spte = *sptep;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 18bb92b70a01..daf9c7731edc 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -698,7 +698,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		 * We cannot overwrite existing page tables with an NX
 		 * large page, as the leaf could be executable.
 		 */
-		if (fault->nx_huge_page_workaround_enabled)
+		if (fault->arch.nx_huge_page_workaround_enabled)
 			disallowed_hugepage_adjust(fault, *it.sptep, it.level);
 
 		base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
@@ -713,7 +713,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			continue;
 
 		link_shadow_page(vcpu, it.sptep, sp);
-		if (fault->huge_page_disallowed)
+		if (fault->arch.huge_page_disallowed)
 			account_nx_huge_page(vcpu->kvm, sp,
 					     fault->req_level >= it.level);
 	}
@@ -793,8 +793,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	int r;
 	bool is_self_change_mapping;
 
-	pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
-	WARN_ON_ONCE(fault->is_tdp);
+	pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->arch.error_code);
+	WARN_ON_ONCE(fault->arch.is_tdp);
 
 	/*
 	 * Look up the guest pte for the faulting address.
@@ -802,7 +802,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * The bit needs to be cleared before walking guest page tables.
 	 */
 	r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
-			     fault->error_code & ~PFERR_RSVD_MASK);
+			     fault->arch.error_code & ~PFERR_RSVD_MASK);
 
 	/*
 	 * The page is not mapped by the guest.  Let the guest handle it.
@@ -830,7 +830,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	vcpu->arch.write_fault_to_shadow_pgtable = false;
 
 	is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
-	      &walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable);
+	      &walker, fault->arch.user, &vcpu->arch.write_fault_to_shadow_pgtable);
 
 	if (is_self_change_mapping)
 		fault->max_level = PG_LEVEL_4K;
@@ -846,7 +846,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * we will cache the incorrect access into mmio spte.
 	 */
 	if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) &&
-	    !is_cr0_wp(vcpu->arch.mmu) && !fault->user && fault->slot) {
+	    !is_cr0_wp(vcpu->arch.mmu) && !fault->arch.user && fault->slot) {
 		walker.pte_access |= ACC_WRITE_MASK;
 		walker.pte_access &= ~ACC_USER_MASK;
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 66231c7ed31e..4940413d3767 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1156,7 +1156,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
 		int r;
 
-		if (fault->nx_huge_page_workaround_enabled)
+		if (fault->arch.nx_huge_page_workaround_enabled)
 			disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
 
 		if (iter.level == fault->goal_level)
@@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		sp = tdp_mmu_alloc_sp(vcpu);
 		tdp_mmu_init_child_sp(sp, &iter);
 
-		sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
+		sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
 
 		if (is_shadow_present_pte(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
@@ -1197,7 +1197,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			goto retry;
 		}
 
-		if (fault->huge_page_disallowed &&
+		if (fault->arch.huge_page_disallowed &&
 		    fault->req_level >= iter.level) {
 			spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 			track_possible_nx_huge_page(kvm, sp);
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index a9da33d4baa8..9f0ca920bf68 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -66,4 +66,48 @@ struct kvm_mmu_page {
 	struct kvm_mmu_page_arch arch;
 };
 
+struct kvm_page_fault {
+	/* The raw faulting address. */
+	const gpa_t addr;
+
+	/* Whether the fault was synthesized to prefetch a mapping. */
+	const bool prefetch;
+
+	/* Information about the cause of the fault. */
+	const bool write;
+	const bool exec;
+
+	/* Shifted addr, or result of guest page table walk if shadow paging. */
+	gfn_t gfn;
+
+	/* The memslot that contains @gfn. May be NULL. */
+	struct kvm_memory_slot *slot;
+
+	/* Maximum page size that can be created for this fault. */
+	u8 max_level;
+
+	/*
+	 * Page size that can be created based on the max_level and the page
+	 * size used by the host mapping.
+	 */
+	u8 req_level;
+
+	/* Final page size that will be created. */
+	u8 goal_level;
+
+	/*
+	 * The value of kvm->mmu_invalidate_seq before fetching the host
+	 * mapping. Used to verify that the host mapping has not changed
+	 * after grabbing the MMU lock.
+	 */
+	unsigned long mmu_seq;
+
+	/* Information about the host mapping. */
+	kvm_pfn_t pfn;
+	hva_t hva;
+	bool map_writable;
+
+	struct kvm_page_fault_arch arch;
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move struct kvm_page_fault to common code. This will be used in a future
commit to move the TDP MMU to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/mmu_types.h | 20 +++++++
 arch/x86/kvm/mmu/mmu.c               | 19 +++----
 arch/x86/kvm/mmu/mmu_internal.h      | 79 ++++++----------------------
 arch/x86/kvm/mmu/mmutrace.h          |  2 +-
 arch/x86/kvm/mmu/paging_tmpl.h       | 14 ++---
 arch/x86/kvm/mmu/tdp_mmu.c           |  6 +--
 include/kvm/mmu_types.h              | 44 ++++++++++++++++
 7 files changed, 100 insertions(+), 84 deletions(-)

diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
index affcb520b482..59d1be85f4b7 100644
--- a/arch/x86/include/asm/kvm/mmu_types.h
+++ b/arch/x86/include/asm/kvm/mmu_types.h
@@ -5,6 +5,7 @@
 #include <linux/bitmap.h>
 #include <linux/list.h>
 #include <linux/types.h>
+#include <linux/kvm_types.h>
 
 /*
  * This is a subset of the overall kvm_cpu_role to minimize the size of
@@ -115,4 +116,23 @@ struct kvm_mmu_page_arch {
 	atomic_t write_flooding_count;
 };
 
+struct kvm_page_fault_arch {
+	const u32 error_code;
+
+	/* x86-specific error code bits */
+	const bool present;
+	const bool rsvd;
+	const bool user;
+
+	/* Derived from mmu and global state.  */
+	const bool is_tdp;
+	const bool nx_huge_page_workaround_enabled;
+
+	/*
+	 * Whether a >4KB mapping can be created or is forbidden due to NX
+	 * hugepages.
+	 */
+	bool huge_page_disallowed;
+};
+
 #endif /* !__ASM_KVM_MMU_TYPES_H */
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e47f35878ab5..0593d4a60139 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3092,7 +3092,8 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	struct kvm_memory_slot *slot = fault->slot;
 	kvm_pfn_t mask;
 
-	fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
+	fault->arch.huge_page_disallowed =
+		fault->exec && fault->arch.nx_huge_page_workaround_enabled;
 
 	if (unlikely(fault->max_level == PG_LEVEL_4K))
 		return;
@@ -3109,7 +3110,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 */
 	fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
 						     fault->gfn, fault->max_level);
-	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
+	if (fault->req_level == PG_LEVEL_4K || fault->arch.huge_page_disallowed)
 		return;
 
 	/*
@@ -3158,7 +3159,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		 * We cannot overwrite existing page tables with an NX
 		 * large page, as the leaf could be executable.
 		 */
-		if (fault->nx_huge_page_workaround_enabled)
+		if (fault->arch.nx_huge_page_workaround_enabled)
 			disallowed_hugepage_adjust(fault, *it.sptep, it.level);
 
 		base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
@@ -3170,7 +3171,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			continue;
 
 		link_shadow_page(vcpu, it.sptep, sp);
-		if (fault->huge_page_disallowed)
+		if (fault->arch.huge_page_disallowed)
 			account_nx_huge_page(vcpu->kvm, sp,
 					     fault->req_level >= it.level);
 	}
@@ -3221,7 +3222,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
 				   struct kvm_page_fault *fault,
 				   unsigned int access)
 {
-	gva_t gva = fault->is_tdp ? 0 : fault->addr;
+	gva_t gva = fault->arch.is_tdp ? 0 : fault->addr;
 
 	vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
 			     access & shadow_mmio_access_mask);
@@ -3255,7 +3256,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
 	 * generation number.  Refreshing the MMIO generation needs to go down
 	 * the slow path.  Note, EPT Misconfigs do NOT set the PRESENT flag!
 	 */
-	if (fault->rsvd)
+	if (fault->arch.rsvd)
 		return false;
 
 	/*
@@ -3273,7 +3274,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
 	 *    SPTE is MMU-writable (determined later), the fault can be fixed
 	 *    by setting the Writable bit, which can be done out of mmu_lock.
 	 */
-	if (!fault->present)
+	if (!fault->arch.present)
 		return !kvm_ad_enabled();
 
 	/*
@@ -4119,10 +4120,10 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
 static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 					 struct kvm_page_fault *fault)
 {
-	if (unlikely(fault->rsvd))
+	if (unlikely(fault->arch.rsvd))
 		return false;
 
-	if (!fault->present || !fault->write)
+	if (!fault->arch.present || !fault->write)
 		return false;
 
 	/*
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index af2ae4887e35..4abb80a3bd01 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -77,60 +77,6 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
 	return READ_ONCE(nx_huge_pages) && !kvm->arch.disable_nx_huge_pages;
 }
 
-struct kvm_page_fault {
-	/* arguments to kvm_mmu_do_page_fault.  */
-	const gpa_t addr;
-	const u32 error_code;
-	const bool prefetch;
-
-	/* Derived from error_code.  */
-	const bool exec;
-	const bool write;
-	const bool present;
-	const bool rsvd;
-	const bool user;
-
-	/* Derived from mmu and global state.  */
-	const bool is_tdp;
-	const bool nx_huge_page_workaround_enabled;
-
-	/*
-	 * Whether a >4KB mapping can be created or is forbidden due to NX
-	 * hugepages.
-	 */
-	bool huge_page_disallowed;
-
-	/*
-	 * Maximum page size that can be created for this fault; input to
-	 * FNAME(fetch), direct_map() and kvm_tdp_mmu_map().
-	 */
-	u8 max_level;
-
-	/*
-	 * Page size that can be created based on the max_level and the
-	 * page size used by the host mapping.
-	 */
-	u8 req_level;
-
-	/*
-	 * Page size that will be created based on the req_level and
-	 * huge_page_disallowed.
-	 */
-	u8 goal_level;
-
-	/* Shifted addr, or result of guest page table walk if addr is a gva.  */
-	gfn_t gfn;
-
-	/* The memslot containing gfn. May be NULL. */
-	struct kvm_memory_slot *slot;
-
-	/* Outputs of kvm_faultin_pfn.  */
-	unsigned long mmu_seq;
-	kvm_pfn_t pfn;
-	hva_t hva;
-	bool map_writable;
-};
-
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 
 /*
@@ -164,22 +110,27 @@ enum {
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 					u32 err, bool prefetch)
 {
+	bool is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault);
 	struct kvm_page_fault fault = {
 		.addr = cr2_or_gpa,
-		.error_code = err,
-		.exec = err & PFERR_FETCH_MASK,
-		.write = err & PFERR_WRITE_MASK,
-		.present = err & PFERR_PRESENT_MASK,
-		.rsvd = err & PFERR_RSVD_MASK,
-		.user = err & PFERR_USER_MASK,
 		.prefetch = prefetch,
-		.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
-		.nx_huge_page_workaround_enabled =
-			is_nx_huge_page_enabled(vcpu->kvm),
+
+		.write = err & PFERR_WRITE_MASK,
+		.exec = err & PFERR_FETCH_MASK,
 
 		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
 		.req_level = PG_LEVEL_4K,
 		.goal_level = PG_LEVEL_4K,
+
+		.arch = {
+			.error_code = err,
+			.present = err & PFERR_PRESENT_MASK,
+			.rsvd = err & PFERR_RSVD_MASK,
+			.user = err & PFERR_USER_MASK,
+			.is_tdp = is_tdp,
+			.nx_huge_page_workaround_enabled =
+				is_nx_huge_page_enabled(vcpu->kvm),
+		},
 	};
 	int r;
 
@@ -196,7 +147,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	if (!prefetch)
 		vcpu->stat.pf_taken++;
 
-	if (IS_ENABLED(CONFIG_RETPOLINE) && fault.is_tdp)
+	if (IS_ENABLED(CONFIG_RETPOLINE) && fault.arch.is_tdp)
 		r = kvm_tdp_page_fault(vcpu, &fault);
 	else
 		r = vcpu->arch.mmu->page_fault(vcpu, &fault);
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 335f26dabdf3..b01767acf073 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -270,7 +270,7 @@ TRACE_EVENT(
 	TP_fast_assign(
 		__entry->vcpu_id = vcpu->vcpu_id;
 		__entry->cr2_or_gpa = fault->addr;
-		__entry->error_code = fault->error_code;
+		__entry->error_code = fault->arch.error_code;
 		__entry->sptep = sptep;
 		__entry->old_spte = old_spte;
 		__entry->new_spte = *sptep;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 18bb92b70a01..daf9c7731edc 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -698,7 +698,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		 * We cannot overwrite existing page tables with an NX
 		 * large page, as the leaf could be executable.
 		 */
-		if (fault->nx_huge_page_workaround_enabled)
+		if (fault->arch.nx_huge_page_workaround_enabled)
 			disallowed_hugepage_adjust(fault, *it.sptep, it.level);
 
 		base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
@@ -713,7 +713,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			continue;
 
 		link_shadow_page(vcpu, it.sptep, sp);
-		if (fault->huge_page_disallowed)
+		if (fault->arch.huge_page_disallowed)
 			account_nx_huge_page(vcpu->kvm, sp,
 					     fault->req_level >= it.level);
 	}
@@ -793,8 +793,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	int r;
 	bool is_self_change_mapping;
 
-	pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
-	WARN_ON_ONCE(fault->is_tdp);
+	pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->arch.error_code);
+	WARN_ON_ONCE(fault->arch.is_tdp);
 
 	/*
 	 * Look up the guest pte for the faulting address.
@@ -802,7 +802,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * The bit needs to be cleared before walking guest page tables.
 	 */
 	r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
-			     fault->error_code & ~PFERR_RSVD_MASK);
+			     fault->arch.error_code & ~PFERR_RSVD_MASK);
 
 	/*
 	 * The page is not mapped by the guest.  Let the guest handle it.
@@ -830,7 +830,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	vcpu->arch.write_fault_to_shadow_pgtable = false;
 
 	is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
-	      &walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable);
+	      &walker, fault->arch.user, &vcpu->arch.write_fault_to_shadow_pgtable);
 
 	if (is_self_change_mapping)
 		fault->max_level = PG_LEVEL_4K;
@@ -846,7 +846,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * we will cache the incorrect access into mmio spte.
 	 */
 	if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) &&
-	    !is_cr0_wp(vcpu->arch.mmu) && !fault->user && fault->slot) {
+	    !is_cr0_wp(vcpu->arch.mmu) && !fault->arch.user && fault->slot) {
 		walker.pte_access |= ACC_WRITE_MASK;
 		walker.pte_access &= ~ACC_USER_MASK;
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 66231c7ed31e..4940413d3767 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1156,7 +1156,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
 		int r;
 
-		if (fault->nx_huge_page_workaround_enabled)
+		if (fault->arch.nx_huge_page_workaround_enabled)
 			disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
 
 		if (iter.level == fault->goal_level)
@@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		sp = tdp_mmu_alloc_sp(vcpu);
 		tdp_mmu_init_child_sp(sp, &iter);
 
-		sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
+		sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
 
 		if (is_shadow_present_pte(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
@@ -1197,7 +1197,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			goto retry;
 		}
 
-		if (fault->huge_page_disallowed &&
+		if (fault->arch.huge_page_disallowed &&
 		    fault->req_level >= iter.level) {
 			spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 			track_possible_nx_huge_page(kvm, sp);
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index a9da33d4baa8..9f0ca920bf68 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -66,4 +66,48 @@ struct kvm_mmu_page {
 	struct kvm_mmu_page_arch arch;
 };
 
+struct kvm_page_fault {
+	/* The raw faulting address. */
+	const gpa_t addr;
+
+	/* Whether the fault was synthesized to prefetch a mapping. */
+	const bool prefetch;
+
+	/* Information about the cause of the fault. */
+	const bool write;
+	const bool exec;
+
+	/* Shifted addr, or result of guest page table walk if shadow paging. */
+	gfn_t gfn;
+
+	/* The memslot that contains @gfn. May be NULL. */
+	struct kvm_memory_slot *slot;
+
+	/* Maximum page size that can be created for this fault. */
+	u8 max_level;
+
+	/*
+	 * Page size that can be created based on the max_level and the page
+	 * size used by the host mapping.
+	 */
+	u8 req_level;
+
+	/* Final page size that will be created. */
+	u8 goal_level;
+
+	/*
+	 * The value of kvm->mmu_invalidate_seq before fetching the host
+	 * mapping. Used to verify that the host mapping has not changed
+	 * after grabbing the MMU lock.
+	 */
+	unsigned long mmu_seq;
+
+	/* Information about the host mapping. */
+	kvm_pfn_t pfn;
+	hva_t hva;
+	bool map_writable;
+
+	struct kvm_page_fault_arch arch;
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 11/37] KVM: MMU: Move RET_PF_* into common code
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move the RET_PF_* enum into common code in preparation for moving the
TDP MMU into common code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 28 ----------------------------
 include/kvm/mmu_types.h         | 26 ++++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 4abb80a3bd01..d3c1d08002af 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -79,34 +79,6 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
 
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 
-/*
- * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
- * and of course kvm_mmu_do_page_fault().
- *
- * RET_PF_CONTINUE: So far, so good, keep handling the page fault.
- * RET_PF_RETRY: let CPU fault again on the address.
- * RET_PF_EMULATE: mmio page fault, emulate the instruction directly.
- * RET_PF_INVALID: the spte is invalid, let the real page fault path update it.
- * RET_PF_FIXED: The faulting entry has been fixed.
- * RET_PF_SPURIOUS: The faulting entry was already fixed, e.g. by another vCPU.
- *
- * Any names added to this enum should be exported to userspace for use in
- * tracepoints via TRACE_DEFINE_ENUM() in mmutrace.h
- *
- * Note, all values must be greater than or equal to zero so as not to encroach
- * on -errno return values.  Somewhat arbitrarily use '0' for CONTINUE, which
- * will allow for efficient machine code when checking for CONTINUE, e.g.
- * "TEST %rax, %rax, JNZ", as all "stop!" values are non-zero.
- */
-enum {
-	RET_PF_CONTINUE = 0,
-	RET_PF_RETRY,
-	RET_PF_EMULATE,
-	RET_PF_INVALID,
-	RET_PF_FIXED,
-	RET_PF_SPURIOUS,
-};
-
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 					u32 err, bool prefetch)
 {
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 9f0ca920bf68..07c9962f9aea 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -110,4 +110,30 @@ struct kvm_page_fault {
 	struct kvm_page_fault_arch arch;
 };
 
+/*
+ * Return values for page fault handling routines.
+ *
+ * RET_PF_CONTINUE: So far, so good, keep handling the page fault.
+ * RET_PF_RETRY: let CPU fault again on the address.
+ * RET_PF_EMULATE: mmio page fault, emulate the instruction directly.
+ * RET_PF_INVALID: the spte is invalid, let the real page fault path update it.
+ * RET_PF_FIXED: The faulting entry has been fixed.
+ * RET_PF_SPURIOUS: The faulting entry was already fixed, e.g. by another vCPU.
+ *
+ * Any names added to this enum should be exported to userspace for use in
+ * tracepoints via TRACE_DEFINE_ENUM() in arch/x86/kvm/mmu/mmutrace.h.
+ *
+ * Note, all values must be greater than or equal to zero so as not to encroach
+ * on -errno return values.  Somewhat arbitrarily use '0' for CONTINUE, which
+ * will allow for efficient machine code when checking for CONTINUE.
+ */
+enum {
+	RET_PF_CONTINUE = 0,
+	RET_PF_RETRY,
+	RET_PF_EMULATE,
+	RET_PF_INVALID,
+	RET_PF_FIXED,
+	RET_PF_SPURIOUS,
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 11/37] KVM: MMU: Move RET_PF_* into common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move the RET_PF_* enum into common code in preparation for moving the
TDP MMU into common code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 28 ----------------------------
 include/kvm/mmu_types.h         | 26 ++++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 4abb80a3bd01..d3c1d08002af 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -79,34 +79,6 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
 
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 
-/*
- * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
- * and of course kvm_mmu_do_page_fault().
- *
- * RET_PF_CONTINUE: So far, so good, keep handling the page fault.
- * RET_PF_RETRY: let CPU fault again on the address.
- * RET_PF_EMULATE: mmio page fault, emulate the instruction directly.
- * RET_PF_INVALID: the spte is invalid, let the real page fault path update it.
- * RET_PF_FIXED: The faulting entry has been fixed.
- * RET_PF_SPURIOUS: The faulting entry was already fixed, e.g. by another vCPU.
- *
- * Any names added to this enum should be exported to userspace for use in
- * tracepoints via TRACE_DEFINE_ENUM() in mmutrace.h
- *
- * Note, all values must be greater than or equal to zero so as not to encroach
- * on -errno return values.  Somewhat arbitrarily use '0' for CONTINUE, which
- * will allow for efficient machine code when checking for CONTINUE, e.g.
- * "TEST %rax, %rax, JNZ", as all "stop!" values are non-zero.
- */
-enum {
-	RET_PF_CONTINUE = 0,
-	RET_PF_RETRY,
-	RET_PF_EMULATE,
-	RET_PF_INVALID,
-	RET_PF_FIXED,
-	RET_PF_SPURIOUS,
-};
-
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 					u32 err, bool prefetch)
 {
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 9f0ca920bf68..07c9962f9aea 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -110,4 +110,30 @@ struct kvm_page_fault {
 	struct kvm_page_fault_arch arch;
 };
 
+/*
+ * Return values for page fault handling routines.
+ *
+ * RET_PF_CONTINUE: So far, so good, keep handling the page fault.
+ * RET_PF_RETRY: let CPU fault again on the address.
+ * RET_PF_EMULATE: mmio page fault, emulate the instruction directly.
+ * RET_PF_INVALID: the spte is invalid, let the real page fault path update it.
+ * RET_PF_FIXED: The faulting entry has been fixed.
+ * RET_PF_SPURIOUS: The faulting entry was already fixed, e.g. by another vCPU.
+ *
+ * Any names added to this enum should be exported to userspace for use in
+ * tracepoints via TRACE_DEFINE_ENUM() in arch/x86/kvm/mmu/mmutrace.h.
+ *
+ * Note, all values must be greater than or equal to zero so as not to encroach
+ * on -errno return values.  Somewhat arbitrarily use '0' for CONTINUE, which
+ * will allow for efficient machine code when checking for CONTINUE.
+ */
+enum {
+	RET_PF_CONTINUE = 0,
+	RET_PF_RETRY,
+	RET_PF_EMULATE,
+	RET_PF_INVALID,
+	RET_PF_FIXED,
+	RET_PF_SPURIOUS,
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 11/37] KVM: MMU: Move RET_PF_* into common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move the RET_PF_* enum into common code in preparation for moving the
TDP MMU into common code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 28 ----------------------------
 include/kvm/mmu_types.h         | 26 ++++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 4abb80a3bd01..d3c1d08002af 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -79,34 +79,6 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
 
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 
-/*
- * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
- * and of course kvm_mmu_do_page_fault().
- *
- * RET_PF_CONTINUE: So far, so good, keep handling the page fault.
- * RET_PF_RETRY: let CPU fault again on the address.
- * RET_PF_EMULATE: mmio page fault, emulate the instruction directly.
- * RET_PF_INVALID: the spte is invalid, let the real page fault path update it.
- * RET_PF_FIXED: The faulting entry has been fixed.
- * RET_PF_SPURIOUS: The faulting entry was already fixed, e.g. by another vCPU.
- *
- * Any names added to this enum should be exported to userspace for use in
- * tracepoints via TRACE_DEFINE_ENUM() in mmutrace.h
- *
- * Note, all values must be greater than or equal to zero so as not to encroach
- * on -errno return values.  Somewhat arbitrarily use '0' for CONTINUE, which
- * will allow for efficient machine code when checking for CONTINUE, e.g.
- * "TEST %rax, %rax, JNZ", as all "stop!" values are non-zero.
- */
-enum {
-	RET_PF_CONTINUE = 0,
-	RET_PF_RETRY,
-	RET_PF_EMULATE,
-	RET_PF_INVALID,
-	RET_PF_FIXED,
-	RET_PF_SPURIOUS,
-};
-
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 					u32 err, bool prefetch)
 {
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 9f0ca920bf68..07c9962f9aea 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -110,4 +110,30 @@ struct kvm_page_fault {
 	struct kvm_page_fault_arch arch;
 };
 
+/*
+ * Return values for page fault handling routines.
+ *
+ * RET_PF_CONTINUE: So far, so good, keep handling the page fault.
+ * RET_PF_RETRY: let CPU fault again on the address.
+ * RET_PF_EMULATE: mmio page fault, emulate the instruction directly.
+ * RET_PF_INVALID: the spte is invalid, let the real page fault path update it.
+ * RET_PF_FIXED: The faulting entry has been fixed.
+ * RET_PF_SPURIOUS: The faulting entry was already fixed, e.g. by another vCPU.
+ *
+ * Any names added to this enum should be exported to userspace for use in
+ * tracepoints via TRACE_DEFINE_ENUM() in arch/x86/kvm/mmu/mmutrace.h.
+ *
+ * Note, all values must be greater than or equal to zero so as not to encroach
+ * on -errno return values.  Somewhat arbitrarily use '0' for CONTINUE, which
+ * will allow for efficient machine code when checking for CONTINUE.
+ */
+enum {
+	RET_PF_CONTINUE = 0,
+	RET_PF_RETRY,
+	RET_PF_EMULATE,
+	RET_PF_INVALID,
+	RET_PF_FIXED,
+	RET_PF_SPURIOUS,
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 11/37] KVM: MMU: Move RET_PF_* into common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move the RET_PF_* enum into common code in preparation for moving the
TDP MMU into common code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 28 ----------------------------
 include/kvm/mmu_types.h         | 26 ++++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 4abb80a3bd01..d3c1d08002af 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -79,34 +79,6 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
 
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
 
-/*
- * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
- * and of course kvm_mmu_do_page_fault().
- *
- * RET_PF_CONTINUE: So far, so good, keep handling the page fault.
- * RET_PF_RETRY: let CPU fault again on the address.
- * RET_PF_EMULATE: mmio page fault, emulate the instruction directly.
- * RET_PF_INVALID: the spte is invalid, let the real page fault path update it.
- * RET_PF_FIXED: The faulting entry has been fixed.
- * RET_PF_SPURIOUS: The faulting entry was already fixed, e.g. by another vCPU.
- *
- * Any names added to this enum should be exported to userspace for use in
- * tracepoints via TRACE_DEFINE_ENUM() in mmutrace.h
- *
- * Note, all values must be greater than or equal to zero so as not to encroach
- * on -errno return values.  Somewhat arbitrarily use '0' for CONTINUE, which
- * will allow for efficient machine code when checking for CONTINUE, e.g.
- * "TEST %rax, %rax, JNZ", as all "stop!" values are non-zero.
- */
-enum {
-	RET_PF_CONTINUE = 0,
-	RET_PF_RETRY,
-	RET_PF_EMULATE,
-	RET_PF_INVALID,
-	RET_PF_FIXED,
-	RET_PF_SPURIOUS,
-};
-
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 					u32 err, bool prefetch)
 {
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 9f0ca920bf68..07c9962f9aea 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -110,4 +110,30 @@ struct kvm_page_fault {
 	struct kvm_page_fault_arch arch;
 };
 
+/*
+ * Return values for page fault handling routines.
+ *
+ * RET_PF_CONTINUE: So far, so good, keep handling the page fault.
+ * RET_PF_RETRY: let CPU fault again on the address.
+ * RET_PF_EMULATE: mmio page fault, emulate the instruction directly.
+ * RET_PF_INVALID: the spte is invalid, let the real page fault path update it.
+ * RET_PF_FIXED: The faulting entry has been fixed.
+ * RET_PF_SPURIOUS: The faulting entry was already fixed, e.g. by another vCPU.
+ *
+ * Any names added to this enum should be exported to userspace for use in
+ * tracepoints via TRACE_DEFINE_ENUM() in arch/x86/kvm/mmu/mmutrace.h.
+ *
+ * Note, all values must be greater than or equal to zero so as not to encroach
+ * on -errno return values.  Somewhat arbitrarily use '0' for CONTINUE, which
+ * will allow for efficient machine code when checking for CONTINUE.
+ */
+enum {
+	RET_PF_CONTINUE = 0,
+	RET_PF_RETRY,
+	RET_PF_EMULATE,
+	RET_PF_INVALID,
+	RET_PF_FIXED,
+	RET_PF_SPURIOUS,
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 12/37] KVM: x86/mmu: Use PG_LEVEL_{PTE,PMD,PUD} in the TDP MMU
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Use PG_LEVEL_{PTE,PMD,PUD} in the TDP MMU instead of the x86-specific
PG_LEVEL_{4K,2M,1G} aliases. This prepares for moving the TDP MMU to
common code, where not all architectures will have 4K PAGE_SIZE.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_iter.h |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c  | 16 ++++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index f0af385c56e0..892c078aab58 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -106,7 +106,7 @@ struct tdp_iter {
 	     tdp_iter_next(&iter))
 
 #define for_each_tdp_pte(iter, root, start, end) \
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end)
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_PTE, start, end)
 
 tdp_ptep_t spte_to_child_pt(u64 pte, int level);
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 4940413d3767..bce0566f2d94 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -347,7 +347,7 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool pfn_changed;
 	struct kvm_memory_slot *slot;
 
-	if (level > PG_LEVEL_4K)
+	if (level > PG_LEVEL_PTE)
 		return;
 
 	pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
@@ -526,7 +526,7 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
 
 	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
-	WARN_ON(level < PG_LEVEL_4K);
+	WARN_ON(level < PG_LEVEL_PTE);
 	WARN_ON(gfn & (KVM_PAGES_PER_HPAGE(level) - 1));
 
 	/*
@@ -897,9 +897,9 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	 * inducing a stall to allow in-place replacement with a 1gb hugepage.
 	 *
 	 * Because zapping a SP recurses on its children, stepping down to
-	 * PG_LEVEL_4K in the iterator itself is unnecessary.
+	 * PG_LEVEL_PTE in the iterator itself is unnecessary.
 	 */
-	__tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_1G);
+	__tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_PUD);
 	__tdp_mmu_zap_root(kvm, root, shared, root->role.level);
 
 	rcu_read_unlock();
@@ -944,7 +944,7 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
 
 	rcu_read_lock();
 
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_PTE, start, end) {
 		if (can_yield &&
 		    tdp_mmu_iter_cond_resched(kvm, &iter, flush, false)) {
 			flush = false;
@@ -1303,7 +1303,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
 	/* Huge pages aren't expected to be modified without first being zapped. */
 	WARN_ON(pte_huge(range->pte) || range->start + 1 != range->end);
 
-	if (iter->level != PG_LEVEL_4K ||
+	if (iter->level != PG_LEVEL_PTE ||
 	    !is_shadow_present_pte(iter->old_spte))
 		return false;
 
@@ -1672,7 +1672,7 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (!mask)
 			break;
 
-		if (iter.level > PG_LEVEL_4K ||
+		if (iter.level > PG_LEVEL_PTE ||
 		    !(mask & (1UL << (iter.gfn - gfn))))
 			continue;
 
@@ -1726,7 +1726,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 
 	rcu_read_lock();
 
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_2M, start, end) {
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_PMD, start, end) {
 retry:
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 12/37] KVM: x86/mmu: Use PG_LEVEL_{PTE, PMD, PUD} in the TDP MMU
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Use PG_LEVEL_{PTE,PMD,PUD} in the TDP MMU instead of the x86-specific
PG_LEVEL_{4K,2M,1G} aliases. This prepares for moving the TDP MMU to
common code, where not all architectures will have 4K PAGE_SIZE.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_iter.h |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c  | 16 ++++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index f0af385c56e0..892c078aab58 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -106,7 +106,7 @@ struct tdp_iter {
 	     tdp_iter_next(&iter))
 
 #define for_each_tdp_pte(iter, root, start, end) \
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end)
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_PTE, start, end)
 
 tdp_ptep_t spte_to_child_pt(u64 pte, int level);
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 4940413d3767..bce0566f2d94 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -347,7 +347,7 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool pfn_changed;
 	struct kvm_memory_slot *slot;
 
-	if (level > PG_LEVEL_4K)
+	if (level > PG_LEVEL_PTE)
 		return;
 
 	pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
@@ -526,7 +526,7 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
 
 	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
-	WARN_ON(level < PG_LEVEL_4K);
+	WARN_ON(level < PG_LEVEL_PTE);
 	WARN_ON(gfn & (KVM_PAGES_PER_HPAGE(level) - 1));
 
 	/*
@@ -897,9 +897,9 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	 * inducing a stall to allow in-place replacement with a 1gb hugepage.
 	 *
 	 * Because zapping a SP recurses on its children, stepping down to
-	 * PG_LEVEL_4K in the iterator itself is unnecessary.
+	 * PG_LEVEL_PTE in the iterator itself is unnecessary.
 	 */
-	__tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_1G);
+	__tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_PUD);
 	__tdp_mmu_zap_root(kvm, root, shared, root->role.level);
 
 	rcu_read_unlock();
@@ -944,7 +944,7 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
 
 	rcu_read_lock();
 
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_PTE, start, end) {
 		if (can_yield &&
 		    tdp_mmu_iter_cond_resched(kvm, &iter, flush, false)) {
 			flush = false;
@@ -1303,7 +1303,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
 	/* Huge pages aren't expected to be modified without first being zapped. */
 	WARN_ON(pte_huge(range->pte) || range->start + 1 != range->end);
 
-	if (iter->level != PG_LEVEL_4K ||
+	if (iter->level != PG_LEVEL_PTE ||
 	    !is_shadow_present_pte(iter->old_spte))
 		return false;
 
@@ -1672,7 +1672,7 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (!mask)
 			break;
 
-		if (iter.level > PG_LEVEL_4K ||
+		if (iter.level > PG_LEVEL_PTE ||
 		    !(mask & (1UL << (iter.gfn - gfn))))
 			continue;
 
@@ -1726,7 +1726,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 
 	rcu_read_lock();
 
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_2M, start, end) {
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_PMD, start, end) {
 retry:
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 12/37] KVM: x86/mmu: Use PG_LEVEL_{PTE,PMD,PUD} in the TDP MMU
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Use PG_LEVEL_{PTE,PMD,PUD} in the TDP MMU instead of the x86-specific
PG_LEVEL_{4K,2M,1G} aliases. This prepares for moving the TDP MMU to
common code, where not all architectures will have 4K PAGE_SIZE.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_iter.h |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c  | 16 ++++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index f0af385c56e0..892c078aab58 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -106,7 +106,7 @@ struct tdp_iter {
 	     tdp_iter_next(&iter))
 
 #define for_each_tdp_pte(iter, root, start, end) \
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end)
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_PTE, start, end)
 
 tdp_ptep_t spte_to_child_pt(u64 pte, int level);
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 4940413d3767..bce0566f2d94 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -347,7 +347,7 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool pfn_changed;
 	struct kvm_memory_slot *slot;
 
-	if (level > PG_LEVEL_4K)
+	if (level > PG_LEVEL_PTE)
 		return;
 
 	pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
@@ -526,7 +526,7 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
 
 	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
-	WARN_ON(level < PG_LEVEL_4K);
+	WARN_ON(level < PG_LEVEL_PTE);
 	WARN_ON(gfn & (KVM_PAGES_PER_HPAGE(level) - 1));
 
 	/*
@@ -897,9 +897,9 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	 * inducing a stall to allow in-place replacement with a 1gb hugepage.
 	 *
 	 * Because zapping a SP recurses on its children, stepping down to
-	 * PG_LEVEL_4K in the iterator itself is unnecessary.
+	 * PG_LEVEL_PTE in the iterator itself is unnecessary.
 	 */
-	__tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_1G);
+	__tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_PUD);
 	__tdp_mmu_zap_root(kvm, root, shared, root->role.level);
 
 	rcu_read_unlock();
@@ -944,7 +944,7 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
 
 	rcu_read_lock();
 
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_PTE, start, end) {
 		if (can_yield &&
 		    tdp_mmu_iter_cond_resched(kvm, &iter, flush, false)) {
 			flush = false;
@@ -1303,7 +1303,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
 	/* Huge pages aren't expected to be modified without first being zapped. */
 	WARN_ON(pte_huge(range->pte) || range->start + 1 != range->end);
 
-	if (iter->level != PG_LEVEL_4K ||
+	if (iter->level != PG_LEVEL_PTE ||
 	    !is_shadow_present_pte(iter->old_spte))
 		return false;
 
@@ -1672,7 +1672,7 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (!mask)
 			break;
 
-		if (iter.level > PG_LEVEL_4K ||
+		if (iter.level > PG_LEVEL_PTE ||
 		    !(mask & (1UL << (iter.gfn - gfn))))
 			continue;
 
@@ -1726,7 +1726,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 
 	rcu_read_lock();
 
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_2M, start, end) {
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_PMD, start, end) {
 retry:
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 12/37] KVM: x86/mmu: Use PG_LEVEL_{PTE,PMD,PUD} in the TDP MMU
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Use PG_LEVEL_{PTE,PMD,PUD} in the TDP MMU instead of the x86-specific
PG_LEVEL_{4K,2M,1G} aliases. This prepares for moving the TDP MMU to
common code, where not all architectures will have 4K PAGE_SIZE.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_iter.h |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c  | 16 ++++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index f0af385c56e0..892c078aab58 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -106,7 +106,7 @@ struct tdp_iter {
 	     tdp_iter_next(&iter))
 
 #define for_each_tdp_pte(iter, root, start, end) \
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end)
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_PTE, start, end)
 
 tdp_ptep_t spte_to_child_pt(u64 pte, int level);
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 4940413d3767..bce0566f2d94 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -347,7 +347,7 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool pfn_changed;
 	struct kvm_memory_slot *slot;
 
-	if (level > PG_LEVEL_4K)
+	if (level > PG_LEVEL_PTE)
 		return;
 
 	pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
@@ -526,7 +526,7 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
 
 	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
-	WARN_ON(level < PG_LEVEL_4K);
+	WARN_ON(level < PG_LEVEL_PTE);
 	WARN_ON(gfn & (KVM_PAGES_PER_HPAGE(level) - 1));
 
 	/*
@@ -897,9 +897,9 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 	 * inducing a stall to allow in-place replacement with a 1gb hugepage.
 	 *
 	 * Because zapping a SP recurses on its children, stepping down to
-	 * PG_LEVEL_4K in the iterator itself is unnecessary.
+	 * PG_LEVEL_PTE in the iterator itself is unnecessary.
 	 */
-	__tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_1G);
+	__tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_PUD);
 	__tdp_mmu_zap_root(kvm, root, shared, root->role.level);
 
 	rcu_read_unlock();
@@ -944,7 +944,7 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
 
 	rcu_read_lock();
 
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_PTE, start, end) {
 		if (can_yield &&
 		    tdp_mmu_iter_cond_resched(kvm, &iter, flush, false)) {
 			flush = false;
@@ -1303,7 +1303,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
 	/* Huge pages aren't expected to be modified without first being zapped. */
 	WARN_ON(pte_huge(range->pte) || range->start + 1 != range->end);
 
-	if (iter->level != PG_LEVEL_4K ||
+	if (iter->level != PG_LEVEL_PTE ||
 	    !is_shadow_present_pte(iter->old_spte))
 		return false;
 
@@ -1672,7 +1672,7 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (!mask)
 			break;
 
-		if (iter.level > PG_LEVEL_4K ||
+		if (iter.level > PG_LEVEL_PTE ||
 		    !(mask & (1UL << (iter.gfn - gfn))))
 			continue;
 
@@ -1726,7 +1726,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 
 	rcu_read_lock();
 
-	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_2M, start, end) {
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_PMD, start, end) {
 retry:
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 13/37] KVM: MMU: Move sptep_to_sp() to common code
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move sptep_to_sp() to common code in preparation for moving the TDP MMU
to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/spte.h | 14 ++------------
 include/kvm/mmu.h       | 19 +++++++++++++++++++
 2 files changed, 21 insertions(+), 12 deletions(-)
 create mode 100644 include/kvm/mmu.h

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index ad84c549fe96..4c5d518e3ac6 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -3,6 +3,8 @@
 #ifndef KVM_X86_MMU_SPTE_H
 #define KVM_X86_MMU_SPTE_H
 
+#include <kvm/mmu.h>
+
 #include "mmu_internal.h"
 
 /*
@@ -219,23 +221,11 @@ static inline int spte_index(u64 *sptep)
  */
 extern u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask;
 
-static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
-{
-	struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT);
-
-	return (struct kvm_mmu_page *)page_private(page);
-}
-
 static inline struct kvm_mmu_page *spte_to_child_sp(u64 spte)
 {
 	return to_shadow_page(spte & SPTE_BASE_ADDR_MASK);
 }
 
-static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep)
-{
-	return to_shadow_page(__pa(sptep));
-}
-
 static inline bool is_mmio_spte(u64 spte)
 {
 	return (spte & shadow_mmio_mask) == shadow_mmio_value &&
diff --git a/include/kvm/mmu.h b/include/kvm/mmu.h
new file mode 100644
index 000000000000..425db8e4f8e9
--- /dev/null
+++ b/include/kvm/mmu.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_MMU_H
+#define __KVM_MMU_H
+
+#include <kvm/mmu_types.h>
+
+static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
+{
+	struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT);
+
+	return (struct kvm_mmu_page *)page_private(page);
+}
+
+static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep)
+{
+	return to_shadow_page(__pa(sptep));
+}
+
+#endif /* !__KVM_MMU_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 13/37] KVM: MMU: Move sptep_to_sp() to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move sptep_to_sp() to common code in preparation for moving the TDP MMU
to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/spte.h | 14 ++------------
 include/kvm/mmu.h       | 19 +++++++++++++++++++
 2 files changed, 21 insertions(+), 12 deletions(-)
 create mode 100644 include/kvm/mmu.h

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index ad84c549fe96..4c5d518e3ac6 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -3,6 +3,8 @@
 #ifndef KVM_X86_MMU_SPTE_H
 #define KVM_X86_MMU_SPTE_H
 
+#include <kvm/mmu.h>
+
 #include "mmu_internal.h"
 
 /*
@@ -219,23 +221,11 @@ static inline int spte_index(u64 *sptep)
  */
 extern u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask;
 
-static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
-{
-	struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT);
-
-	return (struct kvm_mmu_page *)page_private(page);
-}
-
 static inline struct kvm_mmu_page *spte_to_child_sp(u64 spte)
 {
 	return to_shadow_page(spte & SPTE_BASE_ADDR_MASK);
 }
 
-static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep)
-{
-	return to_shadow_page(__pa(sptep));
-}
-
 static inline bool is_mmio_spte(u64 spte)
 {
 	return (spte & shadow_mmio_mask) == shadow_mmio_value &&
diff --git a/include/kvm/mmu.h b/include/kvm/mmu.h
new file mode 100644
index 000000000000..425db8e4f8e9
--- /dev/null
+++ b/include/kvm/mmu.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_MMU_H
+#define __KVM_MMU_H
+
+#include <kvm/mmu_types.h>
+
+static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
+{
+	struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT);
+
+	return (struct kvm_mmu_page *)page_private(page);
+}
+
+static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep)
+{
+	return to_shadow_page(__pa(sptep));
+}
+
+#endif /* !__KVM_MMU_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 13/37] KVM: MMU: Move sptep_to_sp() to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move sptep_to_sp() to common code in preparation for moving the TDP MMU
to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/spte.h | 14 ++------------
 include/kvm/mmu.h       | 19 +++++++++++++++++++
 2 files changed, 21 insertions(+), 12 deletions(-)
 create mode 100644 include/kvm/mmu.h

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index ad84c549fe96..4c5d518e3ac6 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -3,6 +3,8 @@
 #ifndef KVM_X86_MMU_SPTE_H
 #define KVM_X86_MMU_SPTE_H
 
+#include <kvm/mmu.h>
+
 #include "mmu_internal.h"
 
 /*
@@ -219,23 +221,11 @@ static inline int spte_index(u64 *sptep)
  */
 extern u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask;
 
-static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
-{
-	struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT);
-
-	return (struct kvm_mmu_page *)page_private(page);
-}
-
 static inline struct kvm_mmu_page *spte_to_child_sp(u64 spte)
 {
 	return to_shadow_page(spte & SPTE_BASE_ADDR_MASK);
 }
 
-static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep)
-{
-	return to_shadow_page(__pa(sptep));
-}
-
 static inline bool is_mmio_spte(u64 spte)
 {
 	return (spte & shadow_mmio_mask) == shadow_mmio_value &&
diff --git a/include/kvm/mmu.h b/include/kvm/mmu.h
new file mode 100644
index 000000000000..425db8e4f8e9
--- /dev/null
+++ b/include/kvm/mmu.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_MMU_H
+#define __KVM_MMU_H
+
+#include <kvm/mmu_types.h>
+
+static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
+{
+	struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT);
+
+	return (struct kvm_mmu_page *)page_private(page);
+}
+
+static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep)
+{
+	return to_shadow_page(__pa(sptep));
+}
+
+#endif /* !__KVM_MMU_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 13/37] KVM: MMU: Move sptep_to_sp() to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move sptep_to_sp() to common code in preparation for moving the TDP MMU
to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/spte.h | 14 ++------------
 include/kvm/mmu.h       | 19 +++++++++++++++++++
 2 files changed, 21 insertions(+), 12 deletions(-)
 create mode 100644 include/kvm/mmu.h

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index ad84c549fe96..4c5d518e3ac6 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -3,6 +3,8 @@
 #ifndef KVM_X86_MMU_SPTE_H
 #define KVM_X86_MMU_SPTE_H
 
+#include <kvm/mmu.h>
+
 #include "mmu_internal.h"
 
 /*
@@ -219,23 +221,11 @@ static inline int spte_index(u64 *sptep)
  */
 extern u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask;
 
-static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
-{
-	struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT);
-
-	return (struct kvm_mmu_page *)page_private(page);
-}
-
 static inline struct kvm_mmu_page *spte_to_child_sp(u64 spte)
 {
 	return to_shadow_page(spte & SPTE_BASE_ADDR_MASK);
 }
 
-static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep)
-{
-	return to_shadow_page(__pa(sptep));
-}
-
 static inline bool is_mmio_spte(u64 spte)
 {
 	return (spte & shadow_mmio_mask) == shadow_mmio_value &&
diff --git a/include/kvm/mmu.h b/include/kvm/mmu.h
new file mode 100644
index 000000000000..425db8e4f8e9
--- /dev/null
+++ b/include/kvm/mmu.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_MMU_H
+#define __KVM_MMU_H
+
+#include <kvm/mmu_types.h>
+
+static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
+{
+	struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT);
+
+	return (struct kvm_mmu_page *)page_private(page);
+}
+
+static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep)
+{
+	return to_shadow_page(__pa(sptep));
+}
+
+#endif /* !__KVM_MMU_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 14/37] KVM: MMU: Introduce common macros for TDP page tables
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce macros in common KVM code for dealing with TDP page tables.
TDP page tables are assumed to be PAGE_SIZE with 64-bit PTEs. ARM will
have some nuance, e.g. for root page table concatenation, but that will
be handled separately when the time comes. Furthermore, we can add
arch-specific overrides for any of these macros in the future on a case
by case basis.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_iter.c | 14 +++++++-------
 arch/x86/kvm/mmu/tdp_iter.h |  3 ++-
 arch/x86/kvm/mmu/tdp_mmu.c  | 24 +++++++++++++-----------
 include/kvm/tdp_pgtable.h   | 21 +++++++++++++++++++++
 4 files changed, 43 insertions(+), 19 deletions(-)
 create mode 100644 include/kvm/tdp_pgtable.h

diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index 4a7d58bf81c4..d6328dac9cd3 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -10,14 +10,15 @@
  */
 static void tdp_iter_refresh_sptep(struct tdp_iter *iter)
 {
-	iter->sptep = iter->pt_path[iter->level - 1] +
-		SPTE_INDEX(iter->gfn << PAGE_SHIFT, iter->level);
+	int pte_index = TDP_PTE_INDEX(iter->gfn, iter->level);
+
+	iter->sptep = iter->pt_path[iter->level - 1] + pte_index;
 	iter->old_spte = kvm_tdp_mmu_read_spte(iter->sptep);
 }
 
 static gfn_t round_gfn_for_level(gfn_t gfn, int level)
 {
-	return gfn & -KVM_PAGES_PER_HPAGE(level);
+	return gfn & -TDP_PAGES_PER_LEVEL(level);
 }
 
 /*
@@ -46,7 +47,7 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 	int root_level = root->role.level;
 
 	WARN_ON(root_level < 1);
-	WARN_ON(root_level > PT64_ROOT_MAX_LEVEL);
+	WARN_ON(root_level > TDP_ROOT_MAX_LEVEL);
 
 	iter->next_last_level_gfn = next_last_level_gfn;
 	iter->root_level = root_level;
@@ -116,11 +117,10 @@ static bool try_step_side(struct tdp_iter *iter)
 	 * Check if the iterator is already at the end of the current page
 	 * table.
 	 */
-	if (SPTE_INDEX(iter->gfn << PAGE_SHIFT, iter->level) ==
-	    (SPTE_ENT_PER_PAGE - 1))
+	if (TDP_PTE_INDEX(iter->gfn, iter->level) == (TDP_PTES_PER_PAGE - 1))
 		return false;
 
-	iter->gfn += KVM_PAGES_PER_HPAGE(iter->level);
+	iter->gfn += TDP_PAGES_PER_LEVEL(iter->level);
 	iter->next_last_level_gfn = iter->gfn;
 	iter->sptep++;
 	iter->old_spte = kvm_tdp_mmu_read_spte(iter->sptep);
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index 892c078aab58..bfac83ab52db 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -4,6 +4,7 @@
 #define __KVM_X86_MMU_TDP_ITER_H
 
 #include <linux/kvm_host.h>
+#include <kvm/tdp_pgtable.h>
 
 #include "mmu.h"
 #include "spte.h"
@@ -68,7 +69,7 @@ struct tdp_iter {
 	 */
 	gfn_t yielded_gfn;
 	/* Pointers to the page tables traversed to reach the current SPTE */
-	tdp_ptep_t pt_path[PT64_ROOT_MAX_LEVEL];
+	tdp_ptep_t pt_path[TDP_ROOT_MAX_LEVEL];
 	/* A pointer to the current SPTE */
 	tdp_ptep_t sptep;
 	/* The lowest GFN mapped by the current SPTE */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index bce0566f2d94..a6d6e393c009 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -7,6 +7,8 @@
 #include "tdp_mmu.h"
 #include "spte.h"
 
+#include <kvm/tdp_pgtable.h>
+
 #include <asm/cmpxchg.h>
 #include <trace/events/kvm.h>
 
@@ -428,9 +430,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 
 	tdp_mmu_unlink_sp(kvm, sp, shared);
 
-	for (i = 0; i < SPTE_ENT_PER_PAGE; i++) {
+	for (i = 0; i < TDP_PTES_PER_PAGE; i++) {
 		tdp_ptep_t sptep = pt + i;
-		gfn_t gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
+		gfn_t gfn = base_gfn + i * TDP_PAGES_PER_LEVEL(level);
 		u64 old_spte;
 
 		if (shared) {
@@ -525,9 +527,9 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool is_leaf = is_present && is_last_spte(new_spte, level);
 	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
 
-	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
+	WARN_ON(level > TDP_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_PTE);
-	WARN_ON(gfn & (KVM_PAGES_PER_HPAGE(level) - 1));
+	WARN_ON(gfn & (TDP_PAGES_PER_LEVEL(level) - 1));
 
 	/*
 	 * If this warning were to trigger it would indicate that there was a
@@ -677,7 +679,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
 		return ret;
 
 	kvm_flush_remote_tlbs_with_address(kvm, iter->gfn,
-					   KVM_PAGES_PER_HPAGE(iter->level));
+					   TDP_PAGES_PER_LEVEL(iter->level));
 
 	/*
 	 * No other thread can overwrite the removed SPTE as they must either
@@ -1075,7 +1077,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 	else if (is_shadow_present_pte(iter->old_spte) &&
 		 !is_last_spte(iter->old_spte, iter->level))
 		kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
-						   KVM_PAGES_PER_HPAGE(iter->level + 1));
+						   TDP_PAGES_PER_LEVEL(iter->level + 1));
 
 	/*
 	 * If the page fault was caused by a write but the page is write
@@ -1355,7 +1357,7 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 
 	rcu_read_lock();
 
-	BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL);
+	BUG_ON(min_level > TDP_MAX_HUGEPAGE_LEVEL);
 
 	for_each_tdp_pte_min_level(iter, root, min_level, start, end) {
 retry:
@@ -1469,7 +1471,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 	 * No need for atomics when writing to sp->spt since the page table has
 	 * not been linked in yet and thus is not reachable from any other CPU.
 	 */
-	for (i = 0; i < SPTE_ENT_PER_PAGE; i++)
+	for (i = 0; i < TDP_PTES_PER_PAGE; i++)
 		sp->spt[i] = make_huge_page_split_spte(kvm, huge_spte, sp->role, i);
 
 	/*
@@ -1489,7 +1491,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 	 * are overwriting from the page stats. But we have to manually update
 	 * the page stats with the new present child pages.
 	 */
-	kvm_update_page_stats(kvm, level - 1, SPTE_ENT_PER_PAGE);
+	kvm_update_page_stats(kvm, level - 1, TDP_PTES_PER_PAGE);
 
 out:
 	trace_kvm_mmu_split_huge_page(iter->gfn, huge_spte, level, ret);
@@ -1731,7 +1733,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (iter.level > KVM_MAX_HUGEPAGE_LEVEL ||
+		if (iter.level > TDP_MAX_HUGEPAGE_LEVEL ||
 		    !is_shadow_present_pte(iter.old_spte))
 			continue;
 
@@ -1793,7 +1795,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 	u64 new_spte;
 	bool spte_set = false;
 
-	BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL);
+	BUG_ON(min_level > TDP_MAX_HUGEPAGE_LEVEL);
 
 	rcu_read_lock();
 
diff --git a/include/kvm/tdp_pgtable.h b/include/kvm/tdp_pgtable.h
new file mode 100644
index 000000000000..968be8d92350
--- /dev/null
+++ b/include/kvm/tdp_pgtable.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_TDP_PGTABLE_H
+#define __KVM_TDP_PGTABLE_H
+
+#include <linux/log2.h>
+#include <linux/mm_types.h>
+
+#define TDP_ROOT_MAX_LEVEL	5
+#define TDP_MAX_HUGEPAGE_LEVEL	PG_LEVEL_PUD
+#define TDP_PTES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
+#define TDP_LEVEL_BITS		ilog2(TDP_PTES_PER_PAGE)
+#define TDP_LEVEL_MASK		((1UL << TDP_LEVEL_BITS) - 1)
+
+#define TDP_LEVEL_SHIFT(level) (((level) - 1) * TDP_LEVEL_BITS)
+
+#define TDP_PAGES_PER_LEVEL(level) (1UL << TDP_LEVEL_SHIFT(level))
+
+#define TDP_PTE_INDEX(gfn, level) \
+	(((gfn) >> TDP_LEVEL_SHIFT(level)) & TDP_LEVEL_MASK)
+
+#endif /* !__KVM_TDP_PGTABLE_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 14/37] KVM: MMU: Introduce common macros for TDP page tables
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Introduce macros in common KVM code for dealing with TDP page tables.
TDP page tables are assumed to be PAGE_SIZE with 64-bit PTEs. ARM will
have some nuance, e.g. for root page table concatenation, but that will
be handled separately when the time comes. Furthermore, we can add
arch-specific overrides for any of these macros in the future on a case
by case basis.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_iter.c | 14 +++++++-------
 arch/x86/kvm/mmu/tdp_iter.h |  3 ++-
 arch/x86/kvm/mmu/tdp_mmu.c  | 24 +++++++++++++-----------
 include/kvm/tdp_pgtable.h   | 21 +++++++++++++++++++++
 4 files changed, 43 insertions(+), 19 deletions(-)
 create mode 100644 include/kvm/tdp_pgtable.h

diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index 4a7d58bf81c4..d6328dac9cd3 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -10,14 +10,15 @@
  */
 static void tdp_iter_refresh_sptep(struct tdp_iter *iter)
 {
-	iter->sptep = iter->pt_path[iter->level - 1] +
-		SPTE_INDEX(iter->gfn << PAGE_SHIFT, iter->level);
+	int pte_index = TDP_PTE_INDEX(iter->gfn, iter->level);
+
+	iter->sptep = iter->pt_path[iter->level - 1] + pte_index;
 	iter->old_spte = kvm_tdp_mmu_read_spte(iter->sptep);
 }
 
 static gfn_t round_gfn_for_level(gfn_t gfn, int level)
 {
-	return gfn & -KVM_PAGES_PER_HPAGE(level);
+	return gfn & -TDP_PAGES_PER_LEVEL(level);
 }
 
 /*
@@ -46,7 +47,7 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 	int root_level = root->role.level;
 
 	WARN_ON(root_level < 1);
-	WARN_ON(root_level > PT64_ROOT_MAX_LEVEL);
+	WARN_ON(root_level > TDP_ROOT_MAX_LEVEL);
 
 	iter->next_last_level_gfn = next_last_level_gfn;
 	iter->root_level = root_level;
@@ -116,11 +117,10 @@ static bool try_step_side(struct tdp_iter *iter)
 	 * Check if the iterator is already at the end of the current page
 	 * table.
 	 */
-	if (SPTE_INDEX(iter->gfn << PAGE_SHIFT, iter->level) ==
-	    (SPTE_ENT_PER_PAGE - 1))
+	if (TDP_PTE_INDEX(iter->gfn, iter->level) == (TDP_PTES_PER_PAGE - 1))
 		return false;
 
-	iter->gfn += KVM_PAGES_PER_HPAGE(iter->level);
+	iter->gfn += TDP_PAGES_PER_LEVEL(iter->level);
 	iter->next_last_level_gfn = iter->gfn;
 	iter->sptep++;
 	iter->old_spte = kvm_tdp_mmu_read_spte(iter->sptep);
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index 892c078aab58..bfac83ab52db 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -4,6 +4,7 @@
 #define __KVM_X86_MMU_TDP_ITER_H
 
 #include <linux/kvm_host.h>
+#include <kvm/tdp_pgtable.h>
 
 #include "mmu.h"
 #include "spte.h"
@@ -68,7 +69,7 @@ struct tdp_iter {
 	 */
 	gfn_t yielded_gfn;
 	/* Pointers to the page tables traversed to reach the current SPTE */
-	tdp_ptep_t pt_path[PT64_ROOT_MAX_LEVEL];
+	tdp_ptep_t pt_path[TDP_ROOT_MAX_LEVEL];
 	/* A pointer to the current SPTE */
 	tdp_ptep_t sptep;
 	/* The lowest GFN mapped by the current SPTE */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index bce0566f2d94..a6d6e393c009 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -7,6 +7,8 @@
 #include "tdp_mmu.h"
 #include "spte.h"
 
+#include <kvm/tdp_pgtable.h>
+
 #include <asm/cmpxchg.h>
 #include <trace/events/kvm.h>
 
@@ -428,9 +430,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 
 	tdp_mmu_unlink_sp(kvm, sp, shared);
 
-	for (i = 0; i < SPTE_ENT_PER_PAGE; i++) {
+	for (i = 0; i < TDP_PTES_PER_PAGE; i++) {
 		tdp_ptep_t sptep = pt + i;
-		gfn_t gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
+		gfn_t gfn = base_gfn + i * TDP_PAGES_PER_LEVEL(level);
 		u64 old_spte;
 
 		if (shared) {
@@ -525,9 +527,9 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool is_leaf = is_present && is_last_spte(new_spte, level);
 	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
 
-	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
+	WARN_ON(level > TDP_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_PTE);
-	WARN_ON(gfn & (KVM_PAGES_PER_HPAGE(level) - 1));
+	WARN_ON(gfn & (TDP_PAGES_PER_LEVEL(level) - 1));
 
 	/*
 	 * If this warning were to trigger it would indicate that there was a
@@ -677,7 +679,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
 		return ret;
 
 	kvm_flush_remote_tlbs_with_address(kvm, iter->gfn,
-					   KVM_PAGES_PER_HPAGE(iter->level));
+					   TDP_PAGES_PER_LEVEL(iter->level));
 
 	/*
 	 * No other thread can overwrite the removed SPTE as they must either
@@ -1075,7 +1077,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 	else if (is_shadow_present_pte(iter->old_spte) &&
 		 !is_last_spte(iter->old_spte, iter->level))
 		kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
-						   KVM_PAGES_PER_HPAGE(iter->level + 1));
+						   TDP_PAGES_PER_LEVEL(iter->level + 1));
 
 	/*
 	 * If the page fault was caused by a write but the page is write
@@ -1355,7 +1357,7 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 
 	rcu_read_lock();
 
-	BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL);
+	BUG_ON(min_level > TDP_MAX_HUGEPAGE_LEVEL);
 
 	for_each_tdp_pte_min_level(iter, root, min_level, start, end) {
 retry:
@@ -1469,7 +1471,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 	 * No need for atomics when writing to sp->spt since the page table has
 	 * not been linked in yet and thus is not reachable from any other CPU.
 	 */
-	for (i = 0; i < SPTE_ENT_PER_PAGE; i++)
+	for (i = 0; i < TDP_PTES_PER_PAGE; i++)
 		sp->spt[i] = make_huge_page_split_spte(kvm, huge_spte, sp->role, i);
 
 	/*
@@ -1489,7 +1491,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 	 * are overwriting from the page stats. But we have to manually update
 	 * the page stats with the new present child pages.
 	 */
-	kvm_update_page_stats(kvm, level - 1, SPTE_ENT_PER_PAGE);
+	kvm_update_page_stats(kvm, level - 1, TDP_PTES_PER_PAGE);
 
 out:
 	trace_kvm_mmu_split_huge_page(iter->gfn, huge_spte, level, ret);
@@ -1731,7 +1733,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (iter.level > KVM_MAX_HUGEPAGE_LEVEL ||
+		if (iter.level > TDP_MAX_HUGEPAGE_LEVEL ||
 		    !is_shadow_present_pte(iter.old_spte))
 			continue;
 
@@ -1793,7 +1795,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 	u64 new_spte;
 	bool spte_set = false;
 
-	BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL);
+	BUG_ON(min_level > TDP_MAX_HUGEPAGE_LEVEL);
 
 	rcu_read_lock();
 
diff --git a/include/kvm/tdp_pgtable.h b/include/kvm/tdp_pgtable.h
new file mode 100644
index 000000000000..968be8d92350
--- /dev/null
+++ b/include/kvm/tdp_pgtable.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_TDP_PGTABLE_H
+#define __KVM_TDP_PGTABLE_H
+
+#include <linux/log2.h>
+#include <linux/mm_types.h>
+
+#define TDP_ROOT_MAX_LEVEL	5
+#define TDP_MAX_HUGEPAGE_LEVEL	PG_LEVEL_PUD
+#define TDP_PTES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
+#define TDP_LEVEL_BITS		ilog2(TDP_PTES_PER_PAGE)
+#define TDP_LEVEL_MASK		((1UL << TDP_LEVEL_BITS) - 1)
+
+#define TDP_LEVEL_SHIFT(level) (((level) - 1) * TDP_LEVEL_BITS)
+
+#define TDP_PAGES_PER_LEVEL(level) (1UL << TDP_LEVEL_SHIFT(level))
+
+#define TDP_PTE_INDEX(gfn, level) \
+	(((gfn) >> TDP_LEVEL_SHIFT(level)) & TDP_LEVEL_MASK)
+
+#endif /* !__KVM_TDP_PGTABLE_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 14/37] KVM: MMU: Introduce common macros for TDP page tables
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce macros in common KVM code for dealing with TDP page tables.
TDP page tables are assumed to be PAGE_SIZE with 64-bit PTEs. ARM will
have some nuance, e.g. for root page table concatenation, but that will
be handled separately when the time comes. Furthermore, we can add
arch-specific overrides for any of these macros in the future on a case
by case basis.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_iter.c | 14 +++++++-------
 arch/x86/kvm/mmu/tdp_iter.h |  3 ++-
 arch/x86/kvm/mmu/tdp_mmu.c  | 24 +++++++++++++-----------
 include/kvm/tdp_pgtable.h   | 21 +++++++++++++++++++++
 4 files changed, 43 insertions(+), 19 deletions(-)
 create mode 100644 include/kvm/tdp_pgtable.h

diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index 4a7d58bf81c4..d6328dac9cd3 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -10,14 +10,15 @@
  */
 static void tdp_iter_refresh_sptep(struct tdp_iter *iter)
 {
-	iter->sptep = iter->pt_path[iter->level - 1] +
-		SPTE_INDEX(iter->gfn << PAGE_SHIFT, iter->level);
+	int pte_index = TDP_PTE_INDEX(iter->gfn, iter->level);
+
+	iter->sptep = iter->pt_path[iter->level - 1] + pte_index;
 	iter->old_spte = kvm_tdp_mmu_read_spte(iter->sptep);
 }
 
 static gfn_t round_gfn_for_level(gfn_t gfn, int level)
 {
-	return gfn & -KVM_PAGES_PER_HPAGE(level);
+	return gfn & -TDP_PAGES_PER_LEVEL(level);
 }
 
 /*
@@ -46,7 +47,7 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 	int root_level = root->role.level;
 
 	WARN_ON(root_level < 1);
-	WARN_ON(root_level > PT64_ROOT_MAX_LEVEL);
+	WARN_ON(root_level > TDP_ROOT_MAX_LEVEL);
 
 	iter->next_last_level_gfn = next_last_level_gfn;
 	iter->root_level = root_level;
@@ -116,11 +117,10 @@ static bool try_step_side(struct tdp_iter *iter)
 	 * Check if the iterator is already at the end of the current page
 	 * table.
 	 */
-	if (SPTE_INDEX(iter->gfn << PAGE_SHIFT, iter->level) ==
-	    (SPTE_ENT_PER_PAGE - 1))
+	if (TDP_PTE_INDEX(iter->gfn, iter->level) == (TDP_PTES_PER_PAGE - 1))
 		return false;
 
-	iter->gfn += KVM_PAGES_PER_HPAGE(iter->level);
+	iter->gfn += TDP_PAGES_PER_LEVEL(iter->level);
 	iter->next_last_level_gfn = iter->gfn;
 	iter->sptep++;
 	iter->old_spte = kvm_tdp_mmu_read_spte(iter->sptep);
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index 892c078aab58..bfac83ab52db 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -4,6 +4,7 @@
 #define __KVM_X86_MMU_TDP_ITER_H
 
 #include <linux/kvm_host.h>
+#include <kvm/tdp_pgtable.h>
 
 #include "mmu.h"
 #include "spte.h"
@@ -68,7 +69,7 @@ struct tdp_iter {
 	 */
 	gfn_t yielded_gfn;
 	/* Pointers to the page tables traversed to reach the current SPTE */
-	tdp_ptep_t pt_path[PT64_ROOT_MAX_LEVEL];
+	tdp_ptep_t pt_path[TDP_ROOT_MAX_LEVEL];
 	/* A pointer to the current SPTE */
 	tdp_ptep_t sptep;
 	/* The lowest GFN mapped by the current SPTE */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index bce0566f2d94..a6d6e393c009 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -7,6 +7,8 @@
 #include "tdp_mmu.h"
 #include "spte.h"
 
+#include <kvm/tdp_pgtable.h>
+
 #include <asm/cmpxchg.h>
 #include <trace/events/kvm.h>
 
@@ -428,9 +430,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 
 	tdp_mmu_unlink_sp(kvm, sp, shared);
 
-	for (i = 0; i < SPTE_ENT_PER_PAGE; i++) {
+	for (i = 0; i < TDP_PTES_PER_PAGE; i++) {
 		tdp_ptep_t sptep = pt + i;
-		gfn_t gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
+		gfn_t gfn = base_gfn + i * TDP_PAGES_PER_LEVEL(level);
 		u64 old_spte;
 
 		if (shared) {
@@ -525,9 +527,9 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool is_leaf = is_present && is_last_spte(new_spte, level);
 	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
 
-	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
+	WARN_ON(level > TDP_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_PTE);
-	WARN_ON(gfn & (KVM_PAGES_PER_HPAGE(level) - 1));
+	WARN_ON(gfn & (TDP_PAGES_PER_LEVEL(level) - 1));
 
 	/*
 	 * If this warning were to trigger it would indicate that there was a
@@ -677,7 +679,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
 		return ret;
 
 	kvm_flush_remote_tlbs_with_address(kvm, iter->gfn,
-					   KVM_PAGES_PER_HPAGE(iter->level));
+					   TDP_PAGES_PER_LEVEL(iter->level));
 
 	/*
 	 * No other thread can overwrite the removed SPTE as they must either
@@ -1075,7 +1077,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 	else if (is_shadow_present_pte(iter->old_spte) &&
 		 !is_last_spte(iter->old_spte, iter->level))
 		kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
-						   KVM_PAGES_PER_HPAGE(iter->level + 1));
+						   TDP_PAGES_PER_LEVEL(iter->level + 1));
 
 	/*
 	 * If the page fault was caused by a write but the page is write
@@ -1355,7 +1357,7 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 
 	rcu_read_lock();
 
-	BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL);
+	BUG_ON(min_level > TDP_MAX_HUGEPAGE_LEVEL);
 
 	for_each_tdp_pte_min_level(iter, root, min_level, start, end) {
 retry:
@@ -1469,7 +1471,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 	 * No need for atomics when writing to sp->spt since the page table has
 	 * not been linked in yet and thus is not reachable from any other CPU.
 	 */
-	for (i = 0; i < SPTE_ENT_PER_PAGE; i++)
+	for (i = 0; i < TDP_PTES_PER_PAGE; i++)
 		sp->spt[i] = make_huge_page_split_spte(kvm, huge_spte, sp->role, i);
 
 	/*
@@ -1489,7 +1491,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 	 * are overwriting from the page stats. But we have to manually update
 	 * the page stats with the new present child pages.
 	 */
-	kvm_update_page_stats(kvm, level - 1, SPTE_ENT_PER_PAGE);
+	kvm_update_page_stats(kvm, level - 1, TDP_PTES_PER_PAGE);
 
 out:
 	trace_kvm_mmu_split_huge_page(iter->gfn, huge_spte, level, ret);
@@ -1731,7 +1733,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (iter.level > KVM_MAX_HUGEPAGE_LEVEL ||
+		if (iter.level > TDP_MAX_HUGEPAGE_LEVEL ||
 		    !is_shadow_present_pte(iter.old_spte))
 			continue;
 
@@ -1793,7 +1795,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 	u64 new_spte;
 	bool spte_set = false;
 
-	BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL);
+	BUG_ON(min_level > TDP_MAX_HUGEPAGE_LEVEL);
 
 	rcu_read_lock();
 
diff --git a/include/kvm/tdp_pgtable.h b/include/kvm/tdp_pgtable.h
new file mode 100644
index 000000000000..968be8d92350
--- /dev/null
+++ b/include/kvm/tdp_pgtable.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_TDP_PGTABLE_H
+#define __KVM_TDP_PGTABLE_H
+
+#include <linux/log2.h>
+#include <linux/mm_types.h>
+
+#define TDP_ROOT_MAX_LEVEL	5
+#define TDP_MAX_HUGEPAGE_LEVEL	PG_LEVEL_PUD
+#define TDP_PTES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
+#define TDP_LEVEL_BITS		ilog2(TDP_PTES_PER_PAGE)
+#define TDP_LEVEL_MASK		((1UL << TDP_LEVEL_BITS) - 1)
+
+#define TDP_LEVEL_SHIFT(level) (((level) - 1) * TDP_LEVEL_BITS)
+
+#define TDP_PAGES_PER_LEVEL(level) (1UL << TDP_LEVEL_SHIFT(level))
+
+#define TDP_PTE_INDEX(gfn, level) \
+	(((gfn) >> TDP_LEVEL_SHIFT(level)) & TDP_LEVEL_MASK)
+
+#endif /* !__KVM_TDP_PGTABLE_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 14/37] KVM: MMU: Introduce common macros for TDP page tables
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce macros in common KVM code for dealing with TDP page tables.
TDP page tables are assumed to be PAGE_SIZE with 64-bit PTEs. ARM will
have some nuance, e.g. for root page table concatenation, but that will
be handled separately when the time comes. Furthermore, we can add
arch-specific overrides for any of these macros in the future on a case
by case basis.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_iter.c | 14 +++++++-------
 arch/x86/kvm/mmu/tdp_iter.h |  3 ++-
 arch/x86/kvm/mmu/tdp_mmu.c  | 24 +++++++++++++-----------
 include/kvm/tdp_pgtable.h   | 21 +++++++++++++++++++++
 4 files changed, 43 insertions(+), 19 deletions(-)
 create mode 100644 include/kvm/tdp_pgtable.h

diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index 4a7d58bf81c4..d6328dac9cd3 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -10,14 +10,15 @@
  */
 static void tdp_iter_refresh_sptep(struct tdp_iter *iter)
 {
-	iter->sptep = iter->pt_path[iter->level - 1] +
-		SPTE_INDEX(iter->gfn << PAGE_SHIFT, iter->level);
+	int pte_index = TDP_PTE_INDEX(iter->gfn, iter->level);
+
+	iter->sptep = iter->pt_path[iter->level - 1] + pte_index;
 	iter->old_spte = kvm_tdp_mmu_read_spte(iter->sptep);
 }
 
 static gfn_t round_gfn_for_level(gfn_t gfn, int level)
 {
-	return gfn & -KVM_PAGES_PER_HPAGE(level);
+	return gfn & -TDP_PAGES_PER_LEVEL(level);
 }
 
 /*
@@ -46,7 +47,7 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 	int root_level = root->role.level;
 
 	WARN_ON(root_level < 1);
-	WARN_ON(root_level > PT64_ROOT_MAX_LEVEL);
+	WARN_ON(root_level > TDP_ROOT_MAX_LEVEL);
 
 	iter->next_last_level_gfn = next_last_level_gfn;
 	iter->root_level = root_level;
@@ -116,11 +117,10 @@ static bool try_step_side(struct tdp_iter *iter)
 	 * Check if the iterator is already at the end of the current page
 	 * table.
 	 */
-	if (SPTE_INDEX(iter->gfn << PAGE_SHIFT, iter->level) ==
-	    (SPTE_ENT_PER_PAGE - 1))
+	if (TDP_PTE_INDEX(iter->gfn, iter->level) == (TDP_PTES_PER_PAGE - 1))
 		return false;
 
-	iter->gfn += KVM_PAGES_PER_HPAGE(iter->level);
+	iter->gfn += TDP_PAGES_PER_LEVEL(iter->level);
 	iter->next_last_level_gfn = iter->gfn;
 	iter->sptep++;
 	iter->old_spte = kvm_tdp_mmu_read_spte(iter->sptep);
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index 892c078aab58..bfac83ab52db 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -4,6 +4,7 @@
 #define __KVM_X86_MMU_TDP_ITER_H
 
 #include <linux/kvm_host.h>
+#include <kvm/tdp_pgtable.h>
 
 #include "mmu.h"
 #include "spte.h"
@@ -68,7 +69,7 @@ struct tdp_iter {
 	 */
 	gfn_t yielded_gfn;
 	/* Pointers to the page tables traversed to reach the current SPTE */
-	tdp_ptep_t pt_path[PT64_ROOT_MAX_LEVEL];
+	tdp_ptep_t pt_path[TDP_ROOT_MAX_LEVEL];
 	/* A pointer to the current SPTE */
 	tdp_ptep_t sptep;
 	/* The lowest GFN mapped by the current SPTE */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index bce0566f2d94..a6d6e393c009 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -7,6 +7,8 @@
 #include "tdp_mmu.h"
 #include "spte.h"
 
+#include <kvm/tdp_pgtable.h>
+
 #include <asm/cmpxchg.h>
 #include <trace/events/kvm.h>
 
@@ -428,9 +430,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 
 	tdp_mmu_unlink_sp(kvm, sp, shared);
 
-	for (i = 0; i < SPTE_ENT_PER_PAGE; i++) {
+	for (i = 0; i < TDP_PTES_PER_PAGE; i++) {
 		tdp_ptep_t sptep = pt + i;
-		gfn_t gfn = base_gfn + i * KVM_PAGES_PER_HPAGE(level);
+		gfn_t gfn = base_gfn + i * TDP_PAGES_PER_LEVEL(level);
 		u64 old_spte;
 
 		if (shared) {
@@ -525,9 +527,9 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	bool is_leaf = is_present && is_last_spte(new_spte, level);
 	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
 
-	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
+	WARN_ON(level > TDP_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_PTE);
-	WARN_ON(gfn & (KVM_PAGES_PER_HPAGE(level) - 1));
+	WARN_ON(gfn & (TDP_PAGES_PER_LEVEL(level) - 1));
 
 	/*
 	 * If this warning were to trigger it would indicate that there was a
@@ -677,7 +679,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
 		return ret;
 
 	kvm_flush_remote_tlbs_with_address(kvm, iter->gfn,
-					   KVM_PAGES_PER_HPAGE(iter->level));
+					   TDP_PAGES_PER_LEVEL(iter->level));
 
 	/*
 	 * No other thread can overwrite the removed SPTE as they must either
@@ -1075,7 +1077,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 	else if (is_shadow_present_pte(iter->old_spte) &&
 		 !is_last_spte(iter->old_spte, iter->level))
 		kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
-						   KVM_PAGES_PER_HPAGE(iter->level + 1));
+						   TDP_PAGES_PER_LEVEL(iter->level + 1));
 
 	/*
 	 * If the page fault was caused by a write but the page is write
@@ -1355,7 +1357,7 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 
 	rcu_read_lock();
 
-	BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL);
+	BUG_ON(min_level > TDP_MAX_HUGEPAGE_LEVEL);
 
 	for_each_tdp_pte_min_level(iter, root, min_level, start, end) {
 retry:
@@ -1469,7 +1471,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 	 * No need for atomics when writing to sp->spt since the page table has
 	 * not been linked in yet and thus is not reachable from any other CPU.
 	 */
-	for (i = 0; i < SPTE_ENT_PER_PAGE; i++)
+	for (i = 0; i < TDP_PTES_PER_PAGE; i++)
 		sp->spt[i] = make_huge_page_split_spte(kvm, huge_spte, sp->role, i);
 
 	/*
@@ -1489,7 +1491,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 	 * are overwriting from the page stats. But we have to manually update
 	 * the page stats with the new present child pages.
 	 */
-	kvm_update_page_stats(kvm, level - 1, SPTE_ENT_PER_PAGE);
+	kvm_update_page_stats(kvm, level - 1, TDP_PTES_PER_PAGE);
 
 out:
 	trace_kvm_mmu_split_huge_page(iter->gfn, huge_spte, level, ret);
@@ -1731,7 +1733,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (iter.level > KVM_MAX_HUGEPAGE_LEVEL ||
+		if (iter.level > TDP_MAX_HUGEPAGE_LEVEL ||
 		    !is_shadow_present_pte(iter.old_spte))
 			continue;
 
@@ -1793,7 +1795,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 	u64 new_spte;
 	bool spte_set = false;
 
-	BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL);
+	BUG_ON(min_level > TDP_MAX_HUGEPAGE_LEVEL);
 
 	rcu_read_lock();
 
diff --git a/include/kvm/tdp_pgtable.h b/include/kvm/tdp_pgtable.h
new file mode 100644
index 000000000000..968be8d92350
--- /dev/null
+++ b/include/kvm/tdp_pgtable.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_TDP_PGTABLE_H
+#define __KVM_TDP_PGTABLE_H
+
+#include <linux/log2.h>
+#include <linux/mm_types.h>
+
+#define TDP_ROOT_MAX_LEVEL	5
+#define TDP_MAX_HUGEPAGE_LEVEL	PG_LEVEL_PUD
+#define TDP_PTES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
+#define TDP_LEVEL_BITS		ilog2(TDP_PTES_PER_PAGE)
+#define TDP_LEVEL_MASK		((1UL << TDP_LEVEL_BITS) - 1)
+
+#define TDP_LEVEL_SHIFT(level) (((level) - 1) * TDP_LEVEL_BITS)
+
+#define TDP_PAGES_PER_LEVEL(level) (1UL << TDP_LEVEL_SHIFT(level))
+
+#define TDP_PTE_INDEX(gfn, level) \
+	(((gfn) >> TDP_LEVEL_SHIFT(level)) & TDP_LEVEL_MASK)
+
+#endif /* !__KVM_TDP_PGTABLE_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 15/37] KVM: x86/mmu: Add a common API for inspecting/modifying TDP PTEs
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Introduce an API for inspecting and modifying TDP PTEs from common code.
This will be used in future commits to move the TDP MMU to common code.

Specifically, introduce the following API that can be used in common
code:

  /* Inspection API */
  tdp_pte_is_present()
  tdp_pte_is_writable()
  tdp_pte_is_huge()
  tdp_pte_is_leaf()
  tdp_pte_is_accessed()
  tdp_pte_is_dirty()
  tdp_pte_is_mmio()
  tdp_pte_is_volatile()
  tdp_pte_to_pfn()
  tdp_pte_check_leaf_invariants()

  /* Modification API */
  tdp_pte_clear_writable()
  tdp_pte_clear_mmu_writable()
  tdp_pte_clear_dirty()
  tdp_pte_clear_accessed()

Note that this does not cover constructing PTEs from scratch (e.g.
during page fault handling). This will be added in a subsequent commit.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h |  58 +++++++++
 arch/x86/kvm/Makefile                  |   2 +-
 arch/x86/kvm/mmu/spte.c                |   3 +-
 arch/x86/kvm/mmu/spte.h                |  22 ----
 arch/x86/kvm/mmu/tdp_iter.c            |   4 +-
 arch/x86/kvm/mmu/tdp_iter.h            |   5 +-
 arch/x86/kvm/mmu/tdp_mmu.c             | 171 +++++++++++--------------
 arch/x86/kvm/mmu/tdp_pgtable.c         |  72 +++++++++++
 include/kvm/tdp_pgtable.h              |  18 +++
 9 files changed, 231 insertions(+), 124 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm/tdp_pgtable.h
 create mode 100644 arch/x86/kvm/mmu/tdp_pgtable.c

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
new file mode 100644
index 000000000000..cebc4bc44b49
--- /dev/null
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KVM_TDP_PGTABLE_H
+#define __ASM_KVM_TDP_PGTABLE_H
+
+#include <linux/types.h>
+#include <linux/kvm_types.h>
+
+/*
+ * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-present on
+ * both AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a L1TF
+ * vulnerability.  Use only low bits to avoid 64-bit immediates.
+ */
+#define REMOVED_TDP_PTE		0x5a0ULL
+
+#define TDP_PTE_WRITABLE_MASK	BIT_ULL(1)
+#define TDP_PTE_HUGE_PAGE_MASK	BIT_ULL(7)
+#define TDP_PTE_PRESENT_MASK	BIT_ULL(11)
+
+static inline bool tdp_pte_is_writable(u64 pte)
+{
+	return pte & TDP_PTE_WRITABLE_MASK;
+}
+
+static inline bool tdp_pte_is_huge(u64 pte)
+{
+	return pte & TDP_PTE_HUGE_PAGE_MASK;
+}
+
+static inline bool tdp_pte_is_present(u64 pte)
+{
+	return pte & TDP_PTE_PRESENT_MASK;
+}
+
+bool tdp_pte_is_accessed(u64 pte);
+bool tdp_pte_is_dirty(u64 pte);
+bool tdp_pte_is_mmio(u64 pte);
+bool tdp_pte_is_volatile(u64 pte);
+
+static inline u64 tdp_pte_clear_writable(u64 pte)
+{
+	return pte & ~TDP_PTE_WRITABLE_MASK;
+}
+
+static inline u64 tdp_pte_clear_mmu_writable(u64 pte)
+{
+	extern u64 __read_mostly shadow_mmu_writable_mask;
+
+	return pte & ~(TDP_PTE_WRITABLE_MASK | shadow_mmu_writable_mask);
+}
+
+u64 tdp_pte_clear_dirty(u64 pte, bool force_wrprot);
+u64 tdp_pte_clear_accessed(u64 pte);
+
+kvm_pfn_t tdp_pte_to_pfn(u64 pte);
+
+void tdp_pte_check_leaf_invariants(u64 pte);
+
+#endif /* !__ASM_KVM_TDP_PGTABLE_H */
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 80e3fe184d17..c294ae51caba 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,7 +18,7 @@ ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
 endif
 
-kvm-$(CONFIG_X86_64) += mmu/tdp_iter.o mmu/tdp_mmu.o
+kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_iter.o mmu/tdp_mmu.o
 kvm-$(CONFIG_KVM_XEN)	+= xen.o
 kvm-$(CONFIG_KVM_SMM)	+= smm.o
 
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index fe4b626cb431..493e109f1105 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -10,6 +10,7 @@
 
 
 #include <linux/kvm_host.h>
+#include <kvm/tdp_pgtable.h>
 #include "mmu.h"
 #include "mmu_internal.h"
 #include "x86.h"
@@ -401,7 +402,7 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask)
 	 * not set any RWX bits.
 	 */
 	if (WARN_ON((mmio_value & mmio_mask) != mmio_value) ||
-	    WARN_ON(mmio_value && (REMOVED_SPTE & mmio_mask) == mmio_value))
+	    WARN_ON(mmio_value && (REMOVED_TDP_PTE & mmio_mask) == mmio_value))
 		mmio_value = 0;
 
 	if (!mmio_value)
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 4c5d518e3ac6..a1b7d7730583 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -183,28 +183,6 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
  */
 #define SHADOW_NONPRESENT_OR_RSVD_MASK_LEN 5
 
-/*
- * If a thread running without exclusive control of the MMU lock must perform a
- * multi-part operation on an SPTE, it can set the SPTE to REMOVED_SPTE as a
- * non-present intermediate value. Other threads which encounter this value
- * should not modify the SPTE.
- *
- * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-present on
- * both AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a L1TF
- * vulnerability.  Use only low bits to avoid 64-bit immediates.
- *
- * Only used by the TDP MMU.
- */
-#define REMOVED_SPTE	0x5a0ULL
-
-/* Removed SPTEs must not be misconstrued as shadow present PTEs. */
-static_assert(!(REMOVED_SPTE & SPTE_MMU_PRESENT_MASK));
-
-static inline bool is_removed_spte(u64 spte)
-{
-	return spte == REMOVED_SPTE;
-}
-
 /* Get an SPTE's index into its parent's page table (and the spt array). */
 static inline int spte_index(u64 *sptep)
 {
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index d6328dac9cd3..d5f024b7f6e4 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -69,10 +69,10 @@ tdp_ptep_t spte_to_child_pt(u64 spte, int level)
 	 * There's no child entry if this entry isn't present or is a
 	 * last-level entry.
 	 */
-	if (!is_shadow_present_pte(spte) || is_last_spte(spte, level))
+	if (!tdp_pte_is_present(spte) || tdp_pte_is_leaf(spte, level))
 		return NULL;
 
-	return (tdp_ptep_t)__va(spte_to_pfn(spte) << PAGE_SHIFT);
+	return (tdp_ptep_t)__va(tdp_pte_to_pfn(spte) << PAGE_SHIFT);
 }
 
 /*
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index bfac83ab52db..6e3c38532d1d 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -45,8 +45,9 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
 	 * logic needs to be reassessed if KVM were to use non-leaf Accessed
 	 * bits, e.g. to skip stepping down into child SPTEs when aging SPTEs.
 	 */
-	if (is_shadow_present_pte(old_spte) && is_last_spte(old_spte, level) &&
-	    spte_has_volatile_bits(old_spte))
+	if (tdp_pte_is_present(old_spte) &&
+	    tdp_pte_is_leaf(old_spte, level) &&
+	    tdp_pte_is_volatile(old_spte))
 		return kvm_tdp_mmu_write_spte_atomic(sptep, new_spte);
 
 	__kvm_tdp_mmu_write_spte(sptep, new_spte);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index a6d6e393c009..fea42bbac984 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -334,13 +334,13 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 
 static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
 {
-	if (!is_shadow_present_pte(old_spte) || !is_last_spte(old_spte, level))
+	if (!tdp_pte_is_present(old_spte) || !tdp_pte_is_leaf(old_spte, level))
 		return;
 
-	if (is_accessed_spte(old_spte) &&
-	    (!is_shadow_present_pte(new_spte) || !is_accessed_spte(new_spte) ||
-	     spte_to_pfn(old_spte) != spte_to_pfn(new_spte)))
-		kvm_set_pfn_accessed(spte_to_pfn(old_spte));
+	if (tdp_pte_is_accessed(old_spte) &&
+	    (!tdp_pte_is_present(new_spte) || !tdp_pte_is_accessed(new_spte) ||
+	     tdp_pte_to_pfn(old_spte) != tdp_pte_to_pfn(new_spte)))
+		kvm_set_pfn_accessed(tdp_pte_to_pfn(old_spte));
 }
 
 static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
@@ -352,10 +352,10 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 	if (level > PG_LEVEL_PTE)
 		return;
 
-	pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
+	pfn_changed = tdp_pte_to_pfn(old_spte) != tdp_pte_to_pfn(new_spte);
 
-	if ((!is_writable_pte(old_spte) || pfn_changed) &&
-	    is_writable_pte(new_spte)) {
+	if ((!tdp_pte_is_writable(old_spte) || pfn_changed) &&
+	    tdp_pte_is_writable(new_spte)) {
 		slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn);
 		mark_page_dirty_in_slot(kvm, slot, gfn);
 	}
@@ -445,8 +445,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * value to the removed SPTE value.
 			 */
 			for (;;) {
-				old_spte = kvm_tdp_mmu_write_spte_atomic(sptep, REMOVED_SPTE);
-				if (!is_removed_spte(old_spte))
+				old_spte = kvm_tdp_mmu_write_spte_atomic(sptep, REMOVED_TDP_PTE);
+				if (!tdp_pte_is_removed(old_spte))
 					break;
 				cpu_relax();
 			}
@@ -461,7 +461,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * unreachable.
 			 */
 			old_spte = kvm_tdp_mmu_read_spte(sptep);
-			if (!is_shadow_present_pte(old_spte))
+			if (!tdp_pte_is_present(old_spte))
 				continue;
 
 			/*
@@ -481,7 +481,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * strictly necessary for the same reason, but using
 			 * the remove SPTE value keeps the shared/exclusive
 			 * paths consistent and allows the handle_changed_spte()
-			 * call below to hardcode the new value to REMOVED_SPTE.
+			 * call below to hardcode the new value to
+			 * REMOVED_TDP_PTE.
 			 *
 			 * Note, even though dropping a Dirty bit is the only
 			 * scenario where a non-atomic update could result in a
@@ -493,10 +494,11 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * it here.
 			 */
 			old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte,
-							  REMOVED_SPTE, level);
+							  REMOVED_TDP_PTE,
+							  level);
 		}
 		handle_changed_spte(kvm, sp->role.as_id, gfn, old_spte,
-				    REMOVED_SPTE, level, shared);
+				    REMOVED_TDP_PTE, level, shared);
 	}
 
 	call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
@@ -521,11 +523,11 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 				  u64 old_spte, u64 new_spte, int level,
 				  bool shared)
 {
-	bool was_present = is_shadow_present_pte(old_spte);
-	bool is_present = is_shadow_present_pte(new_spte);
-	bool was_leaf = was_present && is_last_spte(old_spte, level);
-	bool is_leaf = is_present && is_last_spte(new_spte, level);
-	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
+	bool was_present = tdp_pte_is_present(old_spte);
+	bool is_present = tdp_pte_is_present(new_spte);
+	bool was_leaf = was_present && tdp_pte_is_leaf(old_spte, level);
+	bool is_leaf = is_present && tdp_pte_is_leaf(new_spte, level);
+	bool pfn_changed = tdp_pte_to_pfn(old_spte) != tdp_pte_to_pfn(new_spte);
 
 	WARN_ON(level > TDP_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_PTE);
@@ -560,7 +562,7 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
 
 	if (is_leaf)
-		check_spte_writable_invariants(new_spte);
+		tdp_pte_check_leaf_invariants(new_spte);
 
 	/*
 	 * The only times a SPTE should be changed from a non-present to
@@ -574,9 +576,9 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 		 * impact the guest since both the former and current SPTEs
 		 * are nonpresent.
 		 */
-		if (WARN_ON(!is_mmio_spte(old_spte) &&
-			    !is_mmio_spte(new_spte) &&
-			    !is_removed_spte(new_spte)))
+		if (WARN_ON(!tdp_pte_is_mmio(old_spte) &&
+			    !tdp_pte_is_mmio(new_spte) &&
+			    !tdp_pte_is_removed(new_spte)))
 			pr_err("Unexpected SPTE change! Nonpresent SPTEs\n"
 			       "should not be replaced with another,\n"
 			       "different nonpresent SPTE, unless one or both\n"
@@ -590,9 +592,9 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	if (is_leaf != was_leaf)
 		kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
 
-	if (was_leaf && is_dirty_spte(old_spte) &&
-	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
-		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
+	if (was_leaf && tdp_pte_is_dirty(old_spte) &&
+	    (!is_present || !tdp_pte_is_dirty(new_spte) || pfn_changed))
+		kvm_set_pfn_dirty(tdp_pte_to_pfn(old_spte));
 
 	/*
 	 * Recursively handle child PTs if the change removed a subtree from
@@ -645,7 +647,7 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm,
 	 * and pre-checking before inserting a new SPTE is advantageous as it
 	 * avoids unnecessary work.
 	 */
-	WARN_ON_ONCE(iter->yielded || is_removed_spte(iter->old_spte));
+	WARN_ON_ONCE(iter->yielded || tdp_pte_is_removed(iter->old_spte));
 
 	lockdep_assert_held_read(&kvm->mmu_lock);
 
@@ -674,7 +676,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
 	 * immediately installing a present entry in its place
 	 * before the TLBs are flushed.
 	 */
-	ret = tdp_mmu_set_spte_atomic(kvm, iter, REMOVED_SPTE);
+	ret = tdp_mmu_set_spte_atomic(kvm, iter, REMOVED_TDP_PTE);
 	if (ret)
 		return ret;
 
@@ -730,7 +732,7 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
 	 * should be used. If operating under the MMU lock in write mode, the
 	 * use of the removed SPTE should not be necessary.
 	 */
-	WARN_ON(is_removed_spte(old_spte) || is_removed_spte(new_spte));
+	WARN_ON(tdp_pte_is_removed(old_spte) || tdp_pte_is_removed(new_spte));
 
 	old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
 
@@ -781,8 +783,8 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
 
 #define tdp_root_for_each_leaf_pte(_iter, _root, _start, _end)	\
 	tdp_root_for_each_pte(_iter, _root, _start, _end)		\
-		if (!is_shadow_present_pte(_iter.old_spte) ||		\
-		    !is_last_spte(_iter.old_spte, _iter.level))		\
+		if (!tdp_pte_is_present(_iter.old_spte) ||		\
+		    !tdp_pte_is_leaf(_iter.old_spte, _iter.level))		\
 			continue;					\
 		else
 
@@ -858,7 +860,7 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte))
+		if (!tdp_pte_is_present(iter.old_spte))
 			continue;
 
 		if (iter.level > zap_level)
@@ -919,7 +921,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 		return false;
 
 	old_spte = kvm_tdp_mmu_read_spte(sp->ptep);
-	if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
+	if (WARN_ON_ONCE(!tdp_pte_is_present(old_spte)))
 		return false;
 
 	__tdp_mmu_set_spte(kvm, sp->role.as_id, sp->ptep, old_spte, 0,
@@ -953,8 +955,8 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
 			continue;
 		}
 
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level))
+		if (!tdp_pte_is_present(iter.old_spte) ||
+		    !tdp_pte_is_leaf(iter.old_spte, iter.level))
 			continue;
 
 		tdp_mmu_set_spte(kvm, &iter, 0);
@@ -1074,8 +1076,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 		ret = RET_PF_SPURIOUS;
 	else if (tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte))
 		return RET_PF_RETRY;
-	else if (is_shadow_present_pte(iter->old_spte) &&
-		 !is_last_spte(iter->old_spte, iter->level))
+	else if (tdp_pte_is_present(iter->old_spte) &&
+		 !tdp_pte_is_leaf(iter->old_spte, iter->level))
 		kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
 						   TDP_PAGES_PER_LEVEL(iter->level + 1));
 
@@ -1090,7 +1092,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 	}
 
 	/* If a MMIO SPTE is installed, the MMIO will need to be emulated. */
-	if (unlikely(is_mmio_spte(new_spte))) {
+	if (unlikely(tdp_pte_is_mmio(new_spte))) {
 		vcpu->stat.pf_mmio_spte_created++;
 		trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn,
 				     new_spte);
@@ -1168,12 +1170,12 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		 * If SPTE has been frozen by another thread, just give up and
 		 * retry, avoiding unnecessary page table allocation and free.
 		 */
-		if (is_removed_spte(iter.old_spte))
+		if (tdp_pte_is_removed(iter.old_spte))
 			goto retry;
 
 		/* Step down into the lower level page table if it exists. */
-		if (is_shadow_present_pte(iter.old_spte) &&
-		    !is_large_pte(iter.old_spte))
+		if (tdp_pte_is_present(iter.old_spte) &&
+		    !tdp_pte_is_huge(iter.old_spte))
 			continue;
 
 		/*
@@ -1185,7 +1187,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 
 		sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
 
-		if (is_shadow_present_pte(iter.old_spte))
+		if (tdp_pte_is_present(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
 		else
 			r = tdp_mmu_link_sp(kvm, &iter, sp, true);
@@ -1207,6 +1209,15 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		}
 	}
 
+	/*
+	 * Force the guest to retry the access if the upper level SPTEs aren't
+	 * in place, or if the target leaf SPTE is frozen by another CPU.
+	 */
+	if (iter.level != fault->goal_level || tdp_pte_is_removed(iter.old_spte)) {
+		rcu_read_unlock();
+		return RET_PF_RETRY;
+	}
+
 	ret = tdp_mmu_map_handle_target_level(vcpu, fault, &iter);
 
 retry:
@@ -1255,27 +1266,13 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm,
 static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter,
 			  struct kvm_gfn_range *range)
 {
-	u64 new_spte = 0;
+	u64 new_spte;
 
 	/* If we have a non-accessed entry we don't need to change the pte. */
-	if (!is_accessed_spte(iter->old_spte))
+	if (!tdp_pte_is_accessed(iter->old_spte))
 		return false;
 
-	new_spte = iter->old_spte;
-
-	if (spte_ad_enabled(new_spte)) {
-		new_spte &= ~shadow_accessed_mask;
-	} else {
-		/*
-		 * Capture the dirty status of the page, so that it doesn't get
-		 * lost when the SPTE is marked for access tracking.
-		 */
-		if (is_writable_pte(new_spte))
-			kvm_set_pfn_dirty(spte_to_pfn(new_spte));
-
-		new_spte = mark_spte_for_access_track(new_spte);
-	}
-
+	new_spte = tdp_pte_clear_accessed(iter->old_spte);
 	tdp_mmu_set_spte_no_acc_track(kvm, iter, new_spte);
 
 	return true;
@@ -1289,7 +1286,7 @@ bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter,
 			 struct kvm_gfn_range *range)
 {
-	return is_accessed_spte(iter->old_spte);
+	return tdp_pte_is_accessed(iter->old_spte);
 }
 
 bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
@@ -1306,7 +1303,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
 	WARN_ON(pte_huge(range->pte) || range->start + 1 != range->end);
 
 	if (iter->level != PG_LEVEL_PTE ||
-	    !is_shadow_present_pte(iter->old_spte))
+	    !tdp_pte_is_present(iter->old_spte))
 		return false;
 
 	/*
@@ -1364,12 +1361,12 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level) ||
-		    !(iter.old_spte & PT_WRITABLE_MASK))
+		if (!tdp_pte_is_present(iter.old_spte) ||
+		    !tdp_pte_is_leaf(iter.old_spte, iter.level) ||
+		    !tdp_pte_is_writable(iter.old_spte))
 			continue;
 
-		new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
+		new_spte = tdp_pte_clear_writable(iter.old_spte);
 
 		if (tdp_mmu_set_spte_atomic(kvm, &iter, new_spte))
 			goto retry;
@@ -1525,7 +1522,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte) || !is_large_pte(iter.old_spte))
+		if (!tdp_pte_is_present(iter.old_spte) || !tdp_pte_is_huge(iter.old_spte))
 			continue;
 
 		if (!sp) {
@@ -1607,20 +1604,12 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte))
+		if (!tdp_pte_is_present(iter.old_spte))
 			continue;
 
-		if (spte_ad_need_write_protect(iter.old_spte)) {
-			if (is_writable_pte(iter.old_spte))
-				new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
-			else
-				continue;
-		} else {
-			if (iter.old_spte & shadow_dirty_mask)
-				new_spte = iter.old_spte & ~shadow_dirty_mask;
-			else
-				continue;
-		}
+		new_spte = tdp_pte_clear_dirty(iter.old_spte, false);
+		if (new_spte == iter.old_spte)
+			continue;
 
 		if (tdp_mmu_set_spte_atomic(kvm, &iter, new_spte))
 			goto retry;
@@ -1680,17 +1669,9 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
 
 		mask &= ~(1UL << (iter.gfn - gfn));
 
-		if (wrprot || spte_ad_need_write_protect(iter.old_spte)) {
-			if (is_writable_pte(iter.old_spte))
-				new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
-			else
-				continue;
-		} else {
-			if (iter.old_spte & shadow_dirty_mask)
-				new_spte = iter.old_spte & ~shadow_dirty_mask;
-			else
-				continue;
-		}
+		new_spte = tdp_pte_clear_dirty(iter.old_spte, wrprot);
+		if (new_spte == iter.old_spte)
+			continue;
 
 		tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte);
 	}
@@ -1734,7 +1715,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 			continue;
 
 		if (iter.level > TDP_MAX_HUGEPAGE_LEVEL ||
-		    !is_shadow_present_pte(iter.old_spte))
+		    !tdp_pte_is_present(iter.old_spte))
 			continue;
 
 		/*
@@ -1742,7 +1723,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		 * a large page size, then its parent would have been zapped
 		 * instead of stepping down.
 		 */
-		if (is_last_spte(iter.old_spte, iter.level))
+		if (tdp_pte_is_leaf(iter.old_spte, iter.level))
 			continue;
 
 		/*
@@ -1800,13 +1781,11 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 	rcu_read_lock();
 
 	for_each_tdp_pte_min_level(iter, root, min_level, gfn, gfn + 1) {
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level))
+		if (!tdp_pte_is_present(iter.old_spte) ||
+		    !tdp_pte_is_leaf(iter.old_spte, iter.level))
 			continue;
 
-		new_spte = iter.old_spte &
-			~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
-
+		new_spte = tdp_pte_clear_mmu_writable(iter.old_spte);
 		if (new_spte == iter.old_spte)
 			break;
 
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
new file mode 100644
index 000000000000..cf3b692d8e21
--- /dev/null
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/kvm_types.h>
+#include <kvm/tdp_pgtable.h>
+
+#include "mmu.h"
+#include "spte.h"
+
+/* Removed SPTEs must not be misconstrued as shadow present PTEs. */
+static_assert(!(REMOVED_TDP_PTE & SPTE_MMU_PRESENT_MASK));
+
+static_assert(TDP_PTE_WRITABLE_MASK == PT_WRITABLE_MASK);
+static_assert(TDP_PTE_HUGE_PAGE_MASK == PT_PAGE_SIZE_MASK);
+static_assert(TDP_PTE_PRESENT_MASK == SPTE_MMU_PRESENT_MASK);
+
+bool tdp_pte_is_accessed(u64 pte)
+{
+	return is_accessed_spte(pte);
+}
+
+bool tdp_pte_is_dirty(u64 pte)
+{
+	return is_dirty_spte(pte);
+}
+
+bool tdp_pte_is_mmio(u64 pte)
+{
+	return is_mmio_spte(pte);
+}
+
+bool tdp_pte_is_volatile(u64 pte)
+{
+	return spte_has_volatile_bits(pte);
+}
+
+u64 tdp_pte_clear_dirty(u64 pte, bool force_wrprot)
+{
+	if (force_wrprot || spte_ad_need_write_protect(pte)) {
+		if (tdp_pte_is_writable(pte))
+			pte &= ~PT_WRITABLE_MASK;
+	} else if (pte & shadow_dirty_mask) {
+		pte &= ~shadow_dirty_mask;
+	}
+
+	return pte;
+}
+
+u64 tdp_pte_clear_accessed(u64 old_spte)
+{
+	if (spte_ad_enabled(old_spte))
+		return old_spte & ~shadow_accessed_mask;
+
+	/*
+	 * Capture the dirty status of the page, so that it doesn't get lost
+	 * when the SPTE is marked for access tracking.
+	 */
+	if (tdp_pte_is_writable(old_spte))
+		kvm_set_pfn_dirty(tdp_pte_to_pfn(old_spte));
+
+	return mark_spte_for_access_track(old_spte);
+}
+
+kvm_pfn_t tdp_pte_to_pfn(u64 pte)
+{
+	return spte_to_pfn(pte);
+}
+
+void tdp_pte_check_leaf_invariants(u64 pte)
+{
+	check_spte_writable_invariants(pte);
+}
+
diff --git a/include/kvm/tdp_pgtable.h b/include/kvm/tdp_pgtable.h
index 968be8d92350..a24c45ac7765 100644
--- a/include/kvm/tdp_pgtable.h
+++ b/include/kvm/tdp_pgtable.h
@@ -5,6 +5,8 @@
 #include <linux/log2.h>
 #include <linux/mm_types.h>
 
+#include <asm/kvm/tdp_pgtable.h>
+
 #define TDP_ROOT_MAX_LEVEL	5
 #define TDP_MAX_HUGEPAGE_LEVEL	PG_LEVEL_PUD
 #define TDP_PTES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
@@ -18,4 +20,20 @@
 #define TDP_PTE_INDEX(gfn, level) \
 	(((gfn) >> TDP_LEVEL_SHIFT(level)) & TDP_LEVEL_MASK)
 
+/*
+ * If a thread running without exclusive control of the MMU lock must perform a
+ * multi-part operation on a PTE, it can set the PTE to REMOVED_TDP_PTE as a
+ * non-present intermediate value. Other threads which encounter this value
+ * should not modify the PTE.
+ */
+static inline bool tdp_pte_is_removed(u64 pte)
+{
+	return pte == REMOVED_TDP_PTE;
+}
+
+static inline bool tdp_pte_is_leaf(u64 pte, int level)
+{
+	return tdp_pte_is_huge(pte) || level == PG_LEVEL_PTE;
+}
+
 #endif /* !__KVM_TDP_PGTABLE_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 15/37] KVM: x86/mmu: Add a common API for inspecting/modifying TDP PTEs
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce an API for inspecting and modifying TDP PTEs from common code.
This will be used in future commits to move the TDP MMU to common code.

Specifically, introduce the following API that can be used in common
code:

  /* Inspection API */
  tdp_pte_is_present()
  tdp_pte_is_writable()
  tdp_pte_is_huge()
  tdp_pte_is_leaf()
  tdp_pte_is_accessed()
  tdp_pte_is_dirty()
  tdp_pte_is_mmio()
  tdp_pte_is_volatile()
  tdp_pte_to_pfn()
  tdp_pte_check_leaf_invariants()

  /* Modification API */
  tdp_pte_clear_writable()
  tdp_pte_clear_mmu_writable()
  tdp_pte_clear_dirty()
  tdp_pte_clear_accessed()

Note that this does not cover constructing PTEs from scratch (e.g.
during page fault handling). This will be added in a subsequent commit.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h |  58 +++++++++
 arch/x86/kvm/Makefile                  |   2 +-
 arch/x86/kvm/mmu/spte.c                |   3 +-
 arch/x86/kvm/mmu/spte.h                |  22 ----
 arch/x86/kvm/mmu/tdp_iter.c            |   4 +-
 arch/x86/kvm/mmu/tdp_iter.h            |   5 +-
 arch/x86/kvm/mmu/tdp_mmu.c             | 171 +++++++++++--------------
 arch/x86/kvm/mmu/tdp_pgtable.c         |  72 +++++++++++
 include/kvm/tdp_pgtable.h              |  18 +++
 9 files changed, 231 insertions(+), 124 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm/tdp_pgtable.h
 create mode 100644 arch/x86/kvm/mmu/tdp_pgtable.c

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
new file mode 100644
index 000000000000..cebc4bc44b49
--- /dev/null
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KVM_TDP_PGTABLE_H
+#define __ASM_KVM_TDP_PGTABLE_H
+
+#include <linux/types.h>
+#include <linux/kvm_types.h>
+
+/*
+ * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-present on
+ * both AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a L1TF
+ * vulnerability.  Use only low bits to avoid 64-bit immediates.
+ */
+#define REMOVED_TDP_PTE		0x5a0ULL
+
+#define TDP_PTE_WRITABLE_MASK	BIT_ULL(1)
+#define TDP_PTE_HUGE_PAGE_MASK	BIT_ULL(7)
+#define TDP_PTE_PRESENT_MASK	BIT_ULL(11)
+
+static inline bool tdp_pte_is_writable(u64 pte)
+{
+	return pte & TDP_PTE_WRITABLE_MASK;
+}
+
+static inline bool tdp_pte_is_huge(u64 pte)
+{
+	return pte & TDP_PTE_HUGE_PAGE_MASK;
+}
+
+static inline bool tdp_pte_is_present(u64 pte)
+{
+	return pte & TDP_PTE_PRESENT_MASK;
+}
+
+bool tdp_pte_is_accessed(u64 pte);
+bool tdp_pte_is_dirty(u64 pte);
+bool tdp_pte_is_mmio(u64 pte);
+bool tdp_pte_is_volatile(u64 pte);
+
+static inline u64 tdp_pte_clear_writable(u64 pte)
+{
+	return pte & ~TDP_PTE_WRITABLE_MASK;
+}
+
+static inline u64 tdp_pte_clear_mmu_writable(u64 pte)
+{
+	extern u64 __read_mostly shadow_mmu_writable_mask;
+
+	return pte & ~(TDP_PTE_WRITABLE_MASK | shadow_mmu_writable_mask);
+}
+
+u64 tdp_pte_clear_dirty(u64 pte, bool force_wrprot);
+u64 tdp_pte_clear_accessed(u64 pte);
+
+kvm_pfn_t tdp_pte_to_pfn(u64 pte);
+
+void tdp_pte_check_leaf_invariants(u64 pte);
+
+#endif /* !__ASM_KVM_TDP_PGTABLE_H */
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 80e3fe184d17..c294ae51caba 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,7 +18,7 @@ ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
 endif
 
-kvm-$(CONFIG_X86_64) += mmu/tdp_iter.o mmu/tdp_mmu.o
+kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_iter.o mmu/tdp_mmu.o
 kvm-$(CONFIG_KVM_XEN)	+= xen.o
 kvm-$(CONFIG_KVM_SMM)	+= smm.o
 
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index fe4b626cb431..493e109f1105 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -10,6 +10,7 @@
 
 
 #include <linux/kvm_host.h>
+#include <kvm/tdp_pgtable.h>
 #include "mmu.h"
 #include "mmu_internal.h"
 #include "x86.h"
@@ -401,7 +402,7 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask)
 	 * not set any RWX bits.
 	 */
 	if (WARN_ON((mmio_value & mmio_mask) != mmio_value) ||
-	    WARN_ON(mmio_value && (REMOVED_SPTE & mmio_mask) == mmio_value))
+	    WARN_ON(mmio_value && (REMOVED_TDP_PTE & mmio_mask) == mmio_value))
 		mmio_value = 0;
 
 	if (!mmio_value)
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 4c5d518e3ac6..a1b7d7730583 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -183,28 +183,6 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
  */
 #define SHADOW_NONPRESENT_OR_RSVD_MASK_LEN 5
 
-/*
- * If a thread running without exclusive control of the MMU lock must perform a
- * multi-part operation on an SPTE, it can set the SPTE to REMOVED_SPTE as a
- * non-present intermediate value. Other threads which encounter this value
- * should not modify the SPTE.
- *
- * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-present on
- * both AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a L1TF
- * vulnerability.  Use only low bits to avoid 64-bit immediates.
- *
- * Only used by the TDP MMU.
- */
-#define REMOVED_SPTE	0x5a0ULL
-
-/* Removed SPTEs must not be misconstrued as shadow present PTEs. */
-static_assert(!(REMOVED_SPTE & SPTE_MMU_PRESENT_MASK));
-
-static inline bool is_removed_spte(u64 spte)
-{
-	return spte == REMOVED_SPTE;
-}
-
 /* Get an SPTE's index into its parent's page table (and the spt array). */
 static inline int spte_index(u64 *sptep)
 {
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index d6328dac9cd3..d5f024b7f6e4 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -69,10 +69,10 @@ tdp_ptep_t spte_to_child_pt(u64 spte, int level)
 	 * There's no child entry if this entry isn't present or is a
 	 * last-level entry.
 	 */
-	if (!is_shadow_present_pte(spte) || is_last_spte(spte, level))
+	if (!tdp_pte_is_present(spte) || tdp_pte_is_leaf(spte, level))
 		return NULL;
 
-	return (tdp_ptep_t)__va(spte_to_pfn(spte) << PAGE_SHIFT);
+	return (tdp_ptep_t)__va(tdp_pte_to_pfn(spte) << PAGE_SHIFT);
 }
 
 /*
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index bfac83ab52db..6e3c38532d1d 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -45,8 +45,9 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
 	 * logic needs to be reassessed if KVM were to use non-leaf Accessed
 	 * bits, e.g. to skip stepping down into child SPTEs when aging SPTEs.
 	 */
-	if (is_shadow_present_pte(old_spte) && is_last_spte(old_spte, level) &&
-	    spte_has_volatile_bits(old_spte))
+	if (tdp_pte_is_present(old_spte) &&
+	    tdp_pte_is_leaf(old_spte, level) &&
+	    tdp_pte_is_volatile(old_spte))
 		return kvm_tdp_mmu_write_spte_atomic(sptep, new_spte);
 
 	__kvm_tdp_mmu_write_spte(sptep, new_spte);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index a6d6e393c009..fea42bbac984 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -334,13 +334,13 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 
 static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
 {
-	if (!is_shadow_present_pte(old_spte) || !is_last_spte(old_spte, level))
+	if (!tdp_pte_is_present(old_spte) || !tdp_pte_is_leaf(old_spte, level))
 		return;
 
-	if (is_accessed_spte(old_spte) &&
-	    (!is_shadow_present_pte(new_spte) || !is_accessed_spte(new_spte) ||
-	     spte_to_pfn(old_spte) != spte_to_pfn(new_spte)))
-		kvm_set_pfn_accessed(spte_to_pfn(old_spte));
+	if (tdp_pte_is_accessed(old_spte) &&
+	    (!tdp_pte_is_present(new_spte) || !tdp_pte_is_accessed(new_spte) ||
+	     tdp_pte_to_pfn(old_spte) != tdp_pte_to_pfn(new_spte)))
+		kvm_set_pfn_accessed(tdp_pte_to_pfn(old_spte));
 }
 
 static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
@@ -352,10 +352,10 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 	if (level > PG_LEVEL_PTE)
 		return;
 
-	pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
+	pfn_changed = tdp_pte_to_pfn(old_spte) != tdp_pte_to_pfn(new_spte);
 
-	if ((!is_writable_pte(old_spte) || pfn_changed) &&
-	    is_writable_pte(new_spte)) {
+	if ((!tdp_pte_is_writable(old_spte) || pfn_changed) &&
+	    tdp_pte_is_writable(new_spte)) {
 		slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn);
 		mark_page_dirty_in_slot(kvm, slot, gfn);
 	}
@@ -445,8 +445,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * value to the removed SPTE value.
 			 */
 			for (;;) {
-				old_spte = kvm_tdp_mmu_write_spte_atomic(sptep, REMOVED_SPTE);
-				if (!is_removed_spte(old_spte))
+				old_spte = kvm_tdp_mmu_write_spte_atomic(sptep, REMOVED_TDP_PTE);
+				if (!tdp_pte_is_removed(old_spte))
 					break;
 				cpu_relax();
 			}
@@ -461,7 +461,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * unreachable.
 			 */
 			old_spte = kvm_tdp_mmu_read_spte(sptep);
-			if (!is_shadow_present_pte(old_spte))
+			if (!tdp_pte_is_present(old_spte))
 				continue;
 
 			/*
@@ -481,7 +481,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * strictly necessary for the same reason, but using
 			 * the remove SPTE value keeps the shared/exclusive
 			 * paths consistent and allows the handle_changed_spte()
-			 * call below to hardcode the new value to REMOVED_SPTE.
+			 * call below to hardcode the new value to
+			 * REMOVED_TDP_PTE.
 			 *
 			 * Note, even though dropping a Dirty bit is the only
 			 * scenario where a non-atomic update could result in a
@@ -493,10 +494,11 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * it here.
 			 */
 			old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte,
-							  REMOVED_SPTE, level);
+							  REMOVED_TDP_PTE,
+							  level);
 		}
 		handle_changed_spte(kvm, sp->role.as_id, gfn, old_spte,
-				    REMOVED_SPTE, level, shared);
+				    REMOVED_TDP_PTE, level, shared);
 	}
 
 	call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
@@ -521,11 +523,11 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 				  u64 old_spte, u64 new_spte, int level,
 				  bool shared)
 {
-	bool was_present = is_shadow_present_pte(old_spte);
-	bool is_present = is_shadow_present_pte(new_spte);
-	bool was_leaf = was_present && is_last_spte(old_spte, level);
-	bool is_leaf = is_present && is_last_spte(new_spte, level);
-	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
+	bool was_present = tdp_pte_is_present(old_spte);
+	bool is_present = tdp_pte_is_present(new_spte);
+	bool was_leaf = was_present && tdp_pte_is_leaf(old_spte, level);
+	bool is_leaf = is_present && tdp_pte_is_leaf(new_spte, level);
+	bool pfn_changed = tdp_pte_to_pfn(old_spte) != tdp_pte_to_pfn(new_spte);
 
 	WARN_ON(level > TDP_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_PTE);
@@ -560,7 +562,7 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
 
 	if (is_leaf)
-		check_spte_writable_invariants(new_spte);
+		tdp_pte_check_leaf_invariants(new_spte);
 
 	/*
 	 * The only times a SPTE should be changed from a non-present to
@@ -574,9 +576,9 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 		 * impact the guest since both the former and current SPTEs
 		 * are nonpresent.
 		 */
-		if (WARN_ON(!is_mmio_spte(old_spte) &&
-			    !is_mmio_spte(new_spte) &&
-			    !is_removed_spte(new_spte)))
+		if (WARN_ON(!tdp_pte_is_mmio(old_spte) &&
+			    !tdp_pte_is_mmio(new_spte) &&
+			    !tdp_pte_is_removed(new_spte)))
 			pr_err("Unexpected SPTE change! Nonpresent SPTEs\n"
 			       "should not be replaced with another,\n"
 			       "different nonpresent SPTE, unless one or both\n"
@@ -590,9 +592,9 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	if (is_leaf != was_leaf)
 		kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
 
-	if (was_leaf && is_dirty_spte(old_spte) &&
-	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
-		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
+	if (was_leaf && tdp_pte_is_dirty(old_spte) &&
+	    (!is_present || !tdp_pte_is_dirty(new_spte) || pfn_changed))
+		kvm_set_pfn_dirty(tdp_pte_to_pfn(old_spte));
 
 	/*
 	 * Recursively handle child PTs if the change removed a subtree from
@@ -645,7 +647,7 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm,
 	 * and pre-checking before inserting a new SPTE is advantageous as it
 	 * avoids unnecessary work.
 	 */
-	WARN_ON_ONCE(iter->yielded || is_removed_spte(iter->old_spte));
+	WARN_ON_ONCE(iter->yielded || tdp_pte_is_removed(iter->old_spte));
 
 	lockdep_assert_held_read(&kvm->mmu_lock);
 
@@ -674,7 +676,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
 	 * immediately installing a present entry in its place
 	 * before the TLBs are flushed.
 	 */
-	ret = tdp_mmu_set_spte_atomic(kvm, iter, REMOVED_SPTE);
+	ret = tdp_mmu_set_spte_atomic(kvm, iter, REMOVED_TDP_PTE);
 	if (ret)
 		return ret;
 
@@ -730,7 +732,7 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
 	 * should be used. If operating under the MMU lock in write mode, the
 	 * use of the removed SPTE should not be necessary.
 	 */
-	WARN_ON(is_removed_spte(old_spte) || is_removed_spte(new_spte));
+	WARN_ON(tdp_pte_is_removed(old_spte) || tdp_pte_is_removed(new_spte));
 
 	old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
 
@@ -781,8 +783,8 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
 
 #define tdp_root_for_each_leaf_pte(_iter, _root, _start, _end)	\
 	tdp_root_for_each_pte(_iter, _root, _start, _end)		\
-		if (!is_shadow_present_pte(_iter.old_spte) ||		\
-		    !is_last_spte(_iter.old_spte, _iter.level))		\
+		if (!tdp_pte_is_present(_iter.old_spte) ||		\
+		    !tdp_pte_is_leaf(_iter.old_spte, _iter.level))		\
 			continue;					\
 		else
 
@@ -858,7 +860,7 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte))
+		if (!tdp_pte_is_present(iter.old_spte))
 			continue;
 
 		if (iter.level > zap_level)
@@ -919,7 +921,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 		return false;
 
 	old_spte = kvm_tdp_mmu_read_spte(sp->ptep);
-	if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
+	if (WARN_ON_ONCE(!tdp_pte_is_present(old_spte)))
 		return false;
 
 	__tdp_mmu_set_spte(kvm, sp->role.as_id, sp->ptep, old_spte, 0,
@@ -953,8 +955,8 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
 			continue;
 		}
 
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level))
+		if (!tdp_pte_is_present(iter.old_spte) ||
+		    !tdp_pte_is_leaf(iter.old_spte, iter.level))
 			continue;
 
 		tdp_mmu_set_spte(kvm, &iter, 0);
@@ -1074,8 +1076,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 		ret = RET_PF_SPURIOUS;
 	else if (tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte))
 		return RET_PF_RETRY;
-	else if (is_shadow_present_pte(iter->old_spte) &&
-		 !is_last_spte(iter->old_spte, iter->level))
+	else if (tdp_pte_is_present(iter->old_spte) &&
+		 !tdp_pte_is_leaf(iter->old_spte, iter->level))
 		kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
 						   TDP_PAGES_PER_LEVEL(iter->level + 1));
 
@@ -1090,7 +1092,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 	}
 
 	/* If a MMIO SPTE is installed, the MMIO will need to be emulated. */
-	if (unlikely(is_mmio_spte(new_spte))) {
+	if (unlikely(tdp_pte_is_mmio(new_spte))) {
 		vcpu->stat.pf_mmio_spte_created++;
 		trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn,
 				     new_spte);
@@ -1168,12 +1170,12 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		 * If SPTE has been frozen by another thread, just give up and
 		 * retry, avoiding unnecessary page table allocation and free.
 		 */
-		if (is_removed_spte(iter.old_spte))
+		if (tdp_pte_is_removed(iter.old_spte))
 			goto retry;
 
 		/* Step down into the lower level page table if it exists. */
-		if (is_shadow_present_pte(iter.old_spte) &&
-		    !is_large_pte(iter.old_spte))
+		if (tdp_pte_is_present(iter.old_spte) &&
+		    !tdp_pte_is_huge(iter.old_spte))
 			continue;
 
 		/*
@@ -1185,7 +1187,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 
 		sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
 
-		if (is_shadow_present_pte(iter.old_spte))
+		if (tdp_pte_is_present(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
 		else
 			r = tdp_mmu_link_sp(kvm, &iter, sp, true);
@@ -1207,6 +1209,15 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		}
 	}
 
+	/*
+	 * Force the guest to retry the access if the upper level SPTEs aren't
+	 * in place, or if the target leaf SPTE is frozen by another CPU.
+	 */
+	if (iter.level != fault->goal_level || tdp_pte_is_removed(iter.old_spte)) {
+		rcu_read_unlock();
+		return RET_PF_RETRY;
+	}
+
 	ret = tdp_mmu_map_handle_target_level(vcpu, fault, &iter);
 
 retry:
@@ -1255,27 +1266,13 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm,
 static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter,
 			  struct kvm_gfn_range *range)
 {
-	u64 new_spte = 0;
+	u64 new_spte;
 
 	/* If we have a non-accessed entry we don't need to change the pte. */
-	if (!is_accessed_spte(iter->old_spte))
+	if (!tdp_pte_is_accessed(iter->old_spte))
 		return false;
 
-	new_spte = iter->old_spte;
-
-	if (spte_ad_enabled(new_spte)) {
-		new_spte &= ~shadow_accessed_mask;
-	} else {
-		/*
-		 * Capture the dirty status of the page, so that it doesn't get
-		 * lost when the SPTE is marked for access tracking.
-		 */
-		if (is_writable_pte(new_spte))
-			kvm_set_pfn_dirty(spte_to_pfn(new_spte));
-
-		new_spte = mark_spte_for_access_track(new_spte);
-	}
-
+	new_spte = tdp_pte_clear_accessed(iter->old_spte);
 	tdp_mmu_set_spte_no_acc_track(kvm, iter, new_spte);
 
 	return true;
@@ -1289,7 +1286,7 @@ bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter,
 			 struct kvm_gfn_range *range)
 {
-	return is_accessed_spte(iter->old_spte);
+	return tdp_pte_is_accessed(iter->old_spte);
 }
 
 bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
@@ -1306,7 +1303,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
 	WARN_ON(pte_huge(range->pte) || range->start + 1 != range->end);
 
 	if (iter->level != PG_LEVEL_PTE ||
-	    !is_shadow_present_pte(iter->old_spte))
+	    !tdp_pte_is_present(iter->old_spte))
 		return false;
 
 	/*
@@ -1364,12 +1361,12 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level) ||
-		    !(iter.old_spte & PT_WRITABLE_MASK))
+		if (!tdp_pte_is_present(iter.old_spte) ||
+		    !tdp_pte_is_leaf(iter.old_spte, iter.level) ||
+		    !tdp_pte_is_writable(iter.old_spte))
 			continue;
 
-		new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
+		new_spte = tdp_pte_clear_writable(iter.old_spte);
 
 		if (tdp_mmu_set_spte_atomic(kvm, &iter, new_spte))
 			goto retry;
@@ -1525,7 +1522,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte) || !is_large_pte(iter.old_spte))
+		if (!tdp_pte_is_present(iter.old_spte) || !tdp_pte_is_huge(iter.old_spte))
 			continue;
 
 		if (!sp) {
@@ -1607,20 +1604,12 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte))
+		if (!tdp_pte_is_present(iter.old_spte))
 			continue;
 
-		if (spte_ad_need_write_protect(iter.old_spte)) {
-			if (is_writable_pte(iter.old_spte))
-				new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
-			else
-				continue;
-		} else {
-			if (iter.old_spte & shadow_dirty_mask)
-				new_spte = iter.old_spte & ~shadow_dirty_mask;
-			else
-				continue;
-		}
+		new_spte = tdp_pte_clear_dirty(iter.old_spte, false);
+		if (new_spte == iter.old_spte)
+			continue;
 
 		if (tdp_mmu_set_spte_atomic(kvm, &iter, new_spte))
 			goto retry;
@@ -1680,17 +1669,9 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
 
 		mask &= ~(1UL << (iter.gfn - gfn));
 
-		if (wrprot || spte_ad_need_write_protect(iter.old_spte)) {
-			if (is_writable_pte(iter.old_spte))
-				new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
-			else
-				continue;
-		} else {
-			if (iter.old_spte & shadow_dirty_mask)
-				new_spte = iter.old_spte & ~shadow_dirty_mask;
-			else
-				continue;
-		}
+		new_spte = tdp_pte_clear_dirty(iter.old_spte, wrprot);
+		if (new_spte == iter.old_spte)
+			continue;
 
 		tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte);
 	}
@@ -1734,7 +1715,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 			continue;
 
 		if (iter.level > TDP_MAX_HUGEPAGE_LEVEL ||
-		    !is_shadow_present_pte(iter.old_spte))
+		    !tdp_pte_is_present(iter.old_spte))
 			continue;
 
 		/*
@@ -1742,7 +1723,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		 * a large page size, then its parent would have been zapped
 		 * instead of stepping down.
 		 */
-		if (is_last_spte(iter.old_spte, iter.level))
+		if (tdp_pte_is_leaf(iter.old_spte, iter.level))
 			continue;
 
 		/*
@@ -1800,13 +1781,11 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 	rcu_read_lock();
 
 	for_each_tdp_pte_min_level(iter, root, min_level, gfn, gfn + 1) {
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level))
+		if (!tdp_pte_is_present(iter.old_spte) ||
+		    !tdp_pte_is_leaf(iter.old_spte, iter.level))
 			continue;
 
-		new_spte = iter.old_spte &
-			~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
-
+		new_spte = tdp_pte_clear_mmu_writable(iter.old_spte);
 		if (new_spte == iter.old_spte)
 			break;
 
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
new file mode 100644
index 000000000000..cf3b692d8e21
--- /dev/null
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/kvm_types.h>
+#include <kvm/tdp_pgtable.h>
+
+#include "mmu.h"
+#include "spte.h"
+
+/* Removed SPTEs must not be misconstrued as shadow present PTEs. */
+static_assert(!(REMOVED_TDP_PTE & SPTE_MMU_PRESENT_MASK));
+
+static_assert(TDP_PTE_WRITABLE_MASK == PT_WRITABLE_MASK);
+static_assert(TDP_PTE_HUGE_PAGE_MASK == PT_PAGE_SIZE_MASK);
+static_assert(TDP_PTE_PRESENT_MASK == SPTE_MMU_PRESENT_MASK);
+
+bool tdp_pte_is_accessed(u64 pte)
+{
+	return is_accessed_spte(pte);
+}
+
+bool tdp_pte_is_dirty(u64 pte)
+{
+	return is_dirty_spte(pte);
+}
+
+bool tdp_pte_is_mmio(u64 pte)
+{
+	return is_mmio_spte(pte);
+}
+
+bool tdp_pte_is_volatile(u64 pte)
+{
+	return spte_has_volatile_bits(pte);
+}
+
+u64 tdp_pte_clear_dirty(u64 pte, bool force_wrprot)
+{
+	if (force_wrprot || spte_ad_need_write_protect(pte)) {
+		if (tdp_pte_is_writable(pte))
+			pte &= ~PT_WRITABLE_MASK;
+	} else if (pte & shadow_dirty_mask) {
+		pte &= ~shadow_dirty_mask;
+	}
+
+	return pte;
+}
+
+u64 tdp_pte_clear_accessed(u64 old_spte)
+{
+	if (spte_ad_enabled(old_spte))
+		return old_spte & ~shadow_accessed_mask;
+
+	/*
+	 * Capture the dirty status of the page, so that it doesn't get lost
+	 * when the SPTE is marked for access tracking.
+	 */
+	if (tdp_pte_is_writable(old_spte))
+		kvm_set_pfn_dirty(tdp_pte_to_pfn(old_spte));
+
+	return mark_spte_for_access_track(old_spte);
+}
+
+kvm_pfn_t tdp_pte_to_pfn(u64 pte)
+{
+	return spte_to_pfn(pte);
+}
+
+void tdp_pte_check_leaf_invariants(u64 pte)
+{
+	check_spte_writable_invariants(pte);
+}
+
diff --git a/include/kvm/tdp_pgtable.h b/include/kvm/tdp_pgtable.h
index 968be8d92350..a24c45ac7765 100644
--- a/include/kvm/tdp_pgtable.h
+++ b/include/kvm/tdp_pgtable.h
@@ -5,6 +5,8 @@
 #include <linux/log2.h>
 #include <linux/mm_types.h>
 
+#include <asm/kvm/tdp_pgtable.h>
+
 #define TDP_ROOT_MAX_LEVEL	5
 #define TDP_MAX_HUGEPAGE_LEVEL	PG_LEVEL_PUD
 #define TDP_PTES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
@@ -18,4 +20,20 @@
 #define TDP_PTE_INDEX(gfn, level) \
 	(((gfn) >> TDP_LEVEL_SHIFT(level)) & TDP_LEVEL_MASK)
 
+/*
+ * If a thread running without exclusive control of the MMU lock must perform a
+ * multi-part operation on a PTE, it can set the PTE to REMOVED_TDP_PTE as a
+ * non-present intermediate value. Other threads which encounter this value
+ * should not modify the PTE.
+ */
+static inline bool tdp_pte_is_removed(u64 pte)
+{
+	return pte == REMOVED_TDP_PTE;
+}
+
+static inline bool tdp_pte_is_leaf(u64 pte, int level)
+{
+	return tdp_pte_is_huge(pte) || level == PG_LEVEL_PTE;
+}
+
 #endif /* !__KVM_TDP_PGTABLE_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 15/37] KVM: x86/mmu: Add a common API for inspecting/modifying TDP PTEs
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce an API for inspecting and modifying TDP PTEs from common code.
This will be used in future commits to move the TDP MMU to common code.

Specifically, introduce the following API that can be used in common
code:

  /* Inspection API */
  tdp_pte_is_present()
  tdp_pte_is_writable()
  tdp_pte_is_huge()
  tdp_pte_is_leaf()
  tdp_pte_is_accessed()
  tdp_pte_is_dirty()
  tdp_pte_is_mmio()
  tdp_pte_is_volatile()
  tdp_pte_to_pfn()
  tdp_pte_check_leaf_invariants()

  /* Modification API */
  tdp_pte_clear_writable()
  tdp_pte_clear_mmu_writable()
  tdp_pte_clear_dirty()
  tdp_pte_clear_accessed()

Note that this does not cover constructing PTEs from scratch (e.g.
during page fault handling). This will be added in a subsequent commit.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h |  58 +++++++++
 arch/x86/kvm/Makefile                  |   2 +-
 arch/x86/kvm/mmu/spte.c                |   3 +-
 arch/x86/kvm/mmu/spte.h                |  22 ----
 arch/x86/kvm/mmu/tdp_iter.c            |   4 +-
 arch/x86/kvm/mmu/tdp_iter.h            |   5 +-
 arch/x86/kvm/mmu/tdp_mmu.c             | 171 +++++++++++--------------
 arch/x86/kvm/mmu/tdp_pgtable.c         |  72 +++++++++++
 include/kvm/tdp_pgtable.h              |  18 +++
 9 files changed, 231 insertions(+), 124 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm/tdp_pgtable.h
 create mode 100644 arch/x86/kvm/mmu/tdp_pgtable.c

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
new file mode 100644
index 000000000000..cebc4bc44b49
--- /dev/null
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KVM_TDP_PGTABLE_H
+#define __ASM_KVM_TDP_PGTABLE_H
+
+#include <linux/types.h>
+#include <linux/kvm_types.h>
+
+/*
+ * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-present on
+ * both AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a L1TF
+ * vulnerability.  Use only low bits to avoid 64-bit immediates.
+ */
+#define REMOVED_TDP_PTE		0x5a0ULL
+
+#define TDP_PTE_WRITABLE_MASK	BIT_ULL(1)
+#define TDP_PTE_HUGE_PAGE_MASK	BIT_ULL(7)
+#define TDP_PTE_PRESENT_MASK	BIT_ULL(11)
+
+static inline bool tdp_pte_is_writable(u64 pte)
+{
+	return pte & TDP_PTE_WRITABLE_MASK;
+}
+
+static inline bool tdp_pte_is_huge(u64 pte)
+{
+	return pte & TDP_PTE_HUGE_PAGE_MASK;
+}
+
+static inline bool tdp_pte_is_present(u64 pte)
+{
+	return pte & TDP_PTE_PRESENT_MASK;
+}
+
+bool tdp_pte_is_accessed(u64 pte);
+bool tdp_pte_is_dirty(u64 pte);
+bool tdp_pte_is_mmio(u64 pte);
+bool tdp_pte_is_volatile(u64 pte);
+
+static inline u64 tdp_pte_clear_writable(u64 pte)
+{
+	return pte & ~TDP_PTE_WRITABLE_MASK;
+}
+
+static inline u64 tdp_pte_clear_mmu_writable(u64 pte)
+{
+	extern u64 __read_mostly shadow_mmu_writable_mask;
+
+	return pte & ~(TDP_PTE_WRITABLE_MASK | shadow_mmu_writable_mask);
+}
+
+u64 tdp_pte_clear_dirty(u64 pte, bool force_wrprot);
+u64 tdp_pte_clear_accessed(u64 pte);
+
+kvm_pfn_t tdp_pte_to_pfn(u64 pte);
+
+void tdp_pte_check_leaf_invariants(u64 pte);
+
+#endif /* !__ASM_KVM_TDP_PGTABLE_H */
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 80e3fe184d17..c294ae51caba 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,7 +18,7 @@ ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
 endif
 
-kvm-$(CONFIG_X86_64) += mmu/tdp_iter.o mmu/tdp_mmu.o
+kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_iter.o mmu/tdp_mmu.o
 kvm-$(CONFIG_KVM_XEN)	+= xen.o
 kvm-$(CONFIG_KVM_SMM)	+= smm.o
 
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index fe4b626cb431..493e109f1105 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -10,6 +10,7 @@
 
 
 #include <linux/kvm_host.h>
+#include <kvm/tdp_pgtable.h>
 #include "mmu.h"
 #include "mmu_internal.h"
 #include "x86.h"
@@ -401,7 +402,7 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask)
 	 * not set any RWX bits.
 	 */
 	if (WARN_ON((mmio_value & mmio_mask) != mmio_value) ||
-	    WARN_ON(mmio_value && (REMOVED_SPTE & mmio_mask) == mmio_value))
+	    WARN_ON(mmio_value && (REMOVED_TDP_PTE & mmio_mask) == mmio_value))
 		mmio_value = 0;
 
 	if (!mmio_value)
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 4c5d518e3ac6..a1b7d7730583 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -183,28 +183,6 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
  */
 #define SHADOW_NONPRESENT_OR_RSVD_MASK_LEN 5
 
-/*
- * If a thread running without exclusive control of the MMU lock must perform a
- * multi-part operation on an SPTE, it can set the SPTE to REMOVED_SPTE as a
- * non-present intermediate value. Other threads which encounter this value
- * should not modify the SPTE.
- *
- * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-present on
- * both AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a L1TF
- * vulnerability.  Use only low bits to avoid 64-bit immediates.
- *
- * Only used by the TDP MMU.
- */
-#define REMOVED_SPTE	0x5a0ULL
-
-/* Removed SPTEs must not be misconstrued as shadow present PTEs. */
-static_assert(!(REMOVED_SPTE & SPTE_MMU_PRESENT_MASK));
-
-static inline bool is_removed_spte(u64 spte)
-{
-	return spte == REMOVED_SPTE;
-}
-
 /* Get an SPTE's index into its parent's page table (and the spt array). */
 static inline int spte_index(u64 *sptep)
 {
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index d6328dac9cd3..d5f024b7f6e4 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -69,10 +69,10 @@ tdp_ptep_t spte_to_child_pt(u64 spte, int level)
 	 * There's no child entry if this entry isn't present or is a
 	 * last-level entry.
 	 */
-	if (!is_shadow_present_pte(spte) || is_last_spte(spte, level))
+	if (!tdp_pte_is_present(spte) || tdp_pte_is_leaf(spte, level))
 		return NULL;
 
-	return (tdp_ptep_t)__va(spte_to_pfn(spte) << PAGE_SHIFT);
+	return (tdp_ptep_t)__va(tdp_pte_to_pfn(spte) << PAGE_SHIFT);
 }
 
 /*
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index bfac83ab52db..6e3c38532d1d 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -45,8 +45,9 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
 	 * logic needs to be reassessed if KVM were to use non-leaf Accessed
 	 * bits, e.g. to skip stepping down into child SPTEs when aging SPTEs.
 	 */
-	if (is_shadow_present_pte(old_spte) && is_last_spte(old_spte, level) &&
-	    spte_has_volatile_bits(old_spte))
+	if (tdp_pte_is_present(old_spte) &&
+	    tdp_pte_is_leaf(old_spte, level) &&
+	    tdp_pte_is_volatile(old_spte))
 		return kvm_tdp_mmu_write_spte_atomic(sptep, new_spte);
 
 	__kvm_tdp_mmu_write_spte(sptep, new_spte);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index a6d6e393c009..fea42bbac984 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -334,13 +334,13 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 
 static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
 {
-	if (!is_shadow_present_pte(old_spte) || !is_last_spte(old_spte, level))
+	if (!tdp_pte_is_present(old_spte) || !tdp_pte_is_leaf(old_spte, level))
 		return;
 
-	if (is_accessed_spte(old_spte) &&
-	    (!is_shadow_present_pte(new_spte) || !is_accessed_spte(new_spte) ||
-	     spte_to_pfn(old_spte) != spte_to_pfn(new_spte)))
-		kvm_set_pfn_accessed(spte_to_pfn(old_spte));
+	if (tdp_pte_is_accessed(old_spte) &&
+	    (!tdp_pte_is_present(new_spte) || !tdp_pte_is_accessed(new_spte) ||
+	     tdp_pte_to_pfn(old_spte) != tdp_pte_to_pfn(new_spte)))
+		kvm_set_pfn_accessed(tdp_pte_to_pfn(old_spte));
 }
 
 static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
@@ -352,10 +352,10 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 	if (level > PG_LEVEL_PTE)
 		return;
 
-	pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
+	pfn_changed = tdp_pte_to_pfn(old_spte) != tdp_pte_to_pfn(new_spte);
 
-	if ((!is_writable_pte(old_spte) || pfn_changed) &&
-	    is_writable_pte(new_spte)) {
+	if ((!tdp_pte_is_writable(old_spte) || pfn_changed) &&
+	    tdp_pte_is_writable(new_spte)) {
 		slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn);
 		mark_page_dirty_in_slot(kvm, slot, gfn);
 	}
@@ -445,8 +445,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * value to the removed SPTE value.
 			 */
 			for (;;) {
-				old_spte = kvm_tdp_mmu_write_spte_atomic(sptep, REMOVED_SPTE);
-				if (!is_removed_spte(old_spte))
+				old_spte = kvm_tdp_mmu_write_spte_atomic(sptep, REMOVED_TDP_PTE);
+				if (!tdp_pte_is_removed(old_spte))
 					break;
 				cpu_relax();
 			}
@@ -461,7 +461,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * unreachable.
 			 */
 			old_spte = kvm_tdp_mmu_read_spte(sptep);
-			if (!is_shadow_present_pte(old_spte))
+			if (!tdp_pte_is_present(old_spte))
 				continue;
 
 			/*
@@ -481,7 +481,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * strictly necessary for the same reason, but using
 			 * the remove SPTE value keeps the shared/exclusive
 			 * paths consistent and allows the handle_changed_spte()
-			 * call below to hardcode the new value to REMOVED_SPTE.
+			 * call below to hardcode the new value to
+			 * REMOVED_TDP_PTE.
 			 *
 			 * Note, even though dropping a Dirty bit is the only
 			 * scenario where a non-atomic update could result in a
@@ -493,10 +494,11 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * it here.
 			 */
 			old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte,
-							  REMOVED_SPTE, level);
+							  REMOVED_TDP_PTE,
+							  level);
 		}
 		handle_changed_spte(kvm, sp->role.as_id, gfn, old_spte,
-				    REMOVED_SPTE, level, shared);
+				    REMOVED_TDP_PTE, level, shared);
 	}
 
 	call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
@@ -521,11 +523,11 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 				  u64 old_spte, u64 new_spte, int level,
 				  bool shared)
 {
-	bool was_present = is_shadow_present_pte(old_spte);
-	bool is_present = is_shadow_present_pte(new_spte);
-	bool was_leaf = was_present && is_last_spte(old_spte, level);
-	bool is_leaf = is_present && is_last_spte(new_spte, level);
-	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
+	bool was_present = tdp_pte_is_present(old_spte);
+	bool is_present = tdp_pte_is_present(new_spte);
+	bool was_leaf = was_present && tdp_pte_is_leaf(old_spte, level);
+	bool is_leaf = is_present && tdp_pte_is_leaf(new_spte, level);
+	bool pfn_changed = tdp_pte_to_pfn(old_spte) != tdp_pte_to_pfn(new_spte);
 
 	WARN_ON(level > TDP_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_PTE);
@@ -560,7 +562,7 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
 
 	if (is_leaf)
-		check_spte_writable_invariants(new_spte);
+		tdp_pte_check_leaf_invariants(new_spte);
 
 	/*
 	 * The only times a SPTE should be changed from a non-present to
@@ -574,9 +576,9 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 		 * impact the guest since both the former and current SPTEs
 		 * are nonpresent.
 		 */
-		if (WARN_ON(!is_mmio_spte(old_spte) &&
-			    !is_mmio_spte(new_spte) &&
-			    !is_removed_spte(new_spte)))
+		if (WARN_ON(!tdp_pte_is_mmio(old_spte) &&
+			    !tdp_pte_is_mmio(new_spte) &&
+			    !tdp_pte_is_removed(new_spte)))
 			pr_err("Unexpected SPTE change! Nonpresent SPTEs\n"
 			       "should not be replaced with another,\n"
 			       "different nonpresent SPTE, unless one or both\n"
@@ -590,9 +592,9 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	if (is_leaf != was_leaf)
 		kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
 
-	if (was_leaf && is_dirty_spte(old_spte) &&
-	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
-		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
+	if (was_leaf && tdp_pte_is_dirty(old_spte) &&
+	    (!is_present || !tdp_pte_is_dirty(new_spte) || pfn_changed))
+		kvm_set_pfn_dirty(tdp_pte_to_pfn(old_spte));
 
 	/*
 	 * Recursively handle child PTs if the change removed a subtree from
@@ -645,7 +647,7 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm,
 	 * and pre-checking before inserting a new SPTE is advantageous as it
 	 * avoids unnecessary work.
 	 */
-	WARN_ON_ONCE(iter->yielded || is_removed_spte(iter->old_spte));
+	WARN_ON_ONCE(iter->yielded || tdp_pte_is_removed(iter->old_spte));
 
 	lockdep_assert_held_read(&kvm->mmu_lock);
 
@@ -674,7 +676,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
 	 * immediately installing a present entry in its place
 	 * before the TLBs are flushed.
 	 */
-	ret = tdp_mmu_set_spte_atomic(kvm, iter, REMOVED_SPTE);
+	ret = tdp_mmu_set_spte_atomic(kvm, iter, REMOVED_TDP_PTE);
 	if (ret)
 		return ret;
 
@@ -730,7 +732,7 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
 	 * should be used. If operating under the MMU lock in write mode, the
 	 * use of the removed SPTE should not be necessary.
 	 */
-	WARN_ON(is_removed_spte(old_spte) || is_removed_spte(new_spte));
+	WARN_ON(tdp_pte_is_removed(old_spte) || tdp_pte_is_removed(new_spte));
 
 	old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
 
@@ -781,8 +783,8 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
 
 #define tdp_root_for_each_leaf_pte(_iter, _root, _start, _end)	\
 	tdp_root_for_each_pte(_iter, _root, _start, _end)		\
-		if (!is_shadow_present_pte(_iter.old_spte) ||		\
-		    !is_last_spte(_iter.old_spte, _iter.level))		\
+		if (!tdp_pte_is_present(_iter.old_spte) ||		\
+		    !tdp_pte_is_leaf(_iter.old_spte, _iter.level))		\
 			continue;					\
 		else
 
@@ -858,7 +860,7 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte))
+		if (!tdp_pte_is_present(iter.old_spte))
 			continue;
 
 		if (iter.level > zap_level)
@@ -919,7 +921,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 		return false;
 
 	old_spte = kvm_tdp_mmu_read_spte(sp->ptep);
-	if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
+	if (WARN_ON_ONCE(!tdp_pte_is_present(old_spte)))
 		return false;
 
 	__tdp_mmu_set_spte(kvm, sp->role.as_id, sp->ptep, old_spte, 0,
@@ -953,8 +955,8 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
 			continue;
 		}
 
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level))
+		if (!tdp_pte_is_present(iter.old_spte) ||
+		    !tdp_pte_is_leaf(iter.old_spte, iter.level))
 			continue;
 
 		tdp_mmu_set_spte(kvm, &iter, 0);
@@ -1074,8 +1076,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 		ret = RET_PF_SPURIOUS;
 	else if (tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte))
 		return RET_PF_RETRY;
-	else if (is_shadow_present_pte(iter->old_spte) &&
-		 !is_last_spte(iter->old_spte, iter->level))
+	else if (tdp_pte_is_present(iter->old_spte) &&
+		 !tdp_pte_is_leaf(iter->old_spte, iter->level))
 		kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
 						   TDP_PAGES_PER_LEVEL(iter->level + 1));
 
@@ -1090,7 +1092,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 	}
 
 	/* If a MMIO SPTE is installed, the MMIO will need to be emulated. */
-	if (unlikely(is_mmio_spte(new_spte))) {
+	if (unlikely(tdp_pte_is_mmio(new_spte))) {
 		vcpu->stat.pf_mmio_spte_created++;
 		trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn,
 				     new_spte);
@@ -1168,12 +1170,12 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		 * If SPTE has been frozen by another thread, just give up and
 		 * retry, avoiding unnecessary page table allocation and free.
 		 */
-		if (is_removed_spte(iter.old_spte))
+		if (tdp_pte_is_removed(iter.old_spte))
 			goto retry;
 
 		/* Step down into the lower level page table if it exists. */
-		if (is_shadow_present_pte(iter.old_spte) &&
-		    !is_large_pte(iter.old_spte))
+		if (tdp_pte_is_present(iter.old_spte) &&
+		    !tdp_pte_is_huge(iter.old_spte))
 			continue;
 
 		/*
@@ -1185,7 +1187,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 
 		sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
 
-		if (is_shadow_present_pte(iter.old_spte))
+		if (tdp_pte_is_present(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
 		else
 			r = tdp_mmu_link_sp(kvm, &iter, sp, true);
@@ -1207,6 +1209,15 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		}
 	}
 
+	/*
+	 * Force the guest to retry the access if the upper level SPTEs aren't
+	 * in place, or if the target leaf SPTE is frozen by another CPU.
+	 */
+	if (iter.level != fault->goal_level || tdp_pte_is_removed(iter.old_spte)) {
+		rcu_read_unlock();
+		return RET_PF_RETRY;
+	}
+
 	ret = tdp_mmu_map_handle_target_level(vcpu, fault, &iter);
 
 retry:
@@ -1255,27 +1266,13 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm,
 static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter,
 			  struct kvm_gfn_range *range)
 {
-	u64 new_spte = 0;
+	u64 new_spte;
 
 	/* If we have a non-accessed entry we don't need to change the pte. */
-	if (!is_accessed_spte(iter->old_spte))
+	if (!tdp_pte_is_accessed(iter->old_spte))
 		return false;
 
-	new_spte = iter->old_spte;
-
-	if (spte_ad_enabled(new_spte)) {
-		new_spte &= ~shadow_accessed_mask;
-	} else {
-		/*
-		 * Capture the dirty status of the page, so that it doesn't get
-		 * lost when the SPTE is marked for access tracking.
-		 */
-		if (is_writable_pte(new_spte))
-			kvm_set_pfn_dirty(spte_to_pfn(new_spte));
-
-		new_spte = mark_spte_for_access_track(new_spte);
-	}
-
+	new_spte = tdp_pte_clear_accessed(iter->old_spte);
 	tdp_mmu_set_spte_no_acc_track(kvm, iter, new_spte);
 
 	return true;
@@ -1289,7 +1286,7 @@ bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter,
 			 struct kvm_gfn_range *range)
 {
-	return is_accessed_spte(iter->old_spte);
+	return tdp_pte_is_accessed(iter->old_spte);
 }
 
 bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
@@ -1306,7 +1303,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
 	WARN_ON(pte_huge(range->pte) || range->start + 1 != range->end);
 
 	if (iter->level != PG_LEVEL_PTE ||
-	    !is_shadow_present_pte(iter->old_spte))
+	    !tdp_pte_is_present(iter->old_spte))
 		return false;
 
 	/*
@@ -1364,12 +1361,12 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level) ||
-		    !(iter.old_spte & PT_WRITABLE_MASK))
+		if (!tdp_pte_is_present(iter.old_spte) ||
+		    !tdp_pte_is_leaf(iter.old_spte, iter.level) ||
+		    !tdp_pte_is_writable(iter.old_spte))
 			continue;
 
-		new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
+		new_spte = tdp_pte_clear_writable(iter.old_spte);
 
 		if (tdp_mmu_set_spte_atomic(kvm, &iter, new_spte))
 			goto retry;
@@ -1525,7 +1522,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte) || !is_large_pte(iter.old_spte))
+		if (!tdp_pte_is_present(iter.old_spte) || !tdp_pte_is_huge(iter.old_spte))
 			continue;
 
 		if (!sp) {
@@ -1607,20 +1604,12 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte))
+		if (!tdp_pte_is_present(iter.old_spte))
 			continue;
 
-		if (spte_ad_need_write_protect(iter.old_spte)) {
-			if (is_writable_pte(iter.old_spte))
-				new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
-			else
-				continue;
-		} else {
-			if (iter.old_spte & shadow_dirty_mask)
-				new_spte = iter.old_spte & ~shadow_dirty_mask;
-			else
-				continue;
-		}
+		new_spte = tdp_pte_clear_dirty(iter.old_spte, false);
+		if (new_spte == iter.old_spte)
+			continue;
 
 		if (tdp_mmu_set_spte_atomic(kvm, &iter, new_spte))
 			goto retry;
@@ -1680,17 +1669,9 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
 
 		mask &= ~(1UL << (iter.gfn - gfn));
 
-		if (wrprot || spte_ad_need_write_protect(iter.old_spte)) {
-			if (is_writable_pte(iter.old_spte))
-				new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
-			else
-				continue;
-		} else {
-			if (iter.old_spte & shadow_dirty_mask)
-				new_spte = iter.old_spte & ~shadow_dirty_mask;
-			else
-				continue;
-		}
+		new_spte = tdp_pte_clear_dirty(iter.old_spte, wrprot);
+		if (new_spte == iter.old_spte)
+			continue;
 
 		tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte);
 	}
@@ -1734,7 +1715,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 			continue;
 
 		if (iter.level > TDP_MAX_HUGEPAGE_LEVEL ||
-		    !is_shadow_present_pte(iter.old_spte))
+		    !tdp_pte_is_present(iter.old_spte))
 			continue;
 
 		/*
@@ -1742,7 +1723,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		 * a large page size, then its parent would have been zapped
 		 * instead of stepping down.
 		 */
-		if (is_last_spte(iter.old_spte, iter.level))
+		if (tdp_pte_is_leaf(iter.old_spte, iter.level))
 			continue;
 
 		/*
@@ -1800,13 +1781,11 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 	rcu_read_lock();
 
 	for_each_tdp_pte_min_level(iter, root, min_level, gfn, gfn + 1) {
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level))
+		if (!tdp_pte_is_present(iter.old_spte) ||
+		    !tdp_pte_is_leaf(iter.old_spte, iter.level))
 			continue;
 
-		new_spte = iter.old_spte &
-			~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
-
+		new_spte = tdp_pte_clear_mmu_writable(iter.old_spte);
 		if (new_spte == iter.old_spte)
 			break;
 
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
new file mode 100644
index 000000000000..cf3b692d8e21
--- /dev/null
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/kvm_types.h>
+#include <kvm/tdp_pgtable.h>
+
+#include "mmu.h"
+#include "spte.h"
+
+/* Removed SPTEs must not be misconstrued as shadow present PTEs. */
+static_assert(!(REMOVED_TDP_PTE & SPTE_MMU_PRESENT_MASK));
+
+static_assert(TDP_PTE_WRITABLE_MASK == PT_WRITABLE_MASK);
+static_assert(TDP_PTE_HUGE_PAGE_MASK == PT_PAGE_SIZE_MASK);
+static_assert(TDP_PTE_PRESENT_MASK == SPTE_MMU_PRESENT_MASK);
+
+bool tdp_pte_is_accessed(u64 pte)
+{
+	return is_accessed_spte(pte);
+}
+
+bool tdp_pte_is_dirty(u64 pte)
+{
+	return is_dirty_spte(pte);
+}
+
+bool tdp_pte_is_mmio(u64 pte)
+{
+	return is_mmio_spte(pte);
+}
+
+bool tdp_pte_is_volatile(u64 pte)
+{
+	return spte_has_volatile_bits(pte);
+}
+
+u64 tdp_pte_clear_dirty(u64 pte, bool force_wrprot)
+{
+	if (force_wrprot || spte_ad_need_write_protect(pte)) {
+		if (tdp_pte_is_writable(pte))
+			pte &= ~PT_WRITABLE_MASK;
+	} else if (pte & shadow_dirty_mask) {
+		pte &= ~shadow_dirty_mask;
+	}
+
+	return pte;
+}
+
+u64 tdp_pte_clear_accessed(u64 old_spte)
+{
+	if (spte_ad_enabled(old_spte))
+		return old_spte & ~shadow_accessed_mask;
+
+	/*
+	 * Capture the dirty status of the page, so that it doesn't get lost
+	 * when the SPTE is marked for access tracking.
+	 */
+	if (tdp_pte_is_writable(old_spte))
+		kvm_set_pfn_dirty(tdp_pte_to_pfn(old_spte));
+
+	return mark_spte_for_access_track(old_spte);
+}
+
+kvm_pfn_t tdp_pte_to_pfn(u64 pte)
+{
+	return spte_to_pfn(pte);
+}
+
+void tdp_pte_check_leaf_invariants(u64 pte)
+{
+	check_spte_writable_invariants(pte);
+}
+
diff --git a/include/kvm/tdp_pgtable.h b/include/kvm/tdp_pgtable.h
index 968be8d92350..a24c45ac7765 100644
--- a/include/kvm/tdp_pgtable.h
+++ b/include/kvm/tdp_pgtable.h
@@ -5,6 +5,8 @@
 #include <linux/log2.h>
 #include <linux/mm_types.h>
 
+#include <asm/kvm/tdp_pgtable.h>
+
 #define TDP_ROOT_MAX_LEVEL	5
 #define TDP_MAX_HUGEPAGE_LEVEL	PG_LEVEL_PUD
 #define TDP_PTES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
@@ -18,4 +20,20 @@
 #define TDP_PTE_INDEX(gfn, level) \
 	(((gfn) >> TDP_LEVEL_SHIFT(level)) & TDP_LEVEL_MASK)
 
+/*
+ * If a thread running without exclusive control of the MMU lock must perform a
+ * multi-part operation on a PTE, it can set the PTE to REMOVED_TDP_PTE as a
+ * non-present intermediate value. Other threads which encounter this value
+ * should not modify the PTE.
+ */
+static inline bool tdp_pte_is_removed(u64 pte)
+{
+	return pte == REMOVED_TDP_PTE;
+}
+
+static inline bool tdp_pte_is_leaf(u64 pte, int level)
+{
+	return tdp_pte_is_huge(pte) || level == PG_LEVEL_PTE;
+}
+
 #endif /* !__KVM_TDP_PGTABLE_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 15/37] KVM: x86/mmu: Add a common API for inspecting/modifying TDP PTEs
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce an API for inspecting and modifying TDP PTEs from common code.
This will be used in future commits to move the TDP MMU to common code.

Specifically, introduce the following API that can be used in common
code:

  /* Inspection API */
  tdp_pte_is_present()
  tdp_pte_is_writable()
  tdp_pte_is_huge()
  tdp_pte_is_leaf()
  tdp_pte_is_accessed()
  tdp_pte_is_dirty()
  tdp_pte_is_mmio()
  tdp_pte_is_volatile()
  tdp_pte_to_pfn()
  tdp_pte_check_leaf_invariants()

  /* Modification API */
  tdp_pte_clear_writable()
  tdp_pte_clear_mmu_writable()
  tdp_pte_clear_dirty()
  tdp_pte_clear_accessed()

Note that this does not cover constructing PTEs from scratch (e.g.
during page fault handling). This will be added in a subsequent commit.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h |  58 +++++++++
 arch/x86/kvm/Makefile                  |   2 +-
 arch/x86/kvm/mmu/spte.c                |   3 +-
 arch/x86/kvm/mmu/spte.h                |  22 ----
 arch/x86/kvm/mmu/tdp_iter.c            |   4 +-
 arch/x86/kvm/mmu/tdp_iter.h            |   5 +-
 arch/x86/kvm/mmu/tdp_mmu.c             | 171 +++++++++++--------------
 arch/x86/kvm/mmu/tdp_pgtable.c         |  72 +++++++++++
 include/kvm/tdp_pgtable.h              |  18 +++
 9 files changed, 231 insertions(+), 124 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm/tdp_pgtable.h
 create mode 100644 arch/x86/kvm/mmu/tdp_pgtable.c

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
new file mode 100644
index 000000000000..cebc4bc44b49
--- /dev/null
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KVM_TDP_PGTABLE_H
+#define __ASM_KVM_TDP_PGTABLE_H
+
+#include <linux/types.h>
+#include <linux/kvm_types.h>
+
+/*
+ * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-present on
+ * both AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a L1TF
+ * vulnerability.  Use only low bits to avoid 64-bit immediates.
+ */
+#define REMOVED_TDP_PTE		0x5a0ULL
+
+#define TDP_PTE_WRITABLE_MASK	BIT_ULL(1)
+#define TDP_PTE_HUGE_PAGE_MASK	BIT_ULL(7)
+#define TDP_PTE_PRESENT_MASK	BIT_ULL(11)
+
+static inline bool tdp_pte_is_writable(u64 pte)
+{
+	return pte & TDP_PTE_WRITABLE_MASK;
+}
+
+static inline bool tdp_pte_is_huge(u64 pte)
+{
+	return pte & TDP_PTE_HUGE_PAGE_MASK;
+}
+
+static inline bool tdp_pte_is_present(u64 pte)
+{
+	return pte & TDP_PTE_PRESENT_MASK;
+}
+
+bool tdp_pte_is_accessed(u64 pte);
+bool tdp_pte_is_dirty(u64 pte);
+bool tdp_pte_is_mmio(u64 pte);
+bool tdp_pte_is_volatile(u64 pte);
+
+static inline u64 tdp_pte_clear_writable(u64 pte)
+{
+	return pte & ~TDP_PTE_WRITABLE_MASK;
+}
+
+static inline u64 tdp_pte_clear_mmu_writable(u64 pte)
+{
+	extern u64 __read_mostly shadow_mmu_writable_mask;
+
+	return pte & ~(TDP_PTE_WRITABLE_MASK | shadow_mmu_writable_mask);
+}
+
+u64 tdp_pte_clear_dirty(u64 pte, bool force_wrprot);
+u64 tdp_pte_clear_accessed(u64 pte);
+
+kvm_pfn_t tdp_pte_to_pfn(u64 pte);
+
+void tdp_pte_check_leaf_invariants(u64 pte);
+
+#endif /* !__ASM_KVM_TDP_PGTABLE_H */
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 80e3fe184d17..c294ae51caba 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,7 +18,7 @@ ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
 endif
 
-kvm-$(CONFIG_X86_64) += mmu/tdp_iter.o mmu/tdp_mmu.o
+kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_iter.o mmu/tdp_mmu.o
 kvm-$(CONFIG_KVM_XEN)	+= xen.o
 kvm-$(CONFIG_KVM_SMM)	+= smm.o
 
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index fe4b626cb431..493e109f1105 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -10,6 +10,7 @@
 
 
 #include <linux/kvm_host.h>
+#include <kvm/tdp_pgtable.h>
 #include "mmu.h"
 #include "mmu_internal.h"
 #include "x86.h"
@@ -401,7 +402,7 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask)
 	 * not set any RWX bits.
 	 */
 	if (WARN_ON((mmio_value & mmio_mask) != mmio_value) ||
-	    WARN_ON(mmio_value && (REMOVED_SPTE & mmio_mask) == mmio_value))
+	    WARN_ON(mmio_value && (REMOVED_TDP_PTE & mmio_mask) == mmio_value))
 		mmio_value = 0;
 
 	if (!mmio_value)
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 4c5d518e3ac6..a1b7d7730583 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -183,28 +183,6 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
  */
 #define SHADOW_NONPRESENT_OR_RSVD_MASK_LEN 5
 
-/*
- * If a thread running without exclusive control of the MMU lock must perform a
- * multi-part operation on an SPTE, it can set the SPTE to REMOVED_SPTE as a
- * non-present intermediate value. Other threads which encounter this value
- * should not modify the SPTE.
- *
- * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-present on
- * both AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a L1TF
- * vulnerability.  Use only low bits to avoid 64-bit immediates.
- *
- * Only used by the TDP MMU.
- */
-#define REMOVED_SPTE	0x5a0ULL
-
-/* Removed SPTEs must not be misconstrued as shadow present PTEs. */
-static_assert(!(REMOVED_SPTE & SPTE_MMU_PRESENT_MASK));
-
-static inline bool is_removed_spte(u64 spte)
-{
-	return spte == REMOVED_SPTE;
-}
-
 /* Get an SPTE's index into its parent's page table (and the spt array). */
 static inline int spte_index(u64 *sptep)
 {
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index d6328dac9cd3..d5f024b7f6e4 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -69,10 +69,10 @@ tdp_ptep_t spte_to_child_pt(u64 spte, int level)
 	 * There's no child entry if this entry isn't present or is a
 	 * last-level entry.
 	 */
-	if (!is_shadow_present_pte(spte) || is_last_spte(spte, level))
+	if (!tdp_pte_is_present(spte) || tdp_pte_is_leaf(spte, level))
 		return NULL;
 
-	return (tdp_ptep_t)__va(spte_to_pfn(spte) << PAGE_SHIFT);
+	return (tdp_ptep_t)__va(tdp_pte_to_pfn(spte) << PAGE_SHIFT);
 }
 
 /*
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index bfac83ab52db..6e3c38532d1d 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -45,8 +45,9 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
 	 * logic needs to be reassessed if KVM were to use non-leaf Accessed
 	 * bits, e.g. to skip stepping down into child SPTEs when aging SPTEs.
 	 */
-	if (is_shadow_present_pte(old_spte) && is_last_spte(old_spte, level) &&
-	    spte_has_volatile_bits(old_spte))
+	if (tdp_pte_is_present(old_spte) &&
+	    tdp_pte_is_leaf(old_spte, level) &&
+	    tdp_pte_is_volatile(old_spte))
 		return kvm_tdp_mmu_write_spte_atomic(sptep, new_spte);
 
 	__kvm_tdp_mmu_write_spte(sptep, new_spte);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index a6d6e393c009..fea42bbac984 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -334,13 +334,13 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 
 static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
 {
-	if (!is_shadow_present_pte(old_spte) || !is_last_spte(old_spte, level))
+	if (!tdp_pte_is_present(old_spte) || !tdp_pte_is_leaf(old_spte, level))
 		return;
 
-	if (is_accessed_spte(old_spte) &&
-	    (!is_shadow_present_pte(new_spte) || !is_accessed_spte(new_spte) ||
-	     spte_to_pfn(old_spte) != spte_to_pfn(new_spte)))
-		kvm_set_pfn_accessed(spte_to_pfn(old_spte));
+	if (tdp_pte_is_accessed(old_spte) &&
+	    (!tdp_pte_is_present(new_spte) || !tdp_pte_is_accessed(new_spte) ||
+	     tdp_pte_to_pfn(old_spte) != tdp_pte_to_pfn(new_spte)))
+		kvm_set_pfn_accessed(tdp_pte_to_pfn(old_spte));
 }
 
 static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
@@ -352,10 +352,10 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 	if (level > PG_LEVEL_PTE)
 		return;
 
-	pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
+	pfn_changed = tdp_pte_to_pfn(old_spte) != tdp_pte_to_pfn(new_spte);
 
-	if ((!is_writable_pte(old_spte) || pfn_changed) &&
-	    is_writable_pte(new_spte)) {
+	if ((!tdp_pte_is_writable(old_spte) || pfn_changed) &&
+	    tdp_pte_is_writable(new_spte)) {
 		slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn);
 		mark_page_dirty_in_slot(kvm, slot, gfn);
 	}
@@ -445,8 +445,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * value to the removed SPTE value.
 			 */
 			for (;;) {
-				old_spte = kvm_tdp_mmu_write_spte_atomic(sptep, REMOVED_SPTE);
-				if (!is_removed_spte(old_spte))
+				old_spte = kvm_tdp_mmu_write_spte_atomic(sptep, REMOVED_TDP_PTE);
+				if (!tdp_pte_is_removed(old_spte))
 					break;
 				cpu_relax();
 			}
@@ -461,7 +461,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * unreachable.
 			 */
 			old_spte = kvm_tdp_mmu_read_spte(sptep);
-			if (!is_shadow_present_pte(old_spte))
+			if (!tdp_pte_is_present(old_spte))
 				continue;
 
 			/*
@@ -481,7 +481,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * strictly necessary for the same reason, but using
 			 * the remove SPTE value keeps the shared/exclusive
 			 * paths consistent and allows the handle_changed_spte()
-			 * call below to hardcode the new value to REMOVED_SPTE.
+			 * call below to hardcode the new value to
+			 * REMOVED_TDP_PTE.
 			 *
 			 * Note, even though dropping a Dirty bit is the only
 			 * scenario where a non-atomic update could result in a
@@ -493,10 +494,11 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 			 * it here.
 			 */
 			old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte,
-							  REMOVED_SPTE, level);
+							  REMOVED_TDP_PTE,
+							  level);
 		}
 		handle_changed_spte(kvm, sp->role.as_id, gfn, old_spte,
-				    REMOVED_SPTE, level, shared);
+				    REMOVED_TDP_PTE, level, shared);
 	}
 
 	call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
@@ -521,11 +523,11 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 				  u64 old_spte, u64 new_spte, int level,
 				  bool shared)
 {
-	bool was_present = is_shadow_present_pte(old_spte);
-	bool is_present = is_shadow_present_pte(new_spte);
-	bool was_leaf = was_present && is_last_spte(old_spte, level);
-	bool is_leaf = is_present && is_last_spte(new_spte, level);
-	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
+	bool was_present = tdp_pte_is_present(old_spte);
+	bool is_present = tdp_pte_is_present(new_spte);
+	bool was_leaf = was_present && tdp_pte_is_leaf(old_spte, level);
+	bool is_leaf = is_present && tdp_pte_is_leaf(new_spte, level);
+	bool pfn_changed = tdp_pte_to_pfn(old_spte) != tdp_pte_to_pfn(new_spte);
 
 	WARN_ON(level > TDP_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_PTE);
@@ -560,7 +562,7 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
 
 	if (is_leaf)
-		check_spte_writable_invariants(new_spte);
+		tdp_pte_check_leaf_invariants(new_spte);
 
 	/*
 	 * The only times a SPTE should be changed from a non-present to
@@ -574,9 +576,9 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 		 * impact the guest since both the former and current SPTEs
 		 * are nonpresent.
 		 */
-		if (WARN_ON(!is_mmio_spte(old_spte) &&
-			    !is_mmio_spte(new_spte) &&
-			    !is_removed_spte(new_spte)))
+		if (WARN_ON(!tdp_pte_is_mmio(old_spte) &&
+			    !tdp_pte_is_mmio(new_spte) &&
+			    !tdp_pte_is_removed(new_spte)))
 			pr_err("Unexpected SPTE change! Nonpresent SPTEs\n"
 			       "should not be replaced with another,\n"
 			       "different nonpresent SPTE, unless one or both\n"
@@ -590,9 +592,9 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	if (is_leaf != was_leaf)
 		kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);
 
-	if (was_leaf && is_dirty_spte(old_spte) &&
-	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
-		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
+	if (was_leaf && tdp_pte_is_dirty(old_spte) &&
+	    (!is_present || !tdp_pte_is_dirty(new_spte) || pfn_changed))
+		kvm_set_pfn_dirty(tdp_pte_to_pfn(old_spte));
 
 	/*
 	 * Recursively handle child PTs if the change removed a subtree from
@@ -645,7 +647,7 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm,
 	 * and pre-checking before inserting a new SPTE is advantageous as it
 	 * avoids unnecessary work.
 	 */
-	WARN_ON_ONCE(iter->yielded || is_removed_spte(iter->old_spte));
+	WARN_ON_ONCE(iter->yielded || tdp_pte_is_removed(iter->old_spte));
 
 	lockdep_assert_held_read(&kvm->mmu_lock);
 
@@ -674,7 +676,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
 	 * immediately installing a present entry in its place
 	 * before the TLBs are flushed.
 	 */
-	ret = tdp_mmu_set_spte_atomic(kvm, iter, REMOVED_SPTE);
+	ret = tdp_mmu_set_spte_atomic(kvm, iter, REMOVED_TDP_PTE);
 	if (ret)
 		return ret;
 
@@ -730,7 +732,7 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
 	 * should be used. If operating under the MMU lock in write mode, the
 	 * use of the removed SPTE should not be necessary.
 	 */
-	WARN_ON(is_removed_spte(old_spte) || is_removed_spte(new_spte));
+	WARN_ON(tdp_pte_is_removed(old_spte) || tdp_pte_is_removed(new_spte));
 
 	old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
 
@@ -781,8 +783,8 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
 
 #define tdp_root_for_each_leaf_pte(_iter, _root, _start, _end)	\
 	tdp_root_for_each_pte(_iter, _root, _start, _end)		\
-		if (!is_shadow_present_pte(_iter.old_spte) ||		\
-		    !is_last_spte(_iter.old_spte, _iter.level))		\
+		if (!tdp_pte_is_present(_iter.old_spte) ||		\
+		    !tdp_pte_is_leaf(_iter.old_spte, _iter.level))		\
 			continue;					\
 		else
 
@@ -858,7 +860,7 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte))
+		if (!tdp_pte_is_present(iter.old_spte))
 			continue;
 
 		if (iter.level > zap_level)
@@ -919,7 +921,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 		return false;
 
 	old_spte = kvm_tdp_mmu_read_spte(sp->ptep);
-	if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
+	if (WARN_ON_ONCE(!tdp_pte_is_present(old_spte)))
 		return false;
 
 	__tdp_mmu_set_spte(kvm, sp->role.as_id, sp->ptep, old_spte, 0,
@@ -953,8 +955,8 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
 			continue;
 		}
 
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level))
+		if (!tdp_pte_is_present(iter.old_spte) ||
+		    !tdp_pte_is_leaf(iter.old_spte, iter.level))
 			continue;
 
 		tdp_mmu_set_spte(kvm, &iter, 0);
@@ -1074,8 +1076,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 		ret = RET_PF_SPURIOUS;
 	else if (tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte))
 		return RET_PF_RETRY;
-	else if (is_shadow_present_pte(iter->old_spte) &&
-		 !is_last_spte(iter->old_spte, iter->level))
+	else if (tdp_pte_is_present(iter->old_spte) &&
+		 !tdp_pte_is_leaf(iter->old_spte, iter->level))
 		kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
 						   TDP_PAGES_PER_LEVEL(iter->level + 1));
 
@@ -1090,7 +1092,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 	}
 
 	/* If a MMIO SPTE is installed, the MMIO will need to be emulated. */
-	if (unlikely(is_mmio_spte(new_spte))) {
+	if (unlikely(tdp_pte_is_mmio(new_spte))) {
 		vcpu->stat.pf_mmio_spte_created++;
 		trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn,
 				     new_spte);
@@ -1168,12 +1170,12 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		 * If SPTE has been frozen by another thread, just give up and
 		 * retry, avoiding unnecessary page table allocation and free.
 		 */
-		if (is_removed_spte(iter.old_spte))
+		if (tdp_pte_is_removed(iter.old_spte))
 			goto retry;
 
 		/* Step down into the lower level page table if it exists. */
-		if (is_shadow_present_pte(iter.old_spte) &&
-		    !is_large_pte(iter.old_spte))
+		if (tdp_pte_is_present(iter.old_spte) &&
+		    !tdp_pte_is_huge(iter.old_spte))
 			continue;
 
 		/*
@@ -1185,7 +1187,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 
 		sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
 
-		if (is_shadow_present_pte(iter.old_spte))
+		if (tdp_pte_is_present(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
 		else
 			r = tdp_mmu_link_sp(kvm, &iter, sp, true);
@@ -1207,6 +1209,15 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		}
 	}
 
+	/*
+	 * Force the guest to retry the access if the upper level SPTEs aren't
+	 * in place, or if the target leaf SPTE is frozen by another CPU.
+	 */
+	if (iter.level != fault->goal_level || tdp_pte_is_removed(iter.old_spte)) {
+		rcu_read_unlock();
+		return RET_PF_RETRY;
+	}
+
 	ret = tdp_mmu_map_handle_target_level(vcpu, fault, &iter);
 
 retry:
@@ -1255,27 +1266,13 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm,
 static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter,
 			  struct kvm_gfn_range *range)
 {
-	u64 new_spte = 0;
+	u64 new_spte;
 
 	/* If we have a non-accessed entry we don't need to change the pte. */
-	if (!is_accessed_spte(iter->old_spte))
+	if (!tdp_pte_is_accessed(iter->old_spte))
 		return false;
 
-	new_spte = iter->old_spte;
-
-	if (spte_ad_enabled(new_spte)) {
-		new_spte &= ~shadow_accessed_mask;
-	} else {
-		/*
-		 * Capture the dirty status of the page, so that it doesn't get
-		 * lost when the SPTE is marked for access tracking.
-		 */
-		if (is_writable_pte(new_spte))
-			kvm_set_pfn_dirty(spte_to_pfn(new_spte));
-
-		new_spte = mark_spte_for_access_track(new_spte);
-	}
-
+	new_spte = tdp_pte_clear_accessed(iter->old_spte);
 	tdp_mmu_set_spte_no_acc_track(kvm, iter, new_spte);
 
 	return true;
@@ -1289,7 +1286,7 @@ bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter,
 			 struct kvm_gfn_range *range)
 {
-	return is_accessed_spte(iter->old_spte);
+	return tdp_pte_is_accessed(iter->old_spte);
 }
 
 bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
@@ -1306,7 +1303,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
 	WARN_ON(pte_huge(range->pte) || range->start + 1 != range->end);
 
 	if (iter->level != PG_LEVEL_PTE ||
-	    !is_shadow_present_pte(iter->old_spte))
+	    !tdp_pte_is_present(iter->old_spte))
 		return false;
 
 	/*
@@ -1364,12 +1361,12 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level) ||
-		    !(iter.old_spte & PT_WRITABLE_MASK))
+		if (!tdp_pte_is_present(iter.old_spte) ||
+		    !tdp_pte_is_leaf(iter.old_spte, iter.level) ||
+		    !tdp_pte_is_writable(iter.old_spte))
 			continue;
 
-		new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
+		new_spte = tdp_pte_clear_writable(iter.old_spte);
 
 		if (tdp_mmu_set_spte_atomic(kvm, &iter, new_spte))
 			goto retry;
@@ -1525,7 +1522,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte) || !is_large_pte(iter.old_spte))
+		if (!tdp_pte_is_present(iter.old_spte) || !tdp_pte_is_huge(iter.old_spte))
 			continue;
 
 		if (!sp) {
@@ -1607,20 +1604,12 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
 		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
 			continue;
 
-		if (!is_shadow_present_pte(iter.old_spte))
+		if (!tdp_pte_is_present(iter.old_spte))
 			continue;
 
-		if (spte_ad_need_write_protect(iter.old_spte)) {
-			if (is_writable_pte(iter.old_spte))
-				new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
-			else
-				continue;
-		} else {
-			if (iter.old_spte & shadow_dirty_mask)
-				new_spte = iter.old_spte & ~shadow_dirty_mask;
-			else
-				continue;
-		}
+		new_spte = tdp_pte_clear_dirty(iter.old_spte, false);
+		if (new_spte == iter.old_spte)
+			continue;
 
 		if (tdp_mmu_set_spte_atomic(kvm, &iter, new_spte))
 			goto retry;
@@ -1680,17 +1669,9 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
 
 		mask &= ~(1UL << (iter.gfn - gfn));
 
-		if (wrprot || spte_ad_need_write_protect(iter.old_spte)) {
-			if (is_writable_pte(iter.old_spte))
-				new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
-			else
-				continue;
-		} else {
-			if (iter.old_spte & shadow_dirty_mask)
-				new_spte = iter.old_spte & ~shadow_dirty_mask;
-			else
-				continue;
-		}
+		new_spte = tdp_pte_clear_dirty(iter.old_spte, wrprot);
+		if (new_spte == iter.old_spte)
+			continue;
 
 		tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte);
 	}
@@ -1734,7 +1715,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 			continue;
 
 		if (iter.level > TDP_MAX_HUGEPAGE_LEVEL ||
-		    !is_shadow_present_pte(iter.old_spte))
+		    !tdp_pte_is_present(iter.old_spte))
 			continue;
 
 		/*
@@ -1742,7 +1723,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		 * a large page size, then its parent would have been zapped
 		 * instead of stepping down.
 		 */
-		if (is_last_spte(iter.old_spte, iter.level))
+		if (tdp_pte_is_leaf(iter.old_spte, iter.level))
 			continue;
 
 		/*
@@ -1800,13 +1781,11 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 	rcu_read_lock();
 
 	for_each_tdp_pte_min_level(iter, root, min_level, gfn, gfn + 1) {
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    !is_last_spte(iter.old_spte, iter.level))
+		if (!tdp_pte_is_present(iter.old_spte) ||
+		    !tdp_pte_is_leaf(iter.old_spte, iter.level))
 			continue;
 
-		new_spte = iter.old_spte &
-			~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
-
+		new_spte = tdp_pte_clear_mmu_writable(iter.old_spte);
 		if (new_spte == iter.old_spte)
 			break;
 
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
new file mode 100644
index 000000000000..cf3b692d8e21
--- /dev/null
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/kvm_types.h>
+#include <kvm/tdp_pgtable.h>
+
+#include "mmu.h"
+#include "spte.h"
+
+/* Removed SPTEs must not be misconstrued as shadow present PTEs. */
+static_assert(!(REMOVED_TDP_PTE & SPTE_MMU_PRESENT_MASK));
+
+static_assert(TDP_PTE_WRITABLE_MASK == PT_WRITABLE_MASK);
+static_assert(TDP_PTE_HUGE_PAGE_MASK == PT_PAGE_SIZE_MASK);
+static_assert(TDP_PTE_PRESENT_MASK == SPTE_MMU_PRESENT_MASK);
+
+bool tdp_pte_is_accessed(u64 pte)
+{
+	return is_accessed_spte(pte);
+}
+
+bool tdp_pte_is_dirty(u64 pte)
+{
+	return is_dirty_spte(pte);
+}
+
+bool tdp_pte_is_mmio(u64 pte)
+{
+	return is_mmio_spte(pte);
+}
+
+bool tdp_pte_is_volatile(u64 pte)
+{
+	return spte_has_volatile_bits(pte);
+}
+
+u64 tdp_pte_clear_dirty(u64 pte, bool force_wrprot)
+{
+	if (force_wrprot || spte_ad_need_write_protect(pte)) {
+		if (tdp_pte_is_writable(pte))
+			pte &= ~PT_WRITABLE_MASK;
+	} else if (pte & shadow_dirty_mask) {
+		pte &= ~shadow_dirty_mask;
+	}
+
+	return pte;
+}
+
+u64 tdp_pte_clear_accessed(u64 old_spte)
+{
+	if (spte_ad_enabled(old_spte))
+		return old_spte & ~shadow_accessed_mask;
+
+	/*
+	 * Capture the dirty status of the page, so that it doesn't get lost
+	 * when the SPTE is marked for access tracking.
+	 */
+	if (tdp_pte_is_writable(old_spte))
+		kvm_set_pfn_dirty(tdp_pte_to_pfn(old_spte));
+
+	return mark_spte_for_access_track(old_spte);
+}
+
+kvm_pfn_t tdp_pte_to_pfn(u64 pte)
+{
+	return spte_to_pfn(pte);
+}
+
+void tdp_pte_check_leaf_invariants(u64 pte)
+{
+	check_spte_writable_invariants(pte);
+}
+
diff --git a/include/kvm/tdp_pgtable.h b/include/kvm/tdp_pgtable.h
index 968be8d92350..a24c45ac7765 100644
--- a/include/kvm/tdp_pgtable.h
+++ b/include/kvm/tdp_pgtable.h
@@ -5,6 +5,8 @@
 #include <linux/log2.h>
 #include <linux/mm_types.h>
 
+#include <asm/kvm/tdp_pgtable.h>
+
 #define TDP_ROOT_MAX_LEVEL	5
 #define TDP_MAX_HUGEPAGE_LEVEL	PG_LEVEL_PUD
 #define TDP_PTES_PER_PAGE	(PAGE_SIZE / sizeof(u64))
@@ -18,4 +20,20 @@
 #define TDP_PTE_INDEX(gfn, level) \
 	(((gfn) >> TDP_LEVEL_SHIFT(level)) & TDP_LEVEL_MASK)
 
+/*
+ * If a thread running without exclusive control of the MMU lock must perform a
+ * multi-part operation on a PTE, it can set the PTE to REMOVED_TDP_PTE as a
+ * non-present intermediate value. Other threads which encounter this value
+ * should not modify the PTE.
+ */
+static inline bool tdp_pte_is_removed(u64 pte)
+{
+	return pte == REMOVED_TDP_PTE;
+}
+
+static inline bool tdp_pte_is_leaf(u64 pte, int level)
+{
+	return tdp_pte_is_huge(pte) || level == PG_LEVEL_PTE;
+}
+
 #endif /* !__KVM_TDP_PGTABLE_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 16/37] KVM: x86/mmu: Abstract away TDP MMU root lookup
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Abstract the code that looks up the TDP MMU root from vcpu->arch.mmu
behind a function, tdp_mmu_root(). This will be used in a future commit
to allow the TDP MMU to be moved to common code, where vcpu->arch.mmu
cannot be accessed directly.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h |  2 ++
 arch/x86/kvm/mmu/tdp_mmu.c             | 17 +++++++----------
 arch/x86/kvm/mmu/tdp_pgtable.c         |  5 +++++
 3 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
index cebc4bc44b49..c5c4e4cab24a 100644
--- a/arch/x86/include/asm/kvm/tdp_pgtable.h
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -5,6 +5,8 @@
 #include <linux/types.h>
 #include <linux/kvm_types.h>
 
+struct kvm_mmu_page *tdp_mmu_root(struct kvm_vcpu *vcpu);
+
 /*
  * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-present on
  * both AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a L1TF
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index fea42bbac984..8155a9e79203 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -788,9 +788,6 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
 			continue;					\
 		else
 
-#define tdp_mmu_for_each_pte(_iter, _mmu, _start, _end)		\
-	for_each_tdp_pte(_iter, to_shadow_page(_mmu->root.hpa), _start, _end)
-
 /*
  * Yield if the MMU lock is contended or this thread needs to return control
  * to the scheduler.
@@ -1145,7 +1142,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
  */
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	struct kvm_mmu_page *root = tdp_mmu_root(vcpu);
 	struct kvm *kvm = vcpu->kvm;
 	struct tdp_iter iter;
 	struct kvm_mmu_page *sp;
@@ -1157,7 +1154,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 
 	rcu_read_lock();
 
-	tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
+	for_each_tdp_pte(iter, root, fault->gfn, fault->gfn + 1) {
 		int r;
 
 		if (fault->arch.nx_huge_page_workaround_enabled)
@@ -1826,14 +1823,14 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
 int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 			 int *root_level)
 {
+	struct kvm_mmu_page *root = tdp_mmu_root(vcpu);
 	struct tdp_iter iter;
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	gfn_t gfn = addr >> PAGE_SHIFT;
 	int leaf = -1;
 
-	*root_level = vcpu->arch.mmu->root_role.level;
+	*root_level = root->role.level;
 
-	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+	for_each_tdp_pte(iter, root, gfn, gfn + 1) {
 		leaf = iter.level;
 		sptes[leaf] = iter.old_spte;
 	}
@@ -1855,12 +1852,12 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 					u64 *spte)
 {
+	struct kvm_mmu_page *root = tdp_mmu_root(vcpu);
 	struct tdp_iter iter;
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	gfn_t gfn = addr >> PAGE_SHIFT;
 	tdp_ptep_t sptep = NULL;
 
-	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+	for_each_tdp_pte(iter, root, gfn, gfn + 1) {
 		*spte = iter.old_spte;
 		sptep = iter.sptep;
 	}
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index cf3b692d8e21..97cc900e8818 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -13,6 +13,11 @@ static_assert(TDP_PTE_WRITABLE_MASK == PT_WRITABLE_MASK);
 static_assert(TDP_PTE_HUGE_PAGE_MASK == PT_PAGE_SIZE_MASK);
 static_assert(TDP_PTE_PRESENT_MASK == SPTE_MMU_PRESENT_MASK);
 
+struct kvm_mmu_page *tdp_mmu_root(struct kvm_vcpu *vcpu)
+{
+	return to_shadow_page(vcpu->arch.mmu->root.hpa);
+}
+
 bool tdp_pte_is_accessed(u64 pte)
 {
 	return is_accessed_spte(pte);
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 16/37] KVM: x86/mmu: Abstract away TDP MMU root lookup
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Abstract the code that looks up the TDP MMU root from vcpu->arch.mmu
behind a function, tdp_mmu_root(). This will be used in a future commit
to allow the TDP MMU to be moved to common code, where vcpu->arch.mmu
cannot be accessed directly.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h |  2 ++
 arch/x86/kvm/mmu/tdp_mmu.c             | 17 +++++++----------
 arch/x86/kvm/mmu/tdp_pgtable.c         |  5 +++++
 3 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
index cebc4bc44b49..c5c4e4cab24a 100644
--- a/arch/x86/include/asm/kvm/tdp_pgtable.h
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -5,6 +5,8 @@
 #include <linux/types.h>
 #include <linux/kvm_types.h>
 
+struct kvm_mmu_page *tdp_mmu_root(struct kvm_vcpu *vcpu);
+
 /*
  * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-present on
  * both AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a L1TF
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index fea42bbac984..8155a9e79203 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -788,9 +788,6 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
 			continue;					\
 		else
 
-#define tdp_mmu_for_each_pte(_iter, _mmu, _start, _end)		\
-	for_each_tdp_pte(_iter, to_shadow_page(_mmu->root.hpa), _start, _end)
-
 /*
  * Yield if the MMU lock is contended or this thread needs to return control
  * to the scheduler.
@@ -1145,7 +1142,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
  */
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	struct kvm_mmu_page *root = tdp_mmu_root(vcpu);
 	struct kvm *kvm = vcpu->kvm;
 	struct tdp_iter iter;
 	struct kvm_mmu_page *sp;
@@ -1157,7 +1154,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 
 	rcu_read_lock();
 
-	tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
+	for_each_tdp_pte(iter, root, fault->gfn, fault->gfn + 1) {
 		int r;
 
 		if (fault->arch.nx_huge_page_workaround_enabled)
@@ -1826,14 +1823,14 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
 int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 			 int *root_level)
 {
+	struct kvm_mmu_page *root = tdp_mmu_root(vcpu);
 	struct tdp_iter iter;
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	gfn_t gfn = addr >> PAGE_SHIFT;
 	int leaf = -1;
 
-	*root_level = vcpu->arch.mmu->root_role.level;
+	*root_level = root->role.level;
 
-	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+	for_each_tdp_pte(iter, root, gfn, gfn + 1) {
 		leaf = iter.level;
 		sptes[leaf] = iter.old_spte;
 	}
@@ -1855,12 +1852,12 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 					u64 *spte)
 {
+	struct kvm_mmu_page *root = tdp_mmu_root(vcpu);
 	struct tdp_iter iter;
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	gfn_t gfn = addr >> PAGE_SHIFT;
 	tdp_ptep_t sptep = NULL;
 
-	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+	for_each_tdp_pte(iter, root, gfn, gfn + 1) {
 		*spte = iter.old_spte;
 		sptep = iter.sptep;
 	}
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index cf3b692d8e21..97cc900e8818 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -13,6 +13,11 @@ static_assert(TDP_PTE_WRITABLE_MASK == PT_WRITABLE_MASK);
 static_assert(TDP_PTE_HUGE_PAGE_MASK == PT_PAGE_SIZE_MASK);
 static_assert(TDP_PTE_PRESENT_MASK == SPTE_MMU_PRESENT_MASK);
 
+struct kvm_mmu_page *tdp_mmu_root(struct kvm_vcpu *vcpu)
+{
+	return to_shadow_page(vcpu->arch.mmu->root.hpa);
+}
+
 bool tdp_pte_is_accessed(u64 pte)
 {
 	return is_accessed_spte(pte);
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 16/37] KVM: x86/mmu: Abstract away TDP MMU root lookup
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Abstract the code that looks up the TDP MMU root from vcpu->arch.mmu
behind a function, tdp_mmu_root(). This will be used in a future commit
to allow the TDP MMU to be moved to common code, where vcpu->arch.mmu
cannot be accessed directly.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h |  2 ++
 arch/x86/kvm/mmu/tdp_mmu.c             | 17 +++++++----------
 arch/x86/kvm/mmu/tdp_pgtable.c         |  5 +++++
 3 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
index cebc4bc44b49..c5c4e4cab24a 100644
--- a/arch/x86/include/asm/kvm/tdp_pgtable.h
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -5,6 +5,8 @@
 #include <linux/types.h>
 #include <linux/kvm_types.h>
 
+struct kvm_mmu_page *tdp_mmu_root(struct kvm_vcpu *vcpu);
+
 /*
  * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-present on
  * both AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a L1TF
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index fea42bbac984..8155a9e79203 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -788,9 +788,6 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
 			continue;					\
 		else
 
-#define tdp_mmu_for_each_pte(_iter, _mmu, _start, _end)		\
-	for_each_tdp_pte(_iter, to_shadow_page(_mmu->root.hpa), _start, _end)
-
 /*
  * Yield if the MMU lock is contended or this thread needs to return control
  * to the scheduler.
@@ -1145,7 +1142,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
  */
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	struct kvm_mmu_page *root = tdp_mmu_root(vcpu);
 	struct kvm *kvm = vcpu->kvm;
 	struct tdp_iter iter;
 	struct kvm_mmu_page *sp;
@@ -1157,7 +1154,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 
 	rcu_read_lock();
 
-	tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
+	for_each_tdp_pte(iter, root, fault->gfn, fault->gfn + 1) {
 		int r;
 
 		if (fault->arch.nx_huge_page_workaround_enabled)
@@ -1826,14 +1823,14 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
 int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 			 int *root_level)
 {
+	struct kvm_mmu_page *root = tdp_mmu_root(vcpu);
 	struct tdp_iter iter;
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	gfn_t gfn = addr >> PAGE_SHIFT;
 	int leaf = -1;
 
-	*root_level = vcpu->arch.mmu->root_role.level;
+	*root_level = root->role.level;
 
-	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+	for_each_tdp_pte(iter, root, gfn, gfn + 1) {
 		leaf = iter.level;
 		sptes[leaf] = iter.old_spte;
 	}
@@ -1855,12 +1852,12 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 					u64 *spte)
 {
+	struct kvm_mmu_page *root = tdp_mmu_root(vcpu);
 	struct tdp_iter iter;
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	gfn_t gfn = addr >> PAGE_SHIFT;
 	tdp_ptep_t sptep = NULL;
 
-	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+	for_each_tdp_pte(iter, root, gfn, gfn + 1) {
 		*spte = iter.old_spte;
 		sptep = iter.sptep;
 	}
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index cf3b692d8e21..97cc900e8818 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -13,6 +13,11 @@ static_assert(TDP_PTE_WRITABLE_MASK == PT_WRITABLE_MASK);
 static_assert(TDP_PTE_HUGE_PAGE_MASK == PT_PAGE_SIZE_MASK);
 static_assert(TDP_PTE_PRESENT_MASK == SPTE_MMU_PRESENT_MASK);
 
+struct kvm_mmu_page *tdp_mmu_root(struct kvm_vcpu *vcpu)
+{
+	return to_shadow_page(vcpu->arch.mmu->root.hpa);
+}
+
 bool tdp_pte_is_accessed(u64 pte)
 {
 	return is_accessed_spte(pte);
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 16/37] KVM: x86/mmu: Abstract away TDP MMU root lookup
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Abstract the code that looks up the TDP MMU root from vcpu->arch.mmu
behind a function, tdp_mmu_root(). This will be used in a future commit
to allow the TDP MMU to be moved to common code, where vcpu->arch.mmu
cannot be accessed directly.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h |  2 ++
 arch/x86/kvm/mmu/tdp_mmu.c             | 17 +++++++----------
 arch/x86/kvm/mmu/tdp_pgtable.c         |  5 +++++
 3 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
index cebc4bc44b49..c5c4e4cab24a 100644
--- a/arch/x86/include/asm/kvm/tdp_pgtable.h
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -5,6 +5,8 @@
 #include <linux/types.h>
 #include <linux/kvm_types.h>
 
+struct kvm_mmu_page *tdp_mmu_root(struct kvm_vcpu *vcpu);
+
 /*
  * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-present on
  * both AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a L1TF
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index fea42bbac984..8155a9e79203 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -788,9 +788,6 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
 			continue;					\
 		else
 
-#define tdp_mmu_for_each_pte(_iter, _mmu, _start, _end)		\
-	for_each_tdp_pte(_iter, to_shadow_page(_mmu->root.hpa), _start, _end)
-
 /*
  * Yield if the MMU lock is contended or this thread needs to return control
  * to the scheduler.
@@ -1145,7 +1142,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
  */
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
+	struct kvm_mmu_page *root = tdp_mmu_root(vcpu);
 	struct kvm *kvm = vcpu->kvm;
 	struct tdp_iter iter;
 	struct kvm_mmu_page *sp;
@@ -1157,7 +1154,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 
 	rcu_read_lock();
 
-	tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
+	for_each_tdp_pte(iter, root, fault->gfn, fault->gfn + 1) {
 		int r;
 
 		if (fault->arch.nx_huge_page_workaround_enabled)
@@ -1826,14 +1823,14 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
 int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 			 int *root_level)
 {
+	struct kvm_mmu_page *root = tdp_mmu_root(vcpu);
 	struct tdp_iter iter;
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	gfn_t gfn = addr >> PAGE_SHIFT;
 	int leaf = -1;
 
-	*root_level = vcpu->arch.mmu->root_role.level;
+	*root_level = root->role.level;
 
-	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+	for_each_tdp_pte(iter, root, gfn, gfn + 1) {
 		leaf = iter.level;
 		sptes[leaf] = iter.old_spte;
 	}
@@ -1855,12 +1852,12 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 					u64 *spte)
 {
+	struct kvm_mmu_page *root = tdp_mmu_root(vcpu);
 	struct tdp_iter iter;
-	struct kvm_mmu *mmu = vcpu->arch.mmu;
 	gfn_t gfn = addr >> PAGE_SHIFT;
 	tdp_ptep_t sptep = NULL;
 
-	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+	for_each_tdp_pte(iter, root, gfn, gfn + 1) {
 		*spte = iter.old_spte;
 		sptep = iter.sptep;
 	}
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index cf3b692d8e21..97cc900e8818 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -13,6 +13,11 @@ static_assert(TDP_PTE_WRITABLE_MASK == PT_WRITABLE_MASK);
 static_assert(TDP_PTE_HUGE_PAGE_MASK == PT_PAGE_SIZE_MASK);
 static_assert(TDP_PTE_PRESENT_MASK == SPTE_MMU_PRESENT_MASK);
 
+struct kvm_mmu_page *tdp_mmu_root(struct kvm_vcpu *vcpu)
+{
+	return to_shadow_page(vcpu->arch.mmu->root.hpa);
+}
+
 bool tdp_pte_is_accessed(u64 pte)
 {
 	return is_accessed_spte(pte);
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 17/37] KVM: Move struct kvm_gfn_range to kvm_types.h
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move struct kvm_gfn_range to kvm_types.h so that it's definition can be
accessed in a future commit by arch/x86/include/asm/kvm/tdp_pgtable.h
without needing to include the mega-header kvm_host.h.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 include/linux/kvm_host.h  | 7 -------
 include/linux/kvm_types.h | 8 ++++++++
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 22ecb7ce4d31..469ff4202a0d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -256,13 +256,6 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
 #ifdef KVM_ARCH_WANT_MMU_NOTIFIER
-struct kvm_gfn_range {
-	struct kvm_memory_slot *slot;
-	gfn_t start;
-	gfn_t end;
-	pte_t pte;
-	bool may_block;
-};
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 59cf958d69df..001aad9ea987 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -132,4 +132,12 @@ struct kvm_vcpu_stat_generic {
 
 #define KVM_STATS_NAME_SIZE	48
 
+struct kvm_gfn_range {
+	struct kvm_memory_slot *slot;
+	gfn_t start;
+	gfn_t end;
+	pte_t pte;
+	bool may_block;
+};
+
 #endif /* __KVM_TYPES_H__ */
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 17/37] KVM: Move struct kvm_gfn_range to kvm_types.h
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move struct kvm_gfn_range to kvm_types.h so that it's definition can be
accessed in a future commit by arch/x86/include/asm/kvm/tdp_pgtable.h
without needing to include the mega-header kvm_host.h.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 include/linux/kvm_host.h  | 7 -------
 include/linux/kvm_types.h | 8 ++++++++
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 22ecb7ce4d31..469ff4202a0d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -256,13 +256,6 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
 #ifdef KVM_ARCH_WANT_MMU_NOTIFIER
-struct kvm_gfn_range {
-	struct kvm_memory_slot *slot;
-	gfn_t start;
-	gfn_t end;
-	pte_t pte;
-	bool may_block;
-};
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 59cf958d69df..001aad9ea987 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -132,4 +132,12 @@ struct kvm_vcpu_stat_generic {
 
 #define KVM_STATS_NAME_SIZE	48
 
+struct kvm_gfn_range {
+	struct kvm_memory_slot *slot;
+	gfn_t start;
+	gfn_t end;
+	pte_t pte;
+	bool may_block;
+};
+
 #endif /* __KVM_TYPES_H__ */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 17/37] KVM: Move struct kvm_gfn_range to kvm_types.h
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move struct kvm_gfn_range to kvm_types.h so that it's definition can be
accessed in a future commit by arch/x86/include/asm/kvm/tdp_pgtable.h
without needing to include the mega-header kvm_host.h.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 include/linux/kvm_host.h  | 7 -------
 include/linux/kvm_types.h | 8 ++++++++
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 22ecb7ce4d31..469ff4202a0d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -256,13 +256,6 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
 #ifdef KVM_ARCH_WANT_MMU_NOTIFIER
-struct kvm_gfn_range {
-	struct kvm_memory_slot *slot;
-	gfn_t start;
-	gfn_t end;
-	pte_t pte;
-	bool may_block;
-};
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 59cf958d69df..001aad9ea987 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -132,4 +132,12 @@ struct kvm_vcpu_stat_generic {
 
 #define KVM_STATS_NAME_SIZE	48
 
+struct kvm_gfn_range {
+	struct kvm_memory_slot *slot;
+	gfn_t start;
+	gfn_t end;
+	pte_t pte;
+	bool may_block;
+};
+
 #endif /* __KVM_TYPES_H__ */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 17/37] KVM: Move struct kvm_gfn_range to kvm_types.h
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move struct kvm_gfn_range to kvm_types.h so that it's definition can be
accessed in a future commit by arch/x86/include/asm/kvm/tdp_pgtable.h
without needing to include the mega-header kvm_host.h.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 include/linux/kvm_host.h  | 7 -------
 include/linux/kvm_types.h | 8 ++++++++
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 22ecb7ce4d31..469ff4202a0d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -256,13 +256,6 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
 #ifdef KVM_ARCH_WANT_MMU_NOTIFIER
-struct kvm_gfn_range {
-	struct kvm_memory_slot *slot;
-	gfn_t start;
-	gfn_t end;
-	pte_t pte;
-	bool may_block;
-};
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 59cf958d69df..001aad9ea987 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -132,4 +132,12 @@ struct kvm_vcpu_stat_generic {
 
 #define KVM_STATS_NAME_SIZE	48
 
+struct kvm_gfn_range {
+	struct kvm_memory_slot *slot;
+	gfn_t start;
+	gfn_t end;
+	pte_t pte;
+	bool may_block;
+};
+
 #endif /* __KVM_TYPES_H__ */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 18/37] KVM: x86/mmu: Add common API for creating TDP PTEs
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce an API for construction of TDP PTEs.

- tdp_mmu_make_leaf_pte()
- tdp_mmu_make_nonleaf_pte()
- tdp_mmu_make_huge_page_split_pte()
- tdp_mmu_make_changed_pte_notifier_pte()

This will be used in a future commit to move the TDP MMU to common code,
while PTE construction will stay in the architecture-specific code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h | 10 +++++++
 arch/x86/kvm/mmu/tdp_mmu.c             | 18 +++++--------
 arch/x86/kvm/mmu/tdp_pgtable.c         | 36 ++++++++++++++++++++++++++
 3 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
index c5c4e4cab24a..ff2691ced38b 100644
--- a/arch/x86/include/asm/kvm/tdp_pgtable.h
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -4,6 +4,7 @@
 
 #include <linux/types.h>
 #include <linux/kvm_types.h>
+#include <kvm/mmu_types.h>
 
 struct kvm_mmu_page *tdp_mmu_root(struct kvm_vcpu *vcpu);
 
@@ -57,4 +58,13 @@ kvm_pfn_t tdp_pte_to_pfn(u64 pte);
 
 void tdp_pte_check_leaf_invariants(u64 pte);
 
+struct tdp_iter;
+
+u64 tdp_mmu_make_leaf_pte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+			  struct tdp_iter *iter, bool *wrprot);
+u64 tdp_mmu_make_nonleaf_pte(struct kvm_mmu_page *sp);
+u64 tdp_mmu_make_changed_pte_notifier_pte(struct tdp_iter *iter,
+					  struct kvm_gfn_range *range);
+u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
+				     struct kvm_mmu_page *sp, int index);
 #endif /* !__ASM_KVM_TDP_PGTABLE_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 8155a9e79203..0172b0e44817 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1057,17 +1057,13 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 					  struct tdp_iter *iter)
 {
 	struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep));
-	u64 new_spte;
 	int ret = RET_PF_FIXED;
 	bool wrprot = false;
+	u64 new_spte;
 
 	WARN_ON(sp->role.level != fault->goal_level);
-	if (unlikely(!fault->slot))
-		new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
-	else
-		wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
-					 fault->pfn, iter->old_spte, fault->prefetch, true,
-					 fault->map_writable, &new_spte);
+
+	new_spte = tdp_mmu_make_leaf_pte(vcpu, fault, iter, &wrprot);
 
 	if (new_spte == iter->old_spte)
 		ret = RET_PF_SPURIOUS;
@@ -1117,7 +1113,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 			   struct kvm_mmu_page *sp, bool shared)
 {
-	u64 spte = make_nonleaf_spte(sp->spt, !kvm_ad_enabled());
+	u64 spte = tdp_mmu_make_nonleaf_pte(sp);
 	int ret = 0;
 
 	if (shared) {
@@ -1312,9 +1308,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
 	tdp_mmu_set_spte(kvm, iter, 0);
 
 	if (!pte_write(range->pte)) {
-		new_spte = kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
-								  pte_pfn(range->pte));
-
+		new_spte = tdp_mmu_make_changed_pte_notifier_pte(iter, range);
 		tdp_mmu_set_spte(kvm, iter, new_spte);
 	}
 
@@ -1466,7 +1460,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 	 * not been linked in yet and thus is not reachable from any other CPU.
 	 */
 	for (i = 0; i < TDP_PTES_PER_PAGE; i++)
-		sp->spt[i] = make_huge_page_split_spte(kvm, huge_spte, sp->role, i);
+		sp->spt[i] = tdp_mmu_make_huge_page_split_pte(kvm, huge_spte, sp, i);
 
 	/*
 	 * Replace the huge spte with a pointer to the populated lower level
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index 97cc900e8818..e036ba0c6bee 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -5,6 +5,7 @@
 
 #include "mmu.h"
 #include "spte.h"
+#include "tdp_iter.h"
 
 /* Removed SPTEs must not be misconstrued as shadow present PTEs. */
 static_assert(!(REMOVED_TDP_PTE & SPTE_MMU_PRESENT_MASK));
@@ -75,3 +76,38 @@ void tdp_pte_check_leaf_invariants(u64 pte)
 	check_spte_writable_invariants(pte);
 }
 
+u64 tdp_mmu_make_leaf_pte(struct kvm_vcpu *vcpu,
+			  struct kvm_page_fault *fault,
+			  struct tdp_iter *iter,
+			  bool *wrprot)
+{
+	struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep));
+	u64 new_spte;
+
+	if (unlikely(!fault->slot))
+		return make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
+
+	*wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
+			    fault->pfn, iter->old_spte, fault->prefetch, true,
+			    fault->map_writable, &new_spte);
+
+	return new_spte;
+}
+
+u64 tdp_mmu_make_nonleaf_pte(struct kvm_mmu_page *sp)
+{
+	return make_nonleaf_spte(sp->spt, !kvm_ad_enabled());
+}
+
+u64 tdp_mmu_make_changed_pte_notifier_pte(struct tdp_iter *iter,
+					  struct kvm_gfn_range *range)
+{
+	return kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
+						      pte_pfn(range->pte));
+}
+
+u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
+				     struct kvm_mmu_page *sp, int index)
+{
+	return make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 18/37] KVM: x86/mmu: Add common API for creating TDP PTEs
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Introduce an API for construction of TDP PTEs.

- tdp_mmu_make_leaf_pte()
- tdp_mmu_make_nonleaf_pte()
- tdp_mmu_make_huge_page_split_pte()
- tdp_mmu_make_changed_pte_notifier_pte()

This will be used in a future commit to move the TDP MMU to common code,
while PTE construction will stay in the architecture-specific code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h | 10 +++++++
 arch/x86/kvm/mmu/tdp_mmu.c             | 18 +++++--------
 arch/x86/kvm/mmu/tdp_pgtable.c         | 36 ++++++++++++++++++++++++++
 3 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
index c5c4e4cab24a..ff2691ced38b 100644
--- a/arch/x86/include/asm/kvm/tdp_pgtable.h
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -4,6 +4,7 @@
 
 #include <linux/types.h>
 #include <linux/kvm_types.h>
+#include <kvm/mmu_types.h>
 
 struct kvm_mmu_page *tdp_mmu_root(struct kvm_vcpu *vcpu);
 
@@ -57,4 +58,13 @@ kvm_pfn_t tdp_pte_to_pfn(u64 pte);
 
 void tdp_pte_check_leaf_invariants(u64 pte);
 
+struct tdp_iter;
+
+u64 tdp_mmu_make_leaf_pte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+			  struct tdp_iter *iter, bool *wrprot);
+u64 tdp_mmu_make_nonleaf_pte(struct kvm_mmu_page *sp);
+u64 tdp_mmu_make_changed_pte_notifier_pte(struct tdp_iter *iter,
+					  struct kvm_gfn_range *range);
+u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
+				     struct kvm_mmu_page *sp, int index);
 #endif /* !__ASM_KVM_TDP_PGTABLE_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 8155a9e79203..0172b0e44817 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1057,17 +1057,13 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 					  struct tdp_iter *iter)
 {
 	struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep));
-	u64 new_spte;
 	int ret = RET_PF_FIXED;
 	bool wrprot = false;
+	u64 new_spte;
 
 	WARN_ON(sp->role.level != fault->goal_level);
-	if (unlikely(!fault->slot))
-		new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
-	else
-		wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
-					 fault->pfn, iter->old_spte, fault->prefetch, true,
-					 fault->map_writable, &new_spte);
+
+	new_spte = tdp_mmu_make_leaf_pte(vcpu, fault, iter, &wrprot);
 
 	if (new_spte == iter->old_spte)
 		ret = RET_PF_SPURIOUS;
@@ -1117,7 +1113,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 			   struct kvm_mmu_page *sp, bool shared)
 {
-	u64 spte = make_nonleaf_spte(sp->spt, !kvm_ad_enabled());
+	u64 spte = tdp_mmu_make_nonleaf_pte(sp);
 	int ret = 0;
 
 	if (shared) {
@@ -1312,9 +1308,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
 	tdp_mmu_set_spte(kvm, iter, 0);
 
 	if (!pte_write(range->pte)) {
-		new_spte = kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
-								  pte_pfn(range->pte));
-
+		new_spte = tdp_mmu_make_changed_pte_notifier_pte(iter, range);
 		tdp_mmu_set_spte(kvm, iter, new_spte);
 	}
 
@@ -1466,7 +1460,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 	 * not been linked in yet and thus is not reachable from any other CPU.
 	 */
 	for (i = 0; i < TDP_PTES_PER_PAGE; i++)
-		sp->spt[i] = make_huge_page_split_spte(kvm, huge_spte, sp->role, i);
+		sp->spt[i] = tdp_mmu_make_huge_page_split_pte(kvm, huge_spte, sp, i);
 
 	/*
 	 * Replace the huge spte with a pointer to the populated lower level
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index 97cc900e8818..e036ba0c6bee 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -5,6 +5,7 @@
 
 #include "mmu.h"
 #include "spte.h"
+#include "tdp_iter.h"
 
 /* Removed SPTEs must not be misconstrued as shadow present PTEs. */
 static_assert(!(REMOVED_TDP_PTE & SPTE_MMU_PRESENT_MASK));
@@ -75,3 +76,38 @@ void tdp_pte_check_leaf_invariants(u64 pte)
 	check_spte_writable_invariants(pte);
 }
 
+u64 tdp_mmu_make_leaf_pte(struct kvm_vcpu *vcpu,
+			  struct kvm_page_fault *fault,
+			  struct tdp_iter *iter,
+			  bool *wrprot)
+{
+	struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep));
+	u64 new_spte;
+
+	if (unlikely(!fault->slot))
+		return make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
+
+	*wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
+			    fault->pfn, iter->old_spte, fault->prefetch, true,
+			    fault->map_writable, &new_spte);
+
+	return new_spte;
+}
+
+u64 tdp_mmu_make_nonleaf_pte(struct kvm_mmu_page *sp)
+{
+	return make_nonleaf_spte(sp->spt, !kvm_ad_enabled());
+}
+
+u64 tdp_mmu_make_changed_pte_notifier_pte(struct tdp_iter *iter,
+					  struct kvm_gfn_range *range)
+{
+	return kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
+						      pte_pfn(range->pte));
+}
+
+u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
+				     struct kvm_mmu_page *sp, int index)
+{
+	return make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 18/37] KVM: x86/mmu: Add common API for creating TDP PTEs
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce an API for construction of TDP PTEs.

- tdp_mmu_make_leaf_pte()
- tdp_mmu_make_nonleaf_pte()
- tdp_mmu_make_huge_page_split_pte()
- tdp_mmu_make_changed_pte_notifier_pte()

This will be used in a future commit to move the TDP MMU to common code,
while PTE construction will stay in the architecture-specific code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h | 10 +++++++
 arch/x86/kvm/mmu/tdp_mmu.c             | 18 +++++--------
 arch/x86/kvm/mmu/tdp_pgtable.c         | 36 ++++++++++++++++++++++++++
 3 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
index c5c4e4cab24a..ff2691ced38b 100644
--- a/arch/x86/include/asm/kvm/tdp_pgtable.h
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -4,6 +4,7 @@
 
 #include <linux/types.h>
 #include <linux/kvm_types.h>
+#include <kvm/mmu_types.h>
 
 struct kvm_mmu_page *tdp_mmu_root(struct kvm_vcpu *vcpu);
 
@@ -57,4 +58,13 @@ kvm_pfn_t tdp_pte_to_pfn(u64 pte);
 
 void tdp_pte_check_leaf_invariants(u64 pte);
 
+struct tdp_iter;
+
+u64 tdp_mmu_make_leaf_pte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+			  struct tdp_iter *iter, bool *wrprot);
+u64 tdp_mmu_make_nonleaf_pte(struct kvm_mmu_page *sp);
+u64 tdp_mmu_make_changed_pte_notifier_pte(struct tdp_iter *iter,
+					  struct kvm_gfn_range *range);
+u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
+				     struct kvm_mmu_page *sp, int index);
 #endif /* !__ASM_KVM_TDP_PGTABLE_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 8155a9e79203..0172b0e44817 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1057,17 +1057,13 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 					  struct tdp_iter *iter)
 {
 	struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep));
-	u64 new_spte;
 	int ret = RET_PF_FIXED;
 	bool wrprot = false;
+	u64 new_spte;
 
 	WARN_ON(sp->role.level != fault->goal_level);
-	if (unlikely(!fault->slot))
-		new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
-	else
-		wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
-					 fault->pfn, iter->old_spte, fault->prefetch, true,
-					 fault->map_writable, &new_spte);
+
+	new_spte = tdp_mmu_make_leaf_pte(vcpu, fault, iter, &wrprot);
 
 	if (new_spte == iter->old_spte)
 		ret = RET_PF_SPURIOUS;
@@ -1117,7 +1113,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 			   struct kvm_mmu_page *sp, bool shared)
 {
-	u64 spte = make_nonleaf_spte(sp->spt, !kvm_ad_enabled());
+	u64 spte = tdp_mmu_make_nonleaf_pte(sp);
 	int ret = 0;
 
 	if (shared) {
@@ -1312,9 +1308,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
 	tdp_mmu_set_spte(kvm, iter, 0);
 
 	if (!pte_write(range->pte)) {
-		new_spte = kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
-								  pte_pfn(range->pte));
-
+		new_spte = tdp_mmu_make_changed_pte_notifier_pte(iter, range);
 		tdp_mmu_set_spte(kvm, iter, new_spte);
 	}
 
@@ -1466,7 +1460,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 	 * not been linked in yet and thus is not reachable from any other CPU.
 	 */
 	for (i = 0; i < TDP_PTES_PER_PAGE; i++)
-		sp->spt[i] = make_huge_page_split_spte(kvm, huge_spte, sp->role, i);
+		sp->spt[i] = tdp_mmu_make_huge_page_split_pte(kvm, huge_spte, sp, i);
 
 	/*
 	 * Replace the huge spte with a pointer to the populated lower level
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index 97cc900e8818..e036ba0c6bee 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -5,6 +5,7 @@
 
 #include "mmu.h"
 #include "spte.h"
+#include "tdp_iter.h"
 
 /* Removed SPTEs must not be misconstrued as shadow present PTEs. */
 static_assert(!(REMOVED_TDP_PTE & SPTE_MMU_PRESENT_MASK));
@@ -75,3 +76,38 @@ void tdp_pte_check_leaf_invariants(u64 pte)
 	check_spte_writable_invariants(pte);
 }
 
+u64 tdp_mmu_make_leaf_pte(struct kvm_vcpu *vcpu,
+			  struct kvm_page_fault *fault,
+			  struct tdp_iter *iter,
+			  bool *wrprot)
+{
+	struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep));
+	u64 new_spte;
+
+	if (unlikely(!fault->slot))
+		return make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
+
+	*wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
+			    fault->pfn, iter->old_spte, fault->prefetch, true,
+			    fault->map_writable, &new_spte);
+
+	return new_spte;
+}
+
+u64 tdp_mmu_make_nonleaf_pte(struct kvm_mmu_page *sp)
+{
+	return make_nonleaf_spte(sp->spt, !kvm_ad_enabled());
+}
+
+u64 tdp_mmu_make_changed_pte_notifier_pte(struct tdp_iter *iter,
+					  struct kvm_gfn_range *range)
+{
+	return kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
+						      pte_pfn(range->pte));
+}
+
+u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
+				     struct kvm_mmu_page *sp, int index)
+{
+	return make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 18/37] KVM: x86/mmu: Add common API for creating TDP PTEs
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce an API for construction of TDP PTEs.

- tdp_mmu_make_leaf_pte()
- tdp_mmu_make_nonleaf_pte()
- tdp_mmu_make_huge_page_split_pte()
- tdp_mmu_make_changed_pte_notifier_pte()

This will be used in a future commit to move the TDP MMU to common code,
while PTE construction will stay in the architecture-specific code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h | 10 +++++++
 arch/x86/kvm/mmu/tdp_mmu.c             | 18 +++++--------
 arch/x86/kvm/mmu/tdp_pgtable.c         | 36 ++++++++++++++++++++++++++
 3 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
index c5c4e4cab24a..ff2691ced38b 100644
--- a/arch/x86/include/asm/kvm/tdp_pgtable.h
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -4,6 +4,7 @@
 
 #include <linux/types.h>
 #include <linux/kvm_types.h>
+#include <kvm/mmu_types.h>
 
 struct kvm_mmu_page *tdp_mmu_root(struct kvm_vcpu *vcpu);
 
@@ -57,4 +58,13 @@ kvm_pfn_t tdp_pte_to_pfn(u64 pte);
 
 void tdp_pte_check_leaf_invariants(u64 pte);
 
+struct tdp_iter;
+
+u64 tdp_mmu_make_leaf_pte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+			  struct tdp_iter *iter, bool *wrprot);
+u64 tdp_mmu_make_nonleaf_pte(struct kvm_mmu_page *sp);
+u64 tdp_mmu_make_changed_pte_notifier_pte(struct tdp_iter *iter,
+					  struct kvm_gfn_range *range);
+u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
+				     struct kvm_mmu_page *sp, int index);
 #endif /* !__ASM_KVM_TDP_PGTABLE_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 8155a9e79203..0172b0e44817 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1057,17 +1057,13 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 					  struct tdp_iter *iter)
 {
 	struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep));
-	u64 new_spte;
 	int ret = RET_PF_FIXED;
 	bool wrprot = false;
+	u64 new_spte;
 
 	WARN_ON(sp->role.level != fault->goal_level);
-	if (unlikely(!fault->slot))
-		new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
-	else
-		wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
-					 fault->pfn, iter->old_spte, fault->prefetch, true,
-					 fault->map_writable, &new_spte);
+
+	new_spte = tdp_mmu_make_leaf_pte(vcpu, fault, iter, &wrprot);
 
 	if (new_spte == iter->old_spte)
 		ret = RET_PF_SPURIOUS;
@@ -1117,7 +1113,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 			   struct kvm_mmu_page *sp, bool shared)
 {
-	u64 spte = make_nonleaf_spte(sp->spt, !kvm_ad_enabled());
+	u64 spte = tdp_mmu_make_nonleaf_pte(sp);
 	int ret = 0;
 
 	if (shared) {
@@ -1312,9 +1308,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
 	tdp_mmu_set_spte(kvm, iter, 0);
 
 	if (!pte_write(range->pte)) {
-		new_spte = kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
-								  pte_pfn(range->pte));
-
+		new_spte = tdp_mmu_make_changed_pte_notifier_pte(iter, range);
 		tdp_mmu_set_spte(kvm, iter, new_spte);
 	}
 
@@ -1466,7 +1460,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 	 * not been linked in yet and thus is not reachable from any other CPU.
 	 */
 	for (i = 0; i < TDP_PTES_PER_PAGE; i++)
-		sp->spt[i] = make_huge_page_split_spte(kvm, huge_spte, sp->role, i);
+		sp->spt[i] = tdp_mmu_make_huge_page_split_pte(kvm, huge_spte, sp, i);
 
 	/*
 	 * Replace the huge spte with a pointer to the populated lower level
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index 97cc900e8818..e036ba0c6bee 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -5,6 +5,7 @@
 
 #include "mmu.h"
 #include "spte.h"
+#include "tdp_iter.h"
 
 /* Removed SPTEs must not be misconstrued as shadow present PTEs. */
 static_assert(!(REMOVED_TDP_PTE & SPTE_MMU_PRESENT_MASK));
@@ -75,3 +76,38 @@ void tdp_pte_check_leaf_invariants(u64 pte)
 	check_spte_writable_invariants(pte);
 }
 
+u64 tdp_mmu_make_leaf_pte(struct kvm_vcpu *vcpu,
+			  struct kvm_page_fault *fault,
+			  struct tdp_iter *iter,
+			  bool *wrprot)
+{
+	struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep));
+	u64 new_spte;
+
+	if (unlikely(!fault->slot))
+		return make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
+
+	*wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
+			    fault->pfn, iter->old_spte, fault->prefetch, true,
+			    fault->map_writable, &new_spte);
+
+	return new_spte;
+}
+
+u64 tdp_mmu_make_nonleaf_pte(struct kvm_mmu_page *sp)
+{
+	return make_nonleaf_spte(sp->spt, !kvm_ad_enabled());
+}
+
+u64 tdp_mmu_make_changed_pte_notifier_pte(struct tdp_iter *iter,
+					  struct kvm_gfn_range *range)
+{
+	return kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
+						      pte_pfn(range->pte));
+}
+
+u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
+				     struct kvm_mmu_page *sp, int index)
+{
+	return make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 19/37] KVM: x86/mmu: Add arch hooks for NX Huge Pages
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Abstract away the handling for NX Huge Pages down to arch-specific
hooks. This will be used in a future commit to move the TDP MMU to
common code despite NX Huge Pages, which is x86-specific.

NX Huge Pages is by far the most disruptive feature in terms of needing
the most arch hooks in the TDP MMU.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c     | 57 +++++++++++++++++++---------------
 arch/x86/kvm/mmu/tdp_pgtable.c | 52 +++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 0172b0e44817..7670fbd8e72d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -269,17 +269,21 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 	return sp;
 }
 
+__weak void tdp_mmu_arch_init_sp(struct kvm_mmu_page *sp)
+{
+}
+
 static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 			    gfn_t gfn, union kvm_mmu_page_role role)
 {
-	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
-
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
 	sp->role = role;
 	sp->gfn = gfn;
 	sp->ptep = sptep;
 
+	tdp_mmu_arch_init_sp(sp);
+
 	trace_kvm_mmu_get_page(sp, true);
 }
 
@@ -373,6 +377,11 @@ static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 	atomic64_dec(&kvm->arch.tdp_mmu_pages);
 }
 
+__weak void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
+				   bool shared)
+{
+}
+
 /**
  * tdp_mmu_unlink_sp() - Remove a shadow page from the list of used pages
  *
@@ -386,20 +395,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 			      bool shared)
 {
 	tdp_unaccount_mmu_page(kvm, sp);
-
-	if (!sp->arch.nx_huge_page_disallowed)
-		return;
-
-	if (shared)
-		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-	else
-		lockdep_assert_held_write(&kvm->mmu_lock);
-
-	sp->arch.nx_huge_page_disallowed = false;
-	untrack_possible_nx_huge_page(kvm, sp);
-
-	if (shared)
-		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	tdp_mmu_arch_unlink_sp(kvm, sp, shared);
 }
 
 /**
@@ -1129,6 +1125,23 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 	return 0;
 }
 
+__weak void tdp_mmu_arch_adjust_map_level(struct kvm_page_fault *fault,
+					  struct tdp_iter *iter)
+{
+}
+
+__weak void tdp_mmu_arch_pre_link_sp(struct kvm *kvm,
+				     struct kvm_mmu_page *sp,
+				     struct kvm_page_fault *fault)
+{
+}
+
+__weak void tdp_mmu_arch_post_link_sp(struct kvm *kvm,
+				      struct kvm_mmu_page *sp,
+				      struct kvm_page_fault *fault)
+{
+}
+
 static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 				   struct kvm_mmu_page *sp, bool shared);
 
@@ -1153,8 +1166,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	for_each_tdp_pte(iter, root, fault->gfn, fault->gfn + 1) {
 		int r;
 
-		if (fault->arch.nx_huge_page_workaround_enabled)
-			disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
+		tdp_mmu_arch_adjust_map_level(fault, &iter);
 
 		if (iter.level == fault->goal_level)
 			break;
@@ -1178,7 +1190,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		sp = tdp_mmu_alloc_sp(vcpu);
 		tdp_mmu_init_child_sp(sp, &iter);
 
-		sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
+		tdp_mmu_arch_pre_link_sp(kvm, sp, fault);
 
 		if (tdp_pte_is_present(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
@@ -1194,12 +1206,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			goto retry;
 		}
 
-		if (fault->arch.huge_page_disallowed &&
-		    fault->req_level >= iter.level) {
-			spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-			track_possible_nx_huge_page(kvm, sp);
-			spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
-		}
+		tdp_mmu_arch_post_link_sp(kvm, sp, fault);
 	}
 
 	/*
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index e036ba0c6bee..b07ed99b4ab1 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -111,3 +111,55 @@ u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
 {
 	return make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
 }
+
+void tdp_mmu_arch_adjust_map_level(struct kvm_page_fault *fault,
+				   struct tdp_iter *iter)
+{
+	if (fault->arch.nx_huge_page_workaround_enabled)
+		disallowed_hugepage_adjust(fault, iter->old_spte, iter->level);
+}
+
+void tdp_mmu_arch_init_sp(struct kvm_mmu_page *sp)
+{
+	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
+}
+
+void tdp_mmu_arch_pre_link_sp(struct kvm *kvm,
+			      struct kvm_mmu_page *sp,
+			      struct kvm_page_fault *fault)
+{
+	sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
+}
+
+void tdp_mmu_arch_post_link_sp(struct kvm *kvm,
+			       struct kvm_mmu_page *sp,
+			       struct kvm_page_fault *fault)
+{
+	if (!fault->arch.huge_page_disallowed)
+		return;
+
+	if (fault->req_level < sp->role.level)
+		return;
+
+	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	track_possible_nx_huge_page(kvm, sp);
+	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+}
+
+void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
+			    bool shared)
+{
+	if (!sp->arch.nx_huge_page_disallowed)
+		return;
+
+	if (shared)
+		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	else
+		lockdep_assert_held_write(&kvm->mmu_lock);
+
+	sp->arch.nx_huge_page_disallowed = false;
+	untrack_possible_nx_huge_page(kvm, sp);
+
+	if (shared)
+		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 19/37] KVM: x86/mmu: Add arch hooks for NX Huge Pages
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Abstract away the handling for NX Huge Pages down to arch-specific
hooks. This will be used in a future commit to move the TDP MMU to
common code despite NX Huge Pages, which is x86-specific.

NX Huge Pages is by far the most disruptive feature in terms of needing
the most arch hooks in the TDP MMU.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c     | 57 +++++++++++++++++++---------------
 arch/x86/kvm/mmu/tdp_pgtable.c | 52 +++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 0172b0e44817..7670fbd8e72d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -269,17 +269,21 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 	return sp;
 }
 
+__weak void tdp_mmu_arch_init_sp(struct kvm_mmu_page *sp)
+{
+}
+
 static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 			    gfn_t gfn, union kvm_mmu_page_role role)
 {
-	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
-
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
 	sp->role = role;
 	sp->gfn = gfn;
 	sp->ptep = sptep;
 
+	tdp_mmu_arch_init_sp(sp);
+
 	trace_kvm_mmu_get_page(sp, true);
 }
 
@@ -373,6 +377,11 @@ static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 	atomic64_dec(&kvm->arch.tdp_mmu_pages);
 }
 
+__weak void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
+				   bool shared)
+{
+}
+
 /**
  * tdp_mmu_unlink_sp() - Remove a shadow page from the list of used pages
  *
@@ -386,20 +395,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 			      bool shared)
 {
 	tdp_unaccount_mmu_page(kvm, sp);
-
-	if (!sp->arch.nx_huge_page_disallowed)
-		return;
-
-	if (shared)
-		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-	else
-		lockdep_assert_held_write(&kvm->mmu_lock);
-
-	sp->arch.nx_huge_page_disallowed = false;
-	untrack_possible_nx_huge_page(kvm, sp);
-
-	if (shared)
-		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	tdp_mmu_arch_unlink_sp(kvm, sp, shared);
 }
 
 /**
@@ -1129,6 +1125,23 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 	return 0;
 }
 
+__weak void tdp_mmu_arch_adjust_map_level(struct kvm_page_fault *fault,
+					  struct tdp_iter *iter)
+{
+}
+
+__weak void tdp_mmu_arch_pre_link_sp(struct kvm *kvm,
+				     struct kvm_mmu_page *sp,
+				     struct kvm_page_fault *fault)
+{
+}
+
+__weak void tdp_mmu_arch_post_link_sp(struct kvm *kvm,
+				      struct kvm_mmu_page *sp,
+				      struct kvm_page_fault *fault)
+{
+}
+
 static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 				   struct kvm_mmu_page *sp, bool shared);
 
@@ -1153,8 +1166,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	for_each_tdp_pte(iter, root, fault->gfn, fault->gfn + 1) {
 		int r;
 
-		if (fault->arch.nx_huge_page_workaround_enabled)
-			disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
+		tdp_mmu_arch_adjust_map_level(fault, &iter);
 
 		if (iter.level == fault->goal_level)
 			break;
@@ -1178,7 +1190,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		sp = tdp_mmu_alloc_sp(vcpu);
 		tdp_mmu_init_child_sp(sp, &iter);
 
-		sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
+		tdp_mmu_arch_pre_link_sp(kvm, sp, fault);
 
 		if (tdp_pte_is_present(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
@@ -1194,12 +1206,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			goto retry;
 		}
 
-		if (fault->arch.huge_page_disallowed &&
-		    fault->req_level >= iter.level) {
-			spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-			track_possible_nx_huge_page(kvm, sp);
-			spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
-		}
+		tdp_mmu_arch_post_link_sp(kvm, sp, fault);
 	}
 
 	/*
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index e036ba0c6bee..b07ed99b4ab1 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -111,3 +111,55 @@ u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
 {
 	return make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
 }
+
+void tdp_mmu_arch_adjust_map_level(struct kvm_page_fault *fault,
+				   struct tdp_iter *iter)
+{
+	if (fault->arch.nx_huge_page_workaround_enabled)
+		disallowed_hugepage_adjust(fault, iter->old_spte, iter->level);
+}
+
+void tdp_mmu_arch_init_sp(struct kvm_mmu_page *sp)
+{
+	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
+}
+
+void tdp_mmu_arch_pre_link_sp(struct kvm *kvm,
+			      struct kvm_mmu_page *sp,
+			      struct kvm_page_fault *fault)
+{
+	sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
+}
+
+void tdp_mmu_arch_post_link_sp(struct kvm *kvm,
+			       struct kvm_mmu_page *sp,
+			       struct kvm_page_fault *fault)
+{
+	if (!fault->arch.huge_page_disallowed)
+		return;
+
+	if (fault->req_level < sp->role.level)
+		return;
+
+	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	track_possible_nx_huge_page(kvm, sp);
+	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+}
+
+void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
+			    bool shared)
+{
+	if (!sp->arch.nx_huge_page_disallowed)
+		return;
+
+	if (shared)
+		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	else
+		lockdep_assert_held_write(&kvm->mmu_lock);
+
+	sp->arch.nx_huge_page_disallowed = false;
+	untrack_possible_nx_huge_page(kvm, sp);
+
+	if (shared)
+		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 19/37] KVM: x86/mmu: Add arch hooks for NX Huge Pages
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Abstract away the handling for NX Huge Pages down to arch-specific
hooks. This will be used in a future commit to move the TDP MMU to
common code despite NX Huge Pages, which is x86-specific.

NX Huge Pages is by far the most disruptive feature in terms of needing
the most arch hooks in the TDP MMU.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c     | 57 +++++++++++++++++++---------------
 arch/x86/kvm/mmu/tdp_pgtable.c | 52 +++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 0172b0e44817..7670fbd8e72d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -269,17 +269,21 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 	return sp;
 }
 
+__weak void tdp_mmu_arch_init_sp(struct kvm_mmu_page *sp)
+{
+}
+
 static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 			    gfn_t gfn, union kvm_mmu_page_role role)
 {
-	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
-
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
 	sp->role = role;
 	sp->gfn = gfn;
 	sp->ptep = sptep;
 
+	tdp_mmu_arch_init_sp(sp);
+
 	trace_kvm_mmu_get_page(sp, true);
 }
 
@@ -373,6 +377,11 @@ static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 	atomic64_dec(&kvm->arch.tdp_mmu_pages);
 }
 
+__weak void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
+				   bool shared)
+{
+}
+
 /**
  * tdp_mmu_unlink_sp() - Remove a shadow page from the list of used pages
  *
@@ -386,20 +395,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 			      bool shared)
 {
 	tdp_unaccount_mmu_page(kvm, sp);
-
-	if (!sp->arch.nx_huge_page_disallowed)
-		return;
-
-	if (shared)
-		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-	else
-		lockdep_assert_held_write(&kvm->mmu_lock);
-
-	sp->arch.nx_huge_page_disallowed = false;
-	untrack_possible_nx_huge_page(kvm, sp);
-
-	if (shared)
-		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	tdp_mmu_arch_unlink_sp(kvm, sp, shared);
 }
 
 /**
@@ -1129,6 +1125,23 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 	return 0;
 }
 
+__weak void tdp_mmu_arch_adjust_map_level(struct kvm_page_fault *fault,
+					  struct tdp_iter *iter)
+{
+}
+
+__weak void tdp_mmu_arch_pre_link_sp(struct kvm *kvm,
+				     struct kvm_mmu_page *sp,
+				     struct kvm_page_fault *fault)
+{
+}
+
+__weak void tdp_mmu_arch_post_link_sp(struct kvm *kvm,
+				      struct kvm_mmu_page *sp,
+				      struct kvm_page_fault *fault)
+{
+}
+
 static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 				   struct kvm_mmu_page *sp, bool shared);
 
@@ -1153,8 +1166,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	for_each_tdp_pte(iter, root, fault->gfn, fault->gfn + 1) {
 		int r;
 
-		if (fault->arch.nx_huge_page_workaround_enabled)
-			disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
+		tdp_mmu_arch_adjust_map_level(fault, &iter);
 
 		if (iter.level == fault->goal_level)
 			break;
@@ -1178,7 +1190,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		sp = tdp_mmu_alloc_sp(vcpu);
 		tdp_mmu_init_child_sp(sp, &iter);
 
-		sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
+		tdp_mmu_arch_pre_link_sp(kvm, sp, fault);
 
 		if (tdp_pte_is_present(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
@@ -1194,12 +1206,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			goto retry;
 		}
 
-		if (fault->arch.huge_page_disallowed &&
-		    fault->req_level >= iter.level) {
-			spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-			track_possible_nx_huge_page(kvm, sp);
-			spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
-		}
+		tdp_mmu_arch_post_link_sp(kvm, sp, fault);
 	}
 
 	/*
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index e036ba0c6bee..b07ed99b4ab1 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -111,3 +111,55 @@ u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
 {
 	return make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
 }
+
+void tdp_mmu_arch_adjust_map_level(struct kvm_page_fault *fault,
+				   struct tdp_iter *iter)
+{
+	if (fault->arch.nx_huge_page_workaround_enabled)
+		disallowed_hugepage_adjust(fault, iter->old_spte, iter->level);
+}
+
+void tdp_mmu_arch_init_sp(struct kvm_mmu_page *sp)
+{
+	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
+}
+
+void tdp_mmu_arch_pre_link_sp(struct kvm *kvm,
+			      struct kvm_mmu_page *sp,
+			      struct kvm_page_fault *fault)
+{
+	sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
+}
+
+void tdp_mmu_arch_post_link_sp(struct kvm *kvm,
+			       struct kvm_mmu_page *sp,
+			       struct kvm_page_fault *fault)
+{
+	if (!fault->arch.huge_page_disallowed)
+		return;
+
+	if (fault->req_level < sp->role.level)
+		return;
+
+	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	track_possible_nx_huge_page(kvm, sp);
+	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+}
+
+void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
+			    bool shared)
+{
+	if (!sp->arch.nx_huge_page_disallowed)
+		return;
+
+	if (shared)
+		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	else
+		lockdep_assert_held_write(&kvm->mmu_lock);
+
+	sp->arch.nx_huge_page_disallowed = false;
+	untrack_possible_nx_huge_page(kvm, sp);
+
+	if (shared)
+		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 19/37] KVM: x86/mmu: Add arch hooks for NX Huge Pages
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Abstract away the handling for NX Huge Pages down to arch-specific
hooks. This will be used in a future commit to move the TDP MMU to
common code despite NX Huge Pages, which is x86-specific.

NX Huge Pages is by far the most disruptive feature in terms of needing
the most arch hooks in the TDP MMU.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c     | 57 +++++++++++++++++++---------------
 arch/x86/kvm/mmu/tdp_pgtable.c | 52 +++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 0172b0e44817..7670fbd8e72d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -269,17 +269,21 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 	return sp;
 }
 
+__weak void tdp_mmu_arch_init_sp(struct kvm_mmu_page *sp)
+{
+}
+
 static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 			    gfn_t gfn, union kvm_mmu_page_role role)
 {
-	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
-
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
 	sp->role = role;
 	sp->gfn = gfn;
 	sp->ptep = sptep;
 
+	tdp_mmu_arch_init_sp(sp);
+
 	trace_kvm_mmu_get_page(sp, true);
 }
 
@@ -373,6 +377,11 @@ static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 	atomic64_dec(&kvm->arch.tdp_mmu_pages);
 }
 
+__weak void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
+				   bool shared)
+{
+}
+
 /**
  * tdp_mmu_unlink_sp() - Remove a shadow page from the list of used pages
  *
@@ -386,20 +395,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 			      bool shared)
 {
 	tdp_unaccount_mmu_page(kvm, sp);
-
-	if (!sp->arch.nx_huge_page_disallowed)
-		return;
-
-	if (shared)
-		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-	else
-		lockdep_assert_held_write(&kvm->mmu_lock);
-
-	sp->arch.nx_huge_page_disallowed = false;
-	untrack_possible_nx_huge_page(kvm, sp);
-
-	if (shared)
-		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	tdp_mmu_arch_unlink_sp(kvm, sp, shared);
 }
 
 /**
@@ -1129,6 +1125,23 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
 	return 0;
 }
 
+__weak void tdp_mmu_arch_adjust_map_level(struct kvm_page_fault *fault,
+					  struct tdp_iter *iter)
+{
+}
+
+__weak void tdp_mmu_arch_pre_link_sp(struct kvm *kvm,
+				     struct kvm_mmu_page *sp,
+				     struct kvm_page_fault *fault)
+{
+}
+
+__weak void tdp_mmu_arch_post_link_sp(struct kvm *kvm,
+				      struct kvm_mmu_page *sp,
+				      struct kvm_page_fault *fault)
+{
+}
+
 static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
 				   struct kvm_mmu_page *sp, bool shared);
 
@@ -1153,8 +1166,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	for_each_tdp_pte(iter, root, fault->gfn, fault->gfn + 1) {
 		int r;
 
-		if (fault->arch.nx_huge_page_workaround_enabled)
-			disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
+		tdp_mmu_arch_adjust_map_level(fault, &iter);
 
 		if (iter.level == fault->goal_level)
 			break;
@@ -1178,7 +1190,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		sp = tdp_mmu_alloc_sp(vcpu);
 		tdp_mmu_init_child_sp(sp, &iter);
 
-		sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
+		tdp_mmu_arch_pre_link_sp(kvm, sp, fault);
 
 		if (tdp_pte_is_present(iter.old_spte))
 			r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
@@ -1194,12 +1206,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			goto retry;
 		}
 
-		if (fault->arch.huge_page_disallowed &&
-		    fault->req_level >= iter.level) {
-			spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-			track_possible_nx_huge_page(kvm, sp);
-			spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
-		}
+		tdp_mmu_arch_post_link_sp(kvm, sp, fault);
 	}
 
 	/*
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index e036ba0c6bee..b07ed99b4ab1 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -111,3 +111,55 @@ u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
 {
 	return make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
 }
+
+void tdp_mmu_arch_adjust_map_level(struct kvm_page_fault *fault,
+				   struct tdp_iter *iter)
+{
+	if (fault->arch.nx_huge_page_workaround_enabled)
+		disallowed_hugepage_adjust(fault, iter->old_spte, iter->level);
+}
+
+void tdp_mmu_arch_init_sp(struct kvm_mmu_page *sp)
+{
+	INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
+}
+
+void tdp_mmu_arch_pre_link_sp(struct kvm *kvm,
+			      struct kvm_mmu_page *sp,
+			      struct kvm_page_fault *fault)
+{
+	sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
+}
+
+void tdp_mmu_arch_post_link_sp(struct kvm *kvm,
+			       struct kvm_mmu_page *sp,
+			       struct kvm_page_fault *fault)
+{
+	if (!fault->arch.huge_page_disallowed)
+		return;
+
+	if (fault->req_level < sp->role.level)
+		return;
+
+	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	track_possible_nx_huge_page(kvm, sp);
+	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+}
+
+void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
+			    bool shared)
+{
+	if (!sp->arch.nx_huge_page_disallowed)
+		return;
+
+	if (shared)
+		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	else
+		lockdep_assert_held_write(&kvm->mmu_lock);
+
+	sp->arch.nx_huge_page_disallowed = false;
+	untrack_possible_nx_huge_page(kvm, sp);
+
+	if (shared)
+		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
function for computing the max level that a given GFN can be mapped in
KVM's page tables. This will be used in a future commit to enable moving
the TDP MMU to common code.

Provide a default implementation for non-x86 architectures that just
returns the max level. This will result in more zapping than necessary
when disabling dirty logging (i.e. less than optimal performance) but no
correctness issues.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c     | 14 ++++++++++----
 arch/x86/kvm/mmu/tdp_pgtable.c |  7 +++++++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7670fbd8e72d..24d1dbd0a1ec 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1696,6 +1696,13 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 		clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
 }
 
+__weak int tdp_mmu_max_mapping_level(struct kvm *kvm,
+				     const struct kvm_memory_slot *slot,
+				     struct tdp_iter *iter)
+{
+	return TDP_MAX_HUGEPAGE_LEVEL;
+}
+
 static void zap_collapsible_spte_range(struct kvm *kvm,
 				       struct kvm_mmu_page *root,
 				       const struct kvm_memory_slot *slot)
@@ -1727,15 +1734,14 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		/*
 		 * If iter.gfn resides outside of the slot, i.e. the page for
 		 * the current level overlaps but is not contained by the slot,
-		 * then the SPTE can't be made huge.  More importantly, trying
-		 * to query that info from slot->arch.lpage_info will cause an
+		 * then the SPTE can't be made huge. On x86, trying to query
+		 * that info from slot->arch.lpage_info will cause an
 		 * out-of-bounds access.
 		 */
 		if (iter.gfn < start || iter.gfn >= end)
 			continue;
 
-		max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot,
-							      iter.gfn, PG_LEVEL_NUM);
+		max_mapping_level = tdp_mmu_max_mapping_level(kvm, slot, &iter);
 		if (max_mapping_level < iter.level)
 			continue;
 
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index b07ed99b4ab1..840d063c45b8 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -163,3 +163,10 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 	if (shared)
 		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
 }
+
+int tdp_mmu_max_mapping_level(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot,
+			      struct tdp_iter *iter)
+{
+	return kvm_mmu_max_mapping_level(kvm, slot, iter->gfn, PG_LEVEL_NUM);
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
function for computing the max level that a given GFN can be mapped in
KVM's page tables. This will be used in a future commit to enable moving
the TDP MMU to common code.

Provide a default implementation for non-x86 architectures that just
returns the max level. This will result in more zapping than necessary
when disabling dirty logging (i.e. less than optimal performance) but no
correctness issues.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c     | 14 ++++++++++----
 arch/x86/kvm/mmu/tdp_pgtable.c |  7 +++++++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7670fbd8e72d..24d1dbd0a1ec 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1696,6 +1696,13 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 		clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
 }
 
+__weak int tdp_mmu_max_mapping_level(struct kvm *kvm,
+				     const struct kvm_memory_slot *slot,
+				     struct tdp_iter *iter)
+{
+	return TDP_MAX_HUGEPAGE_LEVEL;
+}
+
 static void zap_collapsible_spte_range(struct kvm *kvm,
 				       struct kvm_mmu_page *root,
 				       const struct kvm_memory_slot *slot)
@@ -1727,15 +1734,14 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		/*
 		 * If iter.gfn resides outside of the slot, i.e. the page for
 		 * the current level overlaps but is not contained by the slot,
-		 * then the SPTE can't be made huge.  More importantly, trying
-		 * to query that info from slot->arch.lpage_info will cause an
+		 * then the SPTE can't be made huge. On x86, trying to query
+		 * that info from slot->arch.lpage_info will cause an
 		 * out-of-bounds access.
 		 */
 		if (iter.gfn < start || iter.gfn >= end)
 			continue;
 
-		max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot,
-							      iter.gfn, PG_LEVEL_NUM);
+		max_mapping_level = tdp_mmu_max_mapping_level(kvm, slot, &iter);
 		if (max_mapping_level < iter.level)
 			continue;
 
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index b07ed99b4ab1..840d063c45b8 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -163,3 +163,10 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 	if (shared)
 		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
 }
+
+int tdp_mmu_max_mapping_level(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot,
+			      struct tdp_iter *iter)
+{
+	return kvm_mmu_max_mapping_level(kvm, slot, iter->gfn, PG_LEVEL_NUM);
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
function for computing the max level that a given GFN can be mapped in
KVM's page tables. This will be used in a future commit to enable moving
the TDP MMU to common code.

Provide a default implementation for non-x86 architectures that just
returns the max level. This will result in more zapping than necessary
when disabling dirty logging (i.e. less than optimal performance) but no
correctness issues.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c     | 14 ++++++++++----
 arch/x86/kvm/mmu/tdp_pgtable.c |  7 +++++++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7670fbd8e72d..24d1dbd0a1ec 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1696,6 +1696,13 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 		clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
 }
 
+__weak int tdp_mmu_max_mapping_level(struct kvm *kvm,
+				     const struct kvm_memory_slot *slot,
+				     struct tdp_iter *iter)
+{
+	return TDP_MAX_HUGEPAGE_LEVEL;
+}
+
 static void zap_collapsible_spte_range(struct kvm *kvm,
 				       struct kvm_mmu_page *root,
 				       const struct kvm_memory_slot *slot)
@@ -1727,15 +1734,14 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		/*
 		 * If iter.gfn resides outside of the slot, i.e. the page for
 		 * the current level overlaps but is not contained by the slot,
-		 * then the SPTE can't be made huge.  More importantly, trying
-		 * to query that info from slot->arch.lpage_info will cause an
+		 * then the SPTE can't be made huge. On x86, trying to query
+		 * that info from slot->arch.lpage_info will cause an
 		 * out-of-bounds access.
 		 */
 		if (iter.gfn < start || iter.gfn >= end)
 			continue;
 
-		max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot,
-							      iter.gfn, PG_LEVEL_NUM);
+		max_mapping_level = tdp_mmu_max_mapping_level(kvm, slot, &iter);
 		if (max_mapping_level < iter.level)
 			continue;
 
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index b07ed99b4ab1..840d063c45b8 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -163,3 +163,10 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 	if (shared)
 		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
 }
+
+int tdp_mmu_max_mapping_level(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot,
+			      struct tdp_iter *iter)
+{
+	return kvm_mmu_max_mapping_level(kvm, slot, iter->gfn, PG_LEVEL_NUM);
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
function for computing the max level that a given GFN can be mapped in
KVM's page tables. This will be used in a future commit to enable moving
the TDP MMU to common code.

Provide a default implementation for non-x86 architectures that just
returns the max level. This will result in more zapping than necessary
when disabling dirty logging (i.e. less than optimal performance) but no
correctness issues.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c     | 14 ++++++++++----
 arch/x86/kvm/mmu/tdp_pgtable.c |  7 +++++++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7670fbd8e72d..24d1dbd0a1ec 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1696,6 +1696,13 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 		clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
 }
 
+__weak int tdp_mmu_max_mapping_level(struct kvm *kvm,
+				     const struct kvm_memory_slot *slot,
+				     struct tdp_iter *iter)
+{
+	return TDP_MAX_HUGEPAGE_LEVEL;
+}
+
 static void zap_collapsible_spte_range(struct kvm *kvm,
 				       struct kvm_mmu_page *root,
 				       const struct kvm_memory_slot *slot)
@@ -1727,15 +1734,14 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 		/*
 		 * If iter.gfn resides outside of the slot, i.e. the page for
 		 * the current level overlaps but is not contained by the slot,
-		 * then the SPTE can't be made huge.  More importantly, trying
-		 * to query that info from slot->arch.lpage_info will cause an
+		 * then the SPTE can't be made huge. On x86, trying to query
+		 * that info from slot->arch.lpage_info will cause an
 		 * out-of-bounds access.
 		 */
 		if (iter.gfn < start || iter.gfn >= end)
 			continue;
 
-		max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot,
-							      iter.gfn, PG_LEVEL_NUM);
+		max_mapping_level = tdp_mmu_max_mapping_level(kvm, slot, &iter);
 		if (max_mapping_level < iter.level)
 			continue;
 
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index b07ed99b4ab1..840d063c45b8 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -163,3 +163,10 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 	if (shared)
 		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
 }
+
+int tdp_mmu_max_mapping_level(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot,
+			      struct tdp_iter *iter)
+{
+	return kvm_mmu_max_mapping_level(kvm, slot, iter->gfn, PG_LEVEL_NUM);
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 21/37] KVM: Introduce CONFIG_HAVE_TDP_MMU
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce a new config option to gate support for the common TDP MMU.
This will be used in future commits to avoid compiling the TDP MMU code
and avoid adding fields to common structs (e.g. struct kvm) on
architectures that do not support the TDP MMU yet.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 virt/kvm/Kconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 9fb1ff6f19e5..75d86794d6cf 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -92,3 +92,6 @@ config KVM_XFER_TO_GUEST_WORK
 
 config HAVE_KVM_PM_NOTIFIER
        bool
+
+config HAVE_TDP_MMU
+       bool
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 21/37] KVM: Introduce CONFIG_HAVE_TDP_MMU
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Introduce a new config option to gate support for the common TDP MMU.
This will be used in future commits to avoid compiling the TDP MMU code
and avoid adding fields to common structs (e.g. struct kvm) on
architectures that do not support the TDP MMU yet.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 virt/kvm/Kconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 9fb1ff6f19e5..75d86794d6cf 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -92,3 +92,6 @@ config KVM_XFER_TO_GUEST_WORK
 
 config HAVE_KVM_PM_NOTIFIER
        bool
+
+config HAVE_TDP_MMU
+       bool
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 21/37] KVM: Introduce CONFIG_HAVE_TDP_MMU
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce a new config option to gate support for the common TDP MMU.
This will be used in future commits to avoid compiling the TDP MMU code
and avoid adding fields to common structs (e.g. struct kvm) on
architectures that do not support the TDP MMU yet.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 virt/kvm/Kconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 9fb1ff6f19e5..75d86794d6cf 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -92,3 +92,6 @@ config KVM_XFER_TO_GUEST_WORK
 
 config HAVE_KVM_PM_NOTIFIER
        bool
+
+config HAVE_TDP_MMU
+       bool
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 21/37] KVM: Introduce CONFIG_HAVE_TDP_MMU
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Introduce a new config option to gate support for the common TDP MMU.
This will be used in future commits to avoid compiling the TDP MMU code
and avoid adding fields to common structs (e.g. struct kvm) on
architectures that do not support the TDP MMU yet.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 virt/kvm/Kconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 9fb1ff6f19e5..75d86794d6cf 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -92,3 +92,6 @@ config KVM_XFER_TO_GUEST_WORK
 
 config HAVE_KVM_PM_NOTIFIER
        bool
+
+config HAVE_TDP_MMU
+       bool
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 22/37] KVM: x86: Select HAVE_TDP_MMU if X86_64
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

In preparation for moving the TDP MMU implementation into virt/kvm/
guarded behind HAVE_TDP_MMU, ensure that HAVE_TDP_MMU is selected in
X86_64 builds that enable KVM. This matches the existing behavior of the
TDP MMU in x86.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index fbeaa9ddef59..849185d5020d 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -49,6 +49,7 @@ config KVM
 	select SRCU
 	select INTERVAL_TREE
 	select HAVE_KVM_PM_NOTIFIER if PM
+	select HAVE_TDP_MMU if X86_64
 	help
 	  Support hosting fully virtualized guest machines using hardware
 	  virtualization extensions.  You will need a fairly recent
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 22/37] KVM: x86: Select HAVE_TDP_MMU if X86_64
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

In preparation for moving the TDP MMU implementation into virt/kvm/
guarded behind HAVE_TDP_MMU, ensure that HAVE_TDP_MMU is selected in
X86_64 builds that enable KVM. This matches the existing behavior of the
TDP MMU in x86.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index fbeaa9ddef59..849185d5020d 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -49,6 +49,7 @@ config KVM
 	select SRCU
 	select INTERVAL_TREE
 	select HAVE_KVM_PM_NOTIFIER if PM
+	select HAVE_TDP_MMU if X86_64
 	help
 	  Support hosting fully virtualized guest machines using hardware
 	  virtualization extensions.  You will need a fairly recent
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 22/37] KVM: x86: Select HAVE_TDP_MMU if X86_64
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

In preparation for moving the TDP MMU implementation into virt/kvm/
guarded behind HAVE_TDP_MMU, ensure that HAVE_TDP_MMU is selected in
X86_64 builds that enable KVM. This matches the existing behavior of the
TDP MMU in x86.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index fbeaa9ddef59..849185d5020d 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -49,6 +49,7 @@ config KVM
 	select SRCU
 	select INTERVAL_TREE
 	select HAVE_KVM_PM_NOTIFIER if PM
+	select HAVE_TDP_MMU if X86_64
 	help
 	  Support hosting fully virtualized guest machines using hardware
 	  virtualization extensions.  You will need a fairly recent
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 22/37] KVM: x86: Select HAVE_TDP_MMU if X86_64
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

In preparation for moving the TDP MMU implementation into virt/kvm/
guarded behind HAVE_TDP_MMU, ensure that HAVE_TDP_MMU is selected in
X86_64 builds that enable KVM. This matches the existing behavior of the
TDP MMU in x86.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index fbeaa9ddef59..849185d5020d 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -49,6 +49,7 @@ config KVM
 	select SRCU
 	select INTERVAL_TREE
 	select HAVE_KVM_PM_NOTIFIER if PM
+	select HAVE_TDP_MMU if X86_64
 	help
 	  Support hosting fully virtualized guest machines using hardware
 	  virtualization extensions.  You will need a fairly recent
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move VM-level TDP MMU state to struct kvm so it can be accessed by
common code in a future commit.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h | 39 --------------------------------
 arch/x86/kvm/mmu/tdp_mmu.c      | 40 ++++++++++++++++-----------------
 arch/x86/kvm/mmu/tdp_pgtable.c  |  8 +++----
 include/kvm/mmu_types.h         | 40 +++++++++++++++++++++++++++++++++
 include/linux/kvm_host.h        |  8 +++++++
 5 files changed, 72 insertions(+), 63 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9cf8f956bac3..95c731028452 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1272,45 +1272,6 @@ struct kvm_arch {
 	struct kvm_pmu_event_filter __rcu *pmu_event_filter;
 	struct task_struct *nx_huge_page_recovery_thread;
 
-#ifdef CONFIG_X86_64
-	/* The number of TDP MMU pages across all roots. */
-	atomic64_t tdp_mmu_pages;
-
-	/*
-	 * List of struct kvm_mmu_pages being used as roots.
-	 * All struct kvm_mmu_pages in the list should have
-	 * tdp_mmu_page set.
-	 *
-	 * For reads, this list is protected by:
-	 *	the MMU lock in read mode + RCU or
-	 *	the MMU lock in write mode
-	 *
-	 * For writes, this list is protected by:
-	 *	the MMU lock in read mode + the tdp_mmu_pages_lock or
-	 *	the MMU lock in write mode
-	 *
-	 * Roots will remain in the list until their tdp_mmu_root_count
-	 * drops to zero, at which point the thread that decremented the
-	 * count to zero should removed the root from the list and clean
-	 * it up, freeing the root after an RCU grace period.
-	 */
-	struct list_head tdp_mmu_roots;
-
-	/*
-	 * Protects accesses to the following fields when the MMU lock
-	 * is held in read mode:
-	 *  - tdp_mmu_roots (above)
-	 *  - the link field of kvm_mmu_page structs used by the TDP MMU
-	 *  - possible_nx_huge_pages;
-	 *  - the possible_nx_huge_page_link field of kvm_mmu_page structs used
-	 *    by the TDP MMU
-	 * It is acceptable, but not necessary, to acquire this lock when
-	 * the thread holds the MMU lock in write mode.
-	 */
-	spinlock_t tdp_mmu_pages_lock;
-	struct workqueue_struct *tdp_mmu_zap_wq;
-#endif /* CONFIG_X86_64 */
-
 	/*
 	 * If set, at least one shadow root has been allocated. This flag
 	 * is used as one input when determining whether certain memslot
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 24d1dbd0a1ec..b997f84c0ea7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -21,9 +21,9 @@ int kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 	if (!wq)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
-	spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
-	kvm->arch.tdp_mmu_zap_wq = wq;
+	INIT_LIST_HEAD(&kvm->tdp_mmu.roots);
+	spin_lock_init(&kvm->tdp_mmu.pages_lock);
+	kvm->tdp_mmu.zap_wq = wq;
 	return 1;
 }
 
@@ -42,10 +42,10 @@ static __always_inline bool kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm,
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 {
 	/* Also waits for any queued work items.  */
-	destroy_workqueue(kvm->arch.tdp_mmu_zap_wq);
+	destroy_workqueue(kvm->tdp_mmu.zap_wq);
 
-	WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));
-	WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
+	WARN_ON(atomic64_read(&kvm->tdp_mmu.pages));
+	WARN_ON(!list_empty(&kvm->tdp_mmu.roots));
 
 	/*
 	 * Ensure that all the outstanding RCU callbacks to free shadow pages
@@ -114,7 +114,7 @@ static void tdp_mmu_schedule_zap_root(struct kvm *kvm, struct kvm_mmu_page *root
 {
 	root->tdp_mmu_async_data = kvm;
 	INIT_WORK(&root->tdp_mmu_async_work, tdp_mmu_zap_root_work);
-	queue_work(kvm->arch.tdp_mmu_zap_wq, &root->tdp_mmu_async_work);
+	queue_work(kvm->tdp_mmu.zap_wq, &root->tdp_mmu_async_work);
 }
 
 static inline bool kvm_tdp_root_mark_invalid(struct kvm_mmu_page *page)
@@ -173,9 +173,9 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 		return;
 	}
 
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_lock(&kvm->tdp_mmu.pages_lock);
 	list_del_rcu(&root->link);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_unlock(&kvm->tdp_mmu.pages_lock);
 	call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
 }
 
@@ -198,11 +198,11 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 	rcu_read_lock();
 
 	if (prev_root)
-		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+		next_root = list_next_or_null_rcu(&kvm->tdp_mmu.roots,
 						  &prev_root->link,
 						  typeof(*prev_root), link);
 	else
-		next_root = list_first_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+		next_root = list_first_or_null_rcu(&kvm->tdp_mmu.roots,
 						   typeof(*next_root), link);
 
 	while (next_root) {
@@ -210,7 +210,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 		    kvm_tdp_mmu_get_root(next_root))
 			break;
 
-		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+		next_root = list_next_or_null_rcu(&kvm->tdp_mmu.roots,
 				&next_root->link, typeof(*next_root), link);
 	}
 
@@ -254,7 +254,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
  * is guaranteed to be stable.
  */
 #define for_each_tdp_mmu_root(_kvm, _root, _as_id)			\
-	list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link)	\
+	list_for_each_entry(_root, &_kvm->tdp_mmu.roots, link)	\
 		if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) &&	\
 		    _root->role.as_id != _as_id) {		\
 		} else
@@ -324,9 +324,9 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 
 	refcount_set(&root->root_refcount, 1);
 
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-	list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_lock(&kvm->tdp_mmu.pages_lock);
+	list_add_rcu(&root->link, &kvm->tdp_mmu.roots);
+	spin_unlock(&kvm->tdp_mmu.pages_lock);
 
 out:
 	return __pa(root->spt);
@@ -368,13 +368,13 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	kvm_account_pgtable_pages((void *)sp->spt, +1);
-	atomic64_inc(&kvm->arch.tdp_mmu_pages);
+	atomic64_inc(&kvm->tdp_mmu.pages);
 }
 
 static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	kvm_account_pgtable_pages((void *)sp->spt, -1);
-	atomic64_dec(&kvm->arch.tdp_mmu_pages);
+	atomic64_dec(&kvm->tdp_mmu.pages);
 }
 
 __weak void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
@@ -1010,7 +1010,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
  */
 void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
 {
-	flush_workqueue(kvm->arch.tdp_mmu_zap_wq);
+	flush_workqueue(kvm->tdp_mmu.zap_wq);
 }
 
 /*
@@ -1035,7 +1035,7 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm)
 	struct kvm_mmu_page *root;
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
-	list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
+	list_for_each_entry(root, &kvm->tdp_mmu.roots, link) {
 		if (!root->role.invalid &&
 		    !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) {
 			root->role.invalid = true;
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index 840d063c45b8..cc7b10f703e1 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -141,9 +141,9 @@ void tdp_mmu_arch_post_link_sp(struct kvm *kvm,
 	if (fault->req_level < sp->role.level)
 		return;
 
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_lock(&kvm->tdp_mmu.pages_lock);
 	track_possible_nx_huge_page(kvm, sp);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_unlock(&kvm->tdp_mmu.pages_lock);
 }
 
 void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
@@ -153,7 +153,7 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 		return;
 
 	if (shared)
-		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+		spin_lock(&kvm->tdp_mmu.pages_lock);
 	else
 		lockdep_assert_held_write(&kvm->mmu_lock);
 
@@ -161,7 +161,7 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 	untrack_possible_nx_huge_page(kvm, sp);
 
 	if (shared)
-		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+		spin_unlock(&kvm->tdp_mmu.pages_lock);
 }
 
 int tdp_mmu_max_mapping_level(struct kvm *kvm,
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 07c9962f9aea..8ccc48a1cd4c 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -136,4 +136,44 @@ enum {
 	RET_PF_SPURIOUS,
 };
 
+struct tdp_mmu {
+	/* The number of TDP MMU pages across all roots. */
+	atomic64_t pages;
+
+	/*
+	 * List of kvm_mmu_page structs being used as roots.
+	 * All kvm_mmu_page structs in the list should have
+	 * tdp_mmu_page set.
+	 *
+	 * For reads, this list is protected by:
+	 *	the MMU lock in read mode + RCU or
+	 *	the MMU lock in write mode
+	 *
+	 * For writes, this list is protected by:
+	 *	the MMU lock in read mode + the tdp_mmu_pages_lock or
+	 *	the MMU lock in write mode
+	 *
+	 * Roots will remain in the list until their tdp_mmu_root_count
+	 * drops to zero, at which point the thread that decremented the
+	 * count to zero should removed the root from the list and clean
+	 * it up, freeing the root after an RCU grace period.
+	 */
+	struct list_head roots;
+
+	/*
+	 * Protects accesses to the following fields when the MMU lock
+	 * is held in read mode:
+	 *  - roots (above)
+	 *  - the link field of kvm_mmu_page structs used by the TDP MMU
+	 *  - (x86-only) possible_nx_huge_pages;
+	 *  - (x86-only) the arch.possible_nx_huge_page_link field of
+	 *    kvm_mmu_page structs used by the TDP MMU
+	 * It is acceptable, but not necessary, to acquire this lock when
+	 * the thread holds the MMU lock in write mode.
+	 */
+	spinlock_t pages_lock;
+
+	struct workqueue_struct *zap_wq;
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 469ff4202a0d..242eaed55320 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -45,6 +45,10 @@
 #include <asm/kvm_host.h>
 #include <linux/kvm_dirty_ring.h>
 
+#ifdef CONFIG_HAVE_TDP_MMU
+#include <kvm/mmu_types.h>
+#endif
+
 #ifndef KVM_MAX_VCPU_IDS
 #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS
 #endif
@@ -797,6 +801,10 @@ struct kvm {
 	struct notifier_block pm_notifier;
 #endif
 	char stats_id[KVM_STATS_NAME_SIZE];
+
+#ifdef CONFIG_HAVE_TDP_MMU
+	struct tdp_mmu tdp_mmu;
+#endif
 };
 
 #define kvm_err(fmt, ...) \
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move VM-level TDP MMU state to struct kvm so it can be accessed by
common code in a future commit.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h | 39 --------------------------------
 arch/x86/kvm/mmu/tdp_mmu.c      | 40 ++++++++++++++++-----------------
 arch/x86/kvm/mmu/tdp_pgtable.c  |  8 +++----
 include/kvm/mmu_types.h         | 40 +++++++++++++++++++++++++++++++++
 include/linux/kvm_host.h        |  8 +++++++
 5 files changed, 72 insertions(+), 63 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9cf8f956bac3..95c731028452 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1272,45 +1272,6 @@ struct kvm_arch {
 	struct kvm_pmu_event_filter __rcu *pmu_event_filter;
 	struct task_struct *nx_huge_page_recovery_thread;
 
-#ifdef CONFIG_X86_64
-	/* The number of TDP MMU pages across all roots. */
-	atomic64_t tdp_mmu_pages;
-
-	/*
-	 * List of struct kvm_mmu_pages being used as roots.
-	 * All struct kvm_mmu_pages in the list should have
-	 * tdp_mmu_page set.
-	 *
-	 * For reads, this list is protected by:
-	 *	the MMU lock in read mode + RCU or
-	 *	the MMU lock in write mode
-	 *
-	 * For writes, this list is protected by:
-	 *	the MMU lock in read mode + the tdp_mmu_pages_lock or
-	 *	the MMU lock in write mode
-	 *
-	 * Roots will remain in the list until their tdp_mmu_root_count
-	 * drops to zero, at which point the thread that decremented the
-	 * count to zero should removed the root from the list and clean
-	 * it up, freeing the root after an RCU grace period.
-	 */
-	struct list_head tdp_mmu_roots;
-
-	/*
-	 * Protects accesses to the following fields when the MMU lock
-	 * is held in read mode:
-	 *  - tdp_mmu_roots (above)
-	 *  - the link field of kvm_mmu_page structs used by the TDP MMU
-	 *  - possible_nx_huge_pages;
-	 *  - the possible_nx_huge_page_link field of kvm_mmu_page structs used
-	 *    by the TDP MMU
-	 * It is acceptable, but not necessary, to acquire this lock when
-	 * the thread holds the MMU lock in write mode.
-	 */
-	spinlock_t tdp_mmu_pages_lock;
-	struct workqueue_struct *tdp_mmu_zap_wq;
-#endif /* CONFIG_X86_64 */
-
 	/*
 	 * If set, at least one shadow root has been allocated. This flag
 	 * is used as one input when determining whether certain memslot
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 24d1dbd0a1ec..b997f84c0ea7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -21,9 +21,9 @@ int kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 	if (!wq)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
-	spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
-	kvm->arch.tdp_mmu_zap_wq = wq;
+	INIT_LIST_HEAD(&kvm->tdp_mmu.roots);
+	spin_lock_init(&kvm->tdp_mmu.pages_lock);
+	kvm->tdp_mmu.zap_wq = wq;
 	return 1;
 }
 
@@ -42,10 +42,10 @@ static __always_inline bool kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm,
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 {
 	/* Also waits for any queued work items.  */
-	destroy_workqueue(kvm->arch.tdp_mmu_zap_wq);
+	destroy_workqueue(kvm->tdp_mmu.zap_wq);
 
-	WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));
-	WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
+	WARN_ON(atomic64_read(&kvm->tdp_mmu.pages));
+	WARN_ON(!list_empty(&kvm->tdp_mmu.roots));
 
 	/*
 	 * Ensure that all the outstanding RCU callbacks to free shadow pages
@@ -114,7 +114,7 @@ static void tdp_mmu_schedule_zap_root(struct kvm *kvm, struct kvm_mmu_page *root
 {
 	root->tdp_mmu_async_data = kvm;
 	INIT_WORK(&root->tdp_mmu_async_work, tdp_mmu_zap_root_work);
-	queue_work(kvm->arch.tdp_mmu_zap_wq, &root->tdp_mmu_async_work);
+	queue_work(kvm->tdp_mmu.zap_wq, &root->tdp_mmu_async_work);
 }
 
 static inline bool kvm_tdp_root_mark_invalid(struct kvm_mmu_page *page)
@@ -173,9 +173,9 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 		return;
 	}
 
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_lock(&kvm->tdp_mmu.pages_lock);
 	list_del_rcu(&root->link);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_unlock(&kvm->tdp_mmu.pages_lock);
 	call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
 }
 
@@ -198,11 +198,11 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 	rcu_read_lock();
 
 	if (prev_root)
-		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+		next_root = list_next_or_null_rcu(&kvm->tdp_mmu.roots,
 						  &prev_root->link,
 						  typeof(*prev_root), link);
 	else
-		next_root = list_first_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+		next_root = list_first_or_null_rcu(&kvm->tdp_mmu.roots,
 						   typeof(*next_root), link);
 
 	while (next_root) {
@@ -210,7 +210,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 		    kvm_tdp_mmu_get_root(next_root))
 			break;
 
-		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+		next_root = list_next_or_null_rcu(&kvm->tdp_mmu.roots,
 				&next_root->link, typeof(*next_root), link);
 	}
 
@@ -254,7 +254,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
  * is guaranteed to be stable.
  */
 #define for_each_tdp_mmu_root(_kvm, _root, _as_id)			\
-	list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link)	\
+	list_for_each_entry(_root, &_kvm->tdp_mmu.roots, link)	\
 		if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) &&	\
 		    _root->role.as_id != _as_id) {		\
 		} else
@@ -324,9 +324,9 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 
 	refcount_set(&root->root_refcount, 1);
 
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-	list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_lock(&kvm->tdp_mmu.pages_lock);
+	list_add_rcu(&root->link, &kvm->tdp_mmu.roots);
+	spin_unlock(&kvm->tdp_mmu.pages_lock);
 
 out:
 	return __pa(root->spt);
@@ -368,13 +368,13 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	kvm_account_pgtable_pages((void *)sp->spt, +1);
-	atomic64_inc(&kvm->arch.tdp_mmu_pages);
+	atomic64_inc(&kvm->tdp_mmu.pages);
 }
 
 static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	kvm_account_pgtable_pages((void *)sp->spt, -1);
-	atomic64_dec(&kvm->arch.tdp_mmu_pages);
+	atomic64_dec(&kvm->tdp_mmu.pages);
 }
 
 __weak void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
@@ -1010,7 +1010,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
  */
 void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
 {
-	flush_workqueue(kvm->arch.tdp_mmu_zap_wq);
+	flush_workqueue(kvm->tdp_mmu.zap_wq);
 }
 
 /*
@@ -1035,7 +1035,7 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm)
 	struct kvm_mmu_page *root;
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
-	list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
+	list_for_each_entry(root, &kvm->tdp_mmu.roots, link) {
 		if (!root->role.invalid &&
 		    !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) {
 			root->role.invalid = true;
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index 840d063c45b8..cc7b10f703e1 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -141,9 +141,9 @@ void tdp_mmu_arch_post_link_sp(struct kvm *kvm,
 	if (fault->req_level < sp->role.level)
 		return;
 
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_lock(&kvm->tdp_mmu.pages_lock);
 	track_possible_nx_huge_page(kvm, sp);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_unlock(&kvm->tdp_mmu.pages_lock);
 }
 
 void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
@@ -153,7 +153,7 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 		return;
 
 	if (shared)
-		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+		spin_lock(&kvm->tdp_mmu.pages_lock);
 	else
 		lockdep_assert_held_write(&kvm->mmu_lock);
 
@@ -161,7 +161,7 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 	untrack_possible_nx_huge_page(kvm, sp);
 
 	if (shared)
-		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+		spin_unlock(&kvm->tdp_mmu.pages_lock);
 }
 
 int tdp_mmu_max_mapping_level(struct kvm *kvm,
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 07c9962f9aea..8ccc48a1cd4c 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -136,4 +136,44 @@ enum {
 	RET_PF_SPURIOUS,
 };
 
+struct tdp_mmu {
+	/* The number of TDP MMU pages across all roots. */
+	atomic64_t pages;
+
+	/*
+	 * List of kvm_mmu_page structs being used as roots.
+	 * All kvm_mmu_page structs in the list should have
+	 * tdp_mmu_page set.
+	 *
+	 * For reads, this list is protected by:
+	 *	the MMU lock in read mode + RCU or
+	 *	the MMU lock in write mode
+	 *
+	 * For writes, this list is protected by:
+	 *	the MMU lock in read mode + the tdp_mmu_pages_lock or
+	 *	the MMU lock in write mode
+	 *
+	 * Roots will remain in the list until their tdp_mmu_root_count
+	 * drops to zero, at which point the thread that decremented the
+	 * count to zero should removed the root from the list and clean
+	 * it up, freeing the root after an RCU grace period.
+	 */
+	struct list_head roots;
+
+	/*
+	 * Protects accesses to the following fields when the MMU lock
+	 * is held in read mode:
+	 *  - roots (above)
+	 *  - the link field of kvm_mmu_page structs used by the TDP MMU
+	 *  - (x86-only) possible_nx_huge_pages;
+	 *  - (x86-only) the arch.possible_nx_huge_page_link field of
+	 *    kvm_mmu_page structs used by the TDP MMU
+	 * It is acceptable, but not necessary, to acquire this lock when
+	 * the thread holds the MMU lock in write mode.
+	 */
+	spinlock_t pages_lock;
+
+	struct workqueue_struct *zap_wq;
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 469ff4202a0d..242eaed55320 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -45,6 +45,10 @@
 #include <asm/kvm_host.h>
 #include <linux/kvm_dirty_ring.h>
 
+#ifdef CONFIG_HAVE_TDP_MMU
+#include <kvm/mmu_types.h>
+#endif
+
 #ifndef KVM_MAX_VCPU_IDS
 #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS
 #endif
@@ -797,6 +801,10 @@ struct kvm {
 	struct notifier_block pm_notifier;
 #endif
 	char stats_id[KVM_STATS_NAME_SIZE];
+
+#ifdef CONFIG_HAVE_TDP_MMU
+	struct tdp_mmu tdp_mmu;
+#endif
 };
 
 #define kvm_err(fmt, ...) \
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move VM-level TDP MMU state to struct kvm so it can be accessed by
common code in a future commit.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h | 39 --------------------------------
 arch/x86/kvm/mmu/tdp_mmu.c      | 40 ++++++++++++++++-----------------
 arch/x86/kvm/mmu/tdp_pgtable.c  |  8 +++----
 include/kvm/mmu_types.h         | 40 +++++++++++++++++++++++++++++++++
 include/linux/kvm_host.h        |  8 +++++++
 5 files changed, 72 insertions(+), 63 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9cf8f956bac3..95c731028452 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1272,45 +1272,6 @@ struct kvm_arch {
 	struct kvm_pmu_event_filter __rcu *pmu_event_filter;
 	struct task_struct *nx_huge_page_recovery_thread;
 
-#ifdef CONFIG_X86_64
-	/* The number of TDP MMU pages across all roots. */
-	atomic64_t tdp_mmu_pages;
-
-	/*
-	 * List of struct kvm_mmu_pages being used as roots.
-	 * All struct kvm_mmu_pages in the list should have
-	 * tdp_mmu_page set.
-	 *
-	 * For reads, this list is protected by:
-	 *	the MMU lock in read mode + RCU or
-	 *	the MMU lock in write mode
-	 *
-	 * For writes, this list is protected by:
-	 *	the MMU lock in read mode + the tdp_mmu_pages_lock or
-	 *	the MMU lock in write mode
-	 *
-	 * Roots will remain in the list until their tdp_mmu_root_count
-	 * drops to zero, at which point the thread that decremented the
-	 * count to zero should removed the root from the list and clean
-	 * it up, freeing the root after an RCU grace period.
-	 */
-	struct list_head tdp_mmu_roots;
-
-	/*
-	 * Protects accesses to the following fields when the MMU lock
-	 * is held in read mode:
-	 *  - tdp_mmu_roots (above)
-	 *  - the link field of kvm_mmu_page structs used by the TDP MMU
-	 *  - possible_nx_huge_pages;
-	 *  - the possible_nx_huge_page_link field of kvm_mmu_page structs used
-	 *    by the TDP MMU
-	 * It is acceptable, but not necessary, to acquire this lock when
-	 * the thread holds the MMU lock in write mode.
-	 */
-	spinlock_t tdp_mmu_pages_lock;
-	struct workqueue_struct *tdp_mmu_zap_wq;
-#endif /* CONFIG_X86_64 */
-
 	/*
 	 * If set, at least one shadow root has been allocated. This flag
 	 * is used as one input when determining whether certain memslot
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 24d1dbd0a1ec..b997f84c0ea7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -21,9 +21,9 @@ int kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 	if (!wq)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
-	spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
-	kvm->arch.tdp_mmu_zap_wq = wq;
+	INIT_LIST_HEAD(&kvm->tdp_mmu.roots);
+	spin_lock_init(&kvm->tdp_mmu.pages_lock);
+	kvm->tdp_mmu.zap_wq = wq;
 	return 1;
 }
 
@@ -42,10 +42,10 @@ static __always_inline bool kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm,
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 {
 	/* Also waits for any queued work items.  */
-	destroy_workqueue(kvm->arch.tdp_mmu_zap_wq);
+	destroy_workqueue(kvm->tdp_mmu.zap_wq);
 
-	WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));
-	WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
+	WARN_ON(atomic64_read(&kvm->tdp_mmu.pages));
+	WARN_ON(!list_empty(&kvm->tdp_mmu.roots));
 
 	/*
 	 * Ensure that all the outstanding RCU callbacks to free shadow pages
@@ -114,7 +114,7 @@ static void tdp_mmu_schedule_zap_root(struct kvm *kvm, struct kvm_mmu_page *root
 {
 	root->tdp_mmu_async_data = kvm;
 	INIT_WORK(&root->tdp_mmu_async_work, tdp_mmu_zap_root_work);
-	queue_work(kvm->arch.tdp_mmu_zap_wq, &root->tdp_mmu_async_work);
+	queue_work(kvm->tdp_mmu.zap_wq, &root->tdp_mmu_async_work);
 }
 
 static inline bool kvm_tdp_root_mark_invalid(struct kvm_mmu_page *page)
@@ -173,9 +173,9 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 		return;
 	}
 
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_lock(&kvm->tdp_mmu.pages_lock);
 	list_del_rcu(&root->link);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_unlock(&kvm->tdp_mmu.pages_lock);
 	call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
 }
 
@@ -198,11 +198,11 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 	rcu_read_lock();
 
 	if (prev_root)
-		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+		next_root = list_next_or_null_rcu(&kvm->tdp_mmu.roots,
 						  &prev_root->link,
 						  typeof(*prev_root), link);
 	else
-		next_root = list_first_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+		next_root = list_first_or_null_rcu(&kvm->tdp_mmu.roots,
 						   typeof(*next_root), link);
 
 	while (next_root) {
@@ -210,7 +210,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 		    kvm_tdp_mmu_get_root(next_root))
 			break;
 
-		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+		next_root = list_next_or_null_rcu(&kvm->tdp_mmu.roots,
 				&next_root->link, typeof(*next_root), link);
 	}
 
@@ -254,7 +254,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
  * is guaranteed to be stable.
  */
 #define for_each_tdp_mmu_root(_kvm, _root, _as_id)			\
-	list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link)	\
+	list_for_each_entry(_root, &_kvm->tdp_mmu.roots, link)	\
 		if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) &&	\
 		    _root->role.as_id != _as_id) {		\
 		} else
@@ -324,9 +324,9 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 
 	refcount_set(&root->root_refcount, 1);
 
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-	list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_lock(&kvm->tdp_mmu.pages_lock);
+	list_add_rcu(&root->link, &kvm->tdp_mmu.roots);
+	spin_unlock(&kvm->tdp_mmu.pages_lock);
 
 out:
 	return __pa(root->spt);
@@ -368,13 +368,13 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	kvm_account_pgtable_pages((void *)sp->spt, +1);
-	atomic64_inc(&kvm->arch.tdp_mmu_pages);
+	atomic64_inc(&kvm->tdp_mmu.pages);
 }
 
 static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	kvm_account_pgtable_pages((void *)sp->spt, -1);
-	atomic64_dec(&kvm->arch.tdp_mmu_pages);
+	atomic64_dec(&kvm->tdp_mmu.pages);
 }
 
 __weak void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
@@ -1010,7 +1010,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
  */
 void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
 {
-	flush_workqueue(kvm->arch.tdp_mmu_zap_wq);
+	flush_workqueue(kvm->tdp_mmu.zap_wq);
 }
 
 /*
@@ -1035,7 +1035,7 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm)
 	struct kvm_mmu_page *root;
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
-	list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
+	list_for_each_entry(root, &kvm->tdp_mmu.roots, link) {
 		if (!root->role.invalid &&
 		    !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) {
 			root->role.invalid = true;
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index 840d063c45b8..cc7b10f703e1 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -141,9 +141,9 @@ void tdp_mmu_arch_post_link_sp(struct kvm *kvm,
 	if (fault->req_level < sp->role.level)
 		return;
 
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_lock(&kvm->tdp_mmu.pages_lock);
 	track_possible_nx_huge_page(kvm, sp);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_unlock(&kvm->tdp_mmu.pages_lock);
 }
 
 void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
@@ -153,7 +153,7 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 		return;
 
 	if (shared)
-		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+		spin_lock(&kvm->tdp_mmu.pages_lock);
 	else
 		lockdep_assert_held_write(&kvm->mmu_lock);
 
@@ -161,7 +161,7 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 	untrack_possible_nx_huge_page(kvm, sp);
 
 	if (shared)
-		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+		spin_unlock(&kvm->tdp_mmu.pages_lock);
 }
 
 int tdp_mmu_max_mapping_level(struct kvm *kvm,
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 07c9962f9aea..8ccc48a1cd4c 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -136,4 +136,44 @@ enum {
 	RET_PF_SPURIOUS,
 };
 
+struct tdp_mmu {
+	/* The number of TDP MMU pages across all roots. */
+	atomic64_t pages;
+
+	/*
+	 * List of kvm_mmu_page structs being used as roots.
+	 * All kvm_mmu_page structs in the list should have
+	 * tdp_mmu_page set.
+	 *
+	 * For reads, this list is protected by:
+	 *	the MMU lock in read mode + RCU or
+	 *	the MMU lock in write mode
+	 *
+	 * For writes, this list is protected by:
+	 *	the MMU lock in read mode + the tdp_mmu_pages_lock or
+	 *	the MMU lock in write mode
+	 *
+	 * Roots will remain in the list until their tdp_mmu_root_count
+	 * drops to zero, at which point the thread that decremented the
+	 * count to zero should removed the root from the list and clean
+	 * it up, freeing the root after an RCU grace period.
+	 */
+	struct list_head roots;
+
+	/*
+	 * Protects accesses to the following fields when the MMU lock
+	 * is held in read mode:
+	 *  - roots (above)
+	 *  - the link field of kvm_mmu_page structs used by the TDP MMU
+	 *  - (x86-only) possible_nx_huge_pages;
+	 *  - (x86-only) the arch.possible_nx_huge_page_link field of
+	 *    kvm_mmu_page structs used by the TDP MMU
+	 * It is acceptable, but not necessary, to acquire this lock when
+	 * the thread holds the MMU lock in write mode.
+	 */
+	spinlock_t pages_lock;
+
+	struct workqueue_struct *zap_wq;
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 469ff4202a0d..242eaed55320 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -45,6 +45,10 @@
 #include <asm/kvm_host.h>
 #include <linux/kvm_dirty_ring.h>
 
+#ifdef CONFIG_HAVE_TDP_MMU
+#include <kvm/mmu_types.h>
+#endif
+
 #ifndef KVM_MAX_VCPU_IDS
 #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS
 #endif
@@ -797,6 +801,10 @@ struct kvm {
 	struct notifier_block pm_notifier;
 #endif
 	char stats_id[KVM_STATS_NAME_SIZE];
+
+#ifdef CONFIG_HAVE_TDP_MMU
+	struct tdp_mmu tdp_mmu;
+#endif
 };
 
 #define kvm_err(fmt, ...) \
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move VM-level TDP MMU state to struct kvm so it can be accessed by
common code in a future commit.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h | 39 --------------------------------
 arch/x86/kvm/mmu/tdp_mmu.c      | 40 ++++++++++++++++-----------------
 arch/x86/kvm/mmu/tdp_pgtable.c  |  8 +++----
 include/kvm/mmu_types.h         | 40 +++++++++++++++++++++++++++++++++
 include/linux/kvm_host.h        |  8 +++++++
 5 files changed, 72 insertions(+), 63 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9cf8f956bac3..95c731028452 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1272,45 +1272,6 @@ struct kvm_arch {
 	struct kvm_pmu_event_filter __rcu *pmu_event_filter;
 	struct task_struct *nx_huge_page_recovery_thread;
 
-#ifdef CONFIG_X86_64
-	/* The number of TDP MMU pages across all roots. */
-	atomic64_t tdp_mmu_pages;
-
-	/*
-	 * List of struct kvm_mmu_pages being used as roots.
-	 * All struct kvm_mmu_pages in the list should have
-	 * tdp_mmu_page set.
-	 *
-	 * For reads, this list is protected by:
-	 *	the MMU lock in read mode + RCU or
-	 *	the MMU lock in write mode
-	 *
-	 * For writes, this list is protected by:
-	 *	the MMU lock in read mode + the tdp_mmu_pages_lock or
-	 *	the MMU lock in write mode
-	 *
-	 * Roots will remain in the list until their tdp_mmu_root_count
-	 * drops to zero, at which point the thread that decremented the
-	 * count to zero should removed the root from the list and clean
-	 * it up, freeing the root after an RCU grace period.
-	 */
-	struct list_head tdp_mmu_roots;
-
-	/*
-	 * Protects accesses to the following fields when the MMU lock
-	 * is held in read mode:
-	 *  - tdp_mmu_roots (above)
-	 *  - the link field of kvm_mmu_page structs used by the TDP MMU
-	 *  - possible_nx_huge_pages;
-	 *  - the possible_nx_huge_page_link field of kvm_mmu_page structs used
-	 *    by the TDP MMU
-	 * It is acceptable, but not necessary, to acquire this lock when
-	 * the thread holds the MMU lock in write mode.
-	 */
-	spinlock_t tdp_mmu_pages_lock;
-	struct workqueue_struct *tdp_mmu_zap_wq;
-#endif /* CONFIG_X86_64 */
-
 	/*
 	 * If set, at least one shadow root has been allocated. This flag
 	 * is used as one input when determining whether certain memslot
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 24d1dbd0a1ec..b997f84c0ea7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -21,9 +21,9 @@ int kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 	if (!wq)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
-	spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
-	kvm->arch.tdp_mmu_zap_wq = wq;
+	INIT_LIST_HEAD(&kvm->tdp_mmu.roots);
+	spin_lock_init(&kvm->tdp_mmu.pages_lock);
+	kvm->tdp_mmu.zap_wq = wq;
 	return 1;
 }
 
@@ -42,10 +42,10 @@ static __always_inline bool kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm,
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 {
 	/* Also waits for any queued work items.  */
-	destroy_workqueue(kvm->arch.tdp_mmu_zap_wq);
+	destroy_workqueue(kvm->tdp_mmu.zap_wq);
 
-	WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));
-	WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
+	WARN_ON(atomic64_read(&kvm->tdp_mmu.pages));
+	WARN_ON(!list_empty(&kvm->tdp_mmu.roots));
 
 	/*
 	 * Ensure that all the outstanding RCU callbacks to free shadow pages
@@ -114,7 +114,7 @@ static void tdp_mmu_schedule_zap_root(struct kvm *kvm, struct kvm_mmu_page *root
 {
 	root->tdp_mmu_async_data = kvm;
 	INIT_WORK(&root->tdp_mmu_async_work, tdp_mmu_zap_root_work);
-	queue_work(kvm->arch.tdp_mmu_zap_wq, &root->tdp_mmu_async_work);
+	queue_work(kvm->tdp_mmu.zap_wq, &root->tdp_mmu_async_work);
 }
 
 static inline bool kvm_tdp_root_mark_invalid(struct kvm_mmu_page *page)
@@ -173,9 +173,9 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 		return;
 	}
 
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_lock(&kvm->tdp_mmu.pages_lock);
 	list_del_rcu(&root->link);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_unlock(&kvm->tdp_mmu.pages_lock);
 	call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
 }
 
@@ -198,11 +198,11 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 	rcu_read_lock();
 
 	if (prev_root)
-		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+		next_root = list_next_or_null_rcu(&kvm->tdp_mmu.roots,
 						  &prev_root->link,
 						  typeof(*prev_root), link);
 	else
-		next_root = list_first_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+		next_root = list_first_or_null_rcu(&kvm->tdp_mmu.roots,
 						   typeof(*next_root), link);
 
 	while (next_root) {
@@ -210,7 +210,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
 		    kvm_tdp_mmu_get_root(next_root))
 			break;
 
-		next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
+		next_root = list_next_or_null_rcu(&kvm->tdp_mmu.roots,
 				&next_root->link, typeof(*next_root), link);
 	}
 
@@ -254,7 +254,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
  * is guaranteed to be stable.
  */
 #define for_each_tdp_mmu_root(_kvm, _root, _as_id)			\
-	list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link)	\
+	list_for_each_entry(_root, &_kvm->tdp_mmu.roots, link)	\
 		if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) &&	\
 		    _root->role.as_id != _as_id) {		\
 		} else
@@ -324,9 +324,9 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 
 	refcount_set(&root->root_refcount, 1);
 
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-	list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_lock(&kvm->tdp_mmu.pages_lock);
+	list_add_rcu(&root->link, &kvm->tdp_mmu.roots);
+	spin_unlock(&kvm->tdp_mmu.pages_lock);
 
 out:
 	return __pa(root->spt);
@@ -368,13 +368,13 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	kvm_account_pgtable_pages((void *)sp->spt, +1);
-	atomic64_inc(&kvm->arch.tdp_mmu_pages);
+	atomic64_inc(&kvm->tdp_mmu.pages);
 }
 
 static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	kvm_account_pgtable_pages((void *)sp->spt, -1);
-	atomic64_dec(&kvm->arch.tdp_mmu_pages);
+	atomic64_dec(&kvm->tdp_mmu.pages);
 }
 
 __weak void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
@@ -1010,7 +1010,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
  */
 void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
 {
-	flush_workqueue(kvm->arch.tdp_mmu_zap_wq);
+	flush_workqueue(kvm->tdp_mmu.zap_wq);
 }
 
 /*
@@ -1035,7 +1035,7 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm)
 	struct kvm_mmu_page *root;
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
-	list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
+	list_for_each_entry(root, &kvm->tdp_mmu.roots, link) {
 		if (!root->role.invalid &&
 		    !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) {
 			root->role.invalid = true;
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index 840d063c45b8..cc7b10f703e1 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -141,9 +141,9 @@ void tdp_mmu_arch_post_link_sp(struct kvm *kvm,
 	if (fault->req_level < sp->role.level)
 		return;
 
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_lock(&kvm->tdp_mmu.pages_lock);
 	track_possible_nx_huge_page(kvm, sp);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	spin_unlock(&kvm->tdp_mmu.pages_lock);
 }
 
 void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
@@ -153,7 +153,7 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 		return;
 
 	if (shared)
-		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+		spin_lock(&kvm->tdp_mmu.pages_lock);
 	else
 		lockdep_assert_held_write(&kvm->mmu_lock);
 
@@ -161,7 +161,7 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 	untrack_possible_nx_huge_page(kvm, sp);
 
 	if (shared)
-		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+		spin_unlock(&kvm->tdp_mmu.pages_lock);
 }
 
 int tdp_mmu_max_mapping_level(struct kvm *kvm,
diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
index 07c9962f9aea..8ccc48a1cd4c 100644
--- a/include/kvm/mmu_types.h
+++ b/include/kvm/mmu_types.h
@@ -136,4 +136,44 @@ enum {
 	RET_PF_SPURIOUS,
 };
 
+struct tdp_mmu {
+	/* The number of TDP MMU pages across all roots. */
+	atomic64_t pages;
+
+	/*
+	 * List of kvm_mmu_page structs being used as roots.
+	 * All kvm_mmu_page structs in the list should have
+	 * tdp_mmu_page set.
+	 *
+	 * For reads, this list is protected by:
+	 *	the MMU lock in read mode + RCU or
+	 *	the MMU lock in write mode
+	 *
+	 * For writes, this list is protected by:
+	 *	the MMU lock in read mode + the tdp_mmu_pages_lock or
+	 *	the MMU lock in write mode
+	 *
+	 * Roots will remain in the list until their tdp_mmu_root_count
+	 * drops to zero, at which point the thread that decremented the
+	 * count to zero should removed the root from the list and clean
+	 * it up, freeing the root after an RCU grace period.
+	 */
+	struct list_head roots;
+
+	/*
+	 * Protects accesses to the following fields when the MMU lock
+	 * is held in read mode:
+	 *  - roots (above)
+	 *  - the link field of kvm_mmu_page structs used by the TDP MMU
+	 *  - (x86-only) possible_nx_huge_pages;
+	 *  - (x86-only) the arch.possible_nx_huge_page_link field of
+	 *    kvm_mmu_page structs used by the TDP MMU
+	 * It is acceptable, but not necessary, to acquire this lock when
+	 * the thread holds the MMU lock in write mode.
+	 */
+	spinlock_t pages_lock;
+
+	struct workqueue_struct *zap_wq;
+};
+
 #endif /* !__KVM_MMU_TYPES_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 469ff4202a0d..242eaed55320 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -45,6 +45,10 @@
 #include <asm/kvm_host.h>
 #include <linux/kvm_dirty_ring.h>
 
+#ifdef CONFIG_HAVE_TDP_MMU
+#include <kvm/mmu_types.h>
+#endif
+
 #ifndef KVM_MAX_VCPU_IDS
 #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS
 #endif
@@ -797,6 +801,10 @@ struct kvm {
 	struct notifier_block pm_notifier;
 #endif
 	char stats_id[KVM_STATS_NAME_SIZE];
+
+#ifdef CONFIG_HAVE_TDP_MMU
+	struct tdp_mmu tdp_mmu;
+#endif
 };
 
 #define kvm_err(fmt, ...) \
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 24/37] KVM: x86/mmu: Move kvm_mmu_hugepage_adjust() up to fault handler
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move the call to kvm_mmu_hugepage_adjust() up to the fault handler
rather than calling it from kvm_tdp_mmu_map(). Also make the same change
to direct_map() for consistency. This reduces the TDP MMU's dependency
on an x86-specific function, so that it can be moved into common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 6 ++++--
 arch/x86/kvm/mmu/tdp_mmu.c | 2 --
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0593d4a60139..9307608ae975 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3151,8 +3151,6 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	int ret;
 	gfn_t base_gfn = fault->gfn;
 
-	kvm_mmu_hugepage_adjust(vcpu, fault);
-
 	trace_kvm_mmu_spte_requested(fault);
 	for_each_shadow_entry(vcpu, fault->addr, it) {
 		/*
@@ -4330,6 +4328,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (r)
 		goto out_unlock;
 
+	kvm_mmu_hugepage_adjust(vcpu, fault);
+
 	r = direct_map(vcpu, fault);
 
 out_unlock:
@@ -4408,6 +4408,8 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
 	if (is_page_fault_stale(vcpu, fault))
 		goto out_unlock;
 
+	kvm_mmu_hugepage_adjust(vcpu, fault);
+
 	r = kvm_tdp_mmu_map(vcpu, fault);
 
 out_unlock:
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index b997f84c0ea7..e6708829714c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1157,8 +1157,6 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	struct kvm_mmu_page *sp;
 	int ret = RET_PF_RETRY;
 
-	kvm_mmu_hugepage_adjust(vcpu, fault);
-
 	trace_kvm_mmu_spte_requested(fault);
 
 	rcu_read_lock();
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 24/37] KVM: x86/mmu: Move kvm_mmu_hugepage_adjust() up to fault handler
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move the call to kvm_mmu_hugepage_adjust() up to the fault handler
rather than calling it from kvm_tdp_mmu_map(). Also make the same change
to direct_map() for consistency. This reduces the TDP MMU's dependency
on an x86-specific function, so that it can be moved into common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 6 ++++--
 arch/x86/kvm/mmu/tdp_mmu.c | 2 --
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0593d4a60139..9307608ae975 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3151,8 +3151,6 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	int ret;
 	gfn_t base_gfn = fault->gfn;
 
-	kvm_mmu_hugepage_adjust(vcpu, fault);
-
 	trace_kvm_mmu_spte_requested(fault);
 	for_each_shadow_entry(vcpu, fault->addr, it) {
 		/*
@@ -4330,6 +4328,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (r)
 		goto out_unlock;
 
+	kvm_mmu_hugepage_adjust(vcpu, fault);
+
 	r = direct_map(vcpu, fault);
 
 out_unlock:
@@ -4408,6 +4408,8 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
 	if (is_page_fault_stale(vcpu, fault))
 		goto out_unlock;
 
+	kvm_mmu_hugepage_adjust(vcpu, fault);
+
 	r = kvm_tdp_mmu_map(vcpu, fault);
 
 out_unlock:
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index b997f84c0ea7..e6708829714c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1157,8 +1157,6 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	struct kvm_mmu_page *sp;
 	int ret = RET_PF_RETRY;
 
-	kvm_mmu_hugepage_adjust(vcpu, fault);
-
 	trace_kvm_mmu_spte_requested(fault);
 
 	rcu_read_lock();
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 24/37] KVM: x86/mmu: Move kvm_mmu_hugepage_adjust() up to fault handler
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move the call to kvm_mmu_hugepage_adjust() up to the fault handler
rather than calling it from kvm_tdp_mmu_map(). Also make the same change
to direct_map() for consistency. This reduces the TDP MMU's dependency
on an x86-specific function, so that it can be moved into common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 6 ++++--
 arch/x86/kvm/mmu/tdp_mmu.c | 2 --
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0593d4a60139..9307608ae975 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3151,8 +3151,6 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	int ret;
 	gfn_t base_gfn = fault->gfn;
 
-	kvm_mmu_hugepage_adjust(vcpu, fault);
-
 	trace_kvm_mmu_spte_requested(fault);
 	for_each_shadow_entry(vcpu, fault->addr, it) {
 		/*
@@ -4330,6 +4328,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (r)
 		goto out_unlock;
 
+	kvm_mmu_hugepage_adjust(vcpu, fault);
+
 	r = direct_map(vcpu, fault);
 
 out_unlock:
@@ -4408,6 +4408,8 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
 	if (is_page_fault_stale(vcpu, fault))
 		goto out_unlock;
 
+	kvm_mmu_hugepage_adjust(vcpu, fault);
+
 	r = kvm_tdp_mmu_map(vcpu, fault);
 
 out_unlock:
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index b997f84c0ea7..e6708829714c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1157,8 +1157,6 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	struct kvm_mmu_page *sp;
 	int ret = RET_PF_RETRY;
 
-	kvm_mmu_hugepage_adjust(vcpu, fault);
-
 	trace_kvm_mmu_spte_requested(fault);
 
 	rcu_read_lock();
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 24/37] KVM: x86/mmu: Move kvm_mmu_hugepage_adjust() up to fault handler
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move the call to kvm_mmu_hugepage_adjust() up to the fault handler
rather than calling it from kvm_tdp_mmu_map(). Also make the same change
to direct_map() for consistency. This reduces the TDP MMU's dependency
on an x86-specific function, so that it can be moved into common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 6 ++++--
 arch/x86/kvm/mmu/tdp_mmu.c | 2 --
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0593d4a60139..9307608ae975 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3151,8 +3151,6 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	int ret;
 	gfn_t base_gfn = fault->gfn;
 
-	kvm_mmu_hugepage_adjust(vcpu, fault);
-
 	trace_kvm_mmu_spte_requested(fault);
 	for_each_shadow_entry(vcpu, fault->addr, it) {
 		/*
@@ -4330,6 +4328,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (r)
 		goto out_unlock;
 
+	kvm_mmu_hugepage_adjust(vcpu, fault);
+
 	r = direct_map(vcpu, fault);
 
 out_unlock:
@@ -4408,6 +4408,8 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
 	if (is_page_fault_stale(vcpu, fault))
 		goto out_unlock;
 
+	kvm_mmu_hugepage_adjust(vcpu, fault);
+
 	r = kvm_tdp_mmu_map(vcpu, fault);
 
 out_unlock:
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index b997f84c0ea7..e6708829714c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1157,8 +1157,6 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	struct kvm_mmu_page *sp;
 	int ret = RET_PF_RETRY;
 
-	kvm_mmu_hugepage_adjust(vcpu, fault);
-
 	trace_kvm_mmu_spte_requested(fault);
 
 	rcu_read_lock();
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 25/37] KVM: x86/mmu: Pass root role to kvm_tdp_mmu_get_vcpu_root_hpa()
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Pass the root role from the caller rather than grabbing it from
vcpu->arch.mmu. This will enable the TDP MMU to be moved to common code
in a future commit by removing a dependency on vcpu->arch.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 2 +-
 arch/x86/kvm/mmu/tdp_mmu.c | 4 ++--
 arch/x86/kvm/mmu/tdp_mmu.h | 3 ++-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9307608ae975..aea7df3c2dcb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3612,7 +3612,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 
 	if (tdp_mmu_enabled) {
-		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
+		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu, mmu->root_role);
 		mmu->root.hpa = root;
 	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
 		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index e6708829714c..c5d1c9010d21 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -301,9 +301,9 @@ static void tdp_mmu_init_child_sp(struct kvm_mmu_page *child_sp,
 	tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role);
 }
 
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu,
+				    union kvm_mmu_page_role role)
 {
-	union kvm_mmu_page_role role = vcpu->arch.mmu->root_role;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_page *root;
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index e6a929089715..897608be7f75 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -10,7 +10,8 @@
 int kvm_mmu_init_tdp_mmu(struct kvm *kvm);
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
 
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu,
+				    union kvm_mmu_page_role role);
 
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root)
 {
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 25/37] KVM: x86/mmu: Pass root role to kvm_tdp_mmu_get_vcpu_root_hpa()
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Pass the root role from the caller rather than grabbing it from
vcpu->arch.mmu. This will enable the TDP MMU to be moved to common code
in a future commit by removing a dependency on vcpu->arch.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 2 +-
 arch/x86/kvm/mmu/tdp_mmu.c | 4 ++--
 arch/x86/kvm/mmu/tdp_mmu.h | 3 ++-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9307608ae975..aea7df3c2dcb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3612,7 +3612,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 
 	if (tdp_mmu_enabled) {
-		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
+		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu, mmu->root_role);
 		mmu->root.hpa = root;
 	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
 		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index e6708829714c..c5d1c9010d21 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -301,9 +301,9 @@ static void tdp_mmu_init_child_sp(struct kvm_mmu_page *child_sp,
 	tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role);
 }
 
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu,
+				    union kvm_mmu_page_role role)
 {
-	union kvm_mmu_page_role role = vcpu->arch.mmu->root_role;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_page *root;
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index e6a929089715..897608be7f75 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -10,7 +10,8 @@
 int kvm_mmu_init_tdp_mmu(struct kvm *kvm);
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
 
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu,
+				    union kvm_mmu_page_role role);
 
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root)
 {
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 25/37] KVM: x86/mmu: Pass root role to kvm_tdp_mmu_get_vcpu_root_hpa()
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Pass the root role from the caller rather than grabbing it from
vcpu->arch.mmu. This will enable the TDP MMU to be moved to common code
in a future commit by removing a dependency on vcpu->arch.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 2 +-
 arch/x86/kvm/mmu/tdp_mmu.c | 4 ++--
 arch/x86/kvm/mmu/tdp_mmu.h | 3 ++-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9307608ae975..aea7df3c2dcb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3612,7 +3612,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 
 	if (tdp_mmu_enabled) {
-		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
+		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu, mmu->root_role);
 		mmu->root.hpa = root;
 	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
 		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index e6708829714c..c5d1c9010d21 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -301,9 +301,9 @@ static void tdp_mmu_init_child_sp(struct kvm_mmu_page *child_sp,
 	tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role);
 }
 
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu,
+				    union kvm_mmu_page_role role)
 {
-	union kvm_mmu_page_role role = vcpu->arch.mmu->root_role;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_page *root;
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index e6a929089715..897608be7f75 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -10,7 +10,8 @@
 int kvm_mmu_init_tdp_mmu(struct kvm *kvm);
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
 
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu,
+				    union kvm_mmu_page_role role);
 
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root)
 {
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 25/37] KVM: x86/mmu: Pass root role to kvm_tdp_mmu_get_vcpu_root_hpa()
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Pass the root role from the caller rather than grabbing it from
vcpu->arch.mmu. This will enable the TDP MMU to be moved to common code
in a future commit by removing a dependency on vcpu->arch.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 2 +-
 arch/x86/kvm/mmu/tdp_mmu.c | 4 ++--
 arch/x86/kvm/mmu/tdp_mmu.h | 3 ++-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9307608ae975..aea7df3c2dcb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3612,7 +3612,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 
 	if (tdp_mmu_enabled) {
-		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
+		root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu, mmu->root_role);
 		mmu->root.hpa = root;
 	} else if (shadow_root_level >= PT64_ROOT_4LEVEL) {
 		root = mmu_alloc_root(vcpu, 0, 0, shadow_root_level);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index e6708829714c..c5d1c9010d21 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -301,9 +301,9 @@ static void tdp_mmu_init_child_sp(struct kvm_mmu_page *child_sp,
 	tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role);
 }
 
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu,
+				    union kvm_mmu_page_role role)
 {
-	union kvm_mmu_page_role role = vcpu->arch.mmu->root_role;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_page *root;
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index e6a929089715..897608be7f75 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -10,7 +10,8 @@
 int kvm_mmu_init_tdp_mmu(struct kvm *kvm);
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
 
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu,
+				    union kvm_mmu_page_role role);
 
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root)
 {
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 26/37] KVM: Move page table cache to struct kvm_vcpu
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Add a kvm_mmu_memory_cache to struct kvm_vcpu for allocating pages for
page tables, replacing existing architecture-specific page table caches.

Purposely do not make management of this cache architecture-neutral
though to reduce churn and since not all architectures configure it the
same way (MIPS does not set __GFP_ZERO.)

This eliminates a dependency of the TDP MMU on an architecture-specific
field, which will be used in a future commit to move the TDP MMU to
common code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 3 ---
 arch/arm64/kvm/arm.c              | 4 ++--
 arch/arm64/kvm/mmu.c              | 2 +-
 arch/mips/include/asm/kvm_host.h  | 3 ---
 arch/mips/kvm/mmu.c               | 4 ++--
 arch/riscv/include/asm/kvm_host.h | 3 ---
 arch/riscv/kvm/mmu.c              | 2 +-
 arch/riscv/kvm/vcpu.c             | 4 ++--
 arch/x86/include/asm/kvm_host.h   | 1 -
 arch/x86/kvm/mmu/mmu.c            | 8 ++++----
 arch/x86/kvm/mmu/tdp_mmu.c        | 2 +-
 include/linux/kvm_host.h          | 5 +++++
 12 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 001c8abe87fc..da519d6c09a5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -473,9 +473,6 @@ struct kvm_vcpu_arch {
 	/* vcpu power state */
 	struct kvm_mp_state mp_state;
 
-	/* Cache some mmu pages needed inside spinlock regions */
-	struct kvm_mmu_memory_cache mmu_page_cache;
-
 	/* Target CPU and feature flags */
 	int target;
 	DECLARE_BITMAP(features, KVM_VCPU_MAX_FEATURES);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9c5573bc4614..0e0d4c4f79a2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -340,7 +340,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.target = -1;
 	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
 
-	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 
 	/*
 	 * Default value for the FP state, will be overloaded at load
@@ -375,7 +375,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (vcpu_has_run_once(vcpu) && unlikely(!irqchip_in_kernel(vcpu->kvm)))
 		static_branch_dec(&userspace_irqchip_in_use);
 
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 	kvm_timer_vcpu_terminate(vcpu);
 	kvm_pmu_vcpu_destroy(vcpu);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 31d7fa4c7c14..d431c5cdb26a 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1196,7 +1196,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	bool device = false;
 	unsigned long mmu_seq;
 	struct kvm *kvm = vcpu->kvm;
-	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+	struct kvm_mmu_memory_cache *memcache = &vcpu->mmu_page_table_cache;
 	struct vm_area_struct *vma;
 	short vma_shift;
 	gfn_t gfn;
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 5cedb28e8a40..b7f276331583 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -344,9 +344,6 @@ struct kvm_vcpu_arch {
 	/* Bitmask of pending exceptions to be cleared */
 	unsigned long pending_exceptions_clr;
 
-	/* Cache some mmu pages needed inside spinlock regions */
-	struct kvm_mmu_memory_cache mmu_page_cache;
-
 	/* vcpu's vzguestid is different on each host cpu in an smp system */
 	u32 vzguestid[NR_CPUS];
 
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 74cd64a24d05..638f728d0bbe 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -27,7 +27,7 @@
 
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 }
 
 /**
@@ -589,7 +589,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 			     pte_t *out_entry, pte_t *out_buddy)
 {
 	struct kvm *kvm = vcpu->kvm;
-	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+	struct kvm_mmu_memory_cache *memcache = &vcpu->mmu_page_table_cache;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	int srcu_idx, err;
 	kvm_pfn_t pfn;
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index dbbf43d52623..82e5d80347cc 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -219,9 +219,6 @@ struct kvm_vcpu_arch {
 	/* SBI context */
 	struct kvm_sbi_context sbi_context;
 
-	/* Cache pages needed to program page tables with spinlock held */
-	struct kvm_mmu_memory_cache mmu_page_cache;
-
 	/* VCPU power-off state */
 	bool power_off;
 
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 3620ecac2fa1..a8281a65cb3d 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -625,7 +625,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	struct vm_area_struct *vma;
 	struct kvm *kvm = vcpu->kvm;
-	struct kvm_mmu_memory_cache *pcache = &vcpu->arch.mmu_page_cache;
+	struct kvm_mmu_memory_cache *pcache = &vcpu->mmu_page_table_cache;
 	bool logging = (memslot->dirty_bitmap &&
 			!(memslot->flags & KVM_MEM_READONLY)) ? true : false;
 	unsigned long vma_pagesize, mmu_seq;
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 71ebbc4821f0..9a1001ada936 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -160,7 +160,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	/* Mark this VCPU never ran */
 	vcpu->arch.ran_atleast_once = false;
-	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 	bitmap_zero(vcpu->arch.isa, RISCV_ISA_EXT_MAX);
 
 	/* Setup ISA features available to VCPU */
@@ -211,7 +211,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	kvm_riscv_vcpu_timer_deinit(vcpu);
 
 	/* Free unused pages pre-allocated for G-stage page table mappings */
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 }
 
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 95c731028452..8cac8ec29324 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -714,7 +714,6 @@ struct kvm_vcpu_arch {
 	struct kvm_mmu *walk_mmu;
 
 	struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
-	struct kvm_mmu_memory_cache mmu_shadow_page_cache;
 	struct kvm_mmu_memory_cache mmu_shadowed_info_cache;
 	struct kvm_mmu_memory_cache mmu_page_header_cache;
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index aea7df3c2dcb..a845e9141ad4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -664,7 +664,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 				       1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
 	if (r)
 		return r;
-	r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
+	r = kvm_mmu_topup_memory_cache(&vcpu->mmu_page_table_cache,
 				       PT64_ROOT_MAX_LEVEL);
 	if (r)
 		return r;
@@ -681,7 +681,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
@@ -2218,7 +2218,7 @@ static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu,
 {
 	struct shadow_page_caches caches = {
 		.page_header_cache = &vcpu->arch.mmu_page_header_cache,
-		.shadow_page_cache = &vcpu->arch.mmu_shadow_page_cache,
+		.shadow_page_cache = &vcpu->mmu_page_table_cache,
 		.shadowed_info_cache = &vcpu->arch.mmu_shadowed_info_cache,
 	};
 
@@ -5920,7 +5920,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
 	vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;
 
-	vcpu->arch.mmu_shadow_page_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 
 	vcpu->arch.mmu = &vcpu->arch.root_mmu;
 	vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index c5d1c9010d21..922815407b7e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -264,7 +264,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 	struct kvm_mmu_page *sp;
 
 	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
-	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
+	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->mmu_page_table_cache);
 
 	return sp;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 242eaed55320..0a9baa493760 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -387,6 +387,11 @@ struct kvm_vcpu {
 	 */
 	struct kvm_memory_slot *last_used_slot;
 	u64 last_used_slot_gen;
+
+#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
+	/* Cache used to allocate pages for use as page tables. */
+	struct kvm_mmu_memory_cache mmu_page_table_cache;
+#endif
 };
 
 /*
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 26/37] KVM: Move page table cache to struct kvm_vcpu
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Add a kvm_mmu_memory_cache to struct kvm_vcpu for allocating pages for
page tables, replacing existing architecture-specific page table caches.

Purposely do not make management of this cache architecture-neutral
though to reduce churn and since not all architectures configure it the
same way (MIPS does not set __GFP_ZERO.)

This eliminates a dependency of the TDP MMU on an architecture-specific
field, which will be used in a future commit to move the TDP MMU to
common code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 3 ---
 arch/arm64/kvm/arm.c              | 4 ++--
 arch/arm64/kvm/mmu.c              | 2 +-
 arch/mips/include/asm/kvm_host.h  | 3 ---
 arch/mips/kvm/mmu.c               | 4 ++--
 arch/riscv/include/asm/kvm_host.h | 3 ---
 arch/riscv/kvm/mmu.c              | 2 +-
 arch/riscv/kvm/vcpu.c             | 4 ++--
 arch/x86/include/asm/kvm_host.h   | 1 -
 arch/x86/kvm/mmu/mmu.c            | 8 ++++----
 arch/x86/kvm/mmu/tdp_mmu.c        | 2 +-
 include/linux/kvm_host.h          | 5 +++++
 12 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 001c8abe87fc..da519d6c09a5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -473,9 +473,6 @@ struct kvm_vcpu_arch {
 	/* vcpu power state */
 	struct kvm_mp_state mp_state;
 
-	/* Cache some mmu pages needed inside spinlock regions */
-	struct kvm_mmu_memory_cache mmu_page_cache;
-
 	/* Target CPU and feature flags */
 	int target;
 	DECLARE_BITMAP(features, KVM_VCPU_MAX_FEATURES);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9c5573bc4614..0e0d4c4f79a2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -340,7 +340,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.target = -1;
 	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
 
-	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 
 	/*
 	 * Default value for the FP state, will be overloaded at load
@@ -375,7 +375,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (vcpu_has_run_once(vcpu) && unlikely(!irqchip_in_kernel(vcpu->kvm)))
 		static_branch_dec(&userspace_irqchip_in_use);
 
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 	kvm_timer_vcpu_terminate(vcpu);
 	kvm_pmu_vcpu_destroy(vcpu);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 31d7fa4c7c14..d431c5cdb26a 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1196,7 +1196,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	bool device = false;
 	unsigned long mmu_seq;
 	struct kvm *kvm = vcpu->kvm;
-	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+	struct kvm_mmu_memory_cache *memcache = &vcpu->mmu_page_table_cache;
 	struct vm_area_struct *vma;
 	short vma_shift;
 	gfn_t gfn;
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 5cedb28e8a40..b7f276331583 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -344,9 +344,6 @@ struct kvm_vcpu_arch {
 	/* Bitmask of pending exceptions to be cleared */
 	unsigned long pending_exceptions_clr;
 
-	/* Cache some mmu pages needed inside spinlock regions */
-	struct kvm_mmu_memory_cache mmu_page_cache;
-
 	/* vcpu's vzguestid is different on each host cpu in an smp system */
 	u32 vzguestid[NR_CPUS];
 
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 74cd64a24d05..638f728d0bbe 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -27,7 +27,7 @@
 
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 }
 
 /**
@@ -589,7 +589,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 			     pte_t *out_entry, pte_t *out_buddy)
 {
 	struct kvm *kvm = vcpu->kvm;
-	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+	struct kvm_mmu_memory_cache *memcache = &vcpu->mmu_page_table_cache;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	int srcu_idx, err;
 	kvm_pfn_t pfn;
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index dbbf43d52623..82e5d80347cc 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -219,9 +219,6 @@ struct kvm_vcpu_arch {
 	/* SBI context */
 	struct kvm_sbi_context sbi_context;
 
-	/* Cache pages needed to program page tables with spinlock held */
-	struct kvm_mmu_memory_cache mmu_page_cache;
-
 	/* VCPU power-off state */
 	bool power_off;
 
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 3620ecac2fa1..a8281a65cb3d 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -625,7 +625,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	struct vm_area_struct *vma;
 	struct kvm *kvm = vcpu->kvm;
-	struct kvm_mmu_memory_cache *pcache = &vcpu->arch.mmu_page_cache;
+	struct kvm_mmu_memory_cache *pcache = &vcpu->mmu_page_table_cache;
 	bool logging = (memslot->dirty_bitmap &&
 			!(memslot->flags & KVM_MEM_READONLY)) ? true : false;
 	unsigned long vma_pagesize, mmu_seq;
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 71ebbc4821f0..9a1001ada936 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -160,7 +160,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	/* Mark this VCPU never ran */
 	vcpu->arch.ran_atleast_once = false;
-	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 	bitmap_zero(vcpu->arch.isa, RISCV_ISA_EXT_MAX);
 
 	/* Setup ISA features available to VCPU */
@@ -211,7 +211,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	kvm_riscv_vcpu_timer_deinit(vcpu);
 
 	/* Free unused pages pre-allocated for G-stage page table mappings */
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 }
 
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 95c731028452..8cac8ec29324 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -714,7 +714,6 @@ struct kvm_vcpu_arch {
 	struct kvm_mmu *walk_mmu;
 
 	struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
-	struct kvm_mmu_memory_cache mmu_shadow_page_cache;
 	struct kvm_mmu_memory_cache mmu_shadowed_info_cache;
 	struct kvm_mmu_memory_cache mmu_page_header_cache;
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index aea7df3c2dcb..a845e9141ad4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -664,7 +664,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 				       1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
 	if (r)
 		return r;
-	r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
+	r = kvm_mmu_topup_memory_cache(&vcpu->mmu_page_table_cache,
 				       PT64_ROOT_MAX_LEVEL);
 	if (r)
 		return r;
@@ -681,7 +681,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
@@ -2218,7 +2218,7 @@ static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu,
 {
 	struct shadow_page_caches caches = {
 		.page_header_cache = &vcpu->arch.mmu_page_header_cache,
-		.shadow_page_cache = &vcpu->arch.mmu_shadow_page_cache,
+		.shadow_page_cache = &vcpu->mmu_page_table_cache,
 		.shadowed_info_cache = &vcpu->arch.mmu_shadowed_info_cache,
 	};
 
@@ -5920,7 +5920,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
 	vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;
 
-	vcpu->arch.mmu_shadow_page_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 
 	vcpu->arch.mmu = &vcpu->arch.root_mmu;
 	vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index c5d1c9010d21..922815407b7e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -264,7 +264,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 	struct kvm_mmu_page *sp;
 
 	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
-	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
+	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->mmu_page_table_cache);
 
 	return sp;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 242eaed55320..0a9baa493760 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -387,6 +387,11 @@ struct kvm_vcpu {
 	 */
 	struct kvm_memory_slot *last_used_slot;
 	u64 last_used_slot_gen;
+
+#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
+	/* Cache used to allocate pages for use as page tables. */
+	struct kvm_mmu_memory_cache mmu_page_table_cache;
+#endif
 };
 
 /*
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 26/37] KVM: Move page table cache to struct kvm_vcpu
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Add a kvm_mmu_memory_cache to struct kvm_vcpu for allocating pages for
page tables, replacing existing architecture-specific page table caches.

Purposely do not make management of this cache architecture-neutral
though to reduce churn and since not all architectures configure it the
same way (MIPS does not set __GFP_ZERO.)

This eliminates a dependency of the TDP MMU on an architecture-specific
field, which will be used in a future commit to move the TDP MMU to
common code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 3 ---
 arch/arm64/kvm/arm.c              | 4 ++--
 arch/arm64/kvm/mmu.c              | 2 +-
 arch/mips/include/asm/kvm_host.h  | 3 ---
 arch/mips/kvm/mmu.c               | 4 ++--
 arch/riscv/include/asm/kvm_host.h | 3 ---
 arch/riscv/kvm/mmu.c              | 2 +-
 arch/riscv/kvm/vcpu.c             | 4 ++--
 arch/x86/include/asm/kvm_host.h   | 1 -
 arch/x86/kvm/mmu/mmu.c            | 8 ++++----
 arch/x86/kvm/mmu/tdp_mmu.c        | 2 +-
 include/linux/kvm_host.h          | 5 +++++
 12 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 001c8abe87fc..da519d6c09a5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -473,9 +473,6 @@ struct kvm_vcpu_arch {
 	/* vcpu power state */
 	struct kvm_mp_state mp_state;
 
-	/* Cache some mmu pages needed inside spinlock regions */
-	struct kvm_mmu_memory_cache mmu_page_cache;
-
 	/* Target CPU and feature flags */
 	int target;
 	DECLARE_BITMAP(features, KVM_VCPU_MAX_FEATURES);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9c5573bc4614..0e0d4c4f79a2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -340,7 +340,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.target = -1;
 	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
 
-	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 
 	/*
 	 * Default value for the FP state, will be overloaded at load
@@ -375,7 +375,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (vcpu_has_run_once(vcpu) && unlikely(!irqchip_in_kernel(vcpu->kvm)))
 		static_branch_dec(&userspace_irqchip_in_use);
 
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 	kvm_timer_vcpu_terminate(vcpu);
 	kvm_pmu_vcpu_destroy(vcpu);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 31d7fa4c7c14..d431c5cdb26a 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1196,7 +1196,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	bool device = false;
 	unsigned long mmu_seq;
 	struct kvm *kvm = vcpu->kvm;
-	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+	struct kvm_mmu_memory_cache *memcache = &vcpu->mmu_page_table_cache;
 	struct vm_area_struct *vma;
 	short vma_shift;
 	gfn_t gfn;
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 5cedb28e8a40..b7f276331583 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -344,9 +344,6 @@ struct kvm_vcpu_arch {
 	/* Bitmask of pending exceptions to be cleared */
 	unsigned long pending_exceptions_clr;
 
-	/* Cache some mmu pages needed inside spinlock regions */
-	struct kvm_mmu_memory_cache mmu_page_cache;
-
 	/* vcpu's vzguestid is different on each host cpu in an smp system */
 	u32 vzguestid[NR_CPUS];
 
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 74cd64a24d05..638f728d0bbe 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -27,7 +27,7 @@
 
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 }
 
 /**
@@ -589,7 +589,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 			     pte_t *out_entry, pte_t *out_buddy)
 {
 	struct kvm *kvm = vcpu->kvm;
-	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+	struct kvm_mmu_memory_cache *memcache = &vcpu->mmu_page_table_cache;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	int srcu_idx, err;
 	kvm_pfn_t pfn;
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index dbbf43d52623..82e5d80347cc 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -219,9 +219,6 @@ struct kvm_vcpu_arch {
 	/* SBI context */
 	struct kvm_sbi_context sbi_context;
 
-	/* Cache pages needed to program page tables with spinlock held */
-	struct kvm_mmu_memory_cache mmu_page_cache;
-
 	/* VCPU power-off state */
 	bool power_off;
 
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 3620ecac2fa1..a8281a65cb3d 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -625,7 +625,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	struct vm_area_struct *vma;
 	struct kvm *kvm = vcpu->kvm;
-	struct kvm_mmu_memory_cache *pcache = &vcpu->arch.mmu_page_cache;
+	struct kvm_mmu_memory_cache *pcache = &vcpu->mmu_page_table_cache;
 	bool logging = (memslot->dirty_bitmap &&
 			!(memslot->flags & KVM_MEM_READONLY)) ? true : false;
 	unsigned long vma_pagesize, mmu_seq;
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 71ebbc4821f0..9a1001ada936 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -160,7 +160,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	/* Mark this VCPU never ran */
 	vcpu->arch.ran_atleast_once = false;
-	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 	bitmap_zero(vcpu->arch.isa, RISCV_ISA_EXT_MAX);
 
 	/* Setup ISA features available to VCPU */
@@ -211,7 +211,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	kvm_riscv_vcpu_timer_deinit(vcpu);
 
 	/* Free unused pages pre-allocated for G-stage page table mappings */
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 }
 
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 95c731028452..8cac8ec29324 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -714,7 +714,6 @@ struct kvm_vcpu_arch {
 	struct kvm_mmu *walk_mmu;
 
 	struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
-	struct kvm_mmu_memory_cache mmu_shadow_page_cache;
 	struct kvm_mmu_memory_cache mmu_shadowed_info_cache;
 	struct kvm_mmu_memory_cache mmu_page_header_cache;
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index aea7df3c2dcb..a845e9141ad4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -664,7 +664,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 				       1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
 	if (r)
 		return r;
-	r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
+	r = kvm_mmu_topup_memory_cache(&vcpu->mmu_page_table_cache,
 				       PT64_ROOT_MAX_LEVEL);
 	if (r)
 		return r;
@@ -681,7 +681,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
@@ -2218,7 +2218,7 @@ static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu,
 {
 	struct shadow_page_caches caches = {
 		.page_header_cache = &vcpu->arch.mmu_page_header_cache,
-		.shadow_page_cache = &vcpu->arch.mmu_shadow_page_cache,
+		.shadow_page_cache = &vcpu->mmu_page_table_cache,
 		.shadowed_info_cache = &vcpu->arch.mmu_shadowed_info_cache,
 	};
 
@@ -5920,7 +5920,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
 	vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;
 
-	vcpu->arch.mmu_shadow_page_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 
 	vcpu->arch.mmu = &vcpu->arch.root_mmu;
 	vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index c5d1c9010d21..922815407b7e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -264,7 +264,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 	struct kvm_mmu_page *sp;
 
 	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
-	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
+	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->mmu_page_table_cache);
 
 	return sp;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 242eaed55320..0a9baa493760 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -387,6 +387,11 @@ struct kvm_vcpu {
 	 */
 	struct kvm_memory_slot *last_used_slot;
 	u64 last_used_slot_gen;
+
+#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
+	/* Cache used to allocate pages for use as page tables. */
+	struct kvm_mmu_memory_cache mmu_page_table_cache;
+#endif
 };
 
 /*
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 26/37] KVM: Move page table cache to struct kvm_vcpu
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Add a kvm_mmu_memory_cache to struct kvm_vcpu for allocating pages for
page tables, replacing existing architecture-specific page table caches.

Purposely do not make management of this cache architecture-neutral
though to reduce churn and since not all architectures configure it the
same way (MIPS does not set __GFP_ZERO.)

This eliminates a dependency of the TDP MMU on an architecture-specific
field, which will be used in a future commit to move the TDP MMU to
common code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 3 ---
 arch/arm64/kvm/arm.c              | 4 ++--
 arch/arm64/kvm/mmu.c              | 2 +-
 arch/mips/include/asm/kvm_host.h  | 3 ---
 arch/mips/kvm/mmu.c               | 4 ++--
 arch/riscv/include/asm/kvm_host.h | 3 ---
 arch/riscv/kvm/mmu.c              | 2 +-
 arch/riscv/kvm/vcpu.c             | 4 ++--
 arch/x86/include/asm/kvm_host.h   | 1 -
 arch/x86/kvm/mmu/mmu.c            | 8 ++++----
 arch/x86/kvm/mmu/tdp_mmu.c        | 2 +-
 include/linux/kvm_host.h          | 5 +++++
 12 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 001c8abe87fc..da519d6c09a5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -473,9 +473,6 @@ struct kvm_vcpu_arch {
 	/* vcpu power state */
 	struct kvm_mp_state mp_state;
 
-	/* Cache some mmu pages needed inside spinlock regions */
-	struct kvm_mmu_memory_cache mmu_page_cache;
-
 	/* Target CPU and feature flags */
 	int target;
 	DECLARE_BITMAP(features, KVM_VCPU_MAX_FEATURES);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9c5573bc4614..0e0d4c4f79a2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -340,7 +340,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.target = -1;
 	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
 
-	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 
 	/*
 	 * Default value for the FP state, will be overloaded at load
@@ -375,7 +375,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (vcpu_has_run_once(vcpu) && unlikely(!irqchip_in_kernel(vcpu->kvm)))
 		static_branch_dec(&userspace_irqchip_in_use);
 
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 	kvm_timer_vcpu_terminate(vcpu);
 	kvm_pmu_vcpu_destroy(vcpu);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 31d7fa4c7c14..d431c5cdb26a 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1196,7 +1196,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	bool device = false;
 	unsigned long mmu_seq;
 	struct kvm *kvm = vcpu->kvm;
-	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+	struct kvm_mmu_memory_cache *memcache = &vcpu->mmu_page_table_cache;
 	struct vm_area_struct *vma;
 	short vma_shift;
 	gfn_t gfn;
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 5cedb28e8a40..b7f276331583 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -344,9 +344,6 @@ struct kvm_vcpu_arch {
 	/* Bitmask of pending exceptions to be cleared */
 	unsigned long pending_exceptions_clr;
 
-	/* Cache some mmu pages needed inside spinlock regions */
-	struct kvm_mmu_memory_cache mmu_page_cache;
-
 	/* vcpu's vzguestid is different on each host cpu in an smp system */
 	u32 vzguestid[NR_CPUS];
 
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 74cd64a24d05..638f728d0bbe 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -27,7 +27,7 @@
 
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 }
 
 /**
@@ -589,7 +589,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 			     pte_t *out_entry, pte_t *out_buddy)
 {
 	struct kvm *kvm = vcpu->kvm;
-	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+	struct kvm_mmu_memory_cache *memcache = &vcpu->mmu_page_table_cache;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	int srcu_idx, err;
 	kvm_pfn_t pfn;
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index dbbf43d52623..82e5d80347cc 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -219,9 +219,6 @@ struct kvm_vcpu_arch {
 	/* SBI context */
 	struct kvm_sbi_context sbi_context;
 
-	/* Cache pages needed to program page tables with spinlock held */
-	struct kvm_mmu_memory_cache mmu_page_cache;
-
 	/* VCPU power-off state */
 	bool power_off;
 
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 3620ecac2fa1..a8281a65cb3d 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -625,7 +625,7 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	struct vm_area_struct *vma;
 	struct kvm *kvm = vcpu->kvm;
-	struct kvm_mmu_memory_cache *pcache = &vcpu->arch.mmu_page_cache;
+	struct kvm_mmu_memory_cache *pcache = &vcpu->mmu_page_table_cache;
 	bool logging = (memslot->dirty_bitmap &&
 			!(memslot->flags & KVM_MEM_READONLY)) ? true : false;
 	unsigned long vma_pagesize, mmu_seq;
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 71ebbc4821f0..9a1001ada936 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -160,7 +160,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	/* Mark this VCPU never ran */
 	vcpu->arch.ran_atleast_once = false;
-	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 	bitmap_zero(vcpu->arch.isa, RISCV_ISA_EXT_MAX);
 
 	/* Setup ISA features available to VCPU */
@@ -211,7 +211,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	kvm_riscv_vcpu_timer_deinit(vcpu);
 
 	/* Free unused pages pre-allocated for G-stage page table mappings */
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 }
 
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 95c731028452..8cac8ec29324 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -714,7 +714,6 @@ struct kvm_vcpu_arch {
 	struct kvm_mmu *walk_mmu;
 
 	struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
-	struct kvm_mmu_memory_cache mmu_shadow_page_cache;
 	struct kvm_mmu_memory_cache mmu_shadowed_info_cache;
 	struct kvm_mmu_memory_cache mmu_page_header_cache;
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index aea7df3c2dcb..a845e9141ad4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -664,7 +664,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 				       1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
 	if (r)
 		return r;
-	r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
+	r = kvm_mmu_topup_memory_cache(&vcpu->mmu_page_table_cache,
 				       PT64_ROOT_MAX_LEVEL);
 	if (r)
 		return r;
@@ -681,7 +681,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
@@ -2218,7 +2218,7 @@ static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu,
 {
 	struct shadow_page_caches caches = {
 		.page_header_cache = &vcpu->arch.mmu_page_header_cache,
-		.shadow_page_cache = &vcpu->arch.mmu_shadow_page_cache,
+		.shadow_page_cache = &vcpu->mmu_page_table_cache,
 		.shadowed_info_cache = &vcpu->arch.mmu_shadowed_info_cache,
 	};
 
@@ -5920,7 +5920,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
 	vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;
 
-	vcpu->arch.mmu_shadow_page_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 
 	vcpu->arch.mmu = &vcpu->arch.root_mmu;
 	vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index c5d1c9010d21..922815407b7e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -264,7 +264,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 	struct kvm_mmu_page *sp;
 
 	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
-	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
+	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->mmu_page_table_cache);
 
 	return sp;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 242eaed55320..0a9baa493760 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -387,6 +387,11 @@ struct kvm_vcpu {
 	 */
 	struct kvm_memory_slot *last_used_slot;
 	u64 last_used_slot_gen;
+
+#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
+	/* Cache used to allocate pages for use as page tables. */
+	struct kvm_mmu_memory_cache mmu_page_table_cache;
+#endif
 };
 
 /*
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 27/37] KVM: MMU: Move mmu_page_header_cache to common code
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move vcpu->arch.mmu_page_header_cache and its backing kmem_cache to common code
in preparation for moving the TDP MMU to common code in a future commit.

The kmem_cache is still only initialized and used on x86 for the time being to
avoid affecting other architectures. A future commit can move the code to
manage this cache to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 11 +++++------
 arch/x86/kvm/mmu/mmu_internal.h |  2 --
 arch/x86/kvm/mmu/tdp_mmu.c      |  2 +-
 include/kvm/mmu.h               |  2 ++
 include/linux/kvm_host.h        |  3 +++
 virt/kvm/kvm_main.c             |  1 +
 6 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a845e9141ad4..f01ee01f3509 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -163,7 +163,6 @@ struct kvm_shadow_walk_iterator {
 	     __shadow_walk_next(&(_walker), spte))
 
 static struct kmem_cache *pte_list_desc_cache;
-struct kmem_cache *mmu_page_header_cache;
 static struct percpu_counter kvm_total_used_mmu_pages;
 
 static void mmu_spte_set(u64 *sptep, u64 spte);
@@ -674,7 +673,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 		if (r)
 			return r;
 	}
-	return kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
+	return kvm_mmu_topup_memory_cache(&vcpu->mmu_page_header_cache,
 					  PT64_ROOT_MAX_LEVEL);
 }
 
@@ -683,7 +682,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
 	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_header_cache);
 }
 
 static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
@@ -2217,7 +2216,7 @@ static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu,
 						    union kvm_mmu_page_role role)
 {
 	struct shadow_page_caches caches = {
-		.page_header_cache = &vcpu->arch.mmu_page_header_cache,
+		.page_header_cache = &vcpu->mmu_page_header_cache,
 		.shadow_page_cache = &vcpu->mmu_page_table_cache,
 		.shadowed_info_cache = &vcpu->arch.mmu_shadowed_info_cache,
 	};
@@ -5917,8 +5916,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu_pte_list_desc_cache.kmem_cache = pte_list_desc_cache;
 	vcpu->arch.mmu_pte_list_desc_cache.gfp_zero = __GFP_ZERO;
 
-	vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
-	vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
+	vcpu->mmu_page_header_cache.gfp_zero = __GFP_ZERO;
 
 	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d3c1d08002af..4aa60d5d87b0 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -44,8 +44,6 @@ extern bool dbg;
 #define INVALID_PAE_ROOT	0
 #define IS_VALID_PAE_ROOT(x)	(!!(x))
 
-extern struct kmem_cache *mmu_page_header_cache;
-
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 {
 	/*
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 922815407b7e..891877a6fb78 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -263,7 +263,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu_page *sp;
 
-	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
+	sp = kvm_mmu_memory_cache_alloc(&vcpu->mmu_page_header_cache);
 	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->mmu_page_table_cache);
 
 	return sp;
diff --git a/include/kvm/mmu.h b/include/kvm/mmu.h
index 425db8e4f8e9..f1416828f8fe 100644
--- a/include/kvm/mmu.h
+++ b/include/kvm/mmu.h
@@ -4,6 +4,8 @@
 
 #include <kvm/mmu_types.h>
 
+extern struct kmem_cache *mmu_page_header_cache;
+
 static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
 {
 	struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0a9baa493760..ec3a6de6d54e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -391,6 +391,9 @@ struct kvm_vcpu {
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
 	/* Cache used to allocate pages for use as page tables. */
 	struct kvm_mmu_memory_cache mmu_page_table_cache;
+
+	/* Cache used to allocate kvm_mmu_page structs. */
+	struct kvm_mmu_memory_cache mmu_page_header_cache;
 #endif
 };
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 954ab969f55e..0f1d48ed7d57 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -108,6 +108,7 @@ static int kvm_usage_count;
 static atomic_t hardware_enable_failed;
 
 static struct kmem_cache *kvm_vcpu_cache;
+struct kmem_cache *mmu_page_header_cache;
 
 static __read_mostly struct preempt_ops kvm_preempt_ops;
 static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu);
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 27/37] KVM: MMU: Move mmu_page_header_cache to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move vcpu->arch.mmu_page_header_cache and its backing kmem_cache to common code
in preparation for moving the TDP MMU to common code in a future commit.

The kmem_cache is still only initialized and used on x86 for the time being to
avoid affecting other architectures. A future commit can move the code to
manage this cache to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 11 +++++------
 arch/x86/kvm/mmu/mmu_internal.h |  2 --
 arch/x86/kvm/mmu/tdp_mmu.c      |  2 +-
 include/kvm/mmu.h               |  2 ++
 include/linux/kvm_host.h        |  3 +++
 virt/kvm/kvm_main.c             |  1 +
 6 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a845e9141ad4..f01ee01f3509 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -163,7 +163,6 @@ struct kvm_shadow_walk_iterator {
 	     __shadow_walk_next(&(_walker), spte))
 
 static struct kmem_cache *pte_list_desc_cache;
-struct kmem_cache *mmu_page_header_cache;
 static struct percpu_counter kvm_total_used_mmu_pages;
 
 static void mmu_spte_set(u64 *sptep, u64 spte);
@@ -674,7 +673,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 		if (r)
 			return r;
 	}
-	return kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
+	return kvm_mmu_topup_memory_cache(&vcpu->mmu_page_header_cache,
 					  PT64_ROOT_MAX_LEVEL);
 }
 
@@ -683,7 +682,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
 	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_header_cache);
 }
 
 static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
@@ -2217,7 +2216,7 @@ static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu,
 						    union kvm_mmu_page_role role)
 {
 	struct shadow_page_caches caches = {
-		.page_header_cache = &vcpu->arch.mmu_page_header_cache,
+		.page_header_cache = &vcpu->mmu_page_header_cache,
 		.shadow_page_cache = &vcpu->mmu_page_table_cache,
 		.shadowed_info_cache = &vcpu->arch.mmu_shadowed_info_cache,
 	};
@@ -5917,8 +5916,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu_pte_list_desc_cache.kmem_cache = pte_list_desc_cache;
 	vcpu->arch.mmu_pte_list_desc_cache.gfp_zero = __GFP_ZERO;
 
-	vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
-	vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
+	vcpu->mmu_page_header_cache.gfp_zero = __GFP_ZERO;
 
 	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d3c1d08002af..4aa60d5d87b0 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -44,8 +44,6 @@ extern bool dbg;
 #define INVALID_PAE_ROOT	0
 #define IS_VALID_PAE_ROOT(x)	(!!(x))
 
-extern struct kmem_cache *mmu_page_header_cache;
-
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 {
 	/*
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 922815407b7e..891877a6fb78 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -263,7 +263,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu_page *sp;
 
-	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
+	sp = kvm_mmu_memory_cache_alloc(&vcpu->mmu_page_header_cache);
 	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->mmu_page_table_cache);
 
 	return sp;
diff --git a/include/kvm/mmu.h b/include/kvm/mmu.h
index 425db8e4f8e9..f1416828f8fe 100644
--- a/include/kvm/mmu.h
+++ b/include/kvm/mmu.h
@@ -4,6 +4,8 @@
 
 #include <kvm/mmu_types.h>
 
+extern struct kmem_cache *mmu_page_header_cache;
+
 static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
 {
 	struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0a9baa493760..ec3a6de6d54e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -391,6 +391,9 @@ struct kvm_vcpu {
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
 	/* Cache used to allocate pages for use as page tables. */
 	struct kvm_mmu_memory_cache mmu_page_table_cache;
+
+	/* Cache used to allocate kvm_mmu_page structs. */
+	struct kvm_mmu_memory_cache mmu_page_header_cache;
 #endif
 };
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 954ab969f55e..0f1d48ed7d57 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -108,6 +108,7 @@ static int kvm_usage_count;
 static atomic_t hardware_enable_failed;
 
 static struct kmem_cache *kvm_vcpu_cache;
+struct kmem_cache *mmu_page_header_cache;
 
 static __read_mostly struct preempt_ops kvm_preempt_ops;
 static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu);
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 27/37] KVM: MMU: Move mmu_page_header_cache to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move vcpu->arch.mmu_page_header_cache and its backing kmem_cache to common code
in preparation for moving the TDP MMU to common code in a future commit.

The kmem_cache is still only initialized and used on x86 for the time being to
avoid affecting other architectures. A future commit can move the code to
manage this cache to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 11 +++++------
 arch/x86/kvm/mmu/mmu_internal.h |  2 --
 arch/x86/kvm/mmu/tdp_mmu.c      |  2 +-
 include/kvm/mmu.h               |  2 ++
 include/linux/kvm_host.h        |  3 +++
 virt/kvm/kvm_main.c             |  1 +
 6 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a845e9141ad4..f01ee01f3509 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -163,7 +163,6 @@ struct kvm_shadow_walk_iterator {
 	     __shadow_walk_next(&(_walker), spte))
 
 static struct kmem_cache *pte_list_desc_cache;
-struct kmem_cache *mmu_page_header_cache;
 static struct percpu_counter kvm_total_used_mmu_pages;
 
 static void mmu_spte_set(u64 *sptep, u64 spte);
@@ -674,7 +673,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 		if (r)
 			return r;
 	}
-	return kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
+	return kvm_mmu_topup_memory_cache(&vcpu->mmu_page_header_cache,
 					  PT64_ROOT_MAX_LEVEL);
 }
 
@@ -683,7 +682,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
 	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_header_cache);
 }
 
 static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
@@ -2217,7 +2216,7 @@ static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu,
 						    union kvm_mmu_page_role role)
 {
 	struct shadow_page_caches caches = {
-		.page_header_cache = &vcpu->arch.mmu_page_header_cache,
+		.page_header_cache = &vcpu->mmu_page_header_cache,
 		.shadow_page_cache = &vcpu->mmu_page_table_cache,
 		.shadowed_info_cache = &vcpu->arch.mmu_shadowed_info_cache,
 	};
@@ -5917,8 +5916,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu_pte_list_desc_cache.kmem_cache = pte_list_desc_cache;
 	vcpu->arch.mmu_pte_list_desc_cache.gfp_zero = __GFP_ZERO;
 
-	vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
-	vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
+	vcpu->mmu_page_header_cache.gfp_zero = __GFP_ZERO;
 
 	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d3c1d08002af..4aa60d5d87b0 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -44,8 +44,6 @@ extern bool dbg;
 #define INVALID_PAE_ROOT	0
 #define IS_VALID_PAE_ROOT(x)	(!!(x))
 
-extern struct kmem_cache *mmu_page_header_cache;
-
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 {
 	/*
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 922815407b7e..891877a6fb78 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -263,7 +263,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu_page *sp;
 
-	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
+	sp = kvm_mmu_memory_cache_alloc(&vcpu->mmu_page_header_cache);
 	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->mmu_page_table_cache);
 
 	return sp;
diff --git a/include/kvm/mmu.h b/include/kvm/mmu.h
index 425db8e4f8e9..f1416828f8fe 100644
--- a/include/kvm/mmu.h
+++ b/include/kvm/mmu.h
@@ -4,6 +4,8 @@
 
 #include <kvm/mmu_types.h>
 
+extern struct kmem_cache *mmu_page_header_cache;
+
 static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
 {
 	struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0a9baa493760..ec3a6de6d54e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -391,6 +391,9 @@ struct kvm_vcpu {
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
 	/* Cache used to allocate pages for use as page tables. */
 	struct kvm_mmu_memory_cache mmu_page_table_cache;
+
+	/* Cache used to allocate kvm_mmu_page structs. */
+	struct kvm_mmu_memory_cache mmu_page_header_cache;
 #endif
 };
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 954ab969f55e..0f1d48ed7d57 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -108,6 +108,7 @@ static int kvm_usage_count;
 static atomic_t hardware_enable_failed;
 
 static struct kmem_cache *kvm_vcpu_cache;
+struct kmem_cache *mmu_page_header_cache;
 
 static __read_mostly struct preempt_ops kvm_preempt_ops;
 static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu);
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 27/37] KVM: MMU: Move mmu_page_header_cache to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move vcpu->arch.mmu_page_header_cache and its backing kmem_cache to common code
in preparation for moving the TDP MMU to common code in a future commit.

The kmem_cache is still only initialized and used on x86 for the time being to
avoid affecting other architectures. A future commit can move the code to
manage this cache to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 11 +++++------
 arch/x86/kvm/mmu/mmu_internal.h |  2 --
 arch/x86/kvm/mmu/tdp_mmu.c      |  2 +-
 include/kvm/mmu.h               |  2 ++
 include/linux/kvm_host.h        |  3 +++
 virt/kvm/kvm_main.c             |  1 +
 6 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a845e9141ad4..f01ee01f3509 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -163,7 +163,6 @@ struct kvm_shadow_walk_iterator {
 	     __shadow_walk_next(&(_walker), spte))
 
 static struct kmem_cache *pte_list_desc_cache;
-struct kmem_cache *mmu_page_header_cache;
 static struct percpu_counter kvm_total_used_mmu_pages;
 
 static void mmu_spte_set(u64 *sptep, u64 spte);
@@ -674,7 +673,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 		if (r)
 			return r;
 	}
-	return kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
+	return kvm_mmu_topup_memory_cache(&vcpu->mmu_page_header_cache,
 					  PT64_ROOT_MAX_LEVEL);
 }
 
@@ -683,7 +682,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
 	kvm_mmu_free_memory_cache(&vcpu->mmu_page_table_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
+	kvm_mmu_free_memory_cache(&vcpu->mmu_page_header_cache);
 }
 
 static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
@@ -2217,7 +2216,7 @@ static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu,
 						    union kvm_mmu_page_role role)
 {
 	struct shadow_page_caches caches = {
-		.page_header_cache = &vcpu->arch.mmu_page_header_cache,
+		.page_header_cache = &vcpu->mmu_page_header_cache,
 		.shadow_page_cache = &vcpu->mmu_page_table_cache,
 		.shadowed_info_cache = &vcpu->arch.mmu_shadowed_info_cache,
 	};
@@ -5917,8 +5916,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu_pte_list_desc_cache.kmem_cache = pte_list_desc_cache;
 	vcpu->arch.mmu_pte_list_desc_cache.gfp_zero = __GFP_ZERO;
 
-	vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
-	vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;
+	vcpu->mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
+	vcpu->mmu_page_header_cache.gfp_zero = __GFP_ZERO;
 
 	vcpu->mmu_page_table_cache.gfp_zero = __GFP_ZERO;
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d3c1d08002af..4aa60d5d87b0 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -44,8 +44,6 @@ extern bool dbg;
 #define INVALID_PAE_ROOT	0
 #define IS_VALID_PAE_ROOT(x)	(!!(x))
 
-extern struct kmem_cache *mmu_page_header_cache;
-
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 {
 	/*
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 922815407b7e..891877a6fb78 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -263,7 +263,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu_page *sp;
 
-	sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
+	sp = kvm_mmu_memory_cache_alloc(&vcpu->mmu_page_header_cache);
 	sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->mmu_page_table_cache);
 
 	return sp;
diff --git a/include/kvm/mmu.h b/include/kvm/mmu.h
index 425db8e4f8e9..f1416828f8fe 100644
--- a/include/kvm/mmu.h
+++ b/include/kvm/mmu.h
@@ -4,6 +4,8 @@
 
 #include <kvm/mmu_types.h>
 
+extern struct kmem_cache *mmu_page_header_cache;
+
 static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
 {
 	struct page *page = pfn_to_page((shadow_page) >> PAGE_SHIFT);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0a9baa493760..ec3a6de6d54e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -391,6 +391,9 @@ struct kvm_vcpu {
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
 	/* Cache used to allocate pages for use as page tables. */
 	struct kvm_mmu_memory_cache mmu_page_table_cache;
+
+	/* Cache used to allocate kvm_mmu_page structs. */
+	struct kvm_mmu_memory_cache mmu_page_header_cache;
 #endif
 };
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 954ab969f55e..0f1d48ed7d57 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -108,6 +108,7 @@ static int kvm_usage_count;
 static atomic_t hardware_enable_failed;
 
 static struct kmem_cache *kvm_vcpu_cache;
+struct kmem_cache *mmu_page_header_cache;
 
 static __read_mostly struct preempt_ops kvm_preempt_ops;
 static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu);
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 28/37] KVM: MMU: Stub out tracepoints on non-x86 architectures
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Create stub tracepoints outside of x86. The KVM MMU tracepoints can be
moved to common code, but will be deferred to a future commit.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c |  2 +-
 include/kvm/mmutrace.h     | 17 +++++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 include/kvm/mmutrace.h

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 891877a6fb78..72746b645e99 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -2,12 +2,12 @@
 
 #include "mmu.h"
 #include "mmu_internal.h"
-#include "mmutrace.h"
 #include "tdp_iter.h"
 #include "tdp_mmu.h"
 #include "spte.h"
 
 #include <kvm/tdp_pgtable.h>
+#include <kvm/mmutrace.h>
 
 #include <asm/cmpxchg.h>
 #include <trace/events/kvm.h>
diff --git a/include/kvm/mmutrace.h b/include/kvm/mmutrace.h
new file mode 100644
index 000000000000..e95a3cb47479
--- /dev/null
+++ b/include/kvm/mmutrace.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#if !defined(_TRACE_KVM_MMUTRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_MMUTRACE_H
+
+#ifdef CONFIG_X86
+#include "../../arch/x86/kvm/mmu/mmutrace.h"
+#else
+#define trace_mark_mmio_spte(...)		do {} while (0)
+#define trace_kvm_mmu_get_page(...)		do {} while (0)
+#define trace_kvm_mmu_prepare_zap_page(...)	do {} while (0)
+#define trace_kvm_mmu_set_spte(...)		do {} while (0)
+#define trace_kvm_mmu_spte_requested(...)	do {} while (0)
+#define trace_kvm_mmu_split_huge_page(...)	do {} while (0)
+#define trace_kvm_tdp_mmu_spte_changed(...)	do {} while (0)
+#endif
+
+#endif /* _TRACE_KVM_MMUTRACE_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 28/37] KVM: MMU: Stub out tracepoints on non-x86 architectures
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Create stub tracepoints outside of x86. The KVM MMU tracepoints can be
moved to common code, but will be deferred to a future commit.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c |  2 +-
 include/kvm/mmutrace.h     | 17 +++++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 include/kvm/mmutrace.h

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 891877a6fb78..72746b645e99 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -2,12 +2,12 @@
 
 #include "mmu.h"
 #include "mmu_internal.h"
-#include "mmutrace.h"
 #include "tdp_iter.h"
 #include "tdp_mmu.h"
 #include "spte.h"
 
 #include <kvm/tdp_pgtable.h>
+#include <kvm/mmutrace.h>
 
 #include <asm/cmpxchg.h>
 #include <trace/events/kvm.h>
diff --git a/include/kvm/mmutrace.h b/include/kvm/mmutrace.h
new file mode 100644
index 000000000000..e95a3cb47479
--- /dev/null
+++ b/include/kvm/mmutrace.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#if !defined(_TRACE_KVM_MMUTRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_MMUTRACE_H
+
+#ifdef CONFIG_X86
+#include "../../arch/x86/kvm/mmu/mmutrace.h"
+#else
+#define trace_mark_mmio_spte(...)		do {} while (0)
+#define trace_kvm_mmu_get_page(...)		do {} while (0)
+#define trace_kvm_mmu_prepare_zap_page(...)	do {} while (0)
+#define trace_kvm_mmu_set_spte(...)		do {} while (0)
+#define trace_kvm_mmu_spte_requested(...)	do {} while (0)
+#define trace_kvm_mmu_split_huge_page(...)	do {} while (0)
+#define trace_kvm_tdp_mmu_spte_changed(...)	do {} while (0)
+#endif
+
+#endif /* _TRACE_KVM_MMUTRACE_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 28/37] KVM: MMU: Stub out tracepoints on non-x86 architectures
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Create stub tracepoints outside of x86. The KVM MMU tracepoints can be
moved to common code, but will be deferred to a future commit.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c |  2 +-
 include/kvm/mmutrace.h     | 17 +++++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 include/kvm/mmutrace.h

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 891877a6fb78..72746b645e99 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -2,12 +2,12 @@
 
 #include "mmu.h"
 #include "mmu_internal.h"
-#include "mmutrace.h"
 #include "tdp_iter.h"
 #include "tdp_mmu.h"
 #include "spte.h"
 
 #include <kvm/tdp_pgtable.h>
+#include <kvm/mmutrace.h>
 
 #include <asm/cmpxchg.h>
 #include <trace/events/kvm.h>
diff --git a/include/kvm/mmutrace.h b/include/kvm/mmutrace.h
new file mode 100644
index 000000000000..e95a3cb47479
--- /dev/null
+++ b/include/kvm/mmutrace.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#if !defined(_TRACE_KVM_MMUTRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_MMUTRACE_H
+
+#ifdef CONFIG_X86
+#include "../../arch/x86/kvm/mmu/mmutrace.h"
+#else
+#define trace_mark_mmio_spte(...)		do {} while (0)
+#define trace_kvm_mmu_get_page(...)		do {} while (0)
+#define trace_kvm_mmu_prepare_zap_page(...)	do {} while (0)
+#define trace_kvm_mmu_set_spte(...)		do {} while (0)
+#define trace_kvm_mmu_spte_requested(...)	do {} while (0)
+#define trace_kvm_mmu_split_huge_page(...)	do {} while (0)
+#define trace_kvm_tdp_mmu_spte_changed(...)	do {} while (0)
+#endif
+
+#endif /* _TRACE_KVM_MMUTRACE_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 28/37] KVM: MMU: Stub out tracepoints on non-x86 architectures
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Create stub tracepoints outside of x86. The KVM MMU tracepoints can be
moved to common code, but will be deferred to a future commit.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c |  2 +-
 include/kvm/mmutrace.h     | 17 +++++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 include/kvm/mmutrace.h

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 891877a6fb78..72746b645e99 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -2,12 +2,12 @@
 
 #include "mmu.h"
 #include "mmu_internal.h"
-#include "mmutrace.h"
 #include "tdp_iter.h"
 #include "tdp_mmu.h"
 #include "spte.h"
 
 #include <kvm/tdp_pgtable.h>
+#include <kvm/mmutrace.h>
 
 #include <asm/cmpxchg.h>
 #include <trace/events/kvm.h>
diff --git a/include/kvm/mmutrace.h b/include/kvm/mmutrace.h
new file mode 100644
index 000000000000..e95a3cb47479
--- /dev/null
+++ b/include/kvm/mmutrace.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#if !defined(_TRACE_KVM_MMUTRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KVM_MMUTRACE_H
+
+#ifdef CONFIG_X86
+#include "../../arch/x86/kvm/mmu/mmutrace.h"
+#else
+#define trace_mark_mmio_spte(...)		do {} while (0)
+#define trace_kvm_mmu_get_page(...)		do {} while (0)
+#define trace_kvm_mmu_prepare_zap_page(...)	do {} while (0)
+#define trace_kvm_mmu_set_spte(...)		do {} while (0)
+#define trace_kvm_mmu_spte_requested(...)	do {} while (0)
+#define trace_kvm_mmu_split_huge_page(...)	do {} while (0)
+#define trace_kvm_tdp_mmu_spte_changed(...)	do {} while (0)
+#endif
+
+#endif /* _TRACE_KVM_MMUTRACE_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 29/37] KVM: x86/mmu: Collapse kvm_flush_remote_tlbs_with_{range,address}() together
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Collapse kvm_flush_remote_tlbs_with_range() and
kvm_flush_remote_tlbs_with_address() into a single function. This
eliminates some lines of code and a useless NULL check on the range
struct.

Opportunistically switch from ENOTSUPP to EOPNOTSUPP to make checkpatch
happy.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f01ee01f3509..b7bbabac9127 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -244,27 +244,20 @@ static inline bool kvm_available_flush_tlb_with_range(void)
 	return kvm_x86_ops.tlb_remote_flush_with_range;
 }
 
-static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
-		struct kvm_tlb_range *range)
-{
-	int ret = -ENOTSUPP;
-
-	if (range && kvm_x86_ops.tlb_remote_flush_with_range)
-		ret = static_call(kvm_x86_tlb_remote_flush_with_range)(kvm, range);
-
-	if (ret)
-		kvm_flush_remote_tlbs(kvm);
-}
-
 void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
 		u64 start_gfn, u64 pages)
 {
 	struct kvm_tlb_range range;
+	int ret = -EOPNOTSUPP;
 
 	range.start_gfn = start_gfn;
 	range.pages = pages;
 
-	kvm_flush_remote_tlbs_with_range(kvm, &range);
+	if (kvm_x86_ops.tlb_remote_flush_with_range)
+		ret = static_call(kvm_x86_tlb_remote_flush_with_range)(kvm, &range);
+
+	if (ret)
+		kvm_flush_remote_tlbs(kvm);
 }
 
 static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 29/37] KVM: x86/mmu: Collapse kvm_flush_remote_tlbs_with_{range, address}() together
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Collapse kvm_flush_remote_tlbs_with_range() and
kvm_flush_remote_tlbs_with_address() into a single function. This
eliminates some lines of code and a useless NULL check on the range
struct.

Opportunistically switch from ENOTSUPP to EOPNOTSUPP to make checkpatch
happy.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f01ee01f3509..b7bbabac9127 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -244,27 +244,20 @@ static inline bool kvm_available_flush_tlb_with_range(void)
 	return kvm_x86_ops.tlb_remote_flush_with_range;
 }
 
-static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
-		struct kvm_tlb_range *range)
-{
-	int ret = -ENOTSUPP;
-
-	if (range && kvm_x86_ops.tlb_remote_flush_with_range)
-		ret = static_call(kvm_x86_tlb_remote_flush_with_range)(kvm, range);
-
-	if (ret)
-		kvm_flush_remote_tlbs(kvm);
-}
-
 void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
 		u64 start_gfn, u64 pages)
 {
 	struct kvm_tlb_range range;
+	int ret = -EOPNOTSUPP;
 
 	range.start_gfn = start_gfn;
 	range.pages = pages;
 
-	kvm_flush_remote_tlbs_with_range(kvm, &range);
+	if (kvm_x86_ops.tlb_remote_flush_with_range)
+		ret = static_call(kvm_x86_tlb_remote_flush_with_range)(kvm, &range);
+
+	if (ret)
+		kvm_flush_remote_tlbs(kvm);
 }
 
 static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 29/37] KVM: x86/mmu: Collapse kvm_flush_remote_tlbs_with_{range,address}() together
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Collapse kvm_flush_remote_tlbs_with_range() and
kvm_flush_remote_tlbs_with_address() into a single function. This
eliminates some lines of code and a useless NULL check on the range
struct.

Opportunistically switch from ENOTSUPP to EOPNOTSUPP to make checkpatch
happy.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f01ee01f3509..b7bbabac9127 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -244,27 +244,20 @@ static inline bool kvm_available_flush_tlb_with_range(void)
 	return kvm_x86_ops.tlb_remote_flush_with_range;
 }
 
-static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
-		struct kvm_tlb_range *range)
-{
-	int ret = -ENOTSUPP;
-
-	if (range && kvm_x86_ops.tlb_remote_flush_with_range)
-		ret = static_call(kvm_x86_tlb_remote_flush_with_range)(kvm, range);
-
-	if (ret)
-		kvm_flush_remote_tlbs(kvm);
-}
-
 void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
 		u64 start_gfn, u64 pages)
 {
 	struct kvm_tlb_range range;
+	int ret = -EOPNOTSUPP;
 
 	range.start_gfn = start_gfn;
 	range.pages = pages;
 
-	kvm_flush_remote_tlbs_with_range(kvm, &range);
+	if (kvm_x86_ops.tlb_remote_flush_with_range)
+		ret = static_call(kvm_x86_tlb_remote_flush_with_range)(kvm, &range);
+
+	if (ret)
+		kvm_flush_remote_tlbs(kvm);
 }
 
 static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 29/37] KVM: x86/mmu: Collapse kvm_flush_remote_tlbs_with_{range,address}() together
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Collapse kvm_flush_remote_tlbs_with_range() and
kvm_flush_remote_tlbs_with_address() into a single function. This
eliminates some lines of code and a useless NULL check on the range
struct.

Opportunistically switch from ENOTSUPP to EOPNOTSUPP to make checkpatch
happy.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f01ee01f3509..b7bbabac9127 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -244,27 +244,20 @@ static inline bool kvm_available_flush_tlb_with_range(void)
 	return kvm_x86_ops.tlb_remote_flush_with_range;
 }
 
-static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
-		struct kvm_tlb_range *range)
-{
-	int ret = -ENOTSUPP;
-
-	if (range && kvm_x86_ops.tlb_remote_flush_with_range)
-		ret = static_call(kvm_x86_tlb_remote_flush_with_range)(kvm, range);
-
-	if (ret)
-		kvm_flush_remote_tlbs(kvm);
-}
-
 void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
 		u64 start_gfn, u64 pages)
 {
 	struct kvm_tlb_range range;
+	int ret = -EOPNOTSUPP;
 
 	range.start_gfn = start_gfn;
 	range.pages = pages;
 
-	kvm_flush_remote_tlbs_with_range(kvm, &range);
+	if (kvm_x86_ops.tlb_remote_flush_with_range)
+		ret = static_call(kvm_x86_tlb_remote_flush_with_range)(kvm, &range);
+
+	if (ret)
+		kvm_flush_remote_tlbs(kvm);
 }
 
 static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 30/37] KVM: x86/mmu: Rename kvm_flush_remote_tlbs_with_address()
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Rename kvm_flush_remote_tlbs_with_address() to
kvm_flush_remote_tlbs_range(). This name is shorter, which reduces the
number of callsites that need to be broken up across multiple lines, and
more readable since it conveys a range of memory is being flushed rather
than a single address.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 32 ++++++++++++++------------------
 arch/x86/kvm/mmu/mmu_internal.h |  3 +--
 arch/x86/kvm/mmu/paging_tmpl.h  |  4 ++--
 arch/x86/kvm/mmu/tdp_mmu.c      |  7 +++----
 4 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b7bbabac9127..4a28adaa92b4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -244,8 +244,7 @@ static inline bool kvm_available_flush_tlb_with_range(void)
 	return kvm_x86_ops.tlb_remote_flush_with_range;
 }
 
-void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
-		u64 start_gfn, u64 pages)
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages)
 {
 	struct kvm_tlb_range range;
 	int ret = -EOPNOTSUPP;
@@ -804,7 +803,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
 	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
-		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+		kvm_flush_remote_tlbs_range(kvm, gfn, 1);
 }
 
 void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
@@ -1178,7 +1177,7 @@ static void drop_large_spte(struct kvm *kvm, u64 *sptep, bool flush)
 	drop_spte(kvm, sptep);
 
 	if (flush)
-		kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+		kvm_flush_remote_tlbs_range(kvm, sp->gfn,
 			KVM_PAGES_PER_HPAGE(sp->role.level));
 }
 
@@ -1460,7 +1459,7 @@ static bool kvm_set_pte_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 	}
 
 	if (need_flush && kvm_available_flush_tlb_with_range()) {
-		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+		kvm_flush_remote_tlbs_range(kvm, gfn, 1);
 		return false;
 	}
 
@@ -1630,8 +1629,8 @@ static void __rmap_add(struct kvm *kvm,
 		kvm->stat.max_mmu_rmap_size = rmap_count;
 	if (rmap_count > RMAP_RECYCLE_THRESHOLD) {
 		kvm_zap_all_rmap_sptes(kvm, rmap_head);
-		kvm_flush_remote_tlbs_with_address(
-				kvm, sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
+		kvm_flush_remote_tlbs_range(kvm, sp->gfn,
+					    KVM_PAGES_PER_HPAGE(sp->role.level));
 	}
 }
 
@@ -2389,7 +2388,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 			return;
 
 		drop_parent_pte(child, sptep);
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
+		kvm_flush_remote_tlbs_range(vcpu->kvm, child->gfn, 1);
 	}
 }
 
@@ -2873,8 +2872,8 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
 	}
 
 	if (flush)
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn,
-				KVM_PAGES_PER_HPAGE(level));
+		kvm_flush_remote_tlbs_range(vcpu->kvm, gfn,
+					    KVM_PAGES_PER_HPAGE(level));
 
 	pgprintk("%s: setting spte %llx\n", __func__, *sptep);
 
@@ -5809,9 +5808,8 @@ slot_handle_level_range(struct kvm *kvm, const struct kvm_memory_slot *memslot,
 
 		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
 			if (flush && flush_on_yield) {
-				kvm_flush_remote_tlbs_with_address(kvm,
-						start_gfn,
-						iterator.gfn - start_gfn + 1);
+				kvm_flush_remote_tlbs_range(kvm, start_gfn,
+							    iterator.gfn - start_gfn + 1);
 				flush = false;
 			}
 			cond_resched_rwlock_write(&kvm->mmu_lock);
@@ -6166,8 +6164,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 	}
 
 	if (flush)
-		kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
-						   gfn_end - gfn_start);
+		kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - gfn_start);
 
 	kvm_mmu_invalidate_end(kvm, gfn_start, gfn_end);
 
@@ -6506,7 +6503,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
 
 			if (kvm_available_flush_tlb_with_range())
-				kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+				kvm_flush_remote_tlbs_range(kvm, sp->gfn,
 					KVM_PAGES_PER_HPAGE(sp->role.level));
 			else
 				need_tlb_flush = 1;
@@ -6557,8 +6554,7 @@ void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
 	 * is observed by any other operation on the same memslot.
 	 */
 	lockdep_assert_held(&kvm->slots_lock);
-	kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
-					   memslot->npages);
+	kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
 }
 
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 4aa60d5d87b0..d35a5b408b98 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -65,8 +65,7 @@ void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn,
 				    int min_level);
-void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
-					u64 start_gfn, u64 pages);
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages);
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
 extern int nx_huge_pages;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index daf9c7731edc..bfee5e0d1ee1 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -929,8 +929,8 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 
 			mmu_page_zap_pte(vcpu->kvm, sp, sptep, NULL);
 			if (is_shadow_present_pte(old_spte))
-				kvm_flush_remote_tlbs_with_address(vcpu->kvm,
-					sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
+				kvm_flush_remote_tlbs_range(vcpu->kvm, sp->gfn,
+							    KVM_PAGES_PER_HPAGE(sp->role.level));
 
 			if (!rmap_can_add(vcpu))
 				break;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 72746b645e99..1f1f511cd1a0 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -676,8 +676,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
 	if (ret)
 		return ret;
 
-	kvm_flush_remote_tlbs_with_address(kvm, iter->gfn,
-					   TDP_PAGES_PER_LEVEL(iter->level));
+	kvm_flush_remote_tlbs_range(kvm, iter->gfn, TDP_PAGES_PER_LEVEL(iter->level));
 
 	/*
 	 * No other thread can overwrite the removed SPTE as they must either
@@ -1067,8 +1066,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 		return RET_PF_RETRY;
 	else if (tdp_pte_is_present(iter->old_spte) &&
 		 !tdp_pte_is_leaf(iter->old_spte, iter->level))
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
-						   TDP_PAGES_PER_LEVEL(iter->level + 1));
+		kvm_flush_remote_tlbs_range(vcpu->kvm, sp->gfn,
+					    TDP_PAGES_PER_LEVEL(iter->level + 1));
 
 	/*
 	 * If the page fault was caused by a write but the page is write
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 30/37] KVM: x86/mmu: Rename kvm_flush_remote_tlbs_with_address()
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Rename kvm_flush_remote_tlbs_with_address() to
kvm_flush_remote_tlbs_range(). This name is shorter, which reduces the
number of callsites that need to be broken up across multiple lines, and
more readable since it conveys a range of memory is being flushed rather
than a single address.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 32 ++++++++++++++------------------
 arch/x86/kvm/mmu/mmu_internal.h |  3 +--
 arch/x86/kvm/mmu/paging_tmpl.h  |  4 ++--
 arch/x86/kvm/mmu/tdp_mmu.c      |  7 +++----
 4 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b7bbabac9127..4a28adaa92b4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -244,8 +244,7 @@ static inline bool kvm_available_flush_tlb_with_range(void)
 	return kvm_x86_ops.tlb_remote_flush_with_range;
 }
 
-void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
-		u64 start_gfn, u64 pages)
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages)
 {
 	struct kvm_tlb_range range;
 	int ret = -EOPNOTSUPP;
@@ -804,7 +803,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
 	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
-		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+		kvm_flush_remote_tlbs_range(kvm, gfn, 1);
 }
 
 void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
@@ -1178,7 +1177,7 @@ static void drop_large_spte(struct kvm *kvm, u64 *sptep, bool flush)
 	drop_spte(kvm, sptep);
 
 	if (flush)
-		kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+		kvm_flush_remote_tlbs_range(kvm, sp->gfn,
 			KVM_PAGES_PER_HPAGE(sp->role.level));
 }
 
@@ -1460,7 +1459,7 @@ static bool kvm_set_pte_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 	}
 
 	if (need_flush && kvm_available_flush_tlb_with_range()) {
-		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+		kvm_flush_remote_tlbs_range(kvm, gfn, 1);
 		return false;
 	}
 
@@ -1630,8 +1629,8 @@ static void __rmap_add(struct kvm *kvm,
 		kvm->stat.max_mmu_rmap_size = rmap_count;
 	if (rmap_count > RMAP_RECYCLE_THRESHOLD) {
 		kvm_zap_all_rmap_sptes(kvm, rmap_head);
-		kvm_flush_remote_tlbs_with_address(
-				kvm, sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
+		kvm_flush_remote_tlbs_range(kvm, sp->gfn,
+					    KVM_PAGES_PER_HPAGE(sp->role.level));
 	}
 }
 
@@ -2389,7 +2388,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 			return;
 
 		drop_parent_pte(child, sptep);
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
+		kvm_flush_remote_tlbs_range(vcpu->kvm, child->gfn, 1);
 	}
 }
 
@@ -2873,8 +2872,8 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
 	}
 
 	if (flush)
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn,
-				KVM_PAGES_PER_HPAGE(level));
+		kvm_flush_remote_tlbs_range(vcpu->kvm, gfn,
+					    KVM_PAGES_PER_HPAGE(level));
 
 	pgprintk("%s: setting spte %llx\n", __func__, *sptep);
 
@@ -5809,9 +5808,8 @@ slot_handle_level_range(struct kvm *kvm, const struct kvm_memory_slot *memslot,
 
 		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
 			if (flush && flush_on_yield) {
-				kvm_flush_remote_tlbs_with_address(kvm,
-						start_gfn,
-						iterator.gfn - start_gfn + 1);
+				kvm_flush_remote_tlbs_range(kvm, start_gfn,
+							    iterator.gfn - start_gfn + 1);
 				flush = false;
 			}
 			cond_resched_rwlock_write(&kvm->mmu_lock);
@@ -6166,8 +6164,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 	}
 
 	if (flush)
-		kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
-						   gfn_end - gfn_start);
+		kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - gfn_start);
 
 	kvm_mmu_invalidate_end(kvm, gfn_start, gfn_end);
 
@@ -6506,7 +6503,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
 
 			if (kvm_available_flush_tlb_with_range())
-				kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+				kvm_flush_remote_tlbs_range(kvm, sp->gfn,
 					KVM_PAGES_PER_HPAGE(sp->role.level));
 			else
 				need_tlb_flush = 1;
@@ -6557,8 +6554,7 @@ void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
 	 * is observed by any other operation on the same memslot.
 	 */
 	lockdep_assert_held(&kvm->slots_lock);
-	kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
-					   memslot->npages);
+	kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
 }
 
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 4aa60d5d87b0..d35a5b408b98 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -65,8 +65,7 @@ void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn,
 				    int min_level);
-void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
-					u64 start_gfn, u64 pages);
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages);
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
 extern int nx_huge_pages;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index daf9c7731edc..bfee5e0d1ee1 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -929,8 +929,8 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 
 			mmu_page_zap_pte(vcpu->kvm, sp, sptep, NULL);
 			if (is_shadow_present_pte(old_spte))
-				kvm_flush_remote_tlbs_with_address(vcpu->kvm,
-					sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
+				kvm_flush_remote_tlbs_range(vcpu->kvm, sp->gfn,
+							    KVM_PAGES_PER_HPAGE(sp->role.level));
 
 			if (!rmap_can_add(vcpu))
 				break;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 72746b645e99..1f1f511cd1a0 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -676,8 +676,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
 	if (ret)
 		return ret;
 
-	kvm_flush_remote_tlbs_with_address(kvm, iter->gfn,
-					   TDP_PAGES_PER_LEVEL(iter->level));
+	kvm_flush_remote_tlbs_range(kvm, iter->gfn, TDP_PAGES_PER_LEVEL(iter->level));
 
 	/*
 	 * No other thread can overwrite the removed SPTE as they must either
@@ -1067,8 +1066,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 		return RET_PF_RETRY;
 	else if (tdp_pte_is_present(iter->old_spte) &&
 		 !tdp_pte_is_leaf(iter->old_spte, iter->level))
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
-						   TDP_PAGES_PER_LEVEL(iter->level + 1));
+		kvm_flush_remote_tlbs_range(vcpu->kvm, sp->gfn,
+					    TDP_PAGES_PER_LEVEL(iter->level + 1));
 
 	/*
 	 * If the page fault was caused by a write but the page is write
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 30/37] KVM: x86/mmu: Rename kvm_flush_remote_tlbs_with_address()
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Rename kvm_flush_remote_tlbs_with_address() to
kvm_flush_remote_tlbs_range(). This name is shorter, which reduces the
number of callsites that need to be broken up across multiple lines, and
more readable since it conveys a range of memory is being flushed rather
than a single address.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 32 ++++++++++++++------------------
 arch/x86/kvm/mmu/mmu_internal.h |  3 +--
 arch/x86/kvm/mmu/paging_tmpl.h  |  4 ++--
 arch/x86/kvm/mmu/tdp_mmu.c      |  7 +++----
 4 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b7bbabac9127..4a28adaa92b4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -244,8 +244,7 @@ static inline bool kvm_available_flush_tlb_with_range(void)
 	return kvm_x86_ops.tlb_remote_flush_with_range;
 }
 
-void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
-		u64 start_gfn, u64 pages)
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages)
 {
 	struct kvm_tlb_range range;
 	int ret = -EOPNOTSUPP;
@@ -804,7 +803,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
 	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
-		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+		kvm_flush_remote_tlbs_range(kvm, gfn, 1);
 }
 
 void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
@@ -1178,7 +1177,7 @@ static void drop_large_spte(struct kvm *kvm, u64 *sptep, bool flush)
 	drop_spte(kvm, sptep);
 
 	if (flush)
-		kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+		kvm_flush_remote_tlbs_range(kvm, sp->gfn,
 			KVM_PAGES_PER_HPAGE(sp->role.level));
 }
 
@@ -1460,7 +1459,7 @@ static bool kvm_set_pte_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 	}
 
 	if (need_flush && kvm_available_flush_tlb_with_range()) {
-		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+		kvm_flush_remote_tlbs_range(kvm, gfn, 1);
 		return false;
 	}
 
@@ -1630,8 +1629,8 @@ static void __rmap_add(struct kvm *kvm,
 		kvm->stat.max_mmu_rmap_size = rmap_count;
 	if (rmap_count > RMAP_RECYCLE_THRESHOLD) {
 		kvm_zap_all_rmap_sptes(kvm, rmap_head);
-		kvm_flush_remote_tlbs_with_address(
-				kvm, sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
+		kvm_flush_remote_tlbs_range(kvm, sp->gfn,
+					    KVM_PAGES_PER_HPAGE(sp->role.level));
 	}
 }
 
@@ -2389,7 +2388,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 			return;
 
 		drop_parent_pte(child, sptep);
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
+		kvm_flush_remote_tlbs_range(vcpu->kvm, child->gfn, 1);
 	}
 }
 
@@ -2873,8 +2872,8 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
 	}
 
 	if (flush)
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn,
-				KVM_PAGES_PER_HPAGE(level));
+		kvm_flush_remote_tlbs_range(vcpu->kvm, gfn,
+					    KVM_PAGES_PER_HPAGE(level));
 
 	pgprintk("%s: setting spte %llx\n", __func__, *sptep);
 
@@ -5809,9 +5808,8 @@ slot_handle_level_range(struct kvm *kvm, const struct kvm_memory_slot *memslot,
 
 		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
 			if (flush && flush_on_yield) {
-				kvm_flush_remote_tlbs_with_address(kvm,
-						start_gfn,
-						iterator.gfn - start_gfn + 1);
+				kvm_flush_remote_tlbs_range(kvm, start_gfn,
+							    iterator.gfn - start_gfn + 1);
 				flush = false;
 			}
 			cond_resched_rwlock_write(&kvm->mmu_lock);
@@ -6166,8 +6164,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 	}
 
 	if (flush)
-		kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
-						   gfn_end - gfn_start);
+		kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - gfn_start);
 
 	kvm_mmu_invalidate_end(kvm, gfn_start, gfn_end);
 
@@ -6506,7 +6503,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
 
 			if (kvm_available_flush_tlb_with_range())
-				kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+				kvm_flush_remote_tlbs_range(kvm, sp->gfn,
 					KVM_PAGES_PER_HPAGE(sp->role.level));
 			else
 				need_tlb_flush = 1;
@@ -6557,8 +6554,7 @@ void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
 	 * is observed by any other operation on the same memslot.
 	 */
 	lockdep_assert_held(&kvm->slots_lock);
-	kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
-					   memslot->npages);
+	kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
 }
 
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 4aa60d5d87b0..d35a5b408b98 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -65,8 +65,7 @@ void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn,
 				    int min_level);
-void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
-					u64 start_gfn, u64 pages);
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages);
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
 extern int nx_huge_pages;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index daf9c7731edc..bfee5e0d1ee1 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -929,8 +929,8 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 
 			mmu_page_zap_pte(vcpu->kvm, sp, sptep, NULL);
 			if (is_shadow_present_pte(old_spte))
-				kvm_flush_remote_tlbs_with_address(vcpu->kvm,
-					sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
+				kvm_flush_remote_tlbs_range(vcpu->kvm, sp->gfn,
+							    KVM_PAGES_PER_HPAGE(sp->role.level));
 
 			if (!rmap_can_add(vcpu))
 				break;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 72746b645e99..1f1f511cd1a0 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -676,8 +676,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
 	if (ret)
 		return ret;
 
-	kvm_flush_remote_tlbs_with_address(kvm, iter->gfn,
-					   TDP_PAGES_PER_LEVEL(iter->level));
+	kvm_flush_remote_tlbs_range(kvm, iter->gfn, TDP_PAGES_PER_LEVEL(iter->level));
 
 	/*
 	 * No other thread can overwrite the removed SPTE as they must either
@@ -1067,8 +1066,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 		return RET_PF_RETRY;
 	else if (tdp_pte_is_present(iter->old_spte) &&
 		 !tdp_pte_is_leaf(iter->old_spte, iter->level))
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
-						   TDP_PAGES_PER_LEVEL(iter->level + 1));
+		kvm_flush_remote_tlbs_range(vcpu->kvm, sp->gfn,
+					    TDP_PAGES_PER_LEVEL(iter->level + 1));
 
 	/*
 	 * If the page fault was caused by a write but the page is write
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 30/37] KVM: x86/mmu: Rename kvm_flush_remote_tlbs_with_address()
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Rename kvm_flush_remote_tlbs_with_address() to
kvm_flush_remote_tlbs_range(). This name is shorter, which reduces the
number of callsites that need to be broken up across multiple lines, and
more readable since it conveys a range of memory is being flushed rather
than a single address.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 32 ++++++++++++++------------------
 arch/x86/kvm/mmu/mmu_internal.h |  3 +--
 arch/x86/kvm/mmu/paging_tmpl.h  |  4 ++--
 arch/x86/kvm/mmu/tdp_mmu.c      |  7 +++----
 4 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b7bbabac9127..4a28adaa92b4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -244,8 +244,7 @@ static inline bool kvm_available_flush_tlb_with_range(void)
 	return kvm_x86_ops.tlb_remote_flush_with_range;
 }
 
-void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
-		u64 start_gfn, u64 pages)
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages)
 {
 	struct kvm_tlb_range range;
 	int ret = -EOPNOTSUPP;
@@ -804,7 +803,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
 	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
-		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+		kvm_flush_remote_tlbs_range(kvm, gfn, 1);
 }
 
 void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
@@ -1178,7 +1177,7 @@ static void drop_large_spte(struct kvm *kvm, u64 *sptep, bool flush)
 	drop_spte(kvm, sptep);
 
 	if (flush)
-		kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+		kvm_flush_remote_tlbs_range(kvm, sp->gfn,
 			KVM_PAGES_PER_HPAGE(sp->role.level));
 }
 
@@ -1460,7 +1459,7 @@ static bool kvm_set_pte_rmap(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 	}
 
 	if (need_flush && kvm_available_flush_tlb_with_range()) {
-		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+		kvm_flush_remote_tlbs_range(kvm, gfn, 1);
 		return false;
 	}
 
@@ -1630,8 +1629,8 @@ static void __rmap_add(struct kvm *kvm,
 		kvm->stat.max_mmu_rmap_size = rmap_count;
 	if (rmap_count > RMAP_RECYCLE_THRESHOLD) {
 		kvm_zap_all_rmap_sptes(kvm, rmap_head);
-		kvm_flush_remote_tlbs_with_address(
-				kvm, sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
+		kvm_flush_remote_tlbs_range(kvm, sp->gfn,
+					    KVM_PAGES_PER_HPAGE(sp->role.level));
 	}
 }
 
@@ -2389,7 +2388,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 			return;
 
 		drop_parent_pte(child, sptep);
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
+		kvm_flush_remote_tlbs_range(vcpu->kvm, child->gfn, 1);
 	}
 }
 
@@ -2873,8 +2872,8 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
 	}
 
 	if (flush)
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn,
-				KVM_PAGES_PER_HPAGE(level));
+		kvm_flush_remote_tlbs_range(vcpu->kvm, gfn,
+					    KVM_PAGES_PER_HPAGE(level));
 
 	pgprintk("%s: setting spte %llx\n", __func__, *sptep);
 
@@ -5809,9 +5808,8 @@ slot_handle_level_range(struct kvm *kvm, const struct kvm_memory_slot *memslot,
 
 		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
 			if (flush && flush_on_yield) {
-				kvm_flush_remote_tlbs_with_address(kvm,
-						start_gfn,
-						iterator.gfn - start_gfn + 1);
+				kvm_flush_remote_tlbs_range(kvm, start_gfn,
+							    iterator.gfn - start_gfn + 1);
 				flush = false;
 			}
 			cond_resched_rwlock_write(&kvm->mmu_lock);
@@ -6166,8 +6164,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 	}
 
 	if (flush)
-		kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
-						   gfn_end - gfn_start);
+		kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - gfn_start);
 
 	kvm_mmu_invalidate_end(kvm, gfn_start, gfn_end);
 
@@ -6506,7 +6503,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
 
 			if (kvm_available_flush_tlb_with_range())
-				kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+				kvm_flush_remote_tlbs_range(kvm, sp->gfn,
 					KVM_PAGES_PER_HPAGE(sp->role.level));
 			else
 				need_tlb_flush = 1;
@@ -6557,8 +6554,7 @@ void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
 	 * is observed by any other operation on the same memslot.
 	 */
 	lockdep_assert_held(&kvm->slots_lock);
-	kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
-					   memslot->npages);
+	kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
 }
 
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 4aa60d5d87b0..d35a5b408b98 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -65,8 +65,7 @@ void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn,
 				    int min_level);
-void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
-					u64 start_gfn, u64 pages);
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages);
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
 extern int nx_huge_pages;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index daf9c7731edc..bfee5e0d1ee1 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -929,8 +929,8 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
 
 			mmu_page_zap_pte(vcpu->kvm, sp, sptep, NULL);
 			if (is_shadow_present_pte(old_spte))
-				kvm_flush_remote_tlbs_with_address(vcpu->kvm,
-					sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
+				kvm_flush_remote_tlbs_range(vcpu->kvm, sp->gfn,
+							    KVM_PAGES_PER_HPAGE(sp->role.level));
 
 			if (!rmap_can_add(vcpu))
 				break;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 72746b645e99..1f1f511cd1a0 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -676,8 +676,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
 	if (ret)
 		return ret;
 
-	kvm_flush_remote_tlbs_with_address(kvm, iter->gfn,
-					   TDP_PAGES_PER_LEVEL(iter->level));
+	kvm_flush_remote_tlbs_range(kvm, iter->gfn, TDP_PAGES_PER_LEVEL(iter->level));
 
 	/*
 	 * No other thread can overwrite the removed SPTE as they must either
@@ -1067,8 +1066,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
 		return RET_PF_RETRY;
 	else if (tdp_pte_is_present(iter->old_spte) &&
 		 !tdp_pte_is_leaf(iter->old_spte, iter->level))
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
-						   TDP_PAGES_PER_LEVEL(iter->level + 1));
+		kvm_flush_remote_tlbs_range(vcpu->kvm, sp->gfn,
+					    TDP_PAGES_PER_LEVEL(iter->level + 1));
 
 	/*
 	 * If the page fault was caused by a write but the page is write
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 31/37] KVM: x86/MMU: Use gfn_t in kvm_flush_remote_tlbs_range()
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Use gfn_t instead of u64 for the start_gfn parameter to
kvm_flush_remote_tlbs_range(), since that is the standard type for GFNs
throughout KVM.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 2 +-
 arch/x86/kvm/mmu/mmu_internal.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4a28adaa92b4..19963ed83484 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -244,7 +244,7 @@ static inline bool kvm_available_flush_tlb_with_range(void)
 	return kvm_x86_ops.tlb_remote_flush_with_range;
 }
 
-void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages)
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
 {
 	struct kvm_tlb_range range;
 	int ret = -EOPNOTSUPP;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d35a5b408b98..e44fe7ad3cfb 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -65,7 +65,7 @@ void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn,
 				    int min_level);
-void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages);
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
 extern int nx_huge_pages;
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 31/37] KVM: x86/MMU: Use gfn_t in kvm_flush_remote_tlbs_range()
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Use gfn_t instead of u64 for the start_gfn parameter to
kvm_flush_remote_tlbs_range(), since that is the standard type for GFNs
throughout KVM.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 2 +-
 arch/x86/kvm/mmu/mmu_internal.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4a28adaa92b4..19963ed83484 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -244,7 +244,7 @@ static inline bool kvm_available_flush_tlb_with_range(void)
 	return kvm_x86_ops.tlb_remote_flush_with_range;
 }
 
-void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages)
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
 {
 	struct kvm_tlb_range range;
 	int ret = -EOPNOTSUPP;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d35a5b408b98..e44fe7ad3cfb 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -65,7 +65,7 @@ void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn,
 				    int min_level);
-void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages);
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
 extern int nx_huge_pages;
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 31/37] KVM: x86/MMU: Use gfn_t in kvm_flush_remote_tlbs_range()
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Use gfn_t instead of u64 for the start_gfn parameter to
kvm_flush_remote_tlbs_range(), since that is the standard type for GFNs
throughout KVM.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 2 +-
 arch/x86/kvm/mmu/mmu_internal.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4a28adaa92b4..19963ed83484 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -244,7 +244,7 @@ static inline bool kvm_available_flush_tlb_with_range(void)
 	return kvm_x86_ops.tlb_remote_flush_with_range;
 }
 
-void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages)
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
 {
 	struct kvm_tlb_range range;
 	int ret = -EOPNOTSUPP;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d35a5b408b98..e44fe7ad3cfb 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -65,7 +65,7 @@ void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn,
 				    int min_level);
-void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages);
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
 extern int nx_huge_pages;
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 31/37] KVM: x86/MMU: Use gfn_t in kvm_flush_remote_tlbs_range()
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Use gfn_t instead of u64 for the start_gfn parameter to
kvm_flush_remote_tlbs_range(), since that is the standard type for GFNs
throughout KVM.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 2 +-
 arch/x86/kvm/mmu/mmu_internal.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4a28adaa92b4..19963ed83484 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -244,7 +244,7 @@ static inline bool kvm_available_flush_tlb_with_range(void)
 	return kvm_x86_ops.tlb_remote_flush_with_range;
 }
 
-void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages)
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
 {
 	struct kvm_tlb_range range;
 	int ret = -EOPNOTSUPP;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d35a5b408b98..e44fe7ad3cfb 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -65,7 +65,7 @@ void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn,
 				    int min_level);
-void kvm_flush_remote_tlbs_range(struct kvm *kvm, u64 start_gfn, u64 pages);
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
 extern int nx_huge_pages;
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 32/37] KVM: Allow range-based TLB invalidation from common code
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Make kvm_flush_remote_tlbs_range() visible in common code and create a
default implementation that just invalidates the whole TLB. This will be
used in future commits to clean up kvm_arch_flush_remote_tlbs_memslot()
and to move the KVM/x86 TDP MMU to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 1 -
 include/linux/kvm_host.h        | 1 +
 virt/kvm/kvm_main.c             | 9 +++++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index e44fe7ad3cfb..df815cb84bd2 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -65,7 +65,6 @@ void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn,
 				    int min_level);
-void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
 extern int nx_huge_pages;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ec3a6de6d54e..d9a7f559d2c5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1365,6 +1365,7 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible);
 
 void kvm_flush_remote_tlbs(struct kvm *kvm);
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
 
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
 int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0f1d48ed7d57..662ca280c0cf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -379,6 +379,15 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 #endif
 
+/*
+ * Architectures that support range-based TLB invalidation can override this
+ * function.
+ */
+void __weak kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
+{
+	kvm_flush_remote_tlbs(kvm);
+}
+
 static void kvm_flush_shadow_all(struct kvm *kvm)
 {
 	kvm_arch_flush_shadow_all(kvm);
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 32/37] KVM: Allow range-based TLB invalidation from common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Make kvm_flush_remote_tlbs_range() visible in common code and create a
default implementation that just invalidates the whole TLB. This will be
used in future commits to clean up kvm_arch_flush_remote_tlbs_memslot()
and to move the KVM/x86 TDP MMU to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 1 -
 include/linux/kvm_host.h        | 1 +
 virt/kvm/kvm_main.c             | 9 +++++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index e44fe7ad3cfb..df815cb84bd2 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -65,7 +65,6 @@ void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn,
 				    int min_level);
-void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
 extern int nx_huge_pages;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ec3a6de6d54e..d9a7f559d2c5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1365,6 +1365,7 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible);
 
 void kvm_flush_remote_tlbs(struct kvm *kvm);
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
 
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
 int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0f1d48ed7d57..662ca280c0cf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -379,6 +379,15 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 #endif
 
+/*
+ * Architectures that support range-based TLB invalidation can override this
+ * function.
+ */
+void __weak kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
+{
+	kvm_flush_remote_tlbs(kvm);
+}
+
 static void kvm_flush_shadow_all(struct kvm *kvm)
 {
 	kvm_arch_flush_shadow_all(kvm);
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 32/37] KVM: Allow range-based TLB invalidation from common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Make kvm_flush_remote_tlbs_range() visible in common code and create a
default implementation that just invalidates the whole TLB. This will be
used in future commits to clean up kvm_arch_flush_remote_tlbs_memslot()
and to move the KVM/x86 TDP MMU to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 1 -
 include/linux/kvm_host.h        | 1 +
 virt/kvm/kvm_main.c             | 9 +++++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index e44fe7ad3cfb..df815cb84bd2 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -65,7 +65,6 @@ void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn,
 				    int min_level);
-void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
 extern int nx_huge_pages;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ec3a6de6d54e..d9a7f559d2c5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1365,6 +1365,7 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible);
 
 void kvm_flush_remote_tlbs(struct kvm *kvm);
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
 
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
 int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0f1d48ed7d57..662ca280c0cf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -379,6 +379,15 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 #endif
 
+/*
+ * Architectures that support range-based TLB invalidation can override this
+ * function.
+ */
+void __weak kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
+{
+	kvm_flush_remote_tlbs(kvm);
+}
+
 static void kvm_flush_shadow_all(struct kvm *kvm)
 {
 	kvm_arch_flush_shadow_all(kvm);
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 32/37] KVM: Allow range-based TLB invalidation from common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Make kvm_flush_remote_tlbs_range() visible in common code and create a
default implementation that just invalidates the whole TLB. This will be
used in future commits to clean up kvm_arch_flush_remote_tlbs_memslot()
and to move the KVM/x86 TDP MMU to common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 1 -
 include/linux/kvm_host.h        | 1 +
 virt/kvm/kvm_main.c             | 9 +++++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index e44fe7ad3cfb..df815cb84bd2 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -65,7 +65,6 @@ void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn,
 				    int min_level);
-void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);
 
 extern int nx_huge_pages;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ec3a6de6d54e..d9a7f559d2c5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1365,6 +1365,7 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible);
 
 void kvm_flush_remote_tlbs(struct kvm *kvm);
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
 
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
 int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0f1d48ed7d57..662ca280c0cf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -379,6 +379,15 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 #endif
 
+/*
+ * Architectures that support range-based TLB invalidation can override this
+ * function.
+ */
+void __weak kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
+{
+	kvm_flush_remote_tlbs(kvm);
+}
+
 static void kvm_flush_shadow_all(struct kvm *kvm)
 {
 	kvm_arch_flush_shadow_all(kvm);
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 33/37] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop
"arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a
range-based TLB invalidation where the range is defined by the memslot.
Now that kvm_flush_remote_tlbs_range() can be called from common code we
can just use that and drop a bunch of duplicate code from the arch
directories.

Note this adds a lockdep assertion for slot_lock being held when calling
kvm_flush_remote_tlbs_memslot(), which was previously only asserted on
x86.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/arm64/kvm/arm.c     |  6 ------
 arch/mips/kvm/mips.c     | 10 ++--------
 arch/riscv/kvm/mmu.c     |  6 ------
 arch/x86/kvm/mmu/mmu.c   | 16 +---------------
 arch/x86/kvm/x86.c       |  2 +-
 include/linux/kvm_host.h |  7 +++----
 virt/kvm/kvm_main.c      | 17 +++++++++++++++--
 7 files changed, 22 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 0e0d4c4f79a2..4f1549c1d2d2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1430,12 +1430,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	kvm_flush_remote_tlbs(kvm);
-}
-
 static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 					struct kvm_arm_device_addr *dev_addr)
 {
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a25e0b73ee70..ecd8a051fd6b 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -209,7 +209,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	/* Flush slot from GPA */
 	kvm_mips_flush_gpa_pt(kvm, slot->base_gfn,
 			      slot->base_gfn + slot->npages - 1);
-	kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+	kvm_flush_remote_tlbs_memslot(kvm, slot);
 	spin_unlock(&kvm->mmu_lock);
 }
 
@@ -245,7 +245,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		needs_flush = kvm_mips_mkclean_gpa_pt(kvm, new->base_gfn,
 					new->base_gfn + new->npages - 1);
 		if (needs_flush)
-			kvm_arch_flush_remote_tlbs_memslot(kvm, new);
+			kvm_flush_remote_tlbs_memslot(kvm, new);
 		spin_unlock(&kvm->mmu_lock);
 	}
 }
@@ -997,12 +997,6 @@ int kvm_arch_flush_remote_tlb(struct kvm *kvm)
 	return 1;
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	kvm_flush_remote_tlbs(kvm);
-}
-
 long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 {
 	long r;
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index a8281a65cb3d..98bf3719a396 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -406,12 +406,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	kvm_flush_remote_tlbs(kvm);
-}
-
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free)
 {
 }
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 19963ed83484..f2602ee1771f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6524,7 +6524,7 @@ static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
 	 */
 	if (slot_handle_level(kvm, slot, kvm_mmu_zap_collapsible_spte,
 			      PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
-		kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+		kvm_flush_remote_tlbs_memslot(kvm, slot);
 }
 
 void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
@@ -6543,20 +6543,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 	}
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	/*
-	 * All current use cases for flushing the TLBs for a specific memslot
-	 * related to dirty logging, and many do the TLB flush out of mmu_lock.
-	 * The interaction between the various operations on memslot must be
-	 * serialized by slots_locks to ensure the TLB flush from one operation
-	 * is observed by any other operation on the same memslot.
-	 */
-	lockdep_assert_held(&kvm->slots_lock);
-	kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
-}
-
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 517c8ed33542..95ff95da55d5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12574,7 +12574,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 		 * See is_writable_pte() for more details (the case involving
 		 * access-tracked SPTEs is particularly relevant).
 		 */
-		kvm_arch_flush_remote_tlbs_memslot(kvm, new);
+		kvm_flush_remote_tlbs_memslot(kvm, new);
 	}
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d9a7f559d2c5..46ed0ef4fb79 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1366,6 +1366,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible);
 
 void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
+void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
+				   const struct kvm_memory_slot *memslot);
 
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
 int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
@@ -1394,10 +1396,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 					unsigned long mask);
 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
 
-#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot);
-#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
+#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
 int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 		      int *is_dirty, struct kvm_memory_slot **memslot);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 662ca280c0cf..39c2efd15504 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -388,6 +388,19 @@ void __weak kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pa
 	kvm_flush_remote_tlbs(kvm);
 }
 
+void kvm_flush_remote_tlbs_memslot(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+	/*
+	 * All current use cases for flushing the TLBs for a specific memslot
+	 * related to dirty logging, and many do the TLB flush out of mmu_lock.
+	 * The interaction between the various operations on memslot must be
+	 * serialized by slots_locks to ensure the TLB flush from one operation
+	 * is observed by any other operation on the same memslot.
+	 */
+	lockdep_assert_held(&kvm->slots_lock);
+	kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
+}
+
 static void kvm_flush_shadow_all(struct kvm *kvm)
 {
 	kvm_arch_flush_shadow_all(kvm);
@@ -2197,7 +2210,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	}
 
 	if (flush)
-		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
+		kvm_flush_remote_tlbs_memslot(kvm, memslot);
 
 	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
 		return -EFAULT;
@@ -2314,7 +2327,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	KVM_MMU_UNLOCK(kvm);
 
 	if (flush)
-		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
+		kvm_flush_remote_tlbs_memslot(kvm, memslot);
 
 	return 0;
 }
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 33/37] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop
"arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a
range-based TLB invalidation where the range is defined by the memslot.
Now that kvm_flush_remote_tlbs_range() can be called from common code we
can just use that and drop a bunch of duplicate code from the arch
directories.

Note this adds a lockdep assertion for slot_lock being held when calling
kvm_flush_remote_tlbs_memslot(), which was previously only asserted on
x86.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/arm64/kvm/arm.c     |  6 ------
 arch/mips/kvm/mips.c     | 10 ++--------
 arch/riscv/kvm/mmu.c     |  6 ------
 arch/x86/kvm/mmu/mmu.c   | 16 +---------------
 arch/x86/kvm/x86.c       |  2 +-
 include/linux/kvm_host.h |  7 +++----
 virt/kvm/kvm_main.c      | 17 +++++++++++++++--
 7 files changed, 22 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 0e0d4c4f79a2..4f1549c1d2d2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1430,12 +1430,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	kvm_flush_remote_tlbs(kvm);
-}
-
 static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 					struct kvm_arm_device_addr *dev_addr)
 {
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a25e0b73ee70..ecd8a051fd6b 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -209,7 +209,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	/* Flush slot from GPA */
 	kvm_mips_flush_gpa_pt(kvm, slot->base_gfn,
 			      slot->base_gfn + slot->npages - 1);
-	kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+	kvm_flush_remote_tlbs_memslot(kvm, slot);
 	spin_unlock(&kvm->mmu_lock);
 }
 
@@ -245,7 +245,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		needs_flush = kvm_mips_mkclean_gpa_pt(kvm, new->base_gfn,
 					new->base_gfn + new->npages - 1);
 		if (needs_flush)
-			kvm_arch_flush_remote_tlbs_memslot(kvm, new);
+			kvm_flush_remote_tlbs_memslot(kvm, new);
 		spin_unlock(&kvm->mmu_lock);
 	}
 }
@@ -997,12 +997,6 @@ int kvm_arch_flush_remote_tlb(struct kvm *kvm)
 	return 1;
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	kvm_flush_remote_tlbs(kvm);
-}
-
 long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 {
 	long r;
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index a8281a65cb3d..98bf3719a396 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -406,12 +406,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	kvm_flush_remote_tlbs(kvm);
-}
-
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free)
 {
 }
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 19963ed83484..f2602ee1771f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6524,7 +6524,7 @@ static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
 	 */
 	if (slot_handle_level(kvm, slot, kvm_mmu_zap_collapsible_spte,
 			      PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
-		kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+		kvm_flush_remote_tlbs_memslot(kvm, slot);
 }
 
 void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
@@ -6543,20 +6543,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 	}
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	/*
-	 * All current use cases for flushing the TLBs for a specific memslot
-	 * related to dirty logging, and many do the TLB flush out of mmu_lock.
-	 * The interaction between the various operations on memslot must be
-	 * serialized by slots_locks to ensure the TLB flush from one operation
-	 * is observed by any other operation on the same memslot.
-	 */
-	lockdep_assert_held(&kvm->slots_lock);
-	kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
-}
-
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 517c8ed33542..95ff95da55d5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12574,7 +12574,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 		 * See is_writable_pte() for more details (the case involving
 		 * access-tracked SPTEs is particularly relevant).
 		 */
-		kvm_arch_flush_remote_tlbs_memslot(kvm, new);
+		kvm_flush_remote_tlbs_memslot(kvm, new);
 	}
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d9a7f559d2c5..46ed0ef4fb79 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1366,6 +1366,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible);
 
 void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
+void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
+				   const struct kvm_memory_slot *memslot);
 
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
 int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
@@ -1394,10 +1396,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 					unsigned long mask);
 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
 
-#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot);
-#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
+#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
 int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 		      int *is_dirty, struct kvm_memory_slot **memslot);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 662ca280c0cf..39c2efd15504 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -388,6 +388,19 @@ void __weak kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pa
 	kvm_flush_remote_tlbs(kvm);
 }
 
+void kvm_flush_remote_tlbs_memslot(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+	/*
+	 * All current use cases for flushing the TLBs for a specific memslot
+	 * related to dirty logging, and many do the TLB flush out of mmu_lock.
+	 * The interaction between the various operations on memslot must be
+	 * serialized by slots_locks to ensure the TLB flush from one operation
+	 * is observed by any other operation on the same memslot.
+	 */
+	lockdep_assert_held(&kvm->slots_lock);
+	kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
+}
+
 static void kvm_flush_shadow_all(struct kvm *kvm)
 {
 	kvm_arch_flush_shadow_all(kvm);
@@ -2197,7 +2210,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	}
 
 	if (flush)
-		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
+		kvm_flush_remote_tlbs_memslot(kvm, memslot);
 
 	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
 		return -EFAULT;
@@ -2314,7 +2327,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	KVM_MMU_UNLOCK(kvm);
 
 	if (flush)
-		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
+		kvm_flush_remote_tlbs_memslot(kvm, memslot);
 
 	return 0;
 }
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 33/37] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop
"arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a
range-based TLB invalidation where the range is defined by the memslot.
Now that kvm_flush_remote_tlbs_range() can be called from common code we
can just use that and drop a bunch of duplicate code from the arch
directories.

Note this adds a lockdep assertion for slot_lock being held when calling
kvm_flush_remote_tlbs_memslot(), which was previously only asserted on
x86.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/arm64/kvm/arm.c     |  6 ------
 arch/mips/kvm/mips.c     | 10 ++--------
 arch/riscv/kvm/mmu.c     |  6 ------
 arch/x86/kvm/mmu/mmu.c   | 16 +---------------
 arch/x86/kvm/x86.c       |  2 +-
 include/linux/kvm_host.h |  7 +++----
 virt/kvm/kvm_main.c      | 17 +++++++++++++++--
 7 files changed, 22 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 0e0d4c4f79a2..4f1549c1d2d2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1430,12 +1430,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	kvm_flush_remote_tlbs(kvm);
-}
-
 static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 					struct kvm_arm_device_addr *dev_addr)
 {
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a25e0b73ee70..ecd8a051fd6b 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -209,7 +209,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	/* Flush slot from GPA */
 	kvm_mips_flush_gpa_pt(kvm, slot->base_gfn,
 			      slot->base_gfn + slot->npages - 1);
-	kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+	kvm_flush_remote_tlbs_memslot(kvm, slot);
 	spin_unlock(&kvm->mmu_lock);
 }
 
@@ -245,7 +245,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		needs_flush = kvm_mips_mkclean_gpa_pt(kvm, new->base_gfn,
 					new->base_gfn + new->npages - 1);
 		if (needs_flush)
-			kvm_arch_flush_remote_tlbs_memslot(kvm, new);
+			kvm_flush_remote_tlbs_memslot(kvm, new);
 		spin_unlock(&kvm->mmu_lock);
 	}
 }
@@ -997,12 +997,6 @@ int kvm_arch_flush_remote_tlb(struct kvm *kvm)
 	return 1;
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	kvm_flush_remote_tlbs(kvm);
-}
-
 long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 {
 	long r;
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index a8281a65cb3d..98bf3719a396 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -406,12 +406,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	kvm_flush_remote_tlbs(kvm);
-}
-
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free)
 {
 }
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 19963ed83484..f2602ee1771f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6524,7 +6524,7 @@ static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
 	 */
 	if (slot_handle_level(kvm, slot, kvm_mmu_zap_collapsible_spte,
 			      PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
-		kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+		kvm_flush_remote_tlbs_memslot(kvm, slot);
 }
 
 void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
@@ -6543,20 +6543,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 	}
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	/*
-	 * All current use cases for flushing the TLBs for a specific memslot
-	 * related to dirty logging, and many do the TLB flush out of mmu_lock.
-	 * The interaction between the various operations on memslot must be
-	 * serialized by slots_locks to ensure the TLB flush from one operation
-	 * is observed by any other operation on the same memslot.
-	 */
-	lockdep_assert_held(&kvm->slots_lock);
-	kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
-}
-
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 517c8ed33542..95ff95da55d5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12574,7 +12574,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 		 * See is_writable_pte() for more details (the case involving
 		 * access-tracked SPTEs is particularly relevant).
 		 */
-		kvm_arch_flush_remote_tlbs_memslot(kvm, new);
+		kvm_flush_remote_tlbs_memslot(kvm, new);
 	}
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d9a7f559d2c5..46ed0ef4fb79 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1366,6 +1366,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible);
 
 void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
+void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
+				   const struct kvm_memory_slot *memslot);
 
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
 int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
@@ -1394,10 +1396,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 					unsigned long mask);
 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
 
-#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot);
-#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
+#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
 int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 		      int *is_dirty, struct kvm_memory_slot **memslot);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 662ca280c0cf..39c2efd15504 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -388,6 +388,19 @@ void __weak kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pa
 	kvm_flush_remote_tlbs(kvm);
 }
 
+void kvm_flush_remote_tlbs_memslot(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+	/*
+	 * All current use cases for flushing the TLBs for a specific memslot
+	 * related to dirty logging, and many do the TLB flush out of mmu_lock.
+	 * The interaction between the various operations on memslot must be
+	 * serialized by slots_locks to ensure the TLB flush from one operation
+	 * is observed by any other operation on the same memslot.
+	 */
+	lockdep_assert_held(&kvm->slots_lock);
+	kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
+}
+
 static void kvm_flush_shadow_all(struct kvm *kvm)
 {
 	kvm_arch_flush_shadow_all(kvm);
@@ -2197,7 +2210,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	}
 
 	if (flush)
-		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
+		kvm_flush_remote_tlbs_memslot(kvm, memslot);
 
 	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
 		return -EFAULT;
@@ -2314,7 +2327,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	KVM_MMU_UNLOCK(kvm);
 
 	if (flush)
-		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
+		kvm_flush_remote_tlbs_memslot(kvm, memslot);
 
 	return 0;
 }
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 33/37] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop
"arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a
range-based TLB invalidation where the range is defined by the memslot.
Now that kvm_flush_remote_tlbs_range() can be called from common code we
can just use that and drop a bunch of duplicate code from the arch
directories.

Note this adds a lockdep assertion for slot_lock being held when calling
kvm_flush_remote_tlbs_memslot(), which was previously only asserted on
x86.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/arm64/kvm/arm.c     |  6 ------
 arch/mips/kvm/mips.c     | 10 ++--------
 arch/riscv/kvm/mmu.c     |  6 ------
 arch/x86/kvm/mmu/mmu.c   | 16 +---------------
 arch/x86/kvm/x86.c       |  2 +-
 include/linux/kvm_host.h |  7 +++----
 virt/kvm/kvm_main.c      | 17 +++++++++++++++--
 7 files changed, 22 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 0e0d4c4f79a2..4f1549c1d2d2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1430,12 +1430,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	kvm_flush_remote_tlbs(kvm);
-}
-
 static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 					struct kvm_arm_device_addr *dev_addr)
 {
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a25e0b73ee70..ecd8a051fd6b 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -209,7 +209,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	/* Flush slot from GPA */
 	kvm_mips_flush_gpa_pt(kvm, slot->base_gfn,
 			      slot->base_gfn + slot->npages - 1);
-	kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+	kvm_flush_remote_tlbs_memslot(kvm, slot);
 	spin_unlock(&kvm->mmu_lock);
 }
 
@@ -245,7 +245,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		needs_flush = kvm_mips_mkclean_gpa_pt(kvm, new->base_gfn,
 					new->base_gfn + new->npages - 1);
 		if (needs_flush)
-			kvm_arch_flush_remote_tlbs_memslot(kvm, new);
+			kvm_flush_remote_tlbs_memslot(kvm, new);
 		spin_unlock(&kvm->mmu_lock);
 	}
 }
@@ -997,12 +997,6 @@ int kvm_arch_flush_remote_tlb(struct kvm *kvm)
 	return 1;
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	kvm_flush_remote_tlbs(kvm);
-}
-
 long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 {
 	long r;
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index a8281a65cb3d..98bf3719a396 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -406,12 +406,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	kvm_flush_remote_tlbs(kvm);
-}
-
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free)
 {
 }
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 19963ed83484..f2602ee1771f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6524,7 +6524,7 @@ static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
 	 */
 	if (slot_handle_level(kvm, slot, kvm_mmu_zap_collapsible_spte,
 			      PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
-		kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+		kvm_flush_remote_tlbs_memslot(kvm, slot);
 }
 
 void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
@@ -6543,20 +6543,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 	}
 }
 
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot)
-{
-	/*
-	 * All current use cases for flushing the TLBs for a specific memslot
-	 * related to dirty logging, and many do the TLB flush out of mmu_lock.
-	 * The interaction between the various operations on memslot must be
-	 * serialized by slots_locks to ensure the TLB flush from one operation
-	 * is observed by any other operation on the same memslot.
-	 */
-	lockdep_assert_held(&kvm->slots_lock);
-	kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
-}
-
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 517c8ed33542..95ff95da55d5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12574,7 +12574,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 		 * See is_writable_pte() for more details (the case involving
 		 * access-tracked SPTEs is particularly relevant).
 		 */
-		kvm_arch_flush_remote_tlbs_memslot(kvm, new);
+		kvm_flush_remote_tlbs_memslot(kvm, new);
 	}
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d9a7f559d2c5..46ed0ef4fb79 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1366,6 +1366,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible);
 
 void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
+void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
+				   const struct kvm_memory_slot *memslot);
 
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
 int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
@@ -1394,10 +1396,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 					unsigned long mask);
 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
 
-#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
-					const struct kvm_memory_slot *memslot);
-#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
+#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
 int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 		      int *is_dirty, struct kvm_memory_slot **memslot);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 662ca280c0cf..39c2efd15504 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -388,6 +388,19 @@ void __weak kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pa
 	kvm_flush_remote_tlbs(kvm);
 }
 
+void kvm_flush_remote_tlbs_memslot(struct kvm *kvm, const struct kvm_memory_slot *memslot)
+{
+	/*
+	 * All current use cases for flushing the TLBs for a specific memslot
+	 * related to dirty logging, and many do the TLB flush out of mmu_lock.
+	 * The interaction between the various operations on memslot must be
+	 * serialized by slots_locks to ensure the TLB flush from one operation
+	 * is observed by any other operation on the same memslot.
+	 */
+	lockdep_assert_held(&kvm->slots_lock);
+	kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
+}
+
 static void kvm_flush_shadow_all(struct kvm *kvm)
 {
 	kvm_arch_flush_shadow_all(kvm);
@@ -2197,7 +2210,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	}
 
 	if (flush)
-		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
+		kvm_flush_remote_tlbs_memslot(kvm, memslot);
 
 	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
 		return -EFAULT;
@@ -2314,7 +2327,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	KVM_MMU_UNLOCK(kvm);
 
 	if (flush)
-		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
+		kvm_flush_remote_tlbs_memslot(kvm, memslot);
 
 	return 0;
 }
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 34/37] KVM: MMU: Move the TDP iterator to common code
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move arch/x86/kvm/mmu/tdp_iter.{c,h} to into common code so that it can
be used by other architectures in the future.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 MAINTAINERS                                  | 2 +-
 arch/x86/kvm/Makefile                        | 2 +-
 arch/x86/kvm/mmu/tdp_mmu.c                   | 2 +-
 arch/x86/kvm/mmu/tdp_pgtable.c               | 2 +-
 {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h | 9 +++------
 virt/kvm/Makefile.kvm                        | 2 ++
 {arch/x86 => virt}/kvm/mmu/tdp_iter.c        | 4 +---
 7 files changed, 10 insertions(+), 13 deletions(-)
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h (96%)
 rename {arch/x86 => virt}/kvm/mmu/tdp_iter.c (98%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7e586d7ba78c..3c33eca85480 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11206,7 +11206,7 @@ F:	include/uapi/asm-generic/kvm*
 F:	include/uapi/linux/kvm*
 F:	tools/kvm/
 F:	tools/testing/selftests/kvm/
-F:	virt/kvm/*
+F:	virt/kvm/
 
 KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)
 M:	Marc Zyngier <maz@kernel.org>
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index c294ae51caba..cb9ae306892a 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,7 +18,7 @@ ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
 endif
 
-kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_iter.o mmu/tdp_mmu.o
+kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_mmu.o
 kvm-$(CONFIG_KVM_XEN)	+= xen.o
 kvm-$(CONFIG_KVM_SMM)	+= smm.o
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1f1f511cd1a0..c035c051161c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -2,10 +2,10 @@
 
 #include "mmu.h"
 #include "mmu_internal.h"
-#include "tdp_iter.h"
 #include "tdp_mmu.h"
 #include "spte.h"
 
+#include <kvm/tdp_iter.h>
 #include <kvm/tdp_pgtable.h>
 #include <kvm/mmutrace.h>
 
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index cc7b10f703e1..fb40abdb9234 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -2,10 +2,10 @@
 
 #include <linux/kvm_types.h>
 #include <kvm/tdp_pgtable.h>
+#include <kvm/tdp_iter.h>
 
 #include "mmu.h"
 #include "spte.h"
-#include "tdp_iter.h"
 
 /* Removed SPTEs must not be misconstrued as shadow present PTEs. */
 static_assert(!(REMOVED_TDP_PTE & SPTE_MMU_PRESENT_MASK));
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/include/kvm/tdp_iter.h
similarity index 96%
rename from arch/x86/kvm/mmu/tdp_iter.h
rename to include/kvm/tdp_iter.h
index 6e3c38532d1d..0a154fcf2664 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/include/kvm/tdp_iter.h
@@ -1,14 +1,11 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#ifndef __KVM_X86_MMU_TDP_ITER_H
-#define __KVM_X86_MMU_TDP_ITER_H
+#ifndef __KVM_TDP_ITER_H
+#define __KVM_TDP_ITER_H
 
 #include <linux/kvm_host.h>
 #include <kvm/tdp_pgtable.h>
 
-#include "mmu.h"
-#include "spte.h"
-
 /*
  * TDP MMU SPTEs are RCU protected to allow paging structures (non-leaf SPTEs)
  * to be zapped while holding mmu_lock for read, and to allow TLB flushes to be
@@ -117,4 +114,4 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 void tdp_iter_next(struct tdp_iter *iter);
 void tdp_iter_restart(struct tdp_iter *iter);
 
-#endif /* __KVM_X86_MMU_TDP_ITER_H */
+#endif /* __KVM_TDP_ITER_H */
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 2c27d5d0c367..58b595ac9b8d 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -12,3 +12,5 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
 kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
 kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
+
+kvm-$(CONFIG_HAVE_TDP_MMU) += $(KVM)/mmu/tdp_iter.o
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/virt/kvm/mmu/tdp_iter.c
similarity index 98%
rename from arch/x86/kvm/mmu/tdp_iter.c
rename to virt/kvm/mmu/tdp_iter.c
index d5f024b7f6e4..674d93f91979 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/virt/kvm/mmu/tdp_iter.c
@@ -1,8 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#include "mmu_internal.h"
-#include "tdp_iter.h"
-#include "spte.h"
+#include <kvm/tdp_iter.h>
 
 /*
  * Recalculates the pointer to the SPTE for the current GFN and level and
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 34/37] KVM: MMU: Move the TDP iterator to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move arch/x86/kvm/mmu/tdp_iter.{c,h} to into common code so that it can
be used by other architectures in the future.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 MAINTAINERS                                  | 2 +-
 arch/x86/kvm/Makefile                        | 2 +-
 arch/x86/kvm/mmu/tdp_mmu.c                   | 2 +-
 arch/x86/kvm/mmu/tdp_pgtable.c               | 2 +-
 {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h | 9 +++------
 virt/kvm/Makefile.kvm                        | 2 ++
 {arch/x86 => virt}/kvm/mmu/tdp_iter.c        | 4 +---
 7 files changed, 10 insertions(+), 13 deletions(-)
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h (96%)
 rename {arch/x86 => virt}/kvm/mmu/tdp_iter.c (98%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7e586d7ba78c..3c33eca85480 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11206,7 +11206,7 @@ F:	include/uapi/asm-generic/kvm*
 F:	include/uapi/linux/kvm*
 F:	tools/kvm/
 F:	tools/testing/selftests/kvm/
-F:	virt/kvm/*
+F:	virt/kvm/
 
 KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)
 M:	Marc Zyngier <maz@kernel.org>
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index c294ae51caba..cb9ae306892a 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,7 +18,7 @@ ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
 endif
 
-kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_iter.o mmu/tdp_mmu.o
+kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_mmu.o
 kvm-$(CONFIG_KVM_XEN)	+= xen.o
 kvm-$(CONFIG_KVM_SMM)	+= smm.o
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1f1f511cd1a0..c035c051161c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -2,10 +2,10 @@
 
 #include "mmu.h"
 #include "mmu_internal.h"
-#include "tdp_iter.h"
 #include "tdp_mmu.h"
 #include "spte.h"
 
+#include <kvm/tdp_iter.h>
 #include <kvm/tdp_pgtable.h>
 #include <kvm/mmutrace.h>
 
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index cc7b10f703e1..fb40abdb9234 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -2,10 +2,10 @@
 
 #include <linux/kvm_types.h>
 #include <kvm/tdp_pgtable.h>
+#include <kvm/tdp_iter.h>
 
 #include "mmu.h"
 #include "spte.h"
-#include "tdp_iter.h"
 
 /* Removed SPTEs must not be misconstrued as shadow present PTEs. */
 static_assert(!(REMOVED_TDP_PTE & SPTE_MMU_PRESENT_MASK));
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/include/kvm/tdp_iter.h
similarity index 96%
rename from arch/x86/kvm/mmu/tdp_iter.h
rename to include/kvm/tdp_iter.h
index 6e3c38532d1d..0a154fcf2664 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/include/kvm/tdp_iter.h
@@ -1,14 +1,11 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#ifndef __KVM_X86_MMU_TDP_ITER_H
-#define __KVM_X86_MMU_TDP_ITER_H
+#ifndef __KVM_TDP_ITER_H
+#define __KVM_TDP_ITER_H
 
 #include <linux/kvm_host.h>
 #include <kvm/tdp_pgtable.h>
 
-#include "mmu.h"
-#include "spte.h"
-
 /*
  * TDP MMU SPTEs are RCU protected to allow paging structures (non-leaf SPTEs)
  * to be zapped while holding mmu_lock for read, and to allow TLB flushes to be
@@ -117,4 +114,4 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 void tdp_iter_next(struct tdp_iter *iter);
 void tdp_iter_restart(struct tdp_iter *iter);
 
-#endif /* __KVM_X86_MMU_TDP_ITER_H */
+#endif /* __KVM_TDP_ITER_H */
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 2c27d5d0c367..58b595ac9b8d 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -12,3 +12,5 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
 kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
 kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
+
+kvm-$(CONFIG_HAVE_TDP_MMU) += $(KVM)/mmu/tdp_iter.o
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/virt/kvm/mmu/tdp_iter.c
similarity index 98%
rename from arch/x86/kvm/mmu/tdp_iter.c
rename to virt/kvm/mmu/tdp_iter.c
index d5f024b7f6e4..674d93f91979 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/virt/kvm/mmu/tdp_iter.c
@@ -1,8 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#include "mmu_internal.h"
-#include "tdp_iter.h"
-#include "spte.h"
+#include <kvm/tdp_iter.h>
 
 /*
  * Recalculates the pointer to the SPTE for the current GFN and level and
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 34/37] KVM: MMU: Move the TDP iterator to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move arch/x86/kvm/mmu/tdp_iter.{c,h} to into common code so that it can
be used by other architectures in the future.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 MAINTAINERS                                  | 2 +-
 arch/x86/kvm/Makefile                        | 2 +-
 arch/x86/kvm/mmu/tdp_mmu.c                   | 2 +-
 arch/x86/kvm/mmu/tdp_pgtable.c               | 2 +-
 {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h | 9 +++------
 virt/kvm/Makefile.kvm                        | 2 ++
 {arch/x86 => virt}/kvm/mmu/tdp_iter.c        | 4 +---
 7 files changed, 10 insertions(+), 13 deletions(-)
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h (96%)
 rename {arch/x86 => virt}/kvm/mmu/tdp_iter.c (98%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7e586d7ba78c..3c33eca85480 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11206,7 +11206,7 @@ F:	include/uapi/asm-generic/kvm*
 F:	include/uapi/linux/kvm*
 F:	tools/kvm/
 F:	tools/testing/selftests/kvm/
-F:	virt/kvm/*
+F:	virt/kvm/
 
 KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)
 M:	Marc Zyngier <maz@kernel.org>
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index c294ae51caba..cb9ae306892a 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,7 +18,7 @@ ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
 endif
 
-kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_iter.o mmu/tdp_mmu.o
+kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_mmu.o
 kvm-$(CONFIG_KVM_XEN)	+= xen.o
 kvm-$(CONFIG_KVM_SMM)	+= smm.o
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1f1f511cd1a0..c035c051161c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -2,10 +2,10 @@
 
 #include "mmu.h"
 #include "mmu_internal.h"
-#include "tdp_iter.h"
 #include "tdp_mmu.h"
 #include "spte.h"
 
+#include <kvm/tdp_iter.h>
 #include <kvm/tdp_pgtable.h>
 #include <kvm/mmutrace.h>
 
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index cc7b10f703e1..fb40abdb9234 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -2,10 +2,10 @@
 
 #include <linux/kvm_types.h>
 #include <kvm/tdp_pgtable.h>
+#include <kvm/tdp_iter.h>
 
 #include "mmu.h"
 #include "spte.h"
-#include "tdp_iter.h"
 
 /* Removed SPTEs must not be misconstrued as shadow present PTEs. */
 static_assert(!(REMOVED_TDP_PTE & SPTE_MMU_PRESENT_MASK));
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/include/kvm/tdp_iter.h
similarity index 96%
rename from arch/x86/kvm/mmu/tdp_iter.h
rename to include/kvm/tdp_iter.h
index 6e3c38532d1d..0a154fcf2664 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/include/kvm/tdp_iter.h
@@ -1,14 +1,11 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#ifndef __KVM_X86_MMU_TDP_ITER_H
-#define __KVM_X86_MMU_TDP_ITER_H
+#ifndef __KVM_TDP_ITER_H
+#define __KVM_TDP_ITER_H
 
 #include <linux/kvm_host.h>
 #include <kvm/tdp_pgtable.h>
 
-#include "mmu.h"
-#include "spte.h"
-
 /*
  * TDP MMU SPTEs are RCU protected to allow paging structures (non-leaf SPTEs)
  * to be zapped while holding mmu_lock for read, and to allow TLB flushes to be
@@ -117,4 +114,4 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 void tdp_iter_next(struct tdp_iter *iter);
 void tdp_iter_restart(struct tdp_iter *iter);
 
-#endif /* __KVM_X86_MMU_TDP_ITER_H */
+#endif /* __KVM_TDP_ITER_H */
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 2c27d5d0c367..58b595ac9b8d 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -12,3 +12,5 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
 kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
 kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
+
+kvm-$(CONFIG_HAVE_TDP_MMU) += $(KVM)/mmu/tdp_iter.o
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/virt/kvm/mmu/tdp_iter.c
similarity index 98%
rename from arch/x86/kvm/mmu/tdp_iter.c
rename to virt/kvm/mmu/tdp_iter.c
index d5f024b7f6e4..674d93f91979 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/virt/kvm/mmu/tdp_iter.c
@@ -1,8 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#include "mmu_internal.h"
-#include "tdp_iter.h"
-#include "spte.h"
+#include <kvm/tdp_iter.h>
 
 /*
  * Recalculates the pointer to the SPTE for the current GFN and level and
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 34/37] KVM: MMU: Move the TDP iterator to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move arch/x86/kvm/mmu/tdp_iter.{c,h} to into common code so that it can
be used by other architectures in the future.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 MAINTAINERS                                  | 2 +-
 arch/x86/kvm/Makefile                        | 2 +-
 arch/x86/kvm/mmu/tdp_mmu.c                   | 2 +-
 arch/x86/kvm/mmu/tdp_pgtable.c               | 2 +-
 {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h | 9 +++------
 virt/kvm/Makefile.kvm                        | 2 ++
 {arch/x86 => virt}/kvm/mmu/tdp_iter.c        | 4 +---
 7 files changed, 10 insertions(+), 13 deletions(-)
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_iter.h (96%)
 rename {arch/x86 => virt}/kvm/mmu/tdp_iter.c (98%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7e586d7ba78c..3c33eca85480 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11206,7 +11206,7 @@ F:	include/uapi/asm-generic/kvm*
 F:	include/uapi/linux/kvm*
 F:	tools/kvm/
 F:	tools/testing/selftests/kvm/
-F:	virt/kvm/*
+F:	virt/kvm/
 
 KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)
 M:	Marc Zyngier <maz@kernel.org>
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index c294ae51caba..cb9ae306892a 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,7 +18,7 @@ ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
 endif
 
-kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_iter.o mmu/tdp_mmu.o
+kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_mmu.o
 kvm-$(CONFIG_KVM_XEN)	+= xen.o
 kvm-$(CONFIG_KVM_SMM)	+= smm.o
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1f1f511cd1a0..c035c051161c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -2,10 +2,10 @@
 
 #include "mmu.h"
 #include "mmu_internal.h"
-#include "tdp_iter.h"
 #include "tdp_mmu.h"
 #include "spte.h"
 
+#include <kvm/tdp_iter.h>
 #include <kvm/tdp_pgtable.h>
 #include <kvm/mmutrace.h>
 
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index cc7b10f703e1..fb40abdb9234 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -2,10 +2,10 @@
 
 #include <linux/kvm_types.h>
 #include <kvm/tdp_pgtable.h>
+#include <kvm/tdp_iter.h>
 
 #include "mmu.h"
 #include "spte.h"
-#include "tdp_iter.h"
 
 /* Removed SPTEs must not be misconstrued as shadow present PTEs. */
 static_assert(!(REMOVED_TDP_PTE & SPTE_MMU_PRESENT_MASK));
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/include/kvm/tdp_iter.h
similarity index 96%
rename from arch/x86/kvm/mmu/tdp_iter.h
rename to include/kvm/tdp_iter.h
index 6e3c38532d1d..0a154fcf2664 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/include/kvm/tdp_iter.h
@@ -1,14 +1,11 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#ifndef __KVM_X86_MMU_TDP_ITER_H
-#define __KVM_X86_MMU_TDP_ITER_H
+#ifndef __KVM_TDP_ITER_H
+#define __KVM_TDP_ITER_H
 
 #include <linux/kvm_host.h>
 #include <kvm/tdp_pgtable.h>
 
-#include "mmu.h"
-#include "spte.h"
-
 /*
  * TDP MMU SPTEs are RCU protected to allow paging structures (non-leaf SPTEs)
  * to be zapped while holding mmu_lock for read, and to allow TLB flushes to be
@@ -117,4 +114,4 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
 void tdp_iter_next(struct tdp_iter *iter);
 void tdp_iter_restart(struct tdp_iter *iter);
 
-#endif /* __KVM_X86_MMU_TDP_ITER_H */
+#endif /* __KVM_TDP_ITER_H */
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 2c27d5d0c367..58b595ac9b8d 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -12,3 +12,5 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
 kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
 kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
+
+kvm-$(CONFIG_HAVE_TDP_MMU) += $(KVM)/mmu/tdp_iter.o
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/virt/kvm/mmu/tdp_iter.c
similarity index 98%
rename from arch/x86/kvm/mmu/tdp_iter.c
rename to virt/kvm/mmu/tdp_iter.c
index d5f024b7f6e4..674d93f91979 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/virt/kvm/mmu/tdp_iter.c
@@ -1,8 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#include "mmu_internal.h"
-#include "tdp_iter.h"
-#include "spte.h"
+#include <kvm/tdp_iter.h>
 
 /*
  * Recalculates the pointer to the SPTE for the current GFN and level and
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 35/37] KVM: x86/mmu: Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c since it currently
relies on the x86-specific kvm_mmu_max_gfn() function. This can be
improved in the future by implementing a common API for calculating the
max GFN.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h |  3 +++
 arch/x86/kvm/mmu/tdp_mmu.c             | 11 -----------
 arch/x86/kvm/mmu/tdp_pgtable.c         | 11 +++++++++++
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
index ff2691ced38b..c1047fcf1a91 100644
--- a/arch/x86/include/asm/kvm/tdp_pgtable.h
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -67,4 +67,7 @@ u64 tdp_mmu_make_changed_pte_notifier_pte(struct tdp_iter *iter,
 					  struct kvm_gfn_range *range);
 u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
 				     struct kvm_mmu_page *sp, int index);
+
+gfn_t tdp_mmu_max_gfn_exclusive(void);
+
 #endif /* !__ASM_KVM_TDP_PGTABLE_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index c035c051161c..c950d688afea 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -828,17 +828,6 @@ static inline bool __must_check tdp_mmu_iter_cond_resched(struct kvm *kvm,
 	return iter->yielded;
 }
 
-static inline gfn_t tdp_mmu_max_gfn_exclusive(void)
-{
-	/*
-	 * Bound TDP MMU walks at host.MAXPHYADDR.  KVM disallows memslots with
-	 * a gpa range that would exceed the max gfn, and KVM does not create
-	 * MMIO SPTEs for "impossible" gfns, instead sending such accesses down
-	 * the slow emulation path every time.
-	 */
-	return kvm_mmu_max_gfn() + 1;
-}
-
 static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 			       bool shared, int zap_level)
 {
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index fb40abdb9234..4e747956d6ee 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -170,3 +170,14 @@ int tdp_mmu_max_mapping_level(struct kvm *kvm,
 {
 	return kvm_mmu_max_mapping_level(kvm, slot, iter->gfn, PG_LEVEL_NUM);
 }
+
+gfn_t tdp_mmu_max_gfn_exclusive(void)
+{
+	/*
+	 * Bound TDP MMU walks at host.MAXPHYADDR.  KVM disallows memslots with
+	 * a gpa range that would exceed the max gfn, and KVM does not create
+	 * MMIO SPTEs for "impossible" gfns, instead sending such accesses down
+	 * the slow emulation path every time.
+	 */
+	return kvm_mmu_max_gfn() + 1;
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 35/37] KVM: x86/mmu: Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c since it currently
relies on the x86-specific kvm_mmu_max_gfn() function. This can be
improved in the future by implementing a common API for calculating the
max GFN.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h |  3 +++
 arch/x86/kvm/mmu/tdp_mmu.c             | 11 -----------
 arch/x86/kvm/mmu/tdp_pgtable.c         | 11 +++++++++++
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
index ff2691ced38b..c1047fcf1a91 100644
--- a/arch/x86/include/asm/kvm/tdp_pgtable.h
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -67,4 +67,7 @@ u64 tdp_mmu_make_changed_pte_notifier_pte(struct tdp_iter *iter,
 					  struct kvm_gfn_range *range);
 u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
 				     struct kvm_mmu_page *sp, int index);
+
+gfn_t tdp_mmu_max_gfn_exclusive(void);
+
 #endif /* !__ASM_KVM_TDP_PGTABLE_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index c035c051161c..c950d688afea 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -828,17 +828,6 @@ static inline bool __must_check tdp_mmu_iter_cond_resched(struct kvm *kvm,
 	return iter->yielded;
 }
 
-static inline gfn_t tdp_mmu_max_gfn_exclusive(void)
-{
-	/*
-	 * Bound TDP MMU walks at host.MAXPHYADDR.  KVM disallows memslots with
-	 * a gpa range that would exceed the max gfn, and KVM does not create
-	 * MMIO SPTEs for "impossible" gfns, instead sending such accesses down
-	 * the slow emulation path every time.
-	 */
-	return kvm_mmu_max_gfn() + 1;
-}
-
 static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 			       bool shared, int zap_level)
 {
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index fb40abdb9234..4e747956d6ee 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -170,3 +170,14 @@ int tdp_mmu_max_mapping_level(struct kvm *kvm,
 {
 	return kvm_mmu_max_mapping_level(kvm, slot, iter->gfn, PG_LEVEL_NUM);
 }
+
+gfn_t tdp_mmu_max_gfn_exclusive(void)
+{
+	/*
+	 * Bound TDP MMU walks at host.MAXPHYADDR.  KVM disallows memslots with
+	 * a gpa range that would exceed the max gfn, and KVM does not create
+	 * MMIO SPTEs for "impossible" gfns, instead sending such accesses down
+	 * the slow emulation path every time.
+	 */
+	return kvm_mmu_max_gfn() + 1;
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 35/37] KVM: x86/mmu: Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c since it currently
relies on the x86-specific kvm_mmu_max_gfn() function. This can be
improved in the future by implementing a common API for calculating the
max GFN.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h |  3 +++
 arch/x86/kvm/mmu/tdp_mmu.c             | 11 -----------
 arch/x86/kvm/mmu/tdp_pgtable.c         | 11 +++++++++++
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
index ff2691ced38b..c1047fcf1a91 100644
--- a/arch/x86/include/asm/kvm/tdp_pgtable.h
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -67,4 +67,7 @@ u64 tdp_mmu_make_changed_pte_notifier_pte(struct tdp_iter *iter,
 					  struct kvm_gfn_range *range);
 u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
 				     struct kvm_mmu_page *sp, int index);
+
+gfn_t tdp_mmu_max_gfn_exclusive(void);
+
 #endif /* !__ASM_KVM_TDP_PGTABLE_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index c035c051161c..c950d688afea 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -828,17 +828,6 @@ static inline bool __must_check tdp_mmu_iter_cond_resched(struct kvm *kvm,
 	return iter->yielded;
 }
 
-static inline gfn_t tdp_mmu_max_gfn_exclusive(void)
-{
-	/*
-	 * Bound TDP MMU walks at host.MAXPHYADDR.  KVM disallows memslots with
-	 * a gpa range that would exceed the max gfn, and KVM does not create
-	 * MMIO SPTEs for "impossible" gfns, instead sending such accesses down
-	 * the slow emulation path every time.
-	 */
-	return kvm_mmu_max_gfn() + 1;
-}
-
 static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 			       bool shared, int zap_level)
 {
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index fb40abdb9234..4e747956d6ee 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -170,3 +170,14 @@ int tdp_mmu_max_mapping_level(struct kvm *kvm,
 {
 	return kvm_mmu_max_mapping_level(kvm, slot, iter->gfn, PG_LEVEL_NUM);
 }
+
+gfn_t tdp_mmu_max_gfn_exclusive(void)
+{
+	/*
+	 * Bound TDP MMU walks at host.MAXPHYADDR.  KVM disallows memslots with
+	 * a gpa range that would exceed the max gfn, and KVM does not create
+	 * MMIO SPTEs for "impossible" gfns, instead sending such accesses down
+	 * the slow emulation path every time.
+	 */
+	return kvm_mmu_max_gfn() + 1;
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 35/37] KVM: x86/mmu: Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c since it currently
relies on the x86-specific kvm_mmu_max_gfn() function. This can be
improved in the future by implementing a common API for calculating the
max GFN.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm/tdp_pgtable.h |  3 +++
 arch/x86/kvm/mmu/tdp_mmu.c             | 11 -----------
 arch/x86/kvm/mmu/tdp_pgtable.c         | 11 +++++++++++
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm/tdp_pgtable.h b/arch/x86/include/asm/kvm/tdp_pgtable.h
index ff2691ced38b..c1047fcf1a91 100644
--- a/arch/x86/include/asm/kvm/tdp_pgtable.h
+++ b/arch/x86/include/asm/kvm/tdp_pgtable.h
@@ -67,4 +67,7 @@ u64 tdp_mmu_make_changed_pte_notifier_pte(struct tdp_iter *iter,
 					  struct kvm_gfn_range *range);
 u64 tdp_mmu_make_huge_page_split_pte(struct kvm *kvm, u64 huge_spte,
 				     struct kvm_mmu_page *sp, int index);
+
+gfn_t tdp_mmu_max_gfn_exclusive(void);
+
 #endif /* !__ASM_KVM_TDP_PGTABLE_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index c035c051161c..c950d688afea 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -828,17 +828,6 @@ static inline bool __must_check tdp_mmu_iter_cond_resched(struct kvm *kvm,
 	return iter->yielded;
 }
 
-static inline gfn_t tdp_mmu_max_gfn_exclusive(void)
-{
-	/*
-	 * Bound TDP MMU walks at host.MAXPHYADDR.  KVM disallows memslots with
-	 * a gpa range that would exceed the max gfn, and KVM does not create
-	 * MMIO SPTEs for "impossible" gfns, instead sending such accesses down
-	 * the slow emulation path every time.
-	 */
-	return kvm_mmu_max_gfn() + 1;
-}
-
 static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
 			       bool shared, int zap_level)
 {
diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
index fb40abdb9234..4e747956d6ee 100644
--- a/arch/x86/kvm/mmu/tdp_pgtable.c
+++ b/arch/x86/kvm/mmu/tdp_pgtable.c
@@ -170,3 +170,14 @@ int tdp_mmu_max_mapping_level(struct kvm *kvm,
 {
 	return kvm_mmu_max_mapping_level(kvm, slot, iter->gfn, PG_LEVEL_NUM);
 }
+
+gfn_t tdp_mmu_max_gfn_exclusive(void)
+{
+	/*
+	 * Bound TDP MMU walks at host.MAXPHYADDR.  KVM disallows memslots with
+	 * a gpa range that would exceed the max gfn, and KVM does not create
+	 * MMIO SPTEs for "impossible" gfns, instead sending such accesses down
+	 * the slow emulation path every time.
+	 */
+	return kvm_mmu_max_gfn() + 1;
+}
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 36/37] KVM: x86/mmu: Move is_tdp_mmu_page() to mmu_internal.h
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move is_tdp_mmu_page(), which is x86-specific, into mmu_internal.h. This
prepares for moving tdp_mmu.h into common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 9 +++++++++
 arch/x86/kvm/mmu/tdp_mmu.h      | 9 ---------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index df815cb84bd2..51aef9624521 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -147,4 +147,13 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 
+#ifdef CONFIG_X86_64
+static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
+{
+	return !sp->arch.shadow_mmu_page;
+}
+#else
+static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
+#endif
+
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 897608be7f75..607c1417abd1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -71,13 +71,4 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 					u64 *spte);
 
-#ifdef CONFIG_X86_64
-static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
-{
-	return !sp->arch.shadow_mmu_page;
-}
-#else
-static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
-#endif
-
 #endif /* __KVM_X86_MMU_TDP_MMU_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 36/37] KVM: x86/mmu: Move is_tdp_mmu_page() to mmu_internal.h
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move is_tdp_mmu_page(), which is x86-specific, into mmu_internal.h. This
prepares for moving tdp_mmu.h into common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 9 +++++++++
 arch/x86/kvm/mmu/tdp_mmu.h      | 9 ---------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index df815cb84bd2..51aef9624521 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -147,4 +147,13 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 
+#ifdef CONFIG_X86_64
+static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
+{
+	return !sp->arch.shadow_mmu_page;
+}
+#else
+static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
+#endif
+
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 897608be7f75..607c1417abd1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -71,13 +71,4 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 					u64 *spte);
 
-#ifdef CONFIG_X86_64
-static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
-{
-	return !sp->arch.shadow_mmu_page;
-}
-#else
-static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
-#endif
-
 #endif /* __KVM_X86_MMU_TDP_MMU_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 36/37] KVM: x86/mmu: Move is_tdp_mmu_page() to mmu_internal.h
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move is_tdp_mmu_page(), which is x86-specific, into mmu_internal.h. This
prepares for moving tdp_mmu.h into common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 9 +++++++++
 arch/x86/kvm/mmu/tdp_mmu.h      | 9 ---------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index df815cb84bd2..51aef9624521 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -147,4 +147,13 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 
+#ifdef CONFIG_X86_64
+static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
+{
+	return !sp->arch.shadow_mmu_page;
+}
+#else
+static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
+#endif
+
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 897608be7f75..607c1417abd1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -71,13 +71,4 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 					u64 *spte);
 
-#ifdef CONFIG_X86_64
-static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
-{
-	return !sp->arch.shadow_mmu_page;
-}
-#else
-static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
-#endif
-
 #endif /* __KVM_X86_MMU_TDP_MMU_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 36/37] KVM: x86/mmu: Move is_tdp_mmu_page() to mmu_internal.h
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move is_tdp_mmu_page(), which is x86-specific, into mmu_internal.h. This
prepares for moving tdp_mmu.h into common code.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 9 +++++++++
 arch/x86/kvm/mmu/tdp_mmu.h      | 9 ---------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index df815cb84bd2..51aef9624521 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -147,4 +147,13 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 
+#ifdef CONFIG_X86_64
+static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
+{
+	return !sp->arch.shadow_mmu_page;
+}
+#else
+static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
+#endif
+
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 897608be7f75..607c1417abd1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -71,13 +71,4 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
 u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
 					u64 *spte);
 
-#ifdef CONFIG_X86_64
-static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
-{
-	return !sp->arch.shadow_mmu_page;
-}
-#else
-static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
-#endif
-
 #endif /* __KVM_X86_MMU_TDP_MMU_H */
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 37/37] KVM: MMU: Move the TDP MMU to common code
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-08 19:38   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, kvm-riscv,
	Andrew Morton

Move tdp_mmu.[ch] from arch/x86 and into the common code directories.
This will allow other architectures to use the TDP MMU in the future.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/Makefile                       | 2 +-
 arch/x86/kvm/mmu/mmu.c                      | 3 ++-
 {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h | 6 +++++-
 virt/kvm/Makefile.kvm                       | 1 +
 {arch/x86 => virt}/kvm/mmu/tdp_mmu.c        | 8 +++-----
 5 files changed, 12 insertions(+), 8 deletions(-)
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h (94%)
 rename {arch/x86 => virt}/kvm/mmu/tdp_mmu.c (99%)

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index cb9ae306892a..06b61fdea539 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,7 +18,7 @@ ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
 endif
 
-kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_mmu.o
+kvm-$(CONFIG_X86_64)	+= mmu/tdp_pgtable.o
 kvm-$(CONFIG_KVM_XEN)	+= xen.o
 kvm-$(CONFIG_KVM_SMM)	+= smm.o
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f2602ee1771f..8653776bca6f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -19,7 +19,6 @@
 #include "ioapic.h"
 #include "mmu.h"
 #include "mmu_internal.h"
-#include "tdp_mmu.h"
 #include "x86.h"
 #include "kvm_cache_regs.h"
 #include "smm.h"
@@ -27,6 +26,8 @@
 #include "cpuid.h"
 #include "spte.h"
 
+#include <kvm/tdp_mmu.h>
+
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 #include <linux/string.h>
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/include/kvm/tdp_mmu.h
similarity index 94%
rename from arch/x86/kvm/mmu/tdp_mmu.h
rename to include/kvm/tdp_mmu.h
index 607c1417abd1..538c848149c9 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/include/kvm/tdp_mmu.h
@@ -5,7 +5,11 @@
 
 #include <linux/kvm_host.h>
 
-#include "spte.h"
+#include <kvm/mmu_types.h>
+#include <kvm/mmu.h>
+#include <kvm/tdp_iter.h>
+#include <kvm/tdp_pgtable.h>
+#include <kvm/mmutrace.h>
 
 int kvm_mmu_init_tdp_mmu(struct kvm *kvm);
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 58b595ac9b8d..942681308140 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -14,3 +14,4 @@ kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
 
 kvm-$(CONFIG_HAVE_TDP_MMU) += $(KVM)/mmu/tdp_iter.o
+kvm-$(CONFIG_HAVE_TDP_MMU) += $(KVM)/mmu/tdp_mmu.o
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/virt/kvm/mmu/tdp_mmu.c
similarity index 99%
rename from arch/x86/kvm/mmu/tdp_mmu.c
rename to virt/kvm/mmu/tdp_mmu.c
index c950d688afea..5ca8892ebef5 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/virt/kvm/mmu/tdp_mmu.c
@@ -1,11 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#include "mmu.h"
-#include "mmu_internal.h"
-#include "tdp_mmu.h"
-#include "spte.h"
-
+#include <kvm/mmu_types.h>
+#include <kvm/mmu.h>
 #include <kvm/tdp_iter.h>
+#include <kvm/tdp_mmu.h>
 #include <kvm/tdp_pgtable.h>
 #include <kvm/mmutrace.h>
 
-- 
2.39.0.rc1.256.g54fd8350bd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 37/37] KVM: MMU: Move the TDP MMU to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move tdp_mmu.[ch] from arch/x86 and into the common code directories.
This will allow other architectures to use the TDP MMU in the future.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/Makefile                       | 2 +-
 arch/x86/kvm/mmu/mmu.c                      | 3 ++-
 {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h | 6 +++++-
 virt/kvm/Makefile.kvm                       | 1 +
 {arch/x86 => virt}/kvm/mmu/tdp_mmu.c        | 8 +++-----
 5 files changed, 12 insertions(+), 8 deletions(-)
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h (94%)
 rename {arch/x86 => virt}/kvm/mmu/tdp_mmu.c (99%)

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index cb9ae306892a..06b61fdea539 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,7 +18,7 @@ ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
 endif
 
-kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_mmu.o
+kvm-$(CONFIG_X86_64)	+= mmu/tdp_pgtable.o
 kvm-$(CONFIG_KVM_XEN)	+= xen.o
 kvm-$(CONFIG_KVM_SMM)	+= smm.o
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f2602ee1771f..8653776bca6f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -19,7 +19,6 @@
 #include "ioapic.h"
 #include "mmu.h"
 #include "mmu_internal.h"
-#include "tdp_mmu.h"
 #include "x86.h"
 #include "kvm_cache_regs.h"
 #include "smm.h"
@@ -27,6 +26,8 @@
 #include "cpuid.h"
 #include "spte.h"
 
+#include <kvm/tdp_mmu.h>
+
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 #include <linux/string.h>
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/include/kvm/tdp_mmu.h
similarity index 94%
rename from arch/x86/kvm/mmu/tdp_mmu.h
rename to include/kvm/tdp_mmu.h
index 607c1417abd1..538c848149c9 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/include/kvm/tdp_mmu.h
@@ -5,7 +5,11 @@
 
 #include <linux/kvm_host.h>
 
-#include "spte.h"
+#include <kvm/mmu_types.h>
+#include <kvm/mmu.h>
+#include <kvm/tdp_iter.h>
+#include <kvm/tdp_pgtable.h>
+#include <kvm/mmutrace.h>
 
 int kvm_mmu_init_tdp_mmu(struct kvm *kvm);
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 58b595ac9b8d..942681308140 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -14,3 +14,4 @@ kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
 
 kvm-$(CONFIG_HAVE_TDP_MMU) += $(KVM)/mmu/tdp_iter.o
+kvm-$(CONFIG_HAVE_TDP_MMU) += $(KVM)/mmu/tdp_mmu.o
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/virt/kvm/mmu/tdp_mmu.c
similarity index 99%
rename from arch/x86/kvm/mmu/tdp_mmu.c
rename to virt/kvm/mmu/tdp_mmu.c
index c950d688afea..5ca8892ebef5 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/virt/kvm/mmu/tdp_mmu.c
@@ -1,11 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#include "mmu.h"
-#include "mmu_internal.h"
-#include "tdp_mmu.h"
-#include "spte.h"
-
+#include <kvm/mmu_types.h>
+#include <kvm/mmu.h>
 #include <kvm/tdp_iter.h>
+#include <kvm/tdp_mmu.h>
 #include <kvm/tdp_pgtable.h>
 #include <kvm/mmutrace.h>
 
-- 
2.39.0.rc1.256.g54fd8350bd-goog


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 37/37] KVM: MMU: Move the TDP MMU to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move tdp_mmu.[ch] from arch/x86 and into the common code directories.
This will allow other architectures to use the TDP MMU in the future.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/Makefile                       | 2 +-
 arch/x86/kvm/mmu/mmu.c                      | 3 ++-
 {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h | 6 +++++-
 virt/kvm/Makefile.kvm                       | 1 +
 {arch/x86 => virt}/kvm/mmu/tdp_mmu.c        | 8 +++-----
 5 files changed, 12 insertions(+), 8 deletions(-)
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h (94%)
 rename {arch/x86 => virt}/kvm/mmu/tdp_mmu.c (99%)

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index cb9ae306892a..06b61fdea539 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,7 +18,7 @@ ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
 endif
 
-kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_mmu.o
+kvm-$(CONFIG_X86_64)	+= mmu/tdp_pgtable.o
 kvm-$(CONFIG_KVM_XEN)	+= xen.o
 kvm-$(CONFIG_KVM_SMM)	+= smm.o
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f2602ee1771f..8653776bca6f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -19,7 +19,6 @@
 #include "ioapic.h"
 #include "mmu.h"
 #include "mmu_internal.h"
-#include "tdp_mmu.h"
 #include "x86.h"
 #include "kvm_cache_regs.h"
 #include "smm.h"
@@ -27,6 +26,8 @@
 #include "cpuid.h"
 #include "spte.h"
 
+#include <kvm/tdp_mmu.h>
+
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 #include <linux/string.h>
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/include/kvm/tdp_mmu.h
similarity index 94%
rename from arch/x86/kvm/mmu/tdp_mmu.h
rename to include/kvm/tdp_mmu.h
index 607c1417abd1..538c848149c9 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/include/kvm/tdp_mmu.h
@@ -5,7 +5,11 @@
 
 #include <linux/kvm_host.h>
 
-#include "spte.h"
+#include <kvm/mmu_types.h>
+#include <kvm/mmu.h>
+#include <kvm/tdp_iter.h>
+#include <kvm/tdp_pgtable.h>
+#include <kvm/mmutrace.h>
 
 int kvm_mmu_init_tdp_mmu(struct kvm *kvm);
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 58b595ac9b8d..942681308140 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -14,3 +14,4 @@ kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
 
 kvm-$(CONFIG_HAVE_TDP_MMU) += $(KVM)/mmu/tdp_iter.o
+kvm-$(CONFIG_HAVE_TDP_MMU) += $(KVM)/mmu/tdp_mmu.o
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/virt/kvm/mmu/tdp_mmu.c
similarity index 99%
rename from arch/x86/kvm/mmu/tdp_mmu.c
rename to virt/kvm/mmu/tdp_mmu.c
index c950d688afea..5ca8892ebef5 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/virt/kvm/mmu/tdp_mmu.c
@@ -1,11 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#include "mmu.h"
-#include "mmu_internal.h"
-#include "tdp_mmu.h"
-#include "spte.h"
-
+#include <kvm/mmu_types.h>
+#include <kvm/mmu.h>
 #include <kvm/tdp_iter.h>
+#include <kvm/tdp_mmu.h>
 #include <kvm/tdp_pgtable.h>
 #include <kvm/mmutrace.h>
 
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* [RFC PATCH 37/37] KVM: MMU: Move the TDP MMU to common code
@ 2022-12-08 19:38   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-08 19:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, David Matlack,
	Anshuman Khandual, Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Move tdp_mmu.[ch] from arch/x86 and into the common code directories.
This will allow other architectures to use the TDP MMU in the future.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/Makefile                       | 2 +-
 arch/x86/kvm/mmu/mmu.c                      | 3 ++-
 {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h | 6 +++++-
 virt/kvm/Makefile.kvm                       | 1 +
 {arch/x86 => virt}/kvm/mmu/tdp_mmu.c        | 8 +++-----
 5 files changed, 12 insertions(+), 8 deletions(-)
 rename {arch/x86/kvm/mmu => include/kvm}/tdp_mmu.h (94%)
 rename {arch/x86 => virt}/kvm/mmu/tdp_mmu.c (99%)

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index cb9ae306892a..06b61fdea539 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,7 +18,7 @@ ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
 endif
 
-kvm-$(CONFIG_X86_64) += mmu/tdp_pgtable.o mmu/tdp_mmu.o
+kvm-$(CONFIG_X86_64)	+= mmu/tdp_pgtable.o
 kvm-$(CONFIG_KVM_XEN)	+= xen.o
 kvm-$(CONFIG_KVM_SMM)	+= smm.o
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f2602ee1771f..8653776bca6f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -19,7 +19,6 @@
 #include "ioapic.h"
 #include "mmu.h"
 #include "mmu_internal.h"
-#include "tdp_mmu.h"
 #include "x86.h"
 #include "kvm_cache_regs.h"
 #include "smm.h"
@@ -27,6 +26,8 @@
 #include "cpuid.h"
 #include "spte.h"
 
+#include <kvm/tdp_mmu.h>
+
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 #include <linux/string.h>
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/include/kvm/tdp_mmu.h
similarity index 94%
rename from arch/x86/kvm/mmu/tdp_mmu.h
rename to include/kvm/tdp_mmu.h
index 607c1417abd1..538c848149c9 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/include/kvm/tdp_mmu.h
@@ -5,7 +5,11 @@
 
 #include <linux/kvm_host.h>
 
-#include "spte.h"
+#include <kvm/mmu_types.h>
+#include <kvm/mmu.h>
+#include <kvm/tdp_iter.h>
+#include <kvm/tdp_pgtable.h>
+#include <kvm/mmutrace.h>
 
 int kvm_mmu_init_tdp_mmu(struct kvm *kvm);
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 58b595ac9b8d..942681308140 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -14,3 +14,4 @@ kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
 kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
 
 kvm-$(CONFIG_HAVE_TDP_MMU) += $(KVM)/mmu/tdp_iter.o
+kvm-$(CONFIG_HAVE_TDP_MMU) += $(KVM)/mmu/tdp_mmu.o
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/virt/kvm/mmu/tdp_mmu.c
similarity index 99%
rename from arch/x86/kvm/mmu/tdp_mmu.c
rename to virt/kvm/mmu/tdp_mmu.c
index c950d688afea..5ca8892ebef5 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/virt/kvm/mmu/tdp_mmu.c
@@ -1,11 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 
-#include "mmu.h"
-#include "mmu_internal.h"
-#include "tdp_mmu.h"
-#include "spte.h"
-
+#include <kvm/mmu_types.h>
+#include <kvm/mmu.h>
 #include <kvm/tdp_iter.h>
+#include <kvm/tdp_mmu.h>
 #include <kvm/tdp_pgtable.h>
 #include <kvm/mmutrace.h>
 
-- 
2.39.0.rc1.256.g54fd8350bd-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  2022-12-08 19:38   ` David Matlack
  (?)
  (?)
@ 2022-12-09  2:37     ` Yang, Weijiang
  -1 siblings, 0 replies; 317+ messages in thread
From: Yang, Weijiang @ 2022-12-09  2:37 UTC (permalink / raw)
  To: David Matlack, Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christopherson,,
	Sean, Andrew Morton, Anshuman Khandual, Amit, Nadav,
	Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv


On 12/9/2022 3:38 AM, David Matlack wrote:
> Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
> directly as the address space ID throughout the KVM MMU code. This
> eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
> prepares for making kvm_mmu_page_role architecture-neutral.
>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h |  4 ++--
>   arch/x86/kvm/mmu/mmu.c          |  6 +++---
>   arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
>   arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
>   arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
>   5 files changed, 12 insertions(+), 22 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index aa4eb8cfcd7e..0a819d40131a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -348,7 +348,7 @@ union kvm_mmu_page_role {
>   		 * simple shift.  While there is room, give it a whole
>   		 * byte so it is also faster to load it from memory.
>   		 */
> -		unsigned smm:8;
> +		unsigned as_id:8;
>   	};
>   };
>   
> @@ -2056,7 +2056,7 @@ enum {
>   # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
>   # define KVM_ADDRESS_SPACE_NUM 2
>   # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
> -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
> +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
>   #else
>   # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
>   #endif
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 4d188f056933..f375b719f565 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
>   	union kvm_cpu_role role = {0};
>   
>   	role.base.access = ACC_ALL;
> -	role.base.smm = is_smm(vcpu);
> +	role.base.as_id = is_smm(vcpu);

I'm not familiar with other architectures, is there similar conception 
as x86 smm mode?

If not, maybe need to re-shape is_smm() as a common helper to get the as_id.

[...]


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-09  2:37     ` Yang, Weijiang
  0 siblings, 0 replies; 317+ messages in thread
From: Yang, Weijiang @ 2022-12-09  2:37 UTC (permalink / raw)
  To: David Matlack, Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christopherson,,
	Sean, Andrew Morton, Anshuman Khandual, Amit, Nadav,
	Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv


On 12/9/2022 3:38 AM, David Matlack wrote:
> Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
> directly as the address space ID throughout the KVM MMU code. This
> eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
> prepares for making kvm_mmu_page_role architecture-neutral.
>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h |  4 ++--
>   arch/x86/kvm/mmu/mmu.c          |  6 +++---
>   arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
>   arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
>   arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
>   5 files changed, 12 insertions(+), 22 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index aa4eb8cfcd7e..0a819d40131a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -348,7 +348,7 @@ union kvm_mmu_page_role {
>   		 * simple shift.  While there is room, give it a whole
>   		 * byte so it is also faster to load it from memory.
>   		 */
> -		unsigned smm:8;
> +		unsigned as_id:8;
>   	};
>   };
>   
> @@ -2056,7 +2056,7 @@ enum {
>   # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
>   # define KVM_ADDRESS_SPACE_NUM 2
>   # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
> -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
> +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
>   #else
>   # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
>   #endif
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 4d188f056933..f375b719f565 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
>   	union kvm_cpu_role role = {0};
>   
>   	role.base.access = ACC_ALL;
> -	role.base.smm = is_smm(vcpu);
> +	role.base.as_id = is_smm(vcpu);

I'm not familiar with other architectures, is there similar conception 
as x86 smm mode?

If not, maybe need to re-shape is_smm() as a common helper to get the as_id.

[...]


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-09  2:37     ` Yang, Weijiang
  0 siblings, 0 replies; 317+ messages in thread
From: Yang, Weijiang @ 2022-12-09  2:37 UTC (permalink / raw)
  To: David Matlack, Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Christopherson,,
	Sean, Andrew Morton, Anshuman Khandual, Amit, Nadav,
	Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv


On 12/9/2022 3:38 AM, David Matlack wrote:
> Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
> directly as the address space ID throughout the KVM MMU code. This
> eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
> prepares for making kvm_mmu_page_role architecture-neutral.
>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h |  4 ++--
>   arch/x86/kvm/mmu/mmu.c          |  6 +++---
>   arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
>   arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
>   arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
>   5 files changed, 12 insertions(+), 22 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index aa4eb8cfcd7e..0a819d40131a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -348,7 +348,7 @@ union kvm_mmu_page_role {
>   		 * simple shift.  While there is room, give it a whole
>   		 * byte so it is also faster to load it from memory.
>   		 */
> -		unsigned smm:8;
> +		unsigned as_id:8;
>   	};
>   };
>   
> @@ -2056,7 +2056,7 @@ enum {
>   # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
>   # define KVM_ADDRESS_SPACE_NUM 2
>   # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
> -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
> +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
>   #else
>   # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
>   #endif
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 4d188f056933..f375b719f565 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
>   	union kvm_cpu_role role = {0};
>   
>   	role.base.access = ACC_ALL;
> -	role.base.smm = is_smm(vcpu);
> +	role.base.as_id = is_smm(vcpu);

I'm not familiar with other architectures, is there similar conception 
as x86 smm mode?

If not, maybe need to re-shape is_smm() as a common helper to get the as_id.

[...]


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-09  2:37     ` Yang, Weijiang
  0 siblings, 0 replies; 317+ messages in thread
From: Yang, Weijiang @ 2022-12-09  2:37 UTC (permalink / raw)
  To: David Matlack, Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Amit, Nadav,
	Colin Cross, Ben Gardon, linux-riscv, kvmarm, Yu Zhao, xu xin,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Marc Zyngier,
	Andrew Morton


On 12/9/2022 3:38 AM, David Matlack wrote:
> Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
> directly as the address space ID throughout the KVM MMU code. This
> eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
> prepares for making kvm_mmu_page_role architecture-neutral.
>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h |  4 ++--
>   arch/x86/kvm/mmu/mmu.c          |  6 +++---
>   arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
>   arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
>   arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
>   5 files changed, 12 insertions(+), 22 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index aa4eb8cfcd7e..0a819d40131a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -348,7 +348,7 @@ union kvm_mmu_page_role {
>   		 * simple shift.  While there is room, give it a whole
>   		 * byte so it is also faster to load it from memory.
>   		 */
> -		unsigned smm:8;
> +		unsigned as_id:8;
>   	};
>   };
>   
> @@ -2056,7 +2056,7 @@ enum {
>   # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
>   # define KVM_ADDRESS_SPACE_NUM 2
>   # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
> -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
> +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
>   #else
>   # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
>   #endif
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 4d188f056933..f375b719f565 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
>   	union kvm_cpu_role role = {0};
>   
>   	role.base.access = ACC_ALL;
> -	role.base.smm = is_smm(vcpu);
> +	role.base.as_id = is_smm(vcpu);

I'm not familiar with other architectures, is there similar conception 
as x86 smm mode?

If not, maybe need to re-shape is_smm() as a common helper to get the as_id.

[...]

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  2022-12-09  2:37     ` Yang, Weijiang
  (?)
  (?)
@ 2022-12-09 17:24       ` Oliver Upton
  -1 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 17:24 UTC (permalink / raw)
  To: Yang, Weijiang
  Cc: David Matlack, Paolo Bonzini, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christopherson,,
	Sean, Andrew Morton, Anshuman Khandual, Amit, Nadav,
	Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> 
> On 12/9/2022 3:38 AM, David Matlack wrote:
> > Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
> > directly as the address space ID throughout the KVM MMU code. This
> > eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
> > prepares for making kvm_mmu_page_role architecture-neutral.
> > 
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
> >   arch/x86/include/asm/kvm_host.h |  4 ++--
> >   arch/x86/kvm/mmu/mmu.c          |  6 +++---
> >   arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
> >   arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
> >   arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
> >   5 files changed, 12 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index aa4eb8cfcd7e..0a819d40131a 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -348,7 +348,7 @@ union kvm_mmu_page_role {
> >   		 * simple shift.  While there is room, give it a whole
> >   		 * byte so it is also faster to load it from memory.
> >   		 */
> > -		unsigned smm:8;
> > +		unsigned as_id:8;
> >   	};
> >   };
> > @@ -2056,7 +2056,7 @@ enum {
> >   # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
> >   # define KVM_ADDRESS_SPACE_NUM 2
> >   # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
> > -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
> > +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
> >   #else
> >   # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
> >   #endif
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 4d188f056933..f375b719f565 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> >   	union kvm_cpu_role role = {0};
> >   	role.base.access = ACC_ALL;
> > -	role.base.smm = is_smm(vcpu);
> > +	role.base.as_id = is_smm(vcpu);
> 
> I'm not familiar with other architectures, is there similar conception as
> x86 smm mode?

For KVM/arm64:

No, we don't do anything like SMM emulation on x86. Architecturally
speaking, though, we do have a higher level of privilege typically
used by firmware on arm64, called EL3.

I'll need to read David's series a bit more closely, but I'm inclined to
think that the page role is going to be rather arch-specific.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-09 17:24       ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 17:24 UTC (permalink / raw)
  To: Yang, Weijiang
  Cc: David Matlack, Paolo Bonzini, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christopherson,,
	Sean, Andrew Morton, Anshuman Khandual, Amit, Nadav,
	Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> 
> On 12/9/2022 3:38 AM, David Matlack wrote:
> > Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
> > directly as the address space ID throughout the KVM MMU code. This
> > eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
> > prepares for making kvm_mmu_page_role architecture-neutral.
> > 
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
> >   arch/x86/include/asm/kvm_host.h |  4 ++--
> >   arch/x86/kvm/mmu/mmu.c          |  6 +++---
> >   arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
> >   arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
> >   arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
> >   5 files changed, 12 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index aa4eb8cfcd7e..0a819d40131a 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -348,7 +348,7 @@ union kvm_mmu_page_role {
> >   		 * simple shift.  While there is room, give it a whole
> >   		 * byte so it is also faster to load it from memory.
> >   		 */
> > -		unsigned smm:8;
> > +		unsigned as_id:8;
> >   	};
> >   };
> > @@ -2056,7 +2056,7 @@ enum {
> >   # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
> >   # define KVM_ADDRESS_SPACE_NUM 2
> >   # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
> > -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
> > +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
> >   #else
> >   # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
> >   #endif
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 4d188f056933..f375b719f565 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> >   	union kvm_cpu_role role = {0};
> >   	role.base.access = ACC_ALL;
> > -	role.base.smm = is_smm(vcpu);
> > +	role.base.as_id = is_smm(vcpu);
> 
> I'm not familiar with other architectures, is there similar conception as
> x86 smm mode?

For KVM/arm64:

No, we don't do anything like SMM emulation on x86. Architecturally
speaking, though, we do have a higher level of privilege typically
used by firmware on arm64, called EL3.

I'll need to read David's series a bit more closely, but I'm inclined to
think that the page role is going to be rather arch-specific.

--
Thanks,
Oliver

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-09 17:24       ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 17:24 UTC (permalink / raw)
  To: Yang, Weijiang
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm, Amit,
	Nadav, Ben Gardon, linux-riscv, kvmarm, Yu Zhao, Marc Zyngier,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, linux-mips, Colin Cross,
	kvm-riscv, Paolo Bonzini, Andrew Morton

On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> 
> On 12/9/2022 3:38 AM, David Matlack wrote:
> > Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
> > directly as the address space ID throughout the KVM MMU code. This
> > eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
> > prepares for making kvm_mmu_page_role architecture-neutral.
> > 
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
> >   arch/x86/include/asm/kvm_host.h |  4 ++--
> >   arch/x86/kvm/mmu/mmu.c          |  6 +++---
> >   arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
> >   arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
> >   arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
> >   5 files changed, 12 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index aa4eb8cfcd7e..0a819d40131a 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -348,7 +348,7 @@ union kvm_mmu_page_role {
> >   		 * simple shift.  While there is room, give it a whole
> >   		 * byte so it is also faster to load it from memory.
> >   		 */
> > -		unsigned smm:8;
> > +		unsigned as_id:8;
> >   	};
> >   };
> > @@ -2056,7 +2056,7 @@ enum {
> >   # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
> >   # define KVM_ADDRESS_SPACE_NUM 2
> >   # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
> > -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
> > +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
> >   #else
> >   # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
> >   #endif
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 4d188f056933..f375b719f565 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> >   	union kvm_cpu_role role = {0};
> >   	role.base.access = ACC_ALL;
> > -	role.base.smm = is_smm(vcpu);
> > +	role.base.as_id = is_smm(vcpu);
> 
> I'm not familiar with other architectures, is there similar conception as
> x86 smm mode?

For KVM/arm64:

No, we don't do anything like SMM emulation on x86. Architecturally
speaking, though, we do have a higher level of privilege typically
used by firmware on arm64, called EL3.

I'll need to read David's series a bit more closely, but I'm inclined to
think that the page role is going to be rather arch-specific.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-09 17:24       ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 17:24 UTC (permalink / raw)
  To: Yang, Weijiang
  Cc: David Matlack, Paolo Bonzini, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christopherson,,
	Sean, Andrew Morton, Anshuman Khandual, Amit, Nadav,
	Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> 
> On 12/9/2022 3:38 AM, David Matlack wrote:
> > Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
> > directly as the address space ID throughout the KVM MMU code. This
> > eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
> > prepares for making kvm_mmu_page_role architecture-neutral.
> > 
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
> >   arch/x86/include/asm/kvm_host.h |  4 ++--
> >   arch/x86/kvm/mmu/mmu.c          |  6 +++---
> >   arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
> >   arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
> >   arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
> >   5 files changed, 12 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index aa4eb8cfcd7e..0a819d40131a 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -348,7 +348,7 @@ union kvm_mmu_page_role {
> >   		 * simple shift.  While there is room, give it a whole
> >   		 * byte so it is also faster to load it from memory.
> >   		 */
> > -		unsigned smm:8;
> > +		unsigned as_id:8;
> >   	};
> >   };
> > @@ -2056,7 +2056,7 @@ enum {
> >   # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
> >   # define KVM_ADDRESS_SPACE_NUM 2
> >   # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
> > -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
> > +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
> >   #else
> >   # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
> >   #endif
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 4d188f056933..f375b719f565 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> >   	union kvm_cpu_role role = {0};
> >   	role.base.access = ACC_ALL;
> > -	role.base.smm = is_smm(vcpu);
> > +	role.base.as_id = is_smm(vcpu);
> 
> I'm not familiar with other architectures, is there similar conception as
> x86 smm mode?

For KVM/arm64:

No, we don't do anything like SMM emulation on x86. Architecturally
speaking, though, we do have a higher level of privilege typically
used by firmware on arm64, called EL3.

I'll need to read David's series a bit more closely, but I'm inclined to
think that the page role is going to be rather arch-specific.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
  2022-12-08 19:38   ` David Matlack
  (?)
  (?)
@ 2022-12-09 17:31     ` Oliver Upton
  -1 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 17:31 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Hey David,

On Thu, Dec 08, 2022 at 11:38:43AM -0800, David Matlack wrote:
> Move VM-level TDP MMU state to struct kvm so it can be accessed by
> common code in a future commit.
> 
> No functional change intended.

Could you instead introduce a structure to hold all of the MMU state and
stick that in struct kvm? If the goal is to eventually supersede all
uses of the arm64 pgtable library we are going to need the ability to
operate outside of a KVM VM context.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
@ 2022-12-09 17:31     ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 17:31 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Ben Gardon, linux-riscv, kvmarm, Yu Zhao, xu xin, Huacai Chen,
	Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, Colin Cross, kvm-riscv,
	Marc Zyngier, Paolo Bonzini, Andrew Morton

Hey David,

On Thu, Dec 08, 2022 at 11:38:43AM -0800, David Matlack wrote:
> Move VM-level TDP MMU state to struct kvm so it can be accessed by
> common code in a future commit.
> 
> No functional change intended.

Could you instead introduce a structure to hold all of the MMU state and
stick that in struct kvm? If the goal is to eventually supersede all
uses of the arm64 pgtable library we are going to need the ability to
operate outside of a KVM VM context.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
@ 2022-12-09 17:31     ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 17:31 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Hey David,

On Thu, Dec 08, 2022 at 11:38:43AM -0800, David Matlack wrote:
> Move VM-level TDP MMU state to struct kvm so it can be accessed by
> common code in a future commit.
> 
> No functional change intended.

Could you instead introduce a structure to hold all of the MMU state and
stick that in struct kvm? If the goal is to eventually supersede all
uses of the arm64 pgtable library we are going to need the ability to
operate outside of a KVM VM context.

--
Thanks,
Oliver

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
@ 2022-12-09 17:31     ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 17:31 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

Hey David,

On Thu, Dec 08, 2022 at 11:38:43AM -0800, David Matlack wrote:
> Move VM-level TDP MMU state to struct kvm so it can be accessed by
> common code in a future commit.
> 
> No functional change intended.

Could you instead introduce a structure to hold all of the MMU state and
stick that in struct kvm? If the goal is to eventually supersede all
uses of the arm64 pgtable library we are going to need the ability to
operate outside of a KVM VM context.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  2022-12-09 17:24       ` Oliver Upton
  (?)
  (?)
@ 2022-12-09 17:40         ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-09 17:40 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Yang, Weijiang, Paolo Bonzini, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christopherson,,
	Sean, Andrew Morton, Anshuman Khandual, Amit, Nadav,
	Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> >
> > On 12/9/2022 3:38 AM, David Matlack wrote:
> > > Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
> > > directly as the address space ID throughout the KVM MMU code. This
> > > eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
> > > prepares for making kvm_mmu_page_role architecture-neutral.
> > >
> > > Signed-off-by: David Matlack <dmatlack@google.com>
> > > ---
> > >   arch/x86/include/asm/kvm_host.h |  4 ++--
> > >   arch/x86/kvm/mmu/mmu.c          |  6 +++---
> > >   arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
> > >   arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
> > >   arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
> > >   5 files changed, 12 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index aa4eb8cfcd7e..0a819d40131a 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -348,7 +348,7 @@ union kvm_mmu_page_role {
> > >              * simple shift.  While there is room, give it a whole
> > >              * byte so it is also faster to load it from memory.
> > >              */
> > > -           unsigned smm:8;
> > > +           unsigned as_id:8;
> > >     };
> > >   };
> > > @@ -2056,7 +2056,7 @@ enum {
> > >   # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
> > >   # define KVM_ADDRESS_SPACE_NUM 2
> > >   # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
> > > -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
> > > +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
> > >   #else
> > >   # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
> > >   #endif
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 4d188f056933..f375b719f565 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> > >     union kvm_cpu_role role = {0};
> > >     role.base.access = ACC_ALL;
> > > -   role.base.smm = is_smm(vcpu);
> > > +   role.base.as_id = is_smm(vcpu);
> >
> > I'm not familiar with other architectures, is there similar conception as
> > x86 smm mode?

The notion of address spaces is already existing architecture-neutral
concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
Architectures that do not use multiple address spaces will just leave
as_id is as always 0.

>
> For KVM/arm64:
>
> No, we don't do anything like SMM emulation on x86. Architecturally
> speaking, though, we do have a higher level of privilege typically
> used by firmware on arm64, called EL3.
>
> I'll need to read David's series a bit more closely, but I'm inclined to
> think that the page role is going to be rather arch-specific.

Yes most of the fields are in the arch-specific sub-role. The TDP MMU
only needs to know about the as_id, level, and invalid bits. (See next
patch.)

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-09 17:40         ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-09 17:40 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Yang, Weijiang,
	Amit, Nadav, Ben Gardon, linux-riscv, kvmarm, Yu Zhao,
	Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, Colin Cross, kvm-riscv,
	Paolo Bonzini, Andrew Morton

On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> >
> > On 12/9/2022 3:38 AM, David Matlack wrote:
> > > Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
> > > directly as the address space ID throughout the KVM MMU code. This
> > > eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
> > > prepares for making kvm_mmu_page_role architecture-neutral.
> > >
> > > Signed-off-by: David Matlack <dmatlack@google.com>
> > > ---
> > >   arch/x86/include/asm/kvm_host.h |  4 ++--
> > >   arch/x86/kvm/mmu/mmu.c          |  6 +++---
> > >   arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
> > >   arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
> > >   arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
> > >   5 files changed, 12 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index aa4eb8cfcd7e..0a819d40131a 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -348,7 +348,7 @@ union kvm_mmu_page_role {
> > >              * simple shift.  While there is room, give it a whole
> > >              * byte so it is also faster to load it from memory.
> > >              */
> > > -           unsigned smm:8;
> > > +           unsigned as_id:8;
> > >     };
> > >   };
> > > @@ -2056,7 +2056,7 @@ enum {
> > >   # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
> > >   # define KVM_ADDRESS_SPACE_NUM 2
> > >   # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
> > > -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
> > > +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
> > >   #else
> > >   # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
> > >   #endif
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 4d188f056933..f375b719f565 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> > >     union kvm_cpu_role role = {0};
> > >     role.base.access = ACC_ALL;
> > > -   role.base.smm = is_smm(vcpu);
> > > +   role.base.as_id = is_smm(vcpu);
> >
> > I'm not familiar with other architectures, is there similar conception as
> > x86 smm mode?

The notion of address spaces is already existing architecture-neutral
concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
Architectures that do not use multiple address spaces will just leave
as_id is as always 0.

>
> For KVM/arm64:
>
> No, we don't do anything like SMM emulation on x86. Architecturally
> speaking, though, we do have a higher level of privilege typically
> used by firmware on arm64, called EL3.
>
> I'll need to read David's series a bit more closely, but I'm inclined to
> think that the page role is going to be rather arch-specific.

Yes most of the fields are in the arch-specific sub-role. The TDP MMU
only needs to know about the as_id, level, and invalid bits. (See next
patch.)
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-09 17:40         ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-09 17:40 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Yang, Weijiang, Paolo Bonzini, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christopherson,,
	Sean, Andrew Morton, Anshuman Khandual, Amit, Nadav,
	Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> >
> > On 12/9/2022 3:38 AM, David Matlack wrote:
> > > Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
> > > directly as the address space ID throughout the KVM MMU code. This
> > > eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
> > > prepares for making kvm_mmu_page_role architecture-neutral.
> > >
> > > Signed-off-by: David Matlack <dmatlack@google.com>
> > > ---
> > >   arch/x86/include/asm/kvm_host.h |  4 ++--
> > >   arch/x86/kvm/mmu/mmu.c          |  6 +++---
> > >   arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
> > >   arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
> > >   arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
> > >   5 files changed, 12 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index aa4eb8cfcd7e..0a819d40131a 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -348,7 +348,7 @@ union kvm_mmu_page_role {
> > >              * simple shift.  While there is room, give it a whole
> > >              * byte so it is also faster to load it from memory.
> > >              */
> > > -           unsigned smm:8;
> > > +           unsigned as_id:8;
> > >     };
> > >   };
> > > @@ -2056,7 +2056,7 @@ enum {
> > >   # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
> > >   # define KVM_ADDRESS_SPACE_NUM 2
> > >   # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
> > > -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
> > > +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
> > >   #else
> > >   # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
> > >   #endif
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 4d188f056933..f375b719f565 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> > >     union kvm_cpu_role role = {0};
> > >     role.base.access = ACC_ALL;
> > > -   role.base.smm = is_smm(vcpu);
> > > +   role.base.as_id = is_smm(vcpu);
> >
> > I'm not familiar with other architectures, is there similar conception as
> > x86 smm mode?

The notion of address spaces is already existing architecture-neutral
concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
Architectures that do not use multiple address spaces will just leave
as_id is as always 0.

>
> For KVM/arm64:
>
> No, we don't do anything like SMM emulation on x86. Architecturally
> speaking, though, we do have a higher level of privilege typically
> used by firmware on arm64, called EL3.
>
> I'll need to read David's series a bit more closely, but I'm inclined to
> think that the page role is going to be rather arch-specific.

Yes most of the fields are in the arch-specific sub-role. The TDP MMU
only needs to know about the as_id, level, and invalid bits. (See next
patch.)

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-09 17:40         ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-09 17:40 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Yang, Weijiang, Paolo Bonzini, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Christopherson,,
	Sean, Andrew Morton, Anshuman Khandual, Amit, Nadav,
	Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> >
> > On 12/9/2022 3:38 AM, David Matlack wrote:
> > > Rename kvm_mmu_page_role.smm with kvm_mmu_page_role.as_id and use it
> > > directly as the address space ID throughout the KVM MMU code. This
> > > eliminates a needless level of indirection, kvm_mmu_role_as_id(), and
> > > prepares for making kvm_mmu_page_role architecture-neutral.
> > >
> > > Signed-off-by: David Matlack <dmatlack@google.com>
> > > ---
> > >   arch/x86/include/asm/kvm_host.h |  4 ++--
> > >   arch/x86/kvm/mmu/mmu.c          |  6 +++---
> > >   arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
> > >   arch/x86/kvm/mmu/tdp_iter.c     |  2 +-
> > >   arch/x86/kvm/mmu/tdp_mmu.c      | 12 ++++++------
> > >   5 files changed, 12 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index aa4eb8cfcd7e..0a819d40131a 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -348,7 +348,7 @@ union kvm_mmu_page_role {
> > >              * simple shift.  While there is room, give it a whole
> > >              * byte so it is also faster to load it from memory.
> > >              */
> > > -           unsigned smm:8;
> > > +           unsigned as_id:8;
> > >     };
> > >   };
> > > @@ -2056,7 +2056,7 @@ enum {
> > >   # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
> > >   # define KVM_ADDRESS_SPACE_NUM 2
> > >   # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0)
> > > -# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
> > > +# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).as_id)
> > >   #else
> > >   # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
> > >   #endif
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 4d188f056933..f375b719f565 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> > >     union kvm_cpu_role role = {0};
> > >     role.base.access = ACC_ALL;
> > > -   role.base.smm = is_smm(vcpu);
> > > +   role.base.as_id = is_smm(vcpu);
> >
> > I'm not familiar with other architectures, is there similar conception as
> > x86 smm mode?

The notion of address spaces is already existing architecture-neutral
concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
Architectures that do not use multiple address spaces will just leave
as_id is as always 0.

>
> For KVM/arm64:
>
> No, we don't do anything like SMM emulation on x86. Architecturally
> speaking, though, we do have a higher level of privilege typically
> used by firmware on arm64, called EL3.
>
> I'll need to read David's series a bit more closely, but I'm inclined to
> think that the page role is going to be rather arch-specific.

Yes most of the fields are in the arch-specific sub-role. The TDP MMU
only needs to know about the as_id, level, and invalid bits. (See next
patch.)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
  2022-12-09 17:31     ` Oliver Upton
  (?)
  (?)
@ 2022-12-09 17:57       ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-09 17:57 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 9, 2022 at 9:32 AM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Hey David,
>
> On Thu, Dec 08, 2022 at 11:38:43AM -0800, David Matlack wrote:
> > Move VM-level TDP MMU state to struct kvm so it can be accessed by
> > common code in a future commit.
> >
> > No functional change intended.
>
> Could you instead introduce a structure to hold all of the MMU state and
> stick that in struct kvm? If the goal is to eventually supersede all
> uses of the arm64 pgtable library we are going to need the ability to
> operate outside of a KVM VM context.

This patch does introduce a tdp_mmu struct to hold all of the TDP MMU
state. Did you have something else in mind?

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
@ 2022-12-09 17:57       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-09 17:57 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 9, 2022 at 9:32 AM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Hey David,
>
> On Thu, Dec 08, 2022 at 11:38:43AM -0800, David Matlack wrote:
> > Move VM-level TDP MMU state to struct kvm so it can be accessed by
> > common code in a future commit.
> >
> > No functional change intended.
>
> Could you instead introduce a structure to hold all of the MMU state and
> stick that in struct kvm? If the goal is to eventually supersede all
> uses of the arm64 pgtable library we are going to need the ability to
> operate outside of a KVM VM context.

This patch does introduce a tdp_mmu struct to hold all of the TDP MMU
state. Did you have something else in mind?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
@ 2022-12-09 17:57       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-09 17:57 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Ben Gardon, linux-riscv, kvmarm, Yu Zhao, xu xin, Huacai Chen,
	Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, Colin Cross, kvm-riscv,
	Marc Zyngier, Paolo Bonzini, Andrew Morton

On Fri, Dec 9, 2022 at 9:32 AM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Hey David,
>
> On Thu, Dec 08, 2022 at 11:38:43AM -0800, David Matlack wrote:
> > Move VM-level TDP MMU state to struct kvm so it can be accessed by
> > common code in a future commit.
> >
> > No functional change intended.
>
> Could you instead introduce a structure to hold all of the MMU state and
> stick that in struct kvm? If the goal is to eventually supersede all
> uses of the arm64 pgtable library we are going to need the ability to
> operate outside of a KVM VM context.

This patch does introduce a tdp_mmu struct to hold all of the TDP MMU
state. Did you have something else in mind?
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
@ 2022-12-09 17:57       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-09 17:57 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 9, 2022 at 9:32 AM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Hey David,
>
> On Thu, Dec 08, 2022 at 11:38:43AM -0800, David Matlack wrote:
> > Move VM-level TDP MMU state to struct kvm so it can be accessed by
> > common code in a future commit.
> >
> > No functional change intended.
>
> Could you instead introduce a structure to hold all of the MMU state and
> stick that in struct kvm? If the goal is to eventually supersede all
> uses of the arm64 pgtable library we are going to need the ability to
> operate outside of a KVM VM context.

This patch does introduce a tdp_mmu struct to hold all of the TDP MMU
state. Did you have something else in mind?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
  2022-12-09 17:57       ` David Matlack
  (?)
  (?)
@ 2022-12-09 18:30         ` Oliver Upton
  -1 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 18:30 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 09, 2022 at 09:57:15AM -0800, David Matlack wrote:
> On Fri, Dec 9, 2022 at 9:32 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Hey David,
> >
> > On Thu, Dec 08, 2022 at 11:38:43AM -0800, David Matlack wrote:
> > > Move VM-level TDP MMU state to struct kvm so it can be accessed by
> > > common code in a future commit.
> > >
> > > No functional change intended.
> >
> > Could you instead introduce a structure to hold all of the MMU state and
> > stick that in struct kvm? If the goal is to eventually supersede all
> > uses of the arm64 pgtable library we are going to need the ability to
> > operate outside of a KVM VM context.
> 
> This patch does introduce a tdp_mmu struct to hold all of the TDP MMU
> state. Did you have something else in mind?

No, I'm just an idiot without caffeine. I read the patch then forgot
about it when reading the changelog. Sorry!

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
@ 2022-12-09 18:30         ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 18:30 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 09, 2022 at 09:57:15AM -0800, David Matlack wrote:
> On Fri, Dec 9, 2022 at 9:32 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Hey David,
> >
> > On Thu, Dec 08, 2022 at 11:38:43AM -0800, David Matlack wrote:
> > > Move VM-level TDP MMU state to struct kvm so it can be accessed by
> > > common code in a future commit.
> > >
> > > No functional change intended.
> >
> > Could you instead introduce a structure to hold all of the MMU state and
> > stick that in struct kvm? If the goal is to eventually supersede all
> > uses of the arm64 pgtable library we are going to need the ability to
> > operate outside of a KVM VM context.
> 
> This patch does introduce a tdp_mmu struct to hold all of the TDP MMU
> state. Did you have something else in mind?

No, I'm just an idiot without caffeine. I read the patch then forgot
about it when reading the changelog. Sorry!

--
Thanks,
Oliver

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
@ 2022-12-09 18:30         ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 18:30 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Ben Gardon, linux-riscv, kvmarm, Yu Zhao, xu xin, Huacai Chen,
	Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, Colin Cross, kvm-riscv,
	Marc Zyngier, Paolo Bonzini, Andrew Morton

On Fri, Dec 09, 2022 at 09:57:15AM -0800, David Matlack wrote:
> On Fri, Dec 9, 2022 at 9:32 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Hey David,
> >
> > On Thu, Dec 08, 2022 at 11:38:43AM -0800, David Matlack wrote:
> > > Move VM-level TDP MMU state to struct kvm so it can be accessed by
> > > common code in a future commit.
> > >
> > > No functional change intended.
> >
> > Could you instead introduce a structure to hold all of the MMU state and
> > stick that in struct kvm? If the goal is to eventually supersede all
> > uses of the arm64 pgtable library we are going to need the ability to
> > operate outside of a KVM VM context.
> 
> This patch does introduce a tdp_mmu struct to hold all of the TDP MMU
> state. Did you have something else in mind?

No, I'm just an idiot without caffeine. I read the patch then forgot
about it when reading the changelog. Sorry!

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm
@ 2022-12-09 18:30         ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 18:30 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 09, 2022 at 09:57:15AM -0800, David Matlack wrote:
> On Fri, Dec 9, 2022 at 9:32 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Hey David,
> >
> > On Thu, Dec 08, 2022 at 11:38:43AM -0800, David Matlack wrote:
> > > Move VM-level TDP MMU state to struct kvm so it can be accessed by
> > > common code in a future commit.
> > >
> > > No functional change intended.
> >
> > Could you instead introduce a structure to hold all of the MMU state and
> > stick that in struct kvm? If the goal is to eventually supersede all
> > uses of the arm64 pgtable library we are going to need the ability to
> > operate outside of a KVM VM context.
> 
> This patch does introduce a tdp_mmu struct to hold all of the TDP MMU
> state. Did you have something else in mind?

No, I'm just an idiot without caffeine. I read the patch then forgot
about it when reading the changelog. Sorry!

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
  2022-12-08 19:38 ` David Matlack
  (?)
  (?)
@ 2022-12-09 19:07   ` Oliver Upton
  -1 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 19:07 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
> [ mm folks: You are being cc'd since this series includes a mm patch
>   ("mm: Introduce architecture-neutral PG_LEVEL macros"), but general
>   feedback is also welcome. I imagine there are a lot of lessons KVM can
>   learn from mm about sharing page table code across architectures. ]
> 
> Hello,
> 
> This series refactors the KVM/x86 "TDP MMU" into common code. This is
> the first step toward sharing TDP (aka Stage-2) page table management
> code across architectures that support KVM. For more background on this
> effort please see my talk from KVM Forum 2022 "Exploring an
> architecture-neutral MMU":
> 
>   https://youtu.be/IBhW34fCFi0
> 
> By the end of this series, 90% of the TDP MMU code is in common directories
> (virt/kvm/mmu/ and include/kvm/). The only pieces that remaing in
> arch/x86 are code that deals with constructing/inspecting/modifying PTEs
> and arch hooks to implement NX Huge Pages (a mitigation for an
> Intel-specific vulnerability).
> 
> Before:
> 
>   180 arch/x86/kvm/mmu/tdp_iter.c
>   118 arch/x86/kvm/mmu/tdp_iter.h
>  1917 arch/x86/kvm/mmu/tdp_mmu.c
>    98 arch/x86/kvm/mmu/tdp_mmu.h
>  ----
>  2313 total
> 
> After:
> 
>   178 virt/kvm/mmu/tdp_iter.c
>  1867 virt/kvm/mmu/tdp_mmu.c
>   117 include/kvm/tdp_iter.h
>    78 include/kvm/tdp_mmu.h
>    39 include/kvm/tdp_pgtable.h
>  ----
>   184 arch/x86/kvm/mmu/tdp_pgtable.c
>    76 arch/x86/include/asm/kvm/tdp_pgtable.h
>  ----
>  2539 total
> 
> This series is very much an RFC, but it does build (I tested x86_64 and
> ARM64) and pass basic testing (KVM selftests and kvm-unit-tests on
> x86_64), so it is entirely functional aside from any bugs.
> 
> The main areas I would like feedback are:
> 
>  - NX Huge Pages support in the TDP MMU requires 5 arch hooks in
>    the common code, which IMO makes the NX Huge Pages implementation
>    harder to read. The alternative is to move the NX Huge Pages
>    implementation into common code, including the fields in struct
>    kvm_mmu_page and kvm_page_fault, which would increase memory usage
>    a tiny bit (for non-x86 architectures) and pollute the common code
>    with an x86-specific security mitigation. Ideas on better ways to
>    handle this would be appreciated.
> 
>  - struct kvm_mmu_page increased by 64 bytes because the separation of
>    arch and common state eliminated the ability to use unions to
>    optimize the size of the struct. There's two things we can do to
>    reduce the size of the struct back down: (1) dynamically allocated
>    root-specific fields only for root page tables and (2) dynamically
>    allocate Shadow MMU state in kvm_mmu_page_arch only for Shadow MMU
>    pages. This should actually be a net *reduction* the size of
>    kvm_mmu_page relative today for most pages, but I have not
>    implemented it.
> 
>    Note that an alternative approach I implemented avoided this problem
>    by creating an entirely separate struct for the common TDP MMU (e.g.
>    struct tdp_mmu_page). This however had a lot of downsides that I
>    don't think make it a good solution. Notably, it complicated a ton of
>    existing code in arch/x86/kvm/mmu/mmu.c (e.g. anything that touches
>    vcpu->arch.mmu->root and kvm_recover_nx_huge_pages()) and created a
>    new runtime failure mode in to_shadow_page().
> 
>  - Naming. This series does not change the names of any existing code.
>    So all the KVM/x86 Shadow MMU-style terminology like
>    "shadow_page"/"sp"/"spte" persists. Should we keep that style in
>    common code or move toward something less shadow-paging-specific?
>    e.g. "page_table"/"pt"/"pte".

I would strongly be in favor of discarding the shadow paging residue if
x86 folks are willing to part ways with it :)

>    Also do we want to keep "TDP" or switch
>    to something more familiar across architectures (e.g. ARM and RISC-V
>    both use "Stage-2")?

As it relates to guest memory management I don't see much of an issue
with it, TBH. It is sufficiently arch-generic and gets the point across.

Beyond that I think it really depends on the scope of the common code.

To replace the arm64 table walkers we will need to use it for stage-1
tables. I'm only hand-waving at the cover letter and need to do more
reading, but is it possible to accomplish some division:

 - A set of generic table walkers that implement common operations, like
   map and unmap. Names and types at this layer wouldn't be
   virt-specific.

 - Memory management for KVM guests that uses the table walker library,
   which we can probably still call the TDP MMU.

Certainly this doesn't need to be addressed in the first series, as the x86
surgery is enough on its own. Nonetheless, it is probably worthwhile to
get the conversation started about how this code can actually be used by
the other arches.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-09 19:07   ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 19:07 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
> [ mm folks: You are being cc'd since this series includes a mm patch
>   ("mm: Introduce architecture-neutral PG_LEVEL macros"), but general
>   feedback is also welcome. I imagine there are a lot of lessons KVM can
>   learn from mm about sharing page table code across architectures. ]
> 
> Hello,
> 
> This series refactors the KVM/x86 "TDP MMU" into common code. This is
> the first step toward sharing TDP (aka Stage-2) page table management
> code across architectures that support KVM. For more background on this
> effort please see my talk from KVM Forum 2022 "Exploring an
> architecture-neutral MMU":
> 
>   https://youtu.be/IBhW34fCFi0
> 
> By the end of this series, 90% of the TDP MMU code is in common directories
> (virt/kvm/mmu/ and include/kvm/). The only pieces that remaing in
> arch/x86 are code that deals with constructing/inspecting/modifying PTEs
> and arch hooks to implement NX Huge Pages (a mitigation for an
> Intel-specific vulnerability).
> 
> Before:
> 
>   180 arch/x86/kvm/mmu/tdp_iter.c
>   118 arch/x86/kvm/mmu/tdp_iter.h
>  1917 arch/x86/kvm/mmu/tdp_mmu.c
>    98 arch/x86/kvm/mmu/tdp_mmu.h
>  ----
>  2313 total
> 
> After:
> 
>   178 virt/kvm/mmu/tdp_iter.c
>  1867 virt/kvm/mmu/tdp_mmu.c
>   117 include/kvm/tdp_iter.h
>    78 include/kvm/tdp_mmu.h
>    39 include/kvm/tdp_pgtable.h
>  ----
>   184 arch/x86/kvm/mmu/tdp_pgtable.c
>    76 arch/x86/include/asm/kvm/tdp_pgtable.h
>  ----
>  2539 total
> 
> This series is very much an RFC, but it does build (I tested x86_64 and
> ARM64) and pass basic testing (KVM selftests and kvm-unit-tests on
> x86_64), so it is entirely functional aside from any bugs.
> 
> The main areas I would like feedback are:
> 
>  - NX Huge Pages support in the TDP MMU requires 5 arch hooks in
>    the common code, which IMO makes the NX Huge Pages implementation
>    harder to read. The alternative is to move the NX Huge Pages
>    implementation into common code, including the fields in struct
>    kvm_mmu_page and kvm_page_fault, which would increase memory usage
>    a tiny bit (for non-x86 architectures) and pollute the common code
>    with an x86-specific security mitigation. Ideas on better ways to
>    handle this would be appreciated.
> 
>  - struct kvm_mmu_page increased by 64 bytes because the separation of
>    arch and common state eliminated the ability to use unions to
>    optimize the size of the struct. There's two things we can do to
>    reduce the size of the struct back down: (1) dynamically allocated
>    root-specific fields only for root page tables and (2) dynamically
>    allocate Shadow MMU state in kvm_mmu_page_arch only for Shadow MMU
>    pages. This should actually be a net *reduction* the size of
>    kvm_mmu_page relative today for most pages, but I have not
>    implemented it.
> 
>    Note that an alternative approach I implemented avoided this problem
>    by creating an entirely separate struct for the common TDP MMU (e.g.
>    struct tdp_mmu_page). This however had a lot of downsides that I
>    don't think make it a good solution. Notably, it complicated a ton of
>    existing code in arch/x86/kvm/mmu/mmu.c (e.g. anything that touches
>    vcpu->arch.mmu->root and kvm_recover_nx_huge_pages()) and created a
>    new runtime failure mode in to_shadow_page().
> 
>  - Naming. This series does not change the names of any existing code.
>    So all the KVM/x86 Shadow MMU-style terminology like
>    "shadow_page"/"sp"/"spte" persists. Should we keep that style in
>    common code or move toward something less shadow-paging-specific?
>    e.g. "page_table"/"pt"/"pte".

I would strongly be in favor of discarding the shadow paging residue if
x86 folks are willing to part ways with it :)

>    Also do we want to keep "TDP" or switch
>    to something more familiar across architectures (e.g. ARM and RISC-V
>    both use "Stage-2")?

As it relates to guest memory management I don't see much of an issue
with it, TBH. It is sufficiently arch-generic and gets the point across.

Beyond that I think it really depends on the scope of the common code.

To replace the arm64 table walkers we will need to use it for stage-1
tables. I'm only hand-waving at the cover letter and need to do more
reading, but is it possible to accomplish some division:

 - A set of generic table walkers that implement common operations, like
   map and unmap. Names and types at this layer wouldn't be
   virt-specific.

 - Memory management for KVM guests that uses the table walker library,
   which we can probably still call the TDP MMU.

Certainly this doesn't need to be addressed in the first series, as the x86
surgery is enough on its own. Nonetheless, it is probably worthwhile to
get the conversation started about how this code can actually be used by
the other arches.

--
Thanks,
Oliver

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-09 19:07   ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 19:07 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Ben Gardon, linux-riscv, kvmarm, Yu Zhao, xu xin, Huacai Chen,
	Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, Colin Cross, kvm-riscv,
	Marc Zyngier, Paolo Bonzini, Andrew Morton

On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
> [ mm folks: You are being cc'd since this series includes a mm patch
>   ("mm: Introduce architecture-neutral PG_LEVEL macros"), but general
>   feedback is also welcome. I imagine there are a lot of lessons KVM can
>   learn from mm about sharing page table code across architectures. ]
> 
> Hello,
> 
> This series refactors the KVM/x86 "TDP MMU" into common code. This is
> the first step toward sharing TDP (aka Stage-2) page table management
> code across architectures that support KVM. For more background on this
> effort please see my talk from KVM Forum 2022 "Exploring an
> architecture-neutral MMU":
> 
>   https://youtu.be/IBhW34fCFi0
> 
> By the end of this series, 90% of the TDP MMU code is in common directories
> (virt/kvm/mmu/ and include/kvm/). The only pieces that remaing in
> arch/x86 are code that deals with constructing/inspecting/modifying PTEs
> and arch hooks to implement NX Huge Pages (a mitigation for an
> Intel-specific vulnerability).
> 
> Before:
> 
>   180 arch/x86/kvm/mmu/tdp_iter.c
>   118 arch/x86/kvm/mmu/tdp_iter.h
>  1917 arch/x86/kvm/mmu/tdp_mmu.c
>    98 arch/x86/kvm/mmu/tdp_mmu.h
>  ----
>  2313 total
> 
> After:
> 
>   178 virt/kvm/mmu/tdp_iter.c
>  1867 virt/kvm/mmu/tdp_mmu.c
>   117 include/kvm/tdp_iter.h
>    78 include/kvm/tdp_mmu.h
>    39 include/kvm/tdp_pgtable.h
>  ----
>   184 arch/x86/kvm/mmu/tdp_pgtable.c
>    76 arch/x86/include/asm/kvm/tdp_pgtable.h
>  ----
>  2539 total
> 
> This series is very much an RFC, but it does build (I tested x86_64 and
> ARM64) and pass basic testing (KVM selftests and kvm-unit-tests on
> x86_64), so it is entirely functional aside from any bugs.
> 
> The main areas I would like feedback are:
> 
>  - NX Huge Pages support in the TDP MMU requires 5 arch hooks in
>    the common code, which IMO makes the NX Huge Pages implementation
>    harder to read. The alternative is to move the NX Huge Pages
>    implementation into common code, including the fields in struct
>    kvm_mmu_page and kvm_page_fault, which would increase memory usage
>    a tiny bit (for non-x86 architectures) and pollute the common code
>    with an x86-specific security mitigation. Ideas on better ways to
>    handle this would be appreciated.
> 
>  - struct kvm_mmu_page increased by 64 bytes because the separation of
>    arch and common state eliminated the ability to use unions to
>    optimize the size of the struct. There's two things we can do to
>    reduce the size of the struct back down: (1) dynamically allocated
>    root-specific fields only for root page tables and (2) dynamically
>    allocate Shadow MMU state in kvm_mmu_page_arch only for Shadow MMU
>    pages. This should actually be a net *reduction* the size of
>    kvm_mmu_page relative today for most pages, but I have not
>    implemented it.
> 
>    Note that an alternative approach I implemented avoided this problem
>    by creating an entirely separate struct for the common TDP MMU (e.g.
>    struct tdp_mmu_page). This however had a lot of downsides that I
>    don't think make it a good solution. Notably, it complicated a ton of
>    existing code in arch/x86/kvm/mmu/mmu.c (e.g. anything that touches
>    vcpu->arch.mmu->root and kvm_recover_nx_huge_pages()) and created a
>    new runtime failure mode in to_shadow_page().
> 
>  - Naming. This series does not change the names of any existing code.
>    So all the KVM/x86 Shadow MMU-style terminology like
>    "shadow_page"/"sp"/"spte" persists. Should we keep that style in
>    common code or move toward something less shadow-paging-specific?
>    e.g. "page_table"/"pt"/"pte".

I would strongly be in favor of discarding the shadow paging residue if
x86 folks are willing to part ways with it :)

>    Also do we want to keep "TDP" or switch
>    to something more familiar across architectures (e.g. ARM and RISC-V
>    both use "Stage-2")?

As it relates to guest memory management I don't see much of an issue
with it, TBH. It is sufficiently arch-generic and gets the point across.

Beyond that I think it really depends on the scope of the common code.

To replace the arm64 table walkers we will need to use it for stage-1
tables. I'm only hand-waving at the cover letter and need to do more
reading, but is it possible to accomplish some division:

 - A set of generic table walkers that implement common operations, like
   map and unmap. Names and types at this layer wouldn't be
   virt-specific.

 - Memory management for KVM guests that uses the table walker library,
   which we can probably still call the TDP MMU.

Certainly this doesn't need to be addressed in the first series, as the x86
surgery is enough on its own. Nonetheless, it is probably worthwhile to
get the conversation started about how this code can actually be used by
the other arches.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-09 19:07   ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-09 19:07 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
> [ mm folks: You are being cc'd since this series includes a mm patch
>   ("mm: Introduce architecture-neutral PG_LEVEL macros"), but general
>   feedback is also welcome. I imagine there are a lot of lessons KVM can
>   learn from mm about sharing page table code across architectures. ]
> 
> Hello,
> 
> This series refactors the KVM/x86 "TDP MMU" into common code. This is
> the first step toward sharing TDP (aka Stage-2) page table management
> code across architectures that support KVM. For more background on this
> effort please see my talk from KVM Forum 2022 "Exploring an
> architecture-neutral MMU":
> 
>   https://youtu.be/IBhW34fCFi0
> 
> By the end of this series, 90% of the TDP MMU code is in common directories
> (virt/kvm/mmu/ and include/kvm/). The only pieces that remaing in
> arch/x86 are code that deals with constructing/inspecting/modifying PTEs
> and arch hooks to implement NX Huge Pages (a mitigation for an
> Intel-specific vulnerability).
> 
> Before:
> 
>   180 arch/x86/kvm/mmu/tdp_iter.c
>   118 arch/x86/kvm/mmu/tdp_iter.h
>  1917 arch/x86/kvm/mmu/tdp_mmu.c
>    98 arch/x86/kvm/mmu/tdp_mmu.h
>  ----
>  2313 total
> 
> After:
> 
>   178 virt/kvm/mmu/tdp_iter.c
>  1867 virt/kvm/mmu/tdp_mmu.c
>   117 include/kvm/tdp_iter.h
>    78 include/kvm/tdp_mmu.h
>    39 include/kvm/tdp_pgtable.h
>  ----
>   184 arch/x86/kvm/mmu/tdp_pgtable.c
>    76 arch/x86/include/asm/kvm/tdp_pgtable.h
>  ----
>  2539 total
> 
> This series is very much an RFC, but it does build (I tested x86_64 and
> ARM64) and pass basic testing (KVM selftests and kvm-unit-tests on
> x86_64), so it is entirely functional aside from any bugs.
> 
> The main areas I would like feedback are:
> 
>  - NX Huge Pages support in the TDP MMU requires 5 arch hooks in
>    the common code, which IMO makes the NX Huge Pages implementation
>    harder to read. The alternative is to move the NX Huge Pages
>    implementation into common code, including the fields in struct
>    kvm_mmu_page and kvm_page_fault, which would increase memory usage
>    a tiny bit (for non-x86 architectures) and pollute the common code
>    with an x86-specific security mitigation. Ideas on better ways to
>    handle this would be appreciated.
> 
>  - struct kvm_mmu_page increased by 64 bytes because the separation of
>    arch and common state eliminated the ability to use unions to
>    optimize the size of the struct. There's two things we can do to
>    reduce the size of the struct back down: (1) dynamically allocated
>    root-specific fields only for root page tables and (2) dynamically
>    allocate Shadow MMU state in kvm_mmu_page_arch only for Shadow MMU
>    pages. This should actually be a net *reduction* the size of
>    kvm_mmu_page relative today for most pages, but I have not
>    implemented it.
> 
>    Note that an alternative approach I implemented avoided this problem
>    by creating an entirely separate struct for the common TDP MMU (e.g.
>    struct tdp_mmu_page). This however had a lot of downsides that I
>    don't think make it a good solution. Notably, it complicated a ton of
>    existing code in arch/x86/kvm/mmu/mmu.c (e.g. anything that touches
>    vcpu->arch.mmu->root and kvm_recover_nx_huge_pages()) and created a
>    new runtime failure mode in to_shadow_page().
> 
>  - Naming. This series does not change the names of any existing code.
>    So all the KVM/x86 Shadow MMU-style terminology like
>    "shadow_page"/"sp"/"spte" persists. Should we keep that style in
>    common code or move toward something less shadow-paging-specific?
>    e.g. "page_table"/"pt"/"pte".

I would strongly be in favor of discarding the shadow paging residue if
x86 folks are willing to part ways with it :)

>    Also do we want to keep "TDP" or switch
>    to something more familiar across architectures (e.g. ARM and RISC-V
>    both use "Stage-2")?

As it relates to guest memory management I don't see much of an issue
with it, TBH. It is sufficiently arch-generic and gets the point across.

Beyond that I think it really depends on the scope of the common code.

To replace the arm64 table walkers we will need to use it for stage-1
tables. I'm only hand-waving at the cover letter and need to do more
reading, but is it possible to accomplish some division:

 - A set of generic table walkers that implement common operations, like
   map and unmap. Names and types at this layer wouldn't be
   virt-specific.

 - Memory management for KVM guests that uses the table walker library,
   which we can probably still call the TDP MMU.

Certainly this doesn't need to be addressed in the first series, as the x86
surgery is enough on its own. Nonetheless, it is probably worthwhile to
get the conversation started about how this code can actually be used by
the other arches.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
  2022-12-09 19:07   ` Oliver Upton
  (?)
  (?)
@ 2022-12-10  1:07     ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-10  1:07 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 9, 2022 at 11:07 AM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
>
> >    Also do we want to keep "TDP" or switch
> >    to something more familiar across architectures (e.g. ARM and RISC-V
> >    both use "Stage-2")?
>
> As it relates to guest memory management I don't see much of an issue
> with it, TBH. It is sufficiently arch-generic and gets the point across.
>
> Beyond that I think it really depends on the scope of the common code.
>
> To replace the arm64 table walkers we will need to use it for stage-1
> tables.

Speaking of, have ARM folks ever discussed deduplicating the KVM/ARM
stage-1 code with the Linux stage-1 table code (<linux/pgtable.h>),
which is already architecture-neutral? It seems backwards for us to
build out an architecture-neutral stage-1 walker in KVM when one
already exists.

For example, arch/arm64/kvm/mmu.c:get_user_mapping_size() looks like
it could be reimplemented using <linux/pgtable.h>, rather than using
KVM code. In fact that's what we do for walking stage-1 page tables in
KVM/x86. Take a look at
arch/x86/kvm/mmu/mmu.c:host_pfn_mapping_level(). I bet we could move
that somewhere in mm/ so that it could be shared across KVM/x86 and
KVM/ARM.

> I'm only hand-waving at the cover letter and need to do more
> reading, but is it possible to accomplish some division:
>
>  - A set of generic table walkers that implement common operations, like
>    map and unmap. Names and types at this layer wouldn't be
>    virt-specific.
>
>  - Memory management for KVM guests that uses the table walker library,
>    which we can probably still call the TDP MMU.
>
> Certainly this doesn't need to be addressed in the first series, as the x86
> surgery is enough on its own. Nonetheless, it is probably worthwhile to
> get the conversation started about how this code can actually be used by
> the other arches.

Yup, we'll need some sort of split like that in order to integrate
with KVM/ARM, since the hyp can't access struct kvm, work_queues, etc.
in tdp_mmu.c. I don't think we'll need that split for KVM/RISC-V
though. So for the sake of incremental progress I'm not planning on
doing any of that refactoring preemptively. Plus it should be possible
to keep the TDP MMU API constant when the internal implementation
eventually gets split up. i.e. I don't forsee it creating a bunch of
churn down the road.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-10  1:07     ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-10  1:07 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 9, 2022 at 11:07 AM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
>
> >    Also do we want to keep "TDP" or switch
> >    to something more familiar across architectures (e.g. ARM and RISC-V
> >    both use "Stage-2")?
>
> As it relates to guest memory management I don't see much of an issue
> with it, TBH. It is sufficiently arch-generic and gets the point across.
>
> Beyond that I think it really depends on the scope of the common code.
>
> To replace the arm64 table walkers we will need to use it for stage-1
> tables.

Speaking of, have ARM folks ever discussed deduplicating the KVM/ARM
stage-1 code with the Linux stage-1 table code (<linux/pgtable.h>),
which is already architecture-neutral? It seems backwards for us to
build out an architecture-neutral stage-1 walker in KVM when one
already exists.

For example, arch/arm64/kvm/mmu.c:get_user_mapping_size() looks like
it could be reimplemented using <linux/pgtable.h>, rather than using
KVM code. In fact that's what we do for walking stage-1 page tables in
KVM/x86. Take a look at
arch/x86/kvm/mmu/mmu.c:host_pfn_mapping_level(). I bet we could move
that somewhere in mm/ so that it could be shared across KVM/x86 and
KVM/ARM.

> I'm only hand-waving at the cover letter and need to do more
> reading, but is it possible to accomplish some division:
>
>  - A set of generic table walkers that implement common operations, like
>    map and unmap. Names and types at this layer wouldn't be
>    virt-specific.
>
>  - Memory management for KVM guests that uses the table walker library,
>    which we can probably still call the TDP MMU.
>
> Certainly this doesn't need to be addressed in the first series, as the x86
> surgery is enough on its own. Nonetheless, it is probably worthwhile to
> get the conversation started about how this code can actually be used by
> the other arches.

Yup, we'll need some sort of split like that in order to integrate
with KVM/ARM, since the hyp can't access struct kvm, work_queues, etc.
in tdp_mmu.c. I don't think we'll need that split for KVM/RISC-V
though. So for the sake of incremental progress I'm not planning on
doing any of that refactoring preemptively. Plus it should be possible
to keep the TDP MMU API constant when the internal implementation
eventually gets split up. i.e. I don't forsee it creating a bunch of
churn down the road.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-10  1:07     ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-10  1:07 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Ben Gardon, linux-riscv, kvmarm, Yu Zhao, xu xin, Huacai Chen,
	Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, Colin Cross, kvm-riscv,
	Marc Zyngier, Paolo Bonzini, Andrew Morton

On Fri, Dec 9, 2022 at 11:07 AM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
>
> >    Also do we want to keep "TDP" or switch
> >    to something more familiar across architectures (e.g. ARM and RISC-V
> >    both use "Stage-2")?
>
> As it relates to guest memory management I don't see much of an issue
> with it, TBH. It is sufficiently arch-generic and gets the point across.
>
> Beyond that I think it really depends on the scope of the common code.
>
> To replace the arm64 table walkers we will need to use it for stage-1
> tables.

Speaking of, have ARM folks ever discussed deduplicating the KVM/ARM
stage-1 code with the Linux stage-1 table code (<linux/pgtable.h>),
which is already architecture-neutral? It seems backwards for us to
build out an architecture-neutral stage-1 walker in KVM when one
already exists.

For example, arch/arm64/kvm/mmu.c:get_user_mapping_size() looks like
it could be reimplemented using <linux/pgtable.h>, rather than using
KVM code. In fact that's what we do for walking stage-1 page tables in
KVM/x86. Take a look at
arch/x86/kvm/mmu/mmu.c:host_pfn_mapping_level(). I bet we could move
that somewhere in mm/ so that it could be shared across KVM/x86 and
KVM/ARM.

> I'm only hand-waving at the cover letter and need to do more
> reading, but is it possible to accomplish some division:
>
>  - A set of generic table walkers that implement common operations, like
>    map and unmap. Names and types at this layer wouldn't be
>    virt-specific.
>
>  - Memory management for KVM guests that uses the table walker library,
>    which we can probably still call the TDP MMU.
>
> Certainly this doesn't need to be addressed in the first series, as the x86
> surgery is enough on its own. Nonetheless, it is probably worthwhile to
> get the conversation started about how this code can actually be used by
> the other arches.

Yup, we'll need some sort of split like that in order to integrate
with KVM/ARM, since the hyp can't access struct kvm, work_queues, etc.
in tdp_mmu.c. I don't think we'll need that split for KVM/RISC-V
though. So for the sake of incremental progress I'm not planning on
doing any of that refactoring preemptively. Plus it should be possible
to keep the TDP MMU API constant when the internal implementation
eventually gets split up. i.e. I don't forsee it creating a bunch of
churn down the road.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-10  1:07     ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-10  1:07 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 9, 2022 at 11:07 AM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
>
> >    Also do we want to keep "TDP" or switch
> >    to something more familiar across architectures (e.g. ARM and RISC-V
> >    both use "Stage-2")?
>
> As it relates to guest memory management I don't see much of an issue
> with it, TBH. It is sufficiently arch-generic and gets the point across.
>
> Beyond that I think it really depends on the scope of the common code.
>
> To replace the arm64 table walkers we will need to use it for stage-1
> tables.

Speaking of, have ARM folks ever discussed deduplicating the KVM/ARM
stage-1 code with the Linux stage-1 table code (<linux/pgtable.h>),
which is already architecture-neutral? It seems backwards for us to
build out an architecture-neutral stage-1 walker in KVM when one
already exists.

For example, arch/arm64/kvm/mmu.c:get_user_mapping_size() looks like
it could be reimplemented using <linux/pgtable.h>, rather than using
KVM code. In fact that's what we do for walking stage-1 page tables in
KVM/x86. Take a look at
arch/x86/kvm/mmu/mmu.c:host_pfn_mapping_level(). I bet we could move
that somewhere in mm/ so that it could be shared across KVM/x86 and
KVM/ARM.

> I'm only hand-waving at the cover letter and need to do more
> reading, but is it possible to accomplish some division:
>
>  - A set of generic table walkers that implement common operations, like
>    map and unmap. Names and types at this layer wouldn't be
>    virt-specific.
>
>  - Memory management for KVM guests that uses the table walker library,
>    which we can probably still call the TDP MMU.
>
> Certainly this doesn't need to be addressed in the first series, as the x86
> surgery is enough on its own. Nonetheless, it is probably worthwhile to
> get the conversation started about how this code can actually be used by
> the other arches.

Yup, we'll need some sort of split like that in order to integrate
with KVM/ARM, since the hyp can't access struct kvm, work_queues, etc.
in tdp_mmu.c. I don't think we'll need that split for KVM/RISC-V
though. So for the sake of incremental progress I'm not planning on
doing any of that refactoring preemptively. Plus it should be possible
to keep the TDP MMU API constant when the internal implementation
eventually gets split up. i.e. I don't forsee it creating a bunch of
churn down the road.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  2022-12-09 17:40         ` David Matlack
  (?)
  (?)
@ 2022-12-12 17:39           ` Sean Christopherson
  -1 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-12 17:39 UTC (permalink / raw)
  To: David Matlack
  Cc: Oliver Upton, Yang, Weijiang, Paolo Bonzini, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 09, 2022, David Matlack wrote:
> On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 4d188f056933..f375b719f565 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> > > >     union kvm_cpu_role role = {0};
> > > >     role.base.access = ACC_ALL;
> > > > -   role.base.smm = is_smm(vcpu);
> > > > +   role.base.as_id = is_smm(vcpu);
> > >
> > > I'm not familiar with other architectures, is there similar conception as
> > > x86 smm mode?
> 
> The notion of address spaces is already existing architecture-neutral
> concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
> virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.

Yes, SMM is currently the only use-case.

> Architectures that do not use multiple address spaces will just leave
> as_id is as always 0.

My preference would be to leave .smm in x86's page role.  IMO, defining multiple
address spaces to support SMM emulation was a mistake that should be contained to
SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
even x86 can opt out.

For all potential use cases I'm aware of, SMM included, separate address spaces
are overkill.  The SMM use case is to define a region of guest memory that is
accessible if and only if the vCPU is operating in SMM.  Emulating something like
TrustZone or EL3 would be quite similar.  Ditto for Intel's TXT Private Space
(though I can't imagine KVM ever emulating TXT :-) ).

Using separate address spaces means that userspace needs to define the overlapping
GPA areas multiple times, which is inefficient for both memory and CPU usage.
E.g. for SMM,  userspace needs to redefine all of "regular" memory for SMM in
addition to memory that is SMM-only.  And more bizarelly, nothing prevents userspace
from defining completely different memslot layouts for each address space, which
might may not add complexity in terms of code, but does make it more difficult to
reason about KVM behavior at the boundaries between modes.

Unless I'm missing something, e.g. a need to map GPAs differently for SMM vs.
non-SMM, SMM could have been implemented with a simple flag in a memslot to mark
the memslot as SMM-only.  Or likely even better, as an overlay to track attributes,
e.g. similar to how private vs. shared memory will be handled for protected VMs.
That would be slightly less efficient for memslot searches for use cases where all
memory is mutually exclusive, but simpler and more efficient overall.

And separate address spaces become truly nasty if the CPU can access multiple
protected regions, e.g. if the CPU can access type X and type Y at the same time,
then there would need to be memslots for "regular", X, Y, and X+Y.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-12 17:39           ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-12 17:39 UTC (permalink / raw)
  To: David Matlack
  Cc: Oliver Upton, Yang, Weijiang, Paolo Bonzini, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 09, 2022, David Matlack wrote:
> On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 4d188f056933..f375b719f565 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> > > >     union kvm_cpu_role role = {0};
> > > >     role.base.access = ACC_ALL;
> > > > -   role.base.smm = is_smm(vcpu);
> > > > +   role.base.as_id = is_smm(vcpu);
> > >
> > > I'm not familiar with other architectures, is there similar conception as
> > > x86 smm mode?
> 
> The notion of address spaces is already existing architecture-neutral
> concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
> virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.

Yes, SMM is currently the only use-case.

> Architectures that do not use multiple address spaces will just leave
> as_id is as always 0.

My preference would be to leave .smm in x86's page role.  IMO, defining multiple
address spaces to support SMM emulation was a mistake that should be contained to
SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
even x86 can opt out.

For all potential use cases I'm aware of, SMM included, separate address spaces
are overkill.  The SMM use case is to define a region of guest memory that is
accessible if and only if the vCPU is operating in SMM.  Emulating something like
TrustZone or EL3 would be quite similar.  Ditto for Intel's TXT Private Space
(though I can't imagine KVM ever emulating TXT :-) ).

Using separate address spaces means that userspace needs to define the overlapping
GPA areas multiple times, which is inefficient for both memory and CPU usage.
E.g. for SMM,  userspace needs to redefine all of "regular" memory for SMM in
addition to memory that is SMM-only.  And more bizarelly, nothing prevents userspace
from defining completely different memslot layouts for each address space, which
might may not add complexity in terms of code, but does make it more difficult to
reason about KVM behavior at the boundaries between modes.

Unless I'm missing something, e.g. a need to map GPAs differently for SMM vs.
non-SMM, SMM could have been implemented with a simple flag in a memslot to mark
the memslot as SMM-only.  Or likely even better, as an overlay to track attributes,
e.g. similar to how private vs. shared memory will be handled for protected VMs.
That would be slightly less efficient for memslot searches for use cases where all
memory is mutually exclusive, but simpler and more efficient overall.

And separate address spaces become truly nasty if the CPU can access multiple
protected regions, e.g. if the CPU can access type X and type Y at the same time,
then there would need to be memslots for "regular", X, Y, and X+Y.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-12 17:39           ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-12 17:39 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Yang, Weijiang,
	Amit, Nadav, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Paolo Bonzini,
	Andrew Morton

On Fri, Dec 09, 2022, David Matlack wrote:
> On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 4d188f056933..f375b719f565 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> > > >     union kvm_cpu_role role = {0};
> > > >     role.base.access = ACC_ALL;
> > > > -   role.base.smm = is_smm(vcpu);
> > > > +   role.base.as_id = is_smm(vcpu);
> > >
> > > I'm not familiar with other architectures, is there similar conception as
> > > x86 smm mode?
> 
> The notion of address spaces is already existing architecture-neutral
> concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
> virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.

Yes, SMM is currently the only use-case.

> Architectures that do not use multiple address spaces will just leave
> as_id is as always 0.

My preference would be to leave .smm in x86's page role.  IMO, defining multiple
address spaces to support SMM emulation was a mistake that should be contained to
SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
even x86 can opt out.

For all potential use cases I'm aware of, SMM included, separate address spaces
are overkill.  The SMM use case is to define a region of guest memory that is
accessible if and only if the vCPU is operating in SMM.  Emulating something like
TrustZone or EL3 would be quite similar.  Ditto for Intel's TXT Private Space
(though I can't imagine KVM ever emulating TXT :-) ).

Using separate address spaces means that userspace needs to define the overlapping
GPA areas multiple times, which is inefficient for both memory and CPU usage.
E.g. for SMM,  userspace needs to redefine all of "regular" memory for SMM in
addition to memory that is SMM-only.  And more bizarelly, nothing prevents userspace
from defining completely different memslot layouts for each address space, which
might may not add complexity in terms of code, but does make it more difficult to
reason about KVM behavior at the boundaries between modes.

Unless I'm missing something, e.g. a need to map GPAs differently for SMM vs.
non-SMM, SMM could have been implemented with a simple flag in a memslot to mark
the memslot as SMM-only.  Or likely even better, as an overlay to track attributes,
e.g. similar to how private vs. shared memory will be handled for protected VMs.
That would be slightly less efficient for memslot searches for use cases where all
memory is mutually exclusive, but simpler and more efficient overall.

And separate address spaces become truly nasty if the CPU can access multiple
protected regions, e.g. if the CPU can access type X and type Y at the same time,
then there would need to be memslots for "regular", X, Y, and X+Y.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-12 17:39           ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-12 17:39 UTC (permalink / raw)
  To: David Matlack
  Cc: Oliver Upton, Yang, Weijiang, Paolo Bonzini, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Fri, Dec 09, 2022, David Matlack wrote:
> On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 4d188f056933..f375b719f565 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> > > >     union kvm_cpu_role role = {0};
> > > >     role.base.access = ACC_ALL;
> > > > -   role.base.smm = is_smm(vcpu);
> > > > +   role.base.as_id = is_smm(vcpu);
> > >
> > > I'm not familiar with other architectures, is there similar conception as
> > > x86 smm mode?
> 
> The notion of address spaces is already existing architecture-neutral
> concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
> virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.

Yes, SMM is currently the only use-case.

> Architectures that do not use multiple address spaces will just leave
> as_id is as always 0.

My preference would be to leave .smm in x86's page role.  IMO, defining multiple
address spaces to support SMM emulation was a mistake that should be contained to
SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
even x86 can opt out.

For all potential use cases I'm aware of, SMM included, separate address spaces
are overkill.  The SMM use case is to define a region of guest memory that is
accessible if and only if the vCPU is operating in SMM.  Emulating something like
TrustZone or EL3 would be quite similar.  Ditto for Intel's TXT Private Space
(though I can't imagine KVM ever emulating TXT :-) ).

Using separate address spaces means that userspace needs to define the overlapping
GPA areas multiple times, which is inefficient for both memory and CPU usage.
E.g. for SMM,  userspace needs to redefine all of "regular" memory for SMM in
addition to memory that is SMM-only.  And more bizarelly, nothing prevents userspace
from defining completely different memslot layouts for each address space, which
might may not add complexity in terms of code, but does make it more difficult to
reason about KVM behavior at the boundaries between modes.

Unless I'm missing something, e.g. a need to map GPAs differently for SMM vs.
non-SMM, SMM could have been implemented with a simple flag in a memslot to mark
the memslot as SMM-only.  Or likely even better, as an overlay to track attributes,
e.g. similar to how private vs. shared memory will be handled for protected VMs.
That would be slightly less efficient for memslot searches for use cases where all
memory is mutually exclusive, but simpler and more efficient overall.

And separate address spaces become truly nasty if the CPU can access multiple
protected regions, e.g. if the CPU can access type X and type Y at the same time,
then there would need to be memslots for "regular", X, Y, and X+Y.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
  2022-12-08 19:38   ` David Matlack
  (?)
  (?)
@ 2022-12-12 17:48     ` Ben Gardon
  -1 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 17:48 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_mmu_page_role into common code, and move all
> x86-specific fields into a separate sub-struct within the role,
> kvm_mmu_page_role_arch.
>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  MAINTAINERS                          |   4 +-
>  arch/x86/include/asm/kvm/mmu_types.h |  56 ++++++++++
>  arch/x86/include/asm/kvm_host.h      |  68 +-----------
>  arch/x86/kvm/mmu/mmu.c               | 156 +++++++++++++--------------
>  arch/x86/kvm/mmu/mmu_internal.h      |   4 +-
>  arch/x86/kvm/mmu/mmutrace.h          |  12 +--
>  arch/x86/kvm/mmu/paging_tmpl.h       |  20 ++--
>  arch/x86/kvm/mmu/spte.c              |   4 +-
>  arch/x86/kvm/mmu/spte.h              |   2 +-
>  arch/x86/kvm/x86.c                   |   8 +-
>  include/kvm/mmu_types.h              |  37 +++++++
>  11 files changed, 202 insertions(+), 169 deletions(-)
>  create mode 100644 arch/x86/include/asm/kvm/mmu_types.h
>  create mode 100644 include/kvm/mmu_types.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 89672a59c0c3..7e586d7ba78c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11198,7 +11198,8 @@ W:      http://www.linux-kvm.org
>  T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
>  F:     Documentation/virt/kvm/
>  F:     include/asm-generic/kvm*
> -F:     include/kvm/iodev.h
> +F:     include/kvm/
> +X:     include/kvm/arm_*
>  F:     include/linux/kvm*
>  F:     include/trace/events/kvm.h
>  F:     include/uapi/asm-generic/kvm*
> @@ -11285,6 +11286,7 @@ L:      kvm@vger.kernel.org
>  S:     Supported
>  T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
>  F:     arch/x86/include/asm/kvm*
> +F:     arch/x86/include/asm/kvm/
>  F:     arch/x86/include/asm/svm.h
>  F:     arch/x86/include/asm/vmx*.h
>  F:     arch/x86/include/uapi/asm/kvm*
> diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
> new file mode 100644
> index 000000000000..35f893ebab5a
> --- /dev/null
> +++ b/arch/x86/include/asm/kvm/mmu_types.h
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_KVM_MMU_TYPES_H
> +#define __ASM_KVM_MMU_TYPES_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * This is a subset of the overall kvm_cpu_role to minimize the size of
> + * kvm_memory_slot.arch.gfn_track, i.e. allows allocating 2 bytes per gfn
> + * instead of 4 bytes per gfn.
> + *
> + * Upper-level shadow pages having gptes are tracked for write-protection via
> + * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
> + * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
> + * gfn_track will overflow and explosions will ensure.
> + *
> + * A unique shadow page (SP) for a gfn is created if and only if an existing SP
> + * cannot be reused.  The ability to reuse a SP is tracked by its role, which
> + * incorporates various mode bits and properties of the SP.  Roughly speaking,
> + * the number of unique SPs that can theoretically be created is 2^n, where n
> + * is the number of bits that are used to compute the role.
> + *
> + * Note, not all combinations of modes and flags below are possible:
> + *
> + *   - invalid shadow pages are not accounted, so the bits are effectively 18
> + *
> + *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
> + *     execonly and ad_disabled are only used for nested EPT which has
> + *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
> + *
> + *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
> + *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
> + *     paging has exactly one upper level, making level completely redundant
> + *     when has_4_byte_gpte=1.
> + *
> + *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
> + *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
> + *
> + * Therefore, the maximum number of possible upper-level shadow pages for a
> + * single gfn is a bit less than 2^13.
> + */
> +struct kvm_mmu_page_role_arch {
> +       u16 has_4_byte_gpte:1;
> +       u16 quadrant:2;
> +       u16 direct:1;
> +       u16 access:3;
> +       u16 efer_nx:1;
> +       u16 cr0_wp:1;
> +       u16 smep_andnot_wp:1;
> +       u16 smap_andnot_wp:1;
> +       u16 ad_disabled:1;
> +       u16 guest_mode:1;
> +       u16 passthrough:1;
> +};
> +
> +#endif /* !__ASM_KVM_MMU_TYPES_H */
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0a819d40131a..ebcd7a0dabef 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -37,6 +37,8 @@
>  #include <asm/kvm_vcpu_regs.h>
>  #include <asm/hyperv-tlfs.h>
>
> +#include <kvm/mmu_types.h>
> +
>  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>
>  #define KVM_MAX_VCPUS 1024
> @@ -286,72 +288,6 @@ enum x86_intercept_stage;
>
>  struct kvm_kernel_irq_routing_entry;
>
> -/*
> - * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> - * also includes TDP pages) to determine whether or not a page can be used in
> - * the given MMU context.  This is a subset of the overall kvm_cpu_role to
> - * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
> - * 2 bytes per gfn instead of 4 bytes per gfn.
> - *
> - * Upper-level shadow pages having gptes are tracked for write-protection via
> - * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
> - * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
> - * gfn_track will overflow and explosions will ensure.
> - *
> - * A unique shadow page (SP) for a gfn is created if and only if an existing SP
> - * cannot be reused.  The ability to reuse a SP is tracked by its role, which
> - * incorporates various mode bits and properties of the SP.  Roughly speaking,
> - * the number of unique SPs that can theoretically be created is 2^n, where n
> - * is the number of bits that are used to compute the role.
> - *
> - * But, even though there are 19 bits in the mask below, not all combinations
> - * of modes and flags are possible:
> - *
> - *   - invalid shadow pages are not accounted, so the bits are effectively 18
> - *
> - *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
> - *     execonly and ad_disabled are only used for nested EPT which has
> - *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
> - *
> - *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
> - *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
> - *     paging has exactly one upper level, making level completely redundant
> - *     when has_4_byte_gpte=1.
> - *
> - *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
> - *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
> - *
> - * Therefore, the maximum number of possible upper-level shadow pages for a
> - * single gfn is a bit less than 2^13.
> - */
> -union kvm_mmu_page_role {
> -       u32 word;
> -       struct {
> -               unsigned level:4;
> -               unsigned has_4_byte_gpte:1;
> -               unsigned quadrant:2;
> -               unsigned direct:1;
> -               unsigned access:3;
> -               unsigned invalid:1;
> -               unsigned efer_nx:1;
> -               unsigned cr0_wp:1;
> -               unsigned smep_andnot_wp:1;
> -               unsigned smap_andnot_wp:1;
> -               unsigned ad_disabled:1;
> -               unsigned guest_mode:1;
> -               unsigned passthrough:1;
> -               unsigned :5;
> -
> -               /*
> -                * This is left at the top of the word so that
> -                * kvm_memslots_for_spte_role can extract it with a
> -                * simple shift.  While there is room, give it a whole
> -                * byte so it is also faster to load it from memory.
> -                */
> -               unsigned as_id:8;
> -       };
> -};
> -
>  /*
>   * kvm_mmu_extended_role complements kvm_mmu_page_role, tracking properties
>   * relevant to the current MMU configuration.   When loading CR0, CR4, or EFER,
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index f375b719f565..355548603960 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -210,13 +210,13 @@ static inline bool __maybe_unused is_##reg##_##name(struct kvm_mmu *mmu)  \
>  {                                                              \
>         return !!(mmu->cpu_role. base_or_ext . reg##_##name);   \
>  }
> -BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
> +BUILD_MMU_ROLE_ACCESSOR(base.arch, cr0, wp);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pse);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smep);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smap);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
> -BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
> +BUILD_MMU_ROLE_ACCESSOR(base.arch, efer, nx);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  efer, lma);
>
>  static inline bool is_cr0_pg(struct kvm_mmu *mmu)
> @@ -226,7 +226,7 @@ static inline bool is_cr0_pg(struct kvm_mmu *mmu)
>
>  static inline bool is_cr4_pae(struct kvm_mmu *mmu)
>  {
> -        return !mmu->cpu_role.base.has_4_byte_gpte;
> +       return !mmu->cpu_role.base.arch.has_4_byte_gpte;
>  }
>
>  static struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
> @@ -618,7 +618,7 @@ static bool mmu_spte_age(u64 *sptep)
>
>  static inline bool is_tdp_mmu_active(struct kvm_vcpu *vcpu)
>  {
> -       return tdp_mmu_enabled && vcpu->arch.mmu->root_role.direct;
> +       return tdp_mmu_enabled && vcpu->arch.mmu->root_role.arch.direct;
>  }
>
>  static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
> @@ -695,10 +695,10 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp);
>
>  static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>  {
> -       if (sp->role.passthrough)
> +       if (sp->role.arch.passthrough)
>                 return sp->gfn;
>
> -       if (!sp->role.direct)
> +       if (!sp->role.arch.direct)
>                 return sp->shadowed_translation[index] >> PAGE_SHIFT;
>
>         return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
> @@ -727,7 +727,7 @@ static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
>          *
>          * In both cases, sp->role.access contains the correct access bits.
>          */
> -       return sp->role.access;
> +       return sp->role.arch.access;
>  }
>
>  static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
> @@ -739,14 +739,14 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
>         }
>
>         WARN_ONCE(access != kvm_mmu_page_get_access(sp, index),
> -                 "access mismatch under %s page %llx (expected %u, got %u)\n",
> -                 sp->role.passthrough ? "passthrough" : "direct",
> -                 sp->gfn, kvm_mmu_page_get_access(sp, index), access);
> +                 "access mismatch under %s page %llx (expected %u, got %u)\n",
> +                 sp->role.arch.passthrough ? "passthrough" : "direct",
> +                 sp->gfn, kvm_mmu_page_get_access(sp, index), access);
>
>         WARN_ONCE(gfn != kvm_mmu_page_get_gfn(sp, index),
> -                 "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
> -                 sp->role.passthrough ? "passthrough" : "direct",
> -                 sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
> +                 "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
> +                 sp->role.arch.passthrough ? "passthrough" : "direct",
> +                 sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
>  }
>
>  static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
> @@ -1723,7 +1723,7 @@ static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
>         hlist_del(&sp->hash_link);
>         list_del(&sp->link);
>         free_page((unsigned long)sp->spt);
> -       if (!sp->role.direct)
> +       if (!sp->role.arch.direct)
>                 free_page((unsigned long)sp->shadowed_translation);
>         kmem_cache_free(mmu_page_header_cache, sp);
>  }
> @@ -1884,10 +1884,10 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
>
>  static bool sp_has_gptes(struct kvm_mmu_page *sp)
>  {
> -       if (sp->role.direct)
> +       if (sp->role.arch.direct)
>                 return false;
>
> -       if (sp->role.passthrough)
> +       if (sp->role.arch.passthrough)
>                 return false;
>
>         return true;
> @@ -2065,7 +2065,7 @@ static void clear_sp_write_flooding_count(u64 *spte)
>   * The vCPU is required when finding indirect shadow pages; the shadow
>   * page may already exist and syncing it needs the vCPU pointer in
>   * order to read guest page tables.  Direct shadow pages are never
> - * unsync, thus @vcpu can be NULL if @role.direct is true.
> + * unsync, thus @vcpu can be NULL if @role.arch.direct is true.
>   */
>  static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                                                      struct kvm_vcpu *vcpu,
> @@ -2101,7 +2101,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                 }
>
>                 /* unsync and write-flooding only apply to indirect SPs. */
> -               if (sp->role.direct)
> +               if (sp->role.arch.direct)
>                         goto out;
>
>                 if (sp->unsync) {
> @@ -2162,7 +2162,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>
>         sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
>         sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
> -       if (!role.direct)
> +       if (!role.arch.direct)
>                 sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
>
>         set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
> @@ -2187,7 +2187,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>         return sp;
>  }
>
> -/* Note, @vcpu may be NULL if @role.direct is true; see kvm_mmu_find_shadow_page. */
> +/* Note, @vcpu may be NULL if @role.arch.direct is true; see kvm_mmu_find_shadow_page. */
>  static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
>                                                       struct kvm_vcpu *vcpu,
>                                                       struct shadow_page_caches *caches,
> @@ -2231,9 +2231,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
>
>         role = parent_sp->role;
>         role.level--;
> -       role.access = access;
> -       role.direct = direct;
> -       role.passthrough = 0;
> +       role.arch.access = access;
> +       role.arch.direct = direct;
> +       role.arch.passthrough = 0;
>
>         /*
>          * If the guest has 4-byte PTEs then that means it's using 32-bit,
> @@ -2261,9 +2261,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
>          * covers bit 21 (see above), thus the quadrant is calculated from the
>          * _least_ significant bit of the PDE index.
>          */
> -       if (role.has_4_byte_gpte) {
> +       if (role.arch.has_4_byte_gpte) {
>                 WARN_ON_ONCE(role.level != PG_LEVEL_4K);
> -               role.quadrant = spte_index(sptep) & 1;
> +               role.arch.quadrant = spte_index(sptep) & 1;
>         }
>
>         return role;
> @@ -2292,7 +2292,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
>
>         if (iterator->level >= PT64_ROOT_4LEVEL &&
>             vcpu->arch.mmu->cpu_role.base.level < PT64_ROOT_4LEVEL &&
> -           !vcpu->arch.mmu->root_role.direct)
> +           !vcpu->arch.mmu->root_role.arch.direct)
>                 iterator->level = PT32E_ROOT_LEVEL;
>
>         if (iterator->level == PT32E_ROOT_LEVEL) {
> @@ -2391,7 +2391,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>                  * a new sp with the correct access.
>                  */
>                 child = spte_to_child_sp(*sptep);
> -               if (child->role.access == direct_access)
> +               if (child->role.arch.access == direct_access)
>                         return;
>
>                 drop_parent_pte(child, sptep);
> @@ -2420,7 +2420,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
>                          * avoids retaining a large number of stale nested SPs.
>                          */
>                         if (tdp_enabled && invalid_list &&
> -                           child->role.guest_mode && !child->parent_ptes.val)
> +                           child->role.arch.guest_mode && !child->parent_ptes.val)
>                                 return kvm_mmu_prepare_zap_page(kvm, child,
>                                                                 invalid_list);
>                 }
> @@ -2689,7 +2689,7 @@ static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
>         gpa_t gpa;
>         int r;
>
> -       if (vcpu->arch.mmu->root_role.direct)
> +       if (vcpu->arch.mmu->root_role.arch.direct)
>                 return 0;
>
>         gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
> @@ -2900,7 +2900,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
>  {
>         struct page *pages[PTE_PREFETCH_NUM];
>         struct kvm_memory_slot *slot;
> -       unsigned int access = sp->role.access;
> +       unsigned int access = sp->role.arch.access;
>         int i, ret;
>         gfn_t gfn;
>
> @@ -2928,7 +2928,7 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
>         u64 *spte, *start = NULL;
>         int i;
>
> -       WARN_ON(!sp->role.direct);
> +       WARN_ON(!sp->role.arch.direct);
>
>         i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
>         spte = sp->spt + i;
> @@ -3549,7 +3549,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
>          * This should not be called while L2 is active, L2 can't invalidate
>          * _only_ its own roots, e.g. INVVPID unconditionally exits.
>          */
> -       WARN_ON_ONCE(mmu->root_role.guest_mode);
> +       WARN_ON_ONCE(mmu->root_role.arch.guest_mode);
>
>         for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
>                 root_hpa = mmu->prev_roots[i].hpa;
> @@ -3557,7 +3557,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
>                         continue;
>
>                 if (!to_shadow_page(root_hpa) ||
> -                       to_shadow_page(root_hpa)->role.guest_mode)
> +                       to_shadow_page(root_hpa)->role.arch.guest_mode)
>                         roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
>         }
>
> @@ -3585,10 +3585,10 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
>         struct kvm_mmu_page *sp;
>
>         role.level = level;
> -       role.quadrant = quadrant;
> +       role.arch.quadrant = quadrant;
>
> -       WARN_ON_ONCE(quadrant && !role.has_4_byte_gpte);
> -       WARN_ON_ONCE(role.direct && role.has_4_byte_gpte);
> +       WARN_ON_ONCE(quadrant && !role.arch.has_4_byte_gpte);
> +       WARN_ON_ONCE(role.arch.direct && role.arch.has_4_byte_gpte);
>
>         sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
>         ++sp->root_count;
> @@ -3834,7 +3834,7 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
>          * equivalent level in the guest's NPT to shadow.  Allocate the tables
>          * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare.
>          */
> -       if (mmu->root_role.direct ||
> +       if (mmu->root_role.arch.direct ||
>             mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL ||
>             mmu->root_role.level < PT64_ROOT_4LEVEL)
>                 return 0;
> @@ -3932,7 +3932,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
>         int i;
>         struct kvm_mmu_page *sp;
>
> -       if (vcpu->arch.mmu->root_role.direct)
> +       if (vcpu->arch.mmu->root_role.arch.direct)
>                 return;
>
>         if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
> @@ -4161,7 +4161,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>
>         arch.token = alloc_apf_token(vcpu);
>         arch.gfn = gfn;
> -       arch.direct_map = vcpu->arch.mmu->root_role.direct;
> +       arch.direct_map = vcpu->arch.mmu->root_role.arch.direct;
>         arch.cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu);
>
>         return kvm_setup_async_pf(vcpu, cr2_or_gpa,
> @@ -4172,7 +4172,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>  {
>         int r;
>
> -       if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||
> +       if ((vcpu->arch.mmu->root_role.arch.direct != work->arch.direct_map) ||
>               work->wakeup_all)
>                 return;
>
> @@ -4180,7 +4180,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>         if (unlikely(r))
>                 return;
>
> -       if (!vcpu->arch.mmu->root_role.direct &&
> +       if (!vcpu->arch.mmu->root_role.arch.direct &&
>               work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu))
>                 return;
>
> @@ -4456,7 +4456,7 @@ static void nonpaging_init_context(struct kvm_mmu *context)
>  static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
>                                   union kvm_mmu_page_role role)
>  {
> -       return (role.direct || pgd == root->pgd) &&
> +       return (role.arch.direct || pgd == root->pgd) &&
>                VALID_PAGE(root->hpa) &&
>                role.word == to_shadow_page(root->hpa)->role.word;
>  }
> @@ -4576,7 +4576,7 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
>          * If this is a direct root page, it doesn't have a write flooding
>          * count. Otherwise, clear the write flooding count.
>          */
> -       if (!new_role.direct)
> +       if (!new_role.arch.direct)
>                 __clear_sp_write_flooding_count(
>                                 to_shadow_page(vcpu->arch.mmu->root.hpa));
>  }
> @@ -4803,7 +4803,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
>         shadow_zero_check = &context->shadow_zero_check;
>         __reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
>                                 context->root_role.level,
> -                               context->root_role.efer_nx,
> +                               context->root_role.arch.efer_nx,
>                                 guest_can_use_gbpages(vcpu), is_pse, is_amd);
>
>         if (!shadow_me_mask)
> @@ -5055,21 +5055,21 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
>  {
>         union kvm_cpu_role role = {0};
>
> -       role.base.access = ACC_ALL;
>         role.base.as_id = is_smm(vcpu);
> -       role.base.guest_mode = is_guest_mode(vcpu);
> +       role.base.arch.access = ACC_ALL;
> +       role.base.arch.guest_mode = is_guest_mode(vcpu);
>         role.ext.valid = 1;
>
>         if (!____is_cr0_pg(regs)) {
> -               role.base.direct = 1;
> +               role.base.arch.direct = 1;
>                 return role;
>         }
>
> -       role.base.efer_nx = ____is_efer_nx(regs);
> -       role.base.cr0_wp = ____is_cr0_wp(regs);
> -       role.base.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
> -       role.base.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
> -       role.base.has_4_byte_gpte = !____is_cr4_pae(regs);
> +       role.base.arch.efer_nx = ____is_efer_nx(regs);
> +       role.base.arch.cr0_wp = ____is_cr0_wp(regs);
> +       role.base.arch.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
> +       role.base.arch.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
> +       role.base.arch.has_4_byte_gpte = !____is_cr4_pae(regs);
>
>         if (____is_efer_lma(regs))
>                 role.base.level = ____is_cr4_la57(regs) ? PT64_ROOT_5LEVEL
> @@ -5109,15 +5109,15 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
>  {
>         union kvm_mmu_page_role role = {0};
>
> -       role.access = ACC_ALL;
> -       role.cr0_wp = true;
> -       role.efer_nx = true;
>         role.as_id = cpu_role.base.as_id;
> -       role.guest_mode = cpu_role.base.guest_mode;
> -       role.ad_disabled = !kvm_ad_enabled();
>         role.level = kvm_mmu_get_tdp_level(vcpu);
> -       role.direct = true;
> -       role.has_4_byte_gpte = false;
> +       role.arch.access = ACC_ALL;
> +       role.arch.cr0_wp = true;
> +       role.arch.efer_nx = true;
> +       role.arch.guest_mode = cpu_role.base.arch.guest_mode;
> +       role.arch.ad_disabled = !kvm_ad_enabled();
> +       role.arch.direct = true;
> +       role.arch.has_4_byte_gpte = false;
>
>         return role;
>  }
> @@ -5194,7 +5194,7 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
>          * NX can be used by any non-nested shadow MMU to avoid having to reset
>          * MMU contexts.
>          */
> -       root_role.efer_nx = true;
> +       root_role.arch.efer_nx = true;
>
>         shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
>  }
> @@ -5212,13 +5212,13 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
>         union kvm_mmu_page_role root_role;
>
>         /* NPT requires CR0.PG=1. */
> -       WARN_ON_ONCE(cpu_role.base.direct);
> +       WARN_ON_ONCE(cpu_role.base.arch.direct);
>
>         root_role = cpu_role.base;
>         root_role.level = kvm_mmu_get_tdp_level(vcpu);
>         if (root_role.level == PT64_ROOT_5LEVEL &&
>             cpu_role.base.level == PT64_ROOT_4LEVEL)
> -               root_role.passthrough = 1;
> +               root_role.arch.passthrough = 1;
>
>         shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
>         kvm_mmu_new_pgd(vcpu, nested_cr3);
> @@ -5237,11 +5237,11 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
>          */
>         WARN_ON_ONCE(is_smm(vcpu));
>         role.base.level = level;
> -       role.base.has_4_byte_gpte = false;
> -       role.base.direct = false;
> -       role.base.ad_disabled = !accessed_dirty;
> -       role.base.guest_mode = true;
> -       role.base.access = ACC_ALL;
> +       role.base.arch.has_4_byte_gpte = false;
> +       role.base.arch.direct = false;
> +       role.base.arch.ad_disabled = !accessed_dirty;
> +       role.base.arch.guest_mode = true;
> +       role.base.arch.access = ACC_ALL;
>
>         role.ext.word = 0;
>         role.ext.execonly = execonly;
> @@ -5385,13 +5385,13 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
>  {
>         int r;
>
> -       r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.direct);
> +       r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.arch.direct);
>         if (r)
>                 goto out;
>         r = mmu_alloc_special_roots(vcpu);
>         if (r)
>                 goto out;
> -       if (vcpu->arch.mmu->root_role.direct)
> +       if (vcpu->arch.mmu->root_role.arch.direct)
>                 r = mmu_alloc_direct_roots(vcpu);
>         else
>                 r = mmu_alloc_shadow_roots(vcpu);
> @@ -5526,7 +5526,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa,
>                  gpa, bytes, sp->role.word);
>
>         offset = offset_in_page(gpa);
> -       pte_size = sp->role.has_4_byte_gpte ? 4 : 8;
> +       pte_size = sp->role.arch.has_4_byte_gpte ? 4 : 8;
>
>         /*
>          * Sometimes, the OS only writes the last one bytes to update status
> @@ -5550,7 +5550,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
>         page_offset = offset_in_page(gpa);
>         level = sp->role.level;
>         *nspte = 1;
> -       if (sp->role.has_4_byte_gpte) {
> +       if (sp->role.arch.has_4_byte_gpte) {
>                 page_offset <<= 1;      /* 32->64 */
>                 /*
>                  * A 32-bit pde maps 4MB while the shadow pdes map
> @@ -5564,7 +5564,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
>                 }
>                 quadrant = page_offset >> PAGE_SHIFT;
>                 page_offset &= ~PAGE_MASK;
> -               if (quadrant != sp->role.quadrant)
> +               if (quadrant != sp->role.arch.quadrant)
>                         return NULL;
>         }
>
> @@ -5628,7 +5628,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>                        void *insn, int insn_len)
>  {
>         int r, emulation_type = EMULTYPE_PF;
> -       bool direct = vcpu->arch.mmu->root_role.direct;
> +       bool direct = vcpu->arch.mmu->root_role.arch.direct;
>
>         if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
>                 return RET_PF_RETRY;
> @@ -5659,7 +5659,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>          * paging in both guests. If true, we simply unprotect the page
>          * and resume the guest.
>          */
> -       if (vcpu->arch.mmu->root_role.direct &&
> +       if (vcpu->arch.mmu->root_role.arch.direct &&
>             (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) {
>                 kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa));
>                 return 1;
> @@ -6321,7 +6321,7 @@ static void shadow_mmu_split_huge_page(struct kvm *kvm,
>
>                 spte = make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
>                 mmu_spte_set(sptep, spte);
> -               __rmap_add(kvm, cache, slot, sptep, gfn, sp->role.access);
> +               __rmap_add(kvm, cache, slot, sptep, gfn, sp->role.arch.access);
>         }
>
>         __link_shadow_page(kvm, cache, huge_sptep, sp, flush);
> @@ -6380,7 +6380,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
>                 sp = sptep_to_sp(huge_sptep);
>
>                 /* TDP MMU is enabled, so rmap only contains nested MMU SPs. */
> -               if (WARN_ON_ONCE(!sp->role.guest_mode))
> +               if (WARN_ON_ONCE(!sp->role.arch.guest_mode))
>                         continue;
>
>                 /* The rmaps should never contain non-leaf SPTEs. */
> @@ -6502,7 +6502,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
>                  * the guest, and the guest page table is using 4K page size
>                  * mapping if the indirect sp has level = 1.
>                  */
> -               if (sp->role.direct &&
> +               if (sp->role.arch.direct &&
>                     sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
>                                                                PG_LEVEL_NUM)) {
>                         kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
> @@ -6942,7 +6942,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
>                                       struct kvm_mmu_page,
>                                       possible_nx_huge_page_link);
>                 WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
> -               WARN_ON_ONCE(!sp->role.direct);
> +               WARN_ON_ONCE(!sp->role.arch.direct);
>
>                 /*
>                  * Unaccount and do not attempt to recover any NX Huge Pages
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 5427f65117b4..c19a80fdeb8d 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -143,7 +143,7 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
>          * being enabled is mandatory as the bits used to denote WP-only SPTEs
>          * are reserved for PAE paging (32-bit KVM).
>          */
> -       return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
> +       return kvm_x86_ops.cpu_dirty_log_size && sp->role.arch.guest_mode;
>  }
>
>  int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
> @@ -270,7 +270,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>         };
>         int r;
>
> -       if (vcpu->arch.mmu->root_role.direct) {
> +       if (vcpu->arch.mmu->root_role.arch.direct) {
>                 fault.gfn = fault.addr >> PAGE_SHIFT;
>                 fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
>         }
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index ae86820cef69..6a4a43b90780 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -35,13 +35,13 @@
>                          " %snxe %sad root %u %s%c",                    \
>                          __entry->mmu_valid_gen,                        \
>                          __entry->gfn, role.level,                      \
> -                        role.has_4_byte_gpte ? 4 : 8,                  \
> -                        role.quadrant,                                 \
> -                        role.direct ? " direct" : "",                  \
> -                        access_str[role.access],                       \
> +                        role.arch.has_4_byte_gpte ? 4 : 8,                     \
> +                        role.arch.quadrant,                                    \
> +                        role.arch.direct ? " direct" : "",                     \
> +                        access_str[role.arch.access],                  \
>                          role.invalid ? " invalid" : "",                \
> -                        role.efer_nx ? "" : "!",                       \
> -                        role.ad_disabled ? "!" : "",                   \
> +                        role.arch.efer_nx ? "" : "!",                  \
> +                        role.arch.ad_disabled ? "!" : "",                      \
>                          __entry->root_count,                           \
>                          __entry->unsync ? "unsync" : "sync", 0);       \
>         saved_ptr;                                                      \
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index e5662dbd519c..e15ec1c473da 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -55,7 +55,7 @@
>         #define PT_LEVEL_BITS 9
>         #define PT_GUEST_DIRTY_SHIFT 9
>         #define PT_GUEST_ACCESSED_SHIFT 8
> -       #define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.ad_disabled)
> +       #define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.arch.ad_disabled)
>         #define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL
>  #else
>         #error Invalid PTTYPE value
> @@ -532,7 +532,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>         pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
>
>         gfn = gpte_to_gfn(gpte);
> -       pte_access = sp->role.access & FNAME(gpte_access)(gpte);
> +       pte_access = sp->role.arch.access & FNAME(gpte_access)(gpte);
>         FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
>
>         slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn,
> @@ -592,7 +592,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
>         if (unlikely(vcpu->kvm->mmu_invalidate_in_progress))
>                 return;
>
> -       if (sp->role.direct)
> +       if (sp->role.arch.direct)
>                 return __direct_pte_prefetch(vcpu, sp, sptep);
>
>         i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
> @@ -884,7 +884,7 @@ static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
>         WARN_ON(sp->role.level != PG_LEVEL_4K);
>
>         if (PTTYPE == 32)
> -               offset = sp->role.quadrant << SPTE_LEVEL_BITS;
> +               offset = sp->role.arch.quadrant << SPTE_LEVEL_BITS;
>
>         return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
>  }
> @@ -1003,9 +1003,11 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>          */
>         const union kvm_mmu_page_role sync_role_ign = {
>                 .level = 0xf,
> -               .access = 0x7,
> -               .quadrant = 0x3,
> -               .passthrough = 0x1,
> +               .arch = {
> +                       .access = 0x7,
> +                       .quadrant = 0x3,
> +                       .passthrough = 0x1,
> +               },
>         };
>
>         /*
> @@ -1014,7 +1016,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>          * differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
>          * reserved bits checks will be wrong, etc...
>          */
> -       if (WARN_ON_ONCE(sp->role.direct ||
> +       if (WARN_ON_ONCE(sp->role.arch.direct ||
>                          (sp->role.word ^ root_role.word) & ~sync_role_ign.word))
>                 return -1;
>
> @@ -1043,7 +1045,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>                 }
>
>                 gfn = gpte_to_gfn(gpte);
> -               pte_access = sp->role.access;
> +               pte_access = sp->role.arch.access;
>                 pte_access &= FNAME(gpte_access)(gpte);
>                 FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
>
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index c0fd7e049b4e..fe4b626cb431 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -146,7 +146,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>
>         WARN_ON_ONCE(!pte_access && !shadow_present_mask);
>
> -       if (sp->role.ad_disabled)
> +       if (sp->role.arch.ad_disabled)
>                 spte |= SPTE_TDP_AD_DISABLED_MASK;
>         else if (kvm_mmu_page_ad_need_write_protect(sp))
>                 spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
> @@ -301,7 +301,7 @@ u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte, union kvm_mmu_page
>                  * the page executable as the NX hugepage mitigation no longer
>                  * applies.
>                  */
> -               if ((role.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
> +               if ((role.arch.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
>                         child_spte = make_spte_executable(child_spte);
>         }
>
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index 1f03701b943a..ad84c549fe96 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -260,7 +260,7 @@ static inline bool kvm_ad_enabled(void)
>
>  static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
>  {
> -       return sp->role.ad_disabled;
> +       return sp->role.arch.ad_disabled;
>  }
>
>  static inline bool spte_ad_enabled(u64 spte)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b2da8c8f30a..2bfe060768fc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8442,7 +8442,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>             WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF)))
>                 return false;
>
> -       if (!vcpu->arch.mmu->root_role.direct) {
> +       if (!vcpu->arch.mmu->root_role.arch.direct) {
>                 /*
>                  * Write permission should be allowed since only
>                  * write access need to be emulated.
> @@ -8475,7 +8475,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>         kvm_release_pfn_clean(pfn);
>
>         /* The instructions are well-emulated on direct mmu. */
> -       if (vcpu->arch.mmu->root_role.direct) {
> +       if (vcpu->arch.mmu->root_role.arch.direct) {
>                 unsigned int indirect_shadow_pages;
>
>                 write_lock(&vcpu->kvm->mmu_lock);
> @@ -8543,7 +8543,7 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
>         vcpu->arch.last_retry_eip = ctxt->eip;
>         vcpu->arch.last_retry_addr = cr2_or_gpa;
>
> -       if (!vcpu->arch.mmu->root_role.direct)
> +       if (!vcpu->arch.mmu->root_role.arch.direct)
>                 gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);
>
>         kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
> @@ -8846,7 +8846,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>                 ctxt->exception.address = cr2_or_gpa;
>
>                 /* With shadow page tables, cr2 contains a GVA or nGPA. */
> -               if (vcpu->arch.mmu->root_role.direct) {
> +               if (vcpu->arch.mmu->root_role.arch.direct) {
>                         ctxt->gpa_available = true;
>                         ctxt->gpa_val = cr2_or_gpa;
>                 }
> diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> new file mode 100644
> index 000000000000..3f35a924e031
> --- /dev/null
> +++ b/include/kvm/mmu_types.h
> @@ -0,0 +1,37 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __KVM_MMU_TYPES_H
> +#define __KVM_MMU_TYPES_H
> +
> +#include <linux/bug.h>
> +#include <linux/types.h>
> +#include <linux/stddef.h>
> +
> +#include <asm/kvm/mmu_types.h>
> +
> +/*
> + * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> + * also includes TDP pages) to determine whether or not a page can be used in
> + * the given MMU context.
> + */
> +union kvm_mmu_page_role {
> +       u32 word;
> +       struct {
> +               struct {
> +                       /* The address space ID mapped by the page. */
> +                       u16 as_id:8;

We should either make this just 1 bit or preserve the comment saying
it's 8 bits to make it faster to load from memory. Otherwise folks
might think that as_id can use all 8 bits.
kvm_memory_slot has this as a full u16, so we're already unprepared to
express the full range there.

> +
> +                       /* The level of the page in the page table hierarchy. */
> +                       u16 level:4;
> +
> +                       /* Whether the page is invalid, i.e. pending destruction. */
> +                       u16 invalid:1;
> +               };
> +
> +               /* Architecture-specific properties. */
> +               struct kvm_mmu_page_role_arch arch;
> +       };
> +};
> +
> +static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page_role, word));
> +
> +#endif /* !__KVM_MMU_TYPES_H */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
@ 2022-12-12 17:48     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 17:48 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_mmu_page_role into common code, and move all
> x86-specific fields into a separate sub-struct within the role,
> kvm_mmu_page_role_arch.
>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  MAINTAINERS                          |   4 +-
>  arch/x86/include/asm/kvm/mmu_types.h |  56 ++++++++++
>  arch/x86/include/asm/kvm_host.h      |  68 +-----------
>  arch/x86/kvm/mmu/mmu.c               | 156 +++++++++++++--------------
>  arch/x86/kvm/mmu/mmu_internal.h      |   4 +-
>  arch/x86/kvm/mmu/mmutrace.h          |  12 +--
>  arch/x86/kvm/mmu/paging_tmpl.h       |  20 ++--
>  arch/x86/kvm/mmu/spte.c              |   4 +-
>  arch/x86/kvm/mmu/spte.h              |   2 +-
>  arch/x86/kvm/x86.c                   |   8 +-
>  include/kvm/mmu_types.h              |  37 +++++++
>  11 files changed, 202 insertions(+), 169 deletions(-)
>  create mode 100644 arch/x86/include/asm/kvm/mmu_types.h
>  create mode 100644 include/kvm/mmu_types.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 89672a59c0c3..7e586d7ba78c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11198,7 +11198,8 @@ W:      http://www.linux-kvm.org
>  T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
>  F:     Documentation/virt/kvm/
>  F:     include/asm-generic/kvm*
> -F:     include/kvm/iodev.h
> +F:     include/kvm/
> +X:     include/kvm/arm_*
>  F:     include/linux/kvm*
>  F:     include/trace/events/kvm.h
>  F:     include/uapi/asm-generic/kvm*
> @@ -11285,6 +11286,7 @@ L:      kvm@vger.kernel.org
>  S:     Supported
>  T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
>  F:     arch/x86/include/asm/kvm*
> +F:     arch/x86/include/asm/kvm/
>  F:     arch/x86/include/asm/svm.h
>  F:     arch/x86/include/asm/vmx*.h
>  F:     arch/x86/include/uapi/asm/kvm*
> diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
> new file mode 100644
> index 000000000000..35f893ebab5a
> --- /dev/null
> +++ b/arch/x86/include/asm/kvm/mmu_types.h
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_KVM_MMU_TYPES_H
> +#define __ASM_KVM_MMU_TYPES_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * This is a subset of the overall kvm_cpu_role to minimize the size of
> + * kvm_memory_slot.arch.gfn_track, i.e. allows allocating 2 bytes per gfn
> + * instead of 4 bytes per gfn.
> + *
> + * Upper-level shadow pages having gptes are tracked for write-protection via
> + * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
> + * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
> + * gfn_track will overflow and explosions will ensure.
> + *
> + * A unique shadow page (SP) for a gfn is created if and only if an existing SP
> + * cannot be reused.  The ability to reuse a SP is tracked by its role, which
> + * incorporates various mode bits and properties of the SP.  Roughly speaking,
> + * the number of unique SPs that can theoretically be created is 2^n, where n
> + * is the number of bits that are used to compute the role.
> + *
> + * Note, not all combinations of modes and flags below are possible:
> + *
> + *   - invalid shadow pages are not accounted, so the bits are effectively 18
> + *
> + *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
> + *     execonly and ad_disabled are only used for nested EPT which has
> + *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
> + *
> + *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
> + *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
> + *     paging has exactly one upper level, making level completely redundant
> + *     when has_4_byte_gpte=1.
> + *
> + *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
> + *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
> + *
> + * Therefore, the maximum number of possible upper-level shadow pages for a
> + * single gfn is a bit less than 2^13.
> + */
> +struct kvm_mmu_page_role_arch {
> +       u16 has_4_byte_gpte:1;
> +       u16 quadrant:2;
> +       u16 direct:1;
> +       u16 access:3;
> +       u16 efer_nx:1;
> +       u16 cr0_wp:1;
> +       u16 smep_andnot_wp:1;
> +       u16 smap_andnot_wp:1;
> +       u16 ad_disabled:1;
> +       u16 guest_mode:1;
> +       u16 passthrough:1;
> +};
> +
> +#endif /* !__ASM_KVM_MMU_TYPES_H */
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0a819d40131a..ebcd7a0dabef 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -37,6 +37,8 @@
>  #include <asm/kvm_vcpu_regs.h>
>  #include <asm/hyperv-tlfs.h>
>
> +#include <kvm/mmu_types.h>
> +
>  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>
>  #define KVM_MAX_VCPUS 1024
> @@ -286,72 +288,6 @@ enum x86_intercept_stage;
>
>  struct kvm_kernel_irq_routing_entry;
>
> -/*
> - * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> - * also includes TDP pages) to determine whether or not a page can be used in
> - * the given MMU context.  This is a subset of the overall kvm_cpu_role to
> - * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
> - * 2 bytes per gfn instead of 4 bytes per gfn.
> - *
> - * Upper-level shadow pages having gptes are tracked for write-protection via
> - * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
> - * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
> - * gfn_track will overflow and explosions will ensure.
> - *
> - * A unique shadow page (SP) for a gfn is created if and only if an existing SP
> - * cannot be reused.  The ability to reuse a SP is tracked by its role, which
> - * incorporates various mode bits and properties of the SP.  Roughly speaking,
> - * the number of unique SPs that can theoretically be created is 2^n, where n
> - * is the number of bits that are used to compute the role.
> - *
> - * But, even though there are 19 bits in the mask below, not all combinations
> - * of modes and flags are possible:
> - *
> - *   - invalid shadow pages are not accounted, so the bits are effectively 18
> - *
> - *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
> - *     execonly and ad_disabled are only used for nested EPT which has
> - *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
> - *
> - *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
> - *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
> - *     paging has exactly one upper level, making level completely redundant
> - *     when has_4_byte_gpte=1.
> - *
> - *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
> - *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
> - *
> - * Therefore, the maximum number of possible upper-level shadow pages for a
> - * single gfn is a bit less than 2^13.
> - */
> -union kvm_mmu_page_role {
> -       u32 word;
> -       struct {
> -               unsigned level:4;
> -               unsigned has_4_byte_gpte:1;
> -               unsigned quadrant:2;
> -               unsigned direct:1;
> -               unsigned access:3;
> -               unsigned invalid:1;
> -               unsigned efer_nx:1;
> -               unsigned cr0_wp:1;
> -               unsigned smep_andnot_wp:1;
> -               unsigned smap_andnot_wp:1;
> -               unsigned ad_disabled:1;
> -               unsigned guest_mode:1;
> -               unsigned passthrough:1;
> -               unsigned :5;
> -
> -               /*
> -                * This is left at the top of the word so that
> -                * kvm_memslots_for_spte_role can extract it with a
> -                * simple shift.  While there is room, give it a whole
> -                * byte so it is also faster to load it from memory.
> -                */
> -               unsigned as_id:8;
> -       };
> -};
> -
>  /*
>   * kvm_mmu_extended_role complements kvm_mmu_page_role, tracking properties
>   * relevant to the current MMU configuration.   When loading CR0, CR4, or EFER,
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index f375b719f565..355548603960 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -210,13 +210,13 @@ static inline bool __maybe_unused is_##reg##_##name(struct kvm_mmu *mmu)  \
>  {                                                              \
>         return !!(mmu->cpu_role. base_or_ext . reg##_##name);   \
>  }
> -BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
> +BUILD_MMU_ROLE_ACCESSOR(base.arch, cr0, wp);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pse);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smep);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smap);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
> -BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
> +BUILD_MMU_ROLE_ACCESSOR(base.arch, efer, nx);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  efer, lma);
>
>  static inline bool is_cr0_pg(struct kvm_mmu *mmu)
> @@ -226,7 +226,7 @@ static inline bool is_cr0_pg(struct kvm_mmu *mmu)
>
>  static inline bool is_cr4_pae(struct kvm_mmu *mmu)
>  {
> -        return !mmu->cpu_role.base.has_4_byte_gpte;
> +       return !mmu->cpu_role.base.arch.has_4_byte_gpte;
>  }
>
>  static struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
> @@ -618,7 +618,7 @@ static bool mmu_spte_age(u64 *sptep)
>
>  static inline bool is_tdp_mmu_active(struct kvm_vcpu *vcpu)
>  {
> -       return tdp_mmu_enabled && vcpu->arch.mmu->root_role.direct;
> +       return tdp_mmu_enabled && vcpu->arch.mmu->root_role.arch.direct;
>  }
>
>  static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
> @@ -695,10 +695,10 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp);
>
>  static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>  {
> -       if (sp->role.passthrough)
> +       if (sp->role.arch.passthrough)
>                 return sp->gfn;
>
> -       if (!sp->role.direct)
> +       if (!sp->role.arch.direct)
>                 return sp->shadowed_translation[index] >> PAGE_SHIFT;
>
>         return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
> @@ -727,7 +727,7 @@ static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
>          *
>          * In both cases, sp->role.access contains the correct access bits.
>          */
> -       return sp->role.access;
> +       return sp->role.arch.access;
>  }
>
>  static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
> @@ -739,14 +739,14 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
>         }
>
>         WARN_ONCE(access != kvm_mmu_page_get_access(sp, index),
> -                 "access mismatch under %s page %llx (expected %u, got %u)\n",
> -                 sp->role.passthrough ? "passthrough" : "direct",
> -                 sp->gfn, kvm_mmu_page_get_access(sp, index), access);
> +                 "access mismatch under %s page %llx (expected %u, got %u)\n",
> +                 sp->role.arch.passthrough ? "passthrough" : "direct",
> +                 sp->gfn, kvm_mmu_page_get_access(sp, index), access);
>
>         WARN_ONCE(gfn != kvm_mmu_page_get_gfn(sp, index),
> -                 "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
> -                 sp->role.passthrough ? "passthrough" : "direct",
> -                 sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
> +                 "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
> +                 sp->role.arch.passthrough ? "passthrough" : "direct",
> +                 sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
>  }
>
>  static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
> @@ -1723,7 +1723,7 @@ static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
>         hlist_del(&sp->hash_link);
>         list_del(&sp->link);
>         free_page((unsigned long)sp->spt);
> -       if (!sp->role.direct)
> +       if (!sp->role.arch.direct)
>                 free_page((unsigned long)sp->shadowed_translation);
>         kmem_cache_free(mmu_page_header_cache, sp);
>  }
> @@ -1884,10 +1884,10 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
>
>  static bool sp_has_gptes(struct kvm_mmu_page *sp)
>  {
> -       if (sp->role.direct)
> +       if (sp->role.arch.direct)
>                 return false;
>
> -       if (sp->role.passthrough)
> +       if (sp->role.arch.passthrough)
>                 return false;
>
>         return true;
> @@ -2065,7 +2065,7 @@ static void clear_sp_write_flooding_count(u64 *spte)
>   * The vCPU is required when finding indirect shadow pages; the shadow
>   * page may already exist and syncing it needs the vCPU pointer in
>   * order to read guest page tables.  Direct shadow pages are never
> - * unsync, thus @vcpu can be NULL if @role.direct is true.
> + * unsync, thus @vcpu can be NULL if @role.arch.direct is true.
>   */
>  static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                                                      struct kvm_vcpu *vcpu,
> @@ -2101,7 +2101,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                 }
>
>                 /* unsync and write-flooding only apply to indirect SPs. */
> -               if (sp->role.direct)
> +               if (sp->role.arch.direct)
>                         goto out;
>
>                 if (sp->unsync) {
> @@ -2162,7 +2162,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>
>         sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
>         sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
> -       if (!role.direct)
> +       if (!role.arch.direct)
>                 sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
>
>         set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
> @@ -2187,7 +2187,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>         return sp;
>  }
>
> -/* Note, @vcpu may be NULL if @role.direct is true; see kvm_mmu_find_shadow_page. */
> +/* Note, @vcpu may be NULL if @role.arch.direct is true; see kvm_mmu_find_shadow_page. */
>  static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
>                                                       struct kvm_vcpu *vcpu,
>                                                       struct shadow_page_caches *caches,
> @@ -2231,9 +2231,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
>
>         role = parent_sp->role;
>         role.level--;
> -       role.access = access;
> -       role.direct = direct;
> -       role.passthrough = 0;
> +       role.arch.access = access;
> +       role.arch.direct = direct;
> +       role.arch.passthrough = 0;
>
>         /*
>          * If the guest has 4-byte PTEs then that means it's using 32-bit,
> @@ -2261,9 +2261,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
>          * covers bit 21 (see above), thus the quadrant is calculated from the
>          * _least_ significant bit of the PDE index.
>          */
> -       if (role.has_4_byte_gpte) {
> +       if (role.arch.has_4_byte_gpte) {
>                 WARN_ON_ONCE(role.level != PG_LEVEL_4K);
> -               role.quadrant = spte_index(sptep) & 1;
> +               role.arch.quadrant = spte_index(sptep) & 1;
>         }
>
>         return role;
> @@ -2292,7 +2292,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
>
>         if (iterator->level >= PT64_ROOT_4LEVEL &&
>             vcpu->arch.mmu->cpu_role.base.level < PT64_ROOT_4LEVEL &&
> -           !vcpu->arch.mmu->root_role.direct)
> +           !vcpu->arch.mmu->root_role.arch.direct)
>                 iterator->level = PT32E_ROOT_LEVEL;
>
>         if (iterator->level == PT32E_ROOT_LEVEL) {
> @@ -2391,7 +2391,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>                  * a new sp with the correct access.
>                  */
>                 child = spte_to_child_sp(*sptep);
> -               if (child->role.access == direct_access)
> +               if (child->role.arch.access == direct_access)
>                         return;
>
>                 drop_parent_pte(child, sptep);
> @@ -2420,7 +2420,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
>                          * avoids retaining a large number of stale nested SPs.
>                          */
>                         if (tdp_enabled && invalid_list &&
> -                           child->role.guest_mode && !child->parent_ptes.val)
> +                           child->role.arch.guest_mode && !child->parent_ptes.val)
>                                 return kvm_mmu_prepare_zap_page(kvm, child,
>                                                                 invalid_list);
>                 }
> @@ -2689,7 +2689,7 @@ static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
>         gpa_t gpa;
>         int r;
>
> -       if (vcpu->arch.mmu->root_role.direct)
> +       if (vcpu->arch.mmu->root_role.arch.direct)
>                 return 0;
>
>         gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
> @@ -2900,7 +2900,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
>  {
>         struct page *pages[PTE_PREFETCH_NUM];
>         struct kvm_memory_slot *slot;
> -       unsigned int access = sp->role.access;
> +       unsigned int access = sp->role.arch.access;
>         int i, ret;
>         gfn_t gfn;
>
> @@ -2928,7 +2928,7 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
>         u64 *spte, *start = NULL;
>         int i;
>
> -       WARN_ON(!sp->role.direct);
> +       WARN_ON(!sp->role.arch.direct);
>
>         i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
>         spte = sp->spt + i;
> @@ -3549,7 +3549,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
>          * This should not be called while L2 is active, L2 can't invalidate
>          * _only_ its own roots, e.g. INVVPID unconditionally exits.
>          */
> -       WARN_ON_ONCE(mmu->root_role.guest_mode);
> +       WARN_ON_ONCE(mmu->root_role.arch.guest_mode);
>
>         for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
>                 root_hpa = mmu->prev_roots[i].hpa;
> @@ -3557,7 +3557,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
>                         continue;
>
>                 if (!to_shadow_page(root_hpa) ||
> -                       to_shadow_page(root_hpa)->role.guest_mode)
> +                       to_shadow_page(root_hpa)->role.arch.guest_mode)
>                         roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
>         }
>
> @@ -3585,10 +3585,10 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
>         struct kvm_mmu_page *sp;
>
>         role.level = level;
> -       role.quadrant = quadrant;
> +       role.arch.quadrant = quadrant;
>
> -       WARN_ON_ONCE(quadrant && !role.has_4_byte_gpte);
> -       WARN_ON_ONCE(role.direct && role.has_4_byte_gpte);
> +       WARN_ON_ONCE(quadrant && !role.arch.has_4_byte_gpte);
> +       WARN_ON_ONCE(role.arch.direct && role.arch.has_4_byte_gpte);
>
>         sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
>         ++sp->root_count;
> @@ -3834,7 +3834,7 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
>          * equivalent level in the guest's NPT to shadow.  Allocate the tables
>          * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare.
>          */
> -       if (mmu->root_role.direct ||
> +       if (mmu->root_role.arch.direct ||
>             mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL ||
>             mmu->root_role.level < PT64_ROOT_4LEVEL)
>                 return 0;
> @@ -3932,7 +3932,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
>         int i;
>         struct kvm_mmu_page *sp;
>
> -       if (vcpu->arch.mmu->root_role.direct)
> +       if (vcpu->arch.mmu->root_role.arch.direct)
>                 return;
>
>         if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
> @@ -4161,7 +4161,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>
>         arch.token = alloc_apf_token(vcpu);
>         arch.gfn = gfn;
> -       arch.direct_map = vcpu->arch.mmu->root_role.direct;
> +       arch.direct_map = vcpu->arch.mmu->root_role.arch.direct;
>         arch.cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu);
>
>         return kvm_setup_async_pf(vcpu, cr2_or_gpa,
> @@ -4172,7 +4172,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>  {
>         int r;
>
> -       if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||
> +       if ((vcpu->arch.mmu->root_role.arch.direct != work->arch.direct_map) ||
>               work->wakeup_all)
>                 return;
>
> @@ -4180,7 +4180,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>         if (unlikely(r))
>                 return;
>
> -       if (!vcpu->arch.mmu->root_role.direct &&
> +       if (!vcpu->arch.mmu->root_role.arch.direct &&
>               work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu))
>                 return;
>
> @@ -4456,7 +4456,7 @@ static void nonpaging_init_context(struct kvm_mmu *context)
>  static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
>                                   union kvm_mmu_page_role role)
>  {
> -       return (role.direct || pgd == root->pgd) &&
> +       return (role.arch.direct || pgd == root->pgd) &&
>                VALID_PAGE(root->hpa) &&
>                role.word == to_shadow_page(root->hpa)->role.word;
>  }
> @@ -4576,7 +4576,7 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
>          * If this is a direct root page, it doesn't have a write flooding
>          * count. Otherwise, clear the write flooding count.
>          */
> -       if (!new_role.direct)
> +       if (!new_role.arch.direct)
>                 __clear_sp_write_flooding_count(
>                                 to_shadow_page(vcpu->arch.mmu->root.hpa));
>  }
> @@ -4803,7 +4803,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
>         shadow_zero_check = &context->shadow_zero_check;
>         __reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
>                                 context->root_role.level,
> -                               context->root_role.efer_nx,
> +                               context->root_role.arch.efer_nx,
>                                 guest_can_use_gbpages(vcpu), is_pse, is_amd);
>
>         if (!shadow_me_mask)
> @@ -5055,21 +5055,21 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
>  {
>         union kvm_cpu_role role = {0};
>
> -       role.base.access = ACC_ALL;
>         role.base.as_id = is_smm(vcpu);
> -       role.base.guest_mode = is_guest_mode(vcpu);
> +       role.base.arch.access = ACC_ALL;
> +       role.base.arch.guest_mode = is_guest_mode(vcpu);
>         role.ext.valid = 1;
>
>         if (!____is_cr0_pg(regs)) {
> -               role.base.direct = 1;
> +               role.base.arch.direct = 1;
>                 return role;
>         }
>
> -       role.base.efer_nx = ____is_efer_nx(regs);
> -       role.base.cr0_wp = ____is_cr0_wp(regs);
> -       role.base.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
> -       role.base.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
> -       role.base.has_4_byte_gpte = !____is_cr4_pae(regs);
> +       role.base.arch.efer_nx = ____is_efer_nx(regs);
> +       role.base.arch.cr0_wp = ____is_cr0_wp(regs);
> +       role.base.arch.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
> +       role.base.arch.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
> +       role.base.arch.has_4_byte_gpte = !____is_cr4_pae(regs);
>
>         if (____is_efer_lma(regs))
>                 role.base.level = ____is_cr4_la57(regs) ? PT64_ROOT_5LEVEL
> @@ -5109,15 +5109,15 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
>  {
>         union kvm_mmu_page_role role = {0};
>
> -       role.access = ACC_ALL;
> -       role.cr0_wp = true;
> -       role.efer_nx = true;
>         role.as_id = cpu_role.base.as_id;
> -       role.guest_mode = cpu_role.base.guest_mode;
> -       role.ad_disabled = !kvm_ad_enabled();
>         role.level = kvm_mmu_get_tdp_level(vcpu);
> -       role.direct = true;
> -       role.has_4_byte_gpte = false;
> +       role.arch.access = ACC_ALL;
> +       role.arch.cr0_wp = true;
> +       role.arch.efer_nx = true;
> +       role.arch.guest_mode = cpu_role.base.arch.guest_mode;
> +       role.arch.ad_disabled = !kvm_ad_enabled();
> +       role.arch.direct = true;
> +       role.arch.has_4_byte_gpte = false;
>
>         return role;
>  }
> @@ -5194,7 +5194,7 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
>          * NX can be used by any non-nested shadow MMU to avoid having to reset
>          * MMU contexts.
>          */
> -       root_role.efer_nx = true;
> +       root_role.arch.efer_nx = true;
>
>         shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
>  }
> @@ -5212,13 +5212,13 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
>         union kvm_mmu_page_role root_role;
>
>         /* NPT requires CR0.PG=1. */
> -       WARN_ON_ONCE(cpu_role.base.direct);
> +       WARN_ON_ONCE(cpu_role.base.arch.direct);
>
>         root_role = cpu_role.base;
>         root_role.level = kvm_mmu_get_tdp_level(vcpu);
>         if (root_role.level == PT64_ROOT_5LEVEL &&
>             cpu_role.base.level == PT64_ROOT_4LEVEL)
> -               root_role.passthrough = 1;
> +               root_role.arch.passthrough = 1;
>
>         shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
>         kvm_mmu_new_pgd(vcpu, nested_cr3);
> @@ -5237,11 +5237,11 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
>          */
>         WARN_ON_ONCE(is_smm(vcpu));
>         role.base.level = level;
> -       role.base.has_4_byte_gpte = false;
> -       role.base.direct = false;
> -       role.base.ad_disabled = !accessed_dirty;
> -       role.base.guest_mode = true;
> -       role.base.access = ACC_ALL;
> +       role.base.arch.has_4_byte_gpte = false;
> +       role.base.arch.direct = false;
> +       role.base.arch.ad_disabled = !accessed_dirty;
> +       role.base.arch.guest_mode = true;
> +       role.base.arch.access = ACC_ALL;
>
>         role.ext.word = 0;
>         role.ext.execonly = execonly;
> @@ -5385,13 +5385,13 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
>  {
>         int r;
>
> -       r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.direct);
> +       r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.arch.direct);
>         if (r)
>                 goto out;
>         r = mmu_alloc_special_roots(vcpu);
>         if (r)
>                 goto out;
> -       if (vcpu->arch.mmu->root_role.direct)
> +       if (vcpu->arch.mmu->root_role.arch.direct)
>                 r = mmu_alloc_direct_roots(vcpu);
>         else
>                 r = mmu_alloc_shadow_roots(vcpu);
> @@ -5526,7 +5526,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa,
>                  gpa, bytes, sp->role.word);
>
>         offset = offset_in_page(gpa);
> -       pte_size = sp->role.has_4_byte_gpte ? 4 : 8;
> +       pte_size = sp->role.arch.has_4_byte_gpte ? 4 : 8;
>
>         /*
>          * Sometimes, the OS only writes the last one bytes to update status
> @@ -5550,7 +5550,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
>         page_offset = offset_in_page(gpa);
>         level = sp->role.level;
>         *nspte = 1;
> -       if (sp->role.has_4_byte_gpte) {
> +       if (sp->role.arch.has_4_byte_gpte) {
>                 page_offset <<= 1;      /* 32->64 */
>                 /*
>                  * A 32-bit pde maps 4MB while the shadow pdes map
> @@ -5564,7 +5564,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
>                 }
>                 quadrant = page_offset >> PAGE_SHIFT;
>                 page_offset &= ~PAGE_MASK;
> -               if (quadrant != sp->role.quadrant)
> +               if (quadrant != sp->role.arch.quadrant)
>                         return NULL;
>         }
>
> @@ -5628,7 +5628,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>                        void *insn, int insn_len)
>  {
>         int r, emulation_type = EMULTYPE_PF;
> -       bool direct = vcpu->arch.mmu->root_role.direct;
> +       bool direct = vcpu->arch.mmu->root_role.arch.direct;
>
>         if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
>                 return RET_PF_RETRY;
> @@ -5659,7 +5659,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>          * paging in both guests. If true, we simply unprotect the page
>          * and resume the guest.
>          */
> -       if (vcpu->arch.mmu->root_role.direct &&
> +       if (vcpu->arch.mmu->root_role.arch.direct &&
>             (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) {
>                 kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa));
>                 return 1;
> @@ -6321,7 +6321,7 @@ static void shadow_mmu_split_huge_page(struct kvm *kvm,
>
>                 spte = make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
>                 mmu_spte_set(sptep, spte);
> -               __rmap_add(kvm, cache, slot, sptep, gfn, sp->role.access);
> +               __rmap_add(kvm, cache, slot, sptep, gfn, sp->role.arch.access);
>         }
>
>         __link_shadow_page(kvm, cache, huge_sptep, sp, flush);
> @@ -6380,7 +6380,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
>                 sp = sptep_to_sp(huge_sptep);
>
>                 /* TDP MMU is enabled, so rmap only contains nested MMU SPs. */
> -               if (WARN_ON_ONCE(!sp->role.guest_mode))
> +               if (WARN_ON_ONCE(!sp->role.arch.guest_mode))
>                         continue;
>
>                 /* The rmaps should never contain non-leaf SPTEs. */
> @@ -6502,7 +6502,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
>                  * the guest, and the guest page table is using 4K page size
>                  * mapping if the indirect sp has level = 1.
>                  */
> -               if (sp->role.direct &&
> +               if (sp->role.arch.direct &&
>                     sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
>                                                                PG_LEVEL_NUM)) {
>                         kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
> @@ -6942,7 +6942,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
>                                       struct kvm_mmu_page,
>                                       possible_nx_huge_page_link);
>                 WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
> -               WARN_ON_ONCE(!sp->role.direct);
> +               WARN_ON_ONCE(!sp->role.arch.direct);
>
>                 /*
>                  * Unaccount and do not attempt to recover any NX Huge Pages
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 5427f65117b4..c19a80fdeb8d 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -143,7 +143,7 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
>          * being enabled is mandatory as the bits used to denote WP-only SPTEs
>          * are reserved for PAE paging (32-bit KVM).
>          */
> -       return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
> +       return kvm_x86_ops.cpu_dirty_log_size && sp->role.arch.guest_mode;
>  }
>
>  int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
> @@ -270,7 +270,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>         };
>         int r;
>
> -       if (vcpu->arch.mmu->root_role.direct) {
> +       if (vcpu->arch.mmu->root_role.arch.direct) {
>                 fault.gfn = fault.addr >> PAGE_SHIFT;
>                 fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
>         }
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index ae86820cef69..6a4a43b90780 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -35,13 +35,13 @@
>                          " %snxe %sad root %u %s%c",                    \
>                          __entry->mmu_valid_gen,                        \
>                          __entry->gfn, role.level,                      \
> -                        role.has_4_byte_gpte ? 4 : 8,                  \
> -                        role.quadrant,                                 \
> -                        role.direct ? " direct" : "",                  \
> -                        access_str[role.access],                       \
> +                        role.arch.has_4_byte_gpte ? 4 : 8,                     \
> +                        role.arch.quadrant,                                    \
> +                        role.arch.direct ? " direct" : "",                     \
> +                        access_str[role.arch.access],                  \
>                          role.invalid ? " invalid" : "",                \
> -                        role.efer_nx ? "" : "!",                       \
> -                        role.ad_disabled ? "!" : "",                   \
> +                        role.arch.efer_nx ? "" : "!",                  \
> +                        role.arch.ad_disabled ? "!" : "",                      \
>                          __entry->root_count,                           \
>                          __entry->unsync ? "unsync" : "sync", 0);       \
>         saved_ptr;                                                      \
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index e5662dbd519c..e15ec1c473da 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -55,7 +55,7 @@
>         #define PT_LEVEL_BITS 9
>         #define PT_GUEST_DIRTY_SHIFT 9
>         #define PT_GUEST_ACCESSED_SHIFT 8
> -       #define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.ad_disabled)
> +       #define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.arch.ad_disabled)
>         #define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL
>  #else
>         #error Invalid PTTYPE value
> @@ -532,7 +532,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>         pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
>
>         gfn = gpte_to_gfn(gpte);
> -       pte_access = sp->role.access & FNAME(gpte_access)(gpte);
> +       pte_access = sp->role.arch.access & FNAME(gpte_access)(gpte);
>         FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
>
>         slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn,
> @@ -592,7 +592,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
>         if (unlikely(vcpu->kvm->mmu_invalidate_in_progress))
>                 return;
>
> -       if (sp->role.direct)
> +       if (sp->role.arch.direct)
>                 return __direct_pte_prefetch(vcpu, sp, sptep);
>
>         i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
> @@ -884,7 +884,7 @@ static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
>         WARN_ON(sp->role.level != PG_LEVEL_4K);
>
>         if (PTTYPE == 32)
> -               offset = sp->role.quadrant << SPTE_LEVEL_BITS;
> +               offset = sp->role.arch.quadrant << SPTE_LEVEL_BITS;
>
>         return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
>  }
> @@ -1003,9 +1003,11 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>          */
>         const union kvm_mmu_page_role sync_role_ign = {
>                 .level = 0xf,
> -               .access = 0x7,
> -               .quadrant = 0x3,
> -               .passthrough = 0x1,
> +               .arch = {
> +                       .access = 0x7,
> +                       .quadrant = 0x3,
> +                       .passthrough = 0x1,
> +               },
>         };
>
>         /*
> @@ -1014,7 +1016,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>          * differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
>          * reserved bits checks will be wrong, etc...
>          */
> -       if (WARN_ON_ONCE(sp->role.direct ||
> +       if (WARN_ON_ONCE(sp->role.arch.direct ||
>                          (sp->role.word ^ root_role.word) & ~sync_role_ign.word))
>                 return -1;
>
> @@ -1043,7 +1045,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>                 }
>
>                 gfn = gpte_to_gfn(gpte);
> -               pte_access = sp->role.access;
> +               pte_access = sp->role.arch.access;
>                 pte_access &= FNAME(gpte_access)(gpte);
>                 FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
>
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index c0fd7e049b4e..fe4b626cb431 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -146,7 +146,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>
>         WARN_ON_ONCE(!pte_access && !shadow_present_mask);
>
> -       if (sp->role.ad_disabled)
> +       if (sp->role.arch.ad_disabled)
>                 spte |= SPTE_TDP_AD_DISABLED_MASK;
>         else if (kvm_mmu_page_ad_need_write_protect(sp))
>                 spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
> @@ -301,7 +301,7 @@ u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte, union kvm_mmu_page
>                  * the page executable as the NX hugepage mitigation no longer
>                  * applies.
>                  */
> -               if ((role.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
> +               if ((role.arch.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
>                         child_spte = make_spte_executable(child_spte);
>         }
>
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index 1f03701b943a..ad84c549fe96 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -260,7 +260,7 @@ static inline bool kvm_ad_enabled(void)
>
>  static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
>  {
> -       return sp->role.ad_disabled;
> +       return sp->role.arch.ad_disabled;
>  }
>
>  static inline bool spte_ad_enabled(u64 spte)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b2da8c8f30a..2bfe060768fc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8442,7 +8442,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>             WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF)))
>                 return false;
>
> -       if (!vcpu->arch.mmu->root_role.direct) {
> +       if (!vcpu->arch.mmu->root_role.arch.direct) {
>                 /*
>                  * Write permission should be allowed since only
>                  * write access need to be emulated.
> @@ -8475,7 +8475,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>         kvm_release_pfn_clean(pfn);
>
>         /* The instructions are well-emulated on direct mmu. */
> -       if (vcpu->arch.mmu->root_role.direct) {
> +       if (vcpu->arch.mmu->root_role.arch.direct) {
>                 unsigned int indirect_shadow_pages;
>
>                 write_lock(&vcpu->kvm->mmu_lock);
> @@ -8543,7 +8543,7 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
>         vcpu->arch.last_retry_eip = ctxt->eip;
>         vcpu->arch.last_retry_addr = cr2_or_gpa;
>
> -       if (!vcpu->arch.mmu->root_role.direct)
> +       if (!vcpu->arch.mmu->root_role.arch.direct)
>                 gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);
>
>         kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
> @@ -8846,7 +8846,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>                 ctxt->exception.address = cr2_or_gpa;
>
>                 /* With shadow page tables, cr2 contains a GVA or nGPA. */
> -               if (vcpu->arch.mmu->root_role.direct) {
> +               if (vcpu->arch.mmu->root_role.arch.direct) {
>                         ctxt->gpa_available = true;
>                         ctxt->gpa_val = cr2_or_gpa;
>                 }
> diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> new file mode 100644
> index 000000000000..3f35a924e031
> --- /dev/null
> +++ b/include/kvm/mmu_types.h
> @@ -0,0 +1,37 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __KVM_MMU_TYPES_H
> +#define __KVM_MMU_TYPES_H
> +
> +#include <linux/bug.h>
> +#include <linux/types.h>
> +#include <linux/stddef.h>
> +
> +#include <asm/kvm/mmu_types.h>
> +
> +/*
> + * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> + * also includes TDP pages) to determine whether or not a page can be used in
> + * the given MMU context.
> + */
> +union kvm_mmu_page_role {
> +       u32 word;
> +       struct {
> +               struct {
> +                       /* The address space ID mapped by the page. */
> +                       u16 as_id:8;

We should either make this just 1 bit or preserve the comment saying
it's 8 bits to make it faster to load from memory. Otherwise folks
might think that as_id can use all 8 bits.
kvm_memory_slot has this as a full u16, so we're already unprepared to
express the full range there.

> +
> +                       /* The level of the page in the page table hierarchy. */
> +                       u16 level:4;
> +
> +                       /* Whether the page is invalid, i.e. pending destruction. */
> +                       u16 invalid:1;
> +               };
> +
> +               /* Architecture-specific properties. */
> +               struct kvm_mmu_page_role_arch arch;
> +       };
> +};
> +
> +static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page_role, word));
> +
> +#endif /* !__KVM_MMU_TYPES_H */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
@ 2022-12-12 17:48     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 17:48 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, linux-riscv, kvmarm, Yu Zhao, Marc Zyngier,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Paolo Bonzini,
	Andrew Morton

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_mmu_page_role into common code, and move all
> x86-specific fields into a separate sub-struct within the role,
> kvm_mmu_page_role_arch.
>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  MAINTAINERS                          |   4 +-
>  arch/x86/include/asm/kvm/mmu_types.h |  56 ++++++++++
>  arch/x86/include/asm/kvm_host.h      |  68 +-----------
>  arch/x86/kvm/mmu/mmu.c               | 156 +++++++++++++--------------
>  arch/x86/kvm/mmu/mmu_internal.h      |   4 +-
>  arch/x86/kvm/mmu/mmutrace.h          |  12 +--
>  arch/x86/kvm/mmu/paging_tmpl.h       |  20 ++--
>  arch/x86/kvm/mmu/spte.c              |   4 +-
>  arch/x86/kvm/mmu/spte.h              |   2 +-
>  arch/x86/kvm/x86.c                   |   8 +-
>  include/kvm/mmu_types.h              |  37 +++++++
>  11 files changed, 202 insertions(+), 169 deletions(-)
>  create mode 100644 arch/x86/include/asm/kvm/mmu_types.h
>  create mode 100644 include/kvm/mmu_types.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 89672a59c0c3..7e586d7ba78c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11198,7 +11198,8 @@ W:      http://www.linux-kvm.org
>  T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
>  F:     Documentation/virt/kvm/
>  F:     include/asm-generic/kvm*
> -F:     include/kvm/iodev.h
> +F:     include/kvm/
> +X:     include/kvm/arm_*
>  F:     include/linux/kvm*
>  F:     include/trace/events/kvm.h
>  F:     include/uapi/asm-generic/kvm*
> @@ -11285,6 +11286,7 @@ L:      kvm@vger.kernel.org
>  S:     Supported
>  T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
>  F:     arch/x86/include/asm/kvm*
> +F:     arch/x86/include/asm/kvm/
>  F:     arch/x86/include/asm/svm.h
>  F:     arch/x86/include/asm/vmx*.h
>  F:     arch/x86/include/uapi/asm/kvm*
> diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
> new file mode 100644
> index 000000000000..35f893ebab5a
> --- /dev/null
> +++ b/arch/x86/include/asm/kvm/mmu_types.h
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_KVM_MMU_TYPES_H
> +#define __ASM_KVM_MMU_TYPES_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * This is a subset of the overall kvm_cpu_role to minimize the size of
> + * kvm_memory_slot.arch.gfn_track, i.e. allows allocating 2 bytes per gfn
> + * instead of 4 bytes per gfn.
> + *
> + * Upper-level shadow pages having gptes are tracked for write-protection via
> + * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
> + * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
> + * gfn_track will overflow and explosions will ensure.
> + *
> + * A unique shadow page (SP) for a gfn is created if and only if an existing SP
> + * cannot be reused.  The ability to reuse a SP is tracked by its role, which
> + * incorporates various mode bits and properties of the SP.  Roughly speaking,
> + * the number of unique SPs that can theoretically be created is 2^n, where n
> + * is the number of bits that are used to compute the role.
> + *
> + * Note, not all combinations of modes and flags below are possible:
> + *
> + *   - invalid shadow pages are not accounted, so the bits are effectively 18
> + *
> + *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
> + *     execonly and ad_disabled are only used for nested EPT which has
> + *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
> + *
> + *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
> + *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
> + *     paging has exactly one upper level, making level completely redundant
> + *     when has_4_byte_gpte=1.
> + *
> + *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
> + *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
> + *
> + * Therefore, the maximum number of possible upper-level shadow pages for a
> + * single gfn is a bit less than 2^13.
> + */
> +struct kvm_mmu_page_role_arch {
> +       u16 has_4_byte_gpte:1;
> +       u16 quadrant:2;
> +       u16 direct:1;
> +       u16 access:3;
> +       u16 efer_nx:1;
> +       u16 cr0_wp:1;
> +       u16 smep_andnot_wp:1;
> +       u16 smap_andnot_wp:1;
> +       u16 ad_disabled:1;
> +       u16 guest_mode:1;
> +       u16 passthrough:1;
> +};
> +
> +#endif /* !__ASM_KVM_MMU_TYPES_H */
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0a819d40131a..ebcd7a0dabef 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -37,6 +37,8 @@
>  #include <asm/kvm_vcpu_regs.h>
>  #include <asm/hyperv-tlfs.h>
>
> +#include <kvm/mmu_types.h>
> +
>  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>
>  #define KVM_MAX_VCPUS 1024
> @@ -286,72 +288,6 @@ enum x86_intercept_stage;
>
>  struct kvm_kernel_irq_routing_entry;
>
> -/*
> - * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> - * also includes TDP pages) to determine whether or not a page can be used in
> - * the given MMU context.  This is a subset of the overall kvm_cpu_role to
> - * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
> - * 2 bytes per gfn instead of 4 bytes per gfn.
> - *
> - * Upper-level shadow pages having gptes are tracked for write-protection via
> - * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
> - * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
> - * gfn_track will overflow and explosions will ensure.
> - *
> - * A unique shadow page (SP) for a gfn is created if and only if an existing SP
> - * cannot be reused.  The ability to reuse a SP is tracked by its role, which
> - * incorporates various mode bits and properties of the SP.  Roughly speaking,
> - * the number of unique SPs that can theoretically be created is 2^n, where n
> - * is the number of bits that are used to compute the role.
> - *
> - * But, even though there are 19 bits in the mask below, not all combinations
> - * of modes and flags are possible:
> - *
> - *   - invalid shadow pages are not accounted, so the bits are effectively 18
> - *
> - *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
> - *     execonly and ad_disabled are only used for nested EPT which has
> - *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
> - *
> - *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
> - *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
> - *     paging has exactly one upper level, making level completely redundant
> - *     when has_4_byte_gpte=1.
> - *
> - *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
> - *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
> - *
> - * Therefore, the maximum number of possible upper-level shadow pages for a
> - * single gfn is a bit less than 2^13.
> - */
> -union kvm_mmu_page_role {
> -       u32 word;
> -       struct {
> -               unsigned level:4;
> -               unsigned has_4_byte_gpte:1;
> -               unsigned quadrant:2;
> -               unsigned direct:1;
> -               unsigned access:3;
> -               unsigned invalid:1;
> -               unsigned efer_nx:1;
> -               unsigned cr0_wp:1;
> -               unsigned smep_andnot_wp:1;
> -               unsigned smap_andnot_wp:1;
> -               unsigned ad_disabled:1;
> -               unsigned guest_mode:1;
> -               unsigned passthrough:1;
> -               unsigned :5;
> -
> -               /*
> -                * This is left at the top of the word so that
> -                * kvm_memslots_for_spte_role can extract it with a
> -                * simple shift.  While there is room, give it a whole
> -                * byte so it is also faster to load it from memory.
> -                */
> -               unsigned as_id:8;
> -       };
> -};
> -
>  /*
>   * kvm_mmu_extended_role complements kvm_mmu_page_role, tracking properties
>   * relevant to the current MMU configuration.   When loading CR0, CR4, or EFER,
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index f375b719f565..355548603960 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -210,13 +210,13 @@ static inline bool __maybe_unused is_##reg##_##name(struct kvm_mmu *mmu)  \
>  {                                                              \
>         return !!(mmu->cpu_role. base_or_ext . reg##_##name);   \
>  }
> -BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
> +BUILD_MMU_ROLE_ACCESSOR(base.arch, cr0, wp);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pse);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smep);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smap);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
> -BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
> +BUILD_MMU_ROLE_ACCESSOR(base.arch, efer, nx);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  efer, lma);
>
>  static inline bool is_cr0_pg(struct kvm_mmu *mmu)
> @@ -226,7 +226,7 @@ static inline bool is_cr0_pg(struct kvm_mmu *mmu)
>
>  static inline bool is_cr4_pae(struct kvm_mmu *mmu)
>  {
> -        return !mmu->cpu_role.base.has_4_byte_gpte;
> +       return !mmu->cpu_role.base.arch.has_4_byte_gpte;
>  }
>
>  static struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
> @@ -618,7 +618,7 @@ static bool mmu_spte_age(u64 *sptep)
>
>  static inline bool is_tdp_mmu_active(struct kvm_vcpu *vcpu)
>  {
> -       return tdp_mmu_enabled && vcpu->arch.mmu->root_role.direct;
> +       return tdp_mmu_enabled && vcpu->arch.mmu->root_role.arch.direct;
>  }
>
>  static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
> @@ -695,10 +695,10 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp);
>
>  static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>  {
> -       if (sp->role.passthrough)
> +       if (sp->role.arch.passthrough)
>                 return sp->gfn;
>
> -       if (!sp->role.direct)
> +       if (!sp->role.arch.direct)
>                 return sp->shadowed_translation[index] >> PAGE_SHIFT;
>
>         return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
> @@ -727,7 +727,7 @@ static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
>          *
>          * In both cases, sp->role.access contains the correct access bits.
>          */
> -       return sp->role.access;
> +       return sp->role.arch.access;
>  }
>
>  static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
> @@ -739,14 +739,14 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
>         }
>
>         WARN_ONCE(access != kvm_mmu_page_get_access(sp, index),
> -                 "access mismatch under %s page %llx (expected %u, got %u)\n",
> -                 sp->role.passthrough ? "passthrough" : "direct",
> -                 sp->gfn, kvm_mmu_page_get_access(sp, index), access);
> +                 "access mismatch under %s page %llx (expected %u, got %u)\n",
> +                 sp->role.arch.passthrough ? "passthrough" : "direct",
> +                 sp->gfn, kvm_mmu_page_get_access(sp, index), access);
>
>         WARN_ONCE(gfn != kvm_mmu_page_get_gfn(sp, index),
> -                 "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
> -                 sp->role.passthrough ? "passthrough" : "direct",
> -                 sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
> +                 "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
> +                 sp->role.arch.passthrough ? "passthrough" : "direct",
> +                 sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
>  }
>
>  static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
> @@ -1723,7 +1723,7 @@ static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
>         hlist_del(&sp->hash_link);
>         list_del(&sp->link);
>         free_page((unsigned long)sp->spt);
> -       if (!sp->role.direct)
> +       if (!sp->role.arch.direct)
>                 free_page((unsigned long)sp->shadowed_translation);
>         kmem_cache_free(mmu_page_header_cache, sp);
>  }
> @@ -1884,10 +1884,10 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
>
>  static bool sp_has_gptes(struct kvm_mmu_page *sp)
>  {
> -       if (sp->role.direct)
> +       if (sp->role.arch.direct)
>                 return false;
>
> -       if (sp->role.passthrough)
> +       if (sp->role.arch.passthrough)
>                 return false;
>
>         return true;
> @@ -2065,7 +2065,7 @@ static void clear_sp_write_flooding_count(u64 *spte)
>   * The vCPU is required when finding indirect shadow pages; the shadow
>   * page may already exist and syncing it needs the vCPU pointer in
>   * order to read guest page tables.  Direct shadow pages are never
> - * unsync, thus @vcpu can be NULL if @role.direct is true.
> + * unsync, thus @vcpu can be NULL if @role.arch.direct is true.
>   */
>  static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                                                      struct kvm_vcpu *vcpu,
> @@ -2101,7 +2101,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                 }
>
>                 /* unsync and write-flooding only apply to indirect SPs. */
> -               if (sp->role.direct)
> +               if (sp->role.arch.direct)
>                         goto out;
>
>                 if (sp->unsync) {
> @@ -2162,7 +2162,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>
>         sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
>         sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
> -       if (!role.direct)
> +       if (!role.arch.direct)
>                 sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
>
>         set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
> @@ -2187,7 +2187,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>         return sp;
>  }
>
> -/* Note, @vcpu may be NULL if @role.direct is true; see kvm_mmu_find_shadow_page. */
> +/* Note, @vcpu may be NULL if @role.arch.direct is true; see kvm_mmu_find_shadow_page. */
>  static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
>                                                       struct kvm_vcpu *vcpu,
>                                                       struct shadow_page_caches *caches,
> @@ -2231,9 +2231,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
>
>         role = parent_sp->role;
>         role.level--;
> -       role.access = access;
> -       role.direct = direct;
> -       role.passthrough = 0;
> +       role.arch.access = access;
> +       role.arch.direct = direct;
> +       role.arch.passthrough = 0;
>
>         /*
>          * If the guest has 4-byte PTEs then that means it's using 32-bit,
> @@ -2261,9 +2261,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
>          * covers bit 21 (see above), thus the quadrant is calculated from the
>          * _least_ significant bit of the PDE index.
>          */
> -       if (role.has_4_byte_gpte) {
> +       if (role.arch.has_4_byte_gpte) {
>                 WARN_ON_ONCE(role.level != PG_LEVEL_4K);
> -               role.quadrant = spte_index(sptep) & 1;
> +               role.arch.quadrant = spte_index(sptep) & 1;
>         }
>
>         return role;
> @@ -2292,7 +2292,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
>
>         if (iterator->level >= PT64_ROOT_4LEVEL &&
>             vcpu->arch.mmu->cpu_role.base.level < PT64_ROOT_4LEVEL &&
> -           !vcpu->arch.mmu->root_role.direct)
> +           !vcpu->arch.mmu->root_role.arch.direct)
>                 iterator->level = PT32E_ROOT_LEVEL;
>
>         if (iterator->level == PT32E_ROOT_LEVEL) {
> @@ -2391,7 +2391,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>                  * a new sp with the correct access.
>                  */
>                 child = spte_to_child_sp(*sptep);
> -               if (child->role.access == direct_access)
> +               if (child->role.arch.access == direct_access)
>                         return;
>
>                 drop_parent_pte(child, sptep);
> @@ -2420,7 +2420,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
>                          * avoids retaining a large number of stale nested SPs.
>                          */
>                         if (tdp_enabled && invalid_list &&
> -                           child->role.guest_mode && !child->parent_ptes.val)
> +                           child->role.arch.guest_mode && !child->parent_ptes.val)
>                                 return kvm_mmu_prepare_zap_page(kvm, child,
>                                                                 invalid_list);
>                 }
> @@ -2689,7 +2689,7 @@ static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
>         gpa_t gpa;
>         int r;
>
> -       if (vcpu->arch.mmu->root_role.direct)
> +       if (vcpu->arch.mmu->root_role.arch.direct)
>                 return 0;
>
>         gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
> @@ -2900,7 +2900,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
>  {
>         struct page *pages[PTE_PREFETCH_NUM];
>         struct kvm_memory_slot *slot;
> -       unsigned int access = sp->role.access;
> +       unsigned int access = sp->role.arch.access;
>         int i, ret;
>         gfn_t gfn;
>
> @@ -2928,7 +2928,7 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
>         u64 *spte, *start = NULL;
>         int i;
>
> -       WARN_ON(!sp->role.direct);
> +       WARN_ON(!sp->role.arch.direct);
>
>         i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
>         spte = sp->spt + i;
> @@ -3549,7 +3549,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
>          * This should not be called while L2 is active, L2 can't invalidate
>          * _only_ its own roots, e.g. INVVPID unconditionally exits.
>          */
> -       WARN_ON_ONCE(mmu->root_role.guest_mode);
> +       WARN_ON_ONCE(mmu->root_role.arch.guest_mode);
>
>         for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
>                 root_hpa = mmu->prev_roots[i].hpa;
> @@ -3557,7 +3557,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
>                         continue;
>
>                 if (!to_shadow_page(root_hpa) ||
> -                       to_shadow_page(root_hpa)->role.guest_mode)
> +                       to_shadow_page(root_hpa)->role.arch.guest_mode)
>                         roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
>         }
>
> @@ -3585,10 +3585,10 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
>         struct kvm_mmu_page *sp;
>
>         role.level = level;
> -       role.quadrant = quadrant;
> +       role.arch.quadrant = quadrant;
>
> -       WARN_ON_ONCE(quadrant && !role.has_4_byte_gpte);
> -       WARN_ON_ONCE(role.direct && role.has_4_byte_gpte);
> +       WARN_ON_ONCE(quadrant && !role.arch.has_4_byte_gpte);
> +       WARN_ON_ONCE(role.arch.direct && role.arch.has_4_byte_gpte);
>
>         sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
>         ++sp->root_count;
> @@ -3834,7 +3834,7 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
>          * equivalent level in the guest's NPT to shadow.  Allocate the tables
>          * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare.
>          */
> -       if (mmu->root_role.direct ||
> +       if (mmu->root_role.arch.direct ||
>             mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL ||
>             mmu->root_role.level < PT64_ROOT_4LEVEL)
>                 return 0;
> @@ -3932,7 +3932,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
>         int i;
>         struct kvm_mmu_page *sp;
>
> -       if (vcpu->arch.mmu->root_role.direct)
> +       if (vcpu->arch.mmu->root_role.arch.direct)
>                 return;
>
>         if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
> @@ -4161,7 +4161,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>
>         arch.token = alloc_apf_token(vcpu);
>         arch.gfn = gfn;
> -       arch.direct_map = vcpu->arch.mmu->root_role.direct;
> +       arch.direct_map = vcpu->arch.mmu->root_role.arch.direct;
>         arch.cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu);
>
>         return kvm_setup_async_pf(vcpu, cr2_or_gpa,
> @@ -4172,7 +4172,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>  {
>         int r;
>
> -       if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||
> +       if ((vcpu->arch.mmu->root_role.arch.direct != work->arch.direct_map) ||
>               work->wakeup_all)
>                 return;
>
> @@ -4180,7 +4180,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>         if (unlikely(r))
>                 return;
>
> -       if (!vcpu->arch.mmu->root_role.direct &&
> +       if (!vcpu->arch.mmu->root_role.arch.direct &&
>               work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu))
>                 return;
>
> @@ -4456,7 +4456,7 @@ static void nonpaging_init_context(struct kvm_mmu *context)
>  static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
>                                   union kvm_mmu_page_role role)
>  {
> -       return (role.direct || pgd == root->pgd) &&
> +       return (role.arch.direct || pgd == root->pgd) &&
>                VALID_PAGE(root->hpa) &&
>                role.word == to_shadow_page(root->hpa)->role.word;
>  }
> @@ -4576,7 +4576,7 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
>          * If this is a direct root page, it doesn't have a write flooding
>          * count. Otherwise, clear the write flooding count.
>          */
> -       if (!new_role.direct)
> +       if (!new_role.arch.direct)
>                 __clear_sp_write_flooding_count(
>                                 to_shadow_page(vcpu->arch.mmu->root.hpa));
>  }
> @@ -4803,7 +4803,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
>         shadow_zero_check = &context->shadow_zero_check;
>         __reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
>                                 context->root_role.level,
> -                               context->root_role.efer_nx,
> +                               context->root_role.arch.efer_nx,
>                                 guest_can_use_gbpages(vcpu), is_pse, is_amd);
>
>         if (!shadow_me_mask)
> @@ -5055,21 +5055,21 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
>  {
>         union kvm_cpu_role role = {0};
>
> -       role.base.access = ACC_ALL;
>         role.base.as_id = is_smm(vcpu);
> -       role.base.guest_mode = is_guest_mode(vcpu);
> +       role.base.arch.access = ACC_ALL;
> +       role.base.arch.guest_mode = is_guest_mode(vcpu);
>         role.ext.valid = 1;
>
>         if (!____is_cr0_pg(regs)) {
> -               role.base.direct = 1;
> +               role.base.arch.direct = 1;
>                 return role;
>         }
>
> -       role.base.efer_nx = ____is_efer_nx(regs);
> -       role.base.cr0_wp = ____is_cr0_wp(regs);
> -       role.base.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
> -       role.base.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
> -       role.base.has_4_byte_gpte = !____is_cr4_pae(regs);
> +       role.base.arch.efer_nx = ____is_efer_nx(regs);
> +       role.base.arch.cr0_wp = ____is_cr0_wp(regs);
> +       role.base.arch.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
> +       role.base.arch.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
> +       role.base.arch.has_4_byte_gpte = !____is_cr4_pae(regs);
>
>         if (____is_efer_lma(regs))
>                 role.base.level = ____is_cr4_la57(regs) ? PT64_ROOT_5LEVEL
> @@ -5109,15 +5109,15 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
>  {
>         union kvm_mmu_page_role role = {0};
>
> -       role.access = ACC_ALL;
> -       role.cr0_wp = true;
> -       role.efer_nx = true;
>         role.as_id = cpu_role.base.as_id;
> -       role.guest_mode = cpu_role.base.guest_mode;
> -       role.ad_disabled = !kvm_ad_enabled();
>         role.level = kvm_mmu_get_tdp_level(vcpu);
> -       role.direct = true;
> -       role.has_4_byte_gpte = false;
> +       role.arch.access = ACC_ALL;
> +       role.arch.cr0_wp = true;
> +       role.arch.efer_nx = true;
> +       role.arch.guest_mode = cpu_role.base.arch.guest_mode;
> +       role.arch.ad_disabled = !kvm_ad_enabled();
> +       role.arch.direct = true;
> +       role.arch.has_4_byte_gpte = false;
>
>         return role;
>  }
> @@ -5194,7 +5194,7 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
>          * NX can be used by any non-nested shadow MMU to avoid having to reset
>          * MMU contexts.
>          */
> -       root_role.efer_nx = true;
> +       root_role.arch.efer_nx = true;
>
>         shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
>  }
> @@ -5212,13 +5212,13 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
>         union kvm_mmu_page_role root_role;
>
>         /* NPT requires CR0.PG=1. */
> -       WARN_ON_ONCE(cpu_role.base.direct);
> +       WARN_ON_ONCE(cpu_role.base.arch.direct);
>
>         root_role = cpu_role.base;
>         root_role.level = kvm_mmu_get_tdp_level(vcpu);
>         if (root_role.level == PT64_ROOT_5LEVEL &&
>             cpu_role.base.level == PT64_ROOT_4LEVEL)
> -               root_role.passthrough = 1;
> +               root_role.arch.passthrough = 1;
>
>         shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
>         kvm_mmu_new_pgd(vcpu, nested_cr3);
> @@ -5237,11 +5237,11 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
>          */
>         WARN_ON_ONCE(is_smm(vcpu));
>         role.base.level = level;
> -       role.base.has_4_byte_gpte = false;
> -       role.base.direct = false;
> -       role.base.ad_disabled = !accessed_dirty;
> -       role.base.guest_mode = true;
> -       role.base.access = ACC_ALL;
> +       role.base.arch.has_4_byte_gpte = false;
> +       role.base.arch.direct = false;
> +       role.base.arch.ad_disabled = !accessed_dirty;
> +       role.base.arch.guest_mode = true;
> +       role.base.arch.access = ACC_ALL;
>
>         role.ext.word = 0;
>         role.ext.execonly = execonly;
> @@ -5385,13 +5385,13 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
>  {
>         int r;
>
> -       r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.direct);
> +       r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.arch.direct);
>         if (r)
>                 goto out;
>         r = mmu_alloc_special_roots(vcpu);
>         if (r)
>                 goto out;
> -       if (vcpu->arch.mmu->root_role.direct)
> +       if (vcpu->arch.mmu->root_role.arch.direct)
>                 r = mmu_alloc_direct_roots(vcpu);
>         else
>                 r = mmu_alloc_shadow_roots(vcpu);
> @@ -5526,7 +5526,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa,
>                  gpa, bytes, sp->role.word);
>
>         offset = offset_in_page(gpa);
> -       pte_size = sp->role.has_4_byte_gpte ? 4 : 8;
> +       pte_size = sp->role.arch.has_4_byte_gpte ? 4 : 8;
>
>         /*
>          * Sometimes, the OS only writes the last one bytes to update status
> @@ -5550,7 +5550,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
>         page_offset = offset_in_page(gpa);
>         level = sp->role.level;
>         *nspte = 1;
> -       if (sp->role.has_4_byte_gpte) {
> +       if (sp->role.arch.has_4_byte_gpte) {
>                 page_offset <<= 1;      /* 32->64 */
>                 /*
>                  * A 32-bit pde maps 4MB while the shadow pdes map
> @@ -5564,7 +5564,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
>                 }
>                 quadrant = page_offset >> PAGE_SHIFT;
>                 page_offset &= ~PAGE_MASK;
> -               if (quadrant != sp->role.quadrant)
> +               if (quadrant != sp->role.arch.quadrant)
>                         return NULL;
>         }
>
> @@ -5628,7 +5628,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>                        void *insn, int insn_len)
>  {
>         int r, emulation_type = EMULTYPE_PF;
> -       bool direct = vcpu->arch.mmu->root_role.direct;
> +       bool direct = vcpu->arch.mmu->root_role.arch.direct;
>
>         if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
>                 return RET_PF_RETRY;
> @@ -5659,7 +5659,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>          * paging in both guests. If true, we simply unprotect the page
>          * and resume the guest.
>          */
> -       if (vcpu->arch.mmu->root_role.direct &&
> +       if (vcpu->arch.mmu->root_role.arch.direct &&
>             (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) {
>                 kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa));
>                 return 1;
> @@ -6321,7 +6321,7 @@ static void shadow_mmu_split_huge_page(struct kvm *kvm,
>
>                 spte = make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
>                 mmu_spte_set(sptep, spte);
> -               __rmap_add(kvm, cache, slot, sptep, gfn, sp->role.access);
> +               __rmap_add(kvm, cache, slot, sptep, gfn, sp->role.arch.access);
>         }
>
>         __link_shadow_page(kvm, cache, huge_sptep, sp, flush);
> @@ -6380,7 +6380,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
>                 sp = sptep_to_sp(huge_sptep);
>
>                 /* TDP MMU is enabled, so rmap only contains nested MMU SPs. */
> -               if (WARN_ON_ONCE(!sp->role.guest_mode))
> +               if (WARN_ON_ONCE(!sp->role.arch.guest_mode))
>                         continue;
>
>                 /* The rmaps should never contain non-leaf SPTEs. */
> @@ -6502,7 +6502,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
>                  * the guest, and the guest page table is using 4K page size
>                  * mapping if the indirect sp has level = 1.
>                  */
> -               if (sp->role.direct &&
> +               if (sp->role.arch.direct &&
>                     sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
>                                                                PG_LEVEL_NUM)) {
>                         kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
> @@ -6942,7 +6942,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
>                                       struct kvm_mmu_page,
>                                       possible_nx_huge_page_link);
>                 WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
> -               WARN_ON_ONCE(!sp->role.direct);
> +               WARN_ON_ONCE(!sp->role.arch.direct);
>
>                 /*
>                  * Unaccount and do not attempt to recover any NX Huge Pages
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 5427f65117b4..c19a80fdeb8d 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -143,7 +143,7 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
>          * being enabled is mandatory as the bits used to denote WP-only SPTEs
>          * are reserved for PAE paging (32-bit KVM).
>          */
> -       return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
> +       return kvm_x86_ops.cpu_dirty_log_size && sp->role.arch.guest_mode;
>  }
>
>  int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
> @@ -270,7 +270,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>         };
>         int r;
>
> -       if (vcpu->arch.mmu->root_role.direct) {
> +       if (vcpu->arch.mmu->root_role.arch.direct) {
>                 fault.gfn = fault.addr >> PAGE_SHIFT;
>                 fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
>         }
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index ae86820cef69..6a4a43b90780 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -35,13 +35,13 @@
>                          " %snxe %sad root %u %s%c",                    \
>                          __entry->mmu_valid_gen,                        \
>                          __entry->gfn, role.level,                      \
> -                        role.has_4_byte_gpte ? 4 : 8,                  \
> -                        role.quadrant,                                 \
> -                        role.direct ? " direct" : "",                  \
> -                        access_str[role.access],                       \
> +                        role.arch.has_4_byte_gpte ? 4 : 8,                     \
> +                        role.arch.quadrant,                                    \
> +                        role.arch.direct ? " direct" : "",                     \
> +                        access_str[role.arch.access],                  \
>                          role.invalid ? " invalid" : "",                \
> -                        role.efer_nx ? "" : "!",                       \
> -                        role.ad_disabled ? "!" : "",                   \
> +                        role.arch.efer_nx ? "" : "!",                  \
> +                        role.arch.ad_disabled ? "!" : "",                      \
>                          __entry->root_count,                           \
>                          __entry->unsync ? "unsync" : "sync", 0);       \
>         saved_ptr;                                                      \
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index e5662dbd519c..e15ec1c473da 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -55,7 +55,7 @@
>         #define PT_LEVEL_BITS 9
>         #define PT_GUEST_DIRTY_SHIFT 9
>         #define PT_GUEST_ACCESSED_SHIFT 8
> -       #define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.ad_disabled)
> +       #define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.arch.ad_disabled)
>         #define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL
>  #else
>         #error Invalid PTTYPE value
> @@ -532,7 +532,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>         pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
>
>         gfn = gpte_to_gfn(gpte);
> -       pte_access = sp->role.access & FNAME(gpte_access)(gpte);
> +       pte_access = sp->role.arch.access & FNAME(gpte_access)(gpte);
>         FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
>
>         slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn,
> @@ -592,7 +592,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
>         if (unlikely(vcpu->kvm->mmu_invalidate_in_progress))
>                 return;
>
> -       if (sp->role.direct)
> +       if (sp->role.arch.direct)
>                 return __direct_pte_prefetch(vcpu, sp, sptep);
>
>         i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
> @@ -884,7 +884,7 @@ static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
>         WARN_ON(sp->role.level != PG_LEVEL_4K);
>
>         if (PTTYPE == 32)
> -               offset = sp->role.quadrant << SPTE_LEVEL_BITS;
> +               offset = sp->role.arch.quadrant << SPTE_LEVEL_BITS;
>
>         return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
>  }
> @@ -1003,9 +1003,11 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>          */
>         const union kvm_mmu_page_role sync_role_ign = {
>                 .level = 0xf,
> -               .access = 0x7,
> -               .quadrant = 0x3,
> -               .passthrough = 0x1,
> +               .arch = {
> +                       .access = 0x7,
> +                       .quadrant = 0x3,
> +                       .passthrough = 0x1,
> +               },
>         };
>
>         /*
> @@ -1014,7 +1016,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>          * differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
>          * reserved bits checks will be wrong, etc...
>          */
> -       if (WARN_ON_ONCE(sp->role.direct ||
> +       if (WARN_ON_ONCE(sp->role.arch.direct ||
>                          (sp->role.word ^ root_role.word) & ~sync_role_ign.word))
>                 return -1;
>
> @@ -1043,7 +1045,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>                 }
>
>                 gfn = gpte_to_gfn(gpte);
> -               pte_access = sp->role.access;
> +               pte_access = sp->role.arch.access;
>                 pte_access &= FNAME(gpte_access)(gpte);
>                 FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
>
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index c0fd7e049b4e..fe4b626cb431 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -146,7 +146,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>
>         WARN_ON_ONCE(!pte_access && !shadow_present_mask);
>
> -       if (sp->role.ad_disabled)
> +       if (sp->role.arch.ad_disabled)
>                 spte |= SPTE_TDP_AD_DISABLED_MASK;
>         else if (kvm_mmu_page_ad_need_write_protect(sp))
>                 spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
> @@ -301,7 +301,7 @@ u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte, union kvm_mmu_page
>                  * the page executable as the NX hugepage mitigation no longer
>                  * applies.
>                  */
> -               if ((role.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
> +               if ((role.arch.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
>                         child_spte = make_spte_executable(child_spte);
>         }
>
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index 1f03701b943a..ad84c549fe96 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -260,7 +260,7 @@ static inline bool kvm_ad_enabled(void)
>
>  static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
>  {
> -       return sp->role.ad_disabled;
> +       return sp->role.arch.ad_disabled;
>  }
>
>  static inline bool spte_ad_enabled(u64 spte)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b2da8c8f30a..2bfe060768fc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8442,7 +8442,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>             WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF)))
>                 return false;
>
> -       if (!vcpu->arch.mmu->root_role.direct) {
> +       if (!vcpu->arch.mmu->root_role.arch.direct) {
>                 /*
>                  * Write permission should be allowed since only
>                  * write access need to be emulated.
> @@ -8475,7 +8475,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>         kvm_release_pfn_clean(pfn);
>
>         /* The instructions are well-emulated on direct mmu. */
> -       if (vcpu->arch.mmu->root_role.direct) {
> +       if (vcpu->arch.mmu->root_role.arch.direct) {
>                 unsigned int indirect_shadow_pages;
>
>                 write_lock(&vcpu->kvm->mmu_lock);
> @@ -8543,7 +8543,7 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
>         vcpu->arch.last_retry_eip = ctxt->eip;
>         vcpu->arch.last_retry_addr = cr2_or_gpa;
>
> -       if (!vcpu->arch.mmu->root_role.direct)
> +       if (!vcpu->arch.mmu->root_role.arch.direct)
>                 gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);
>
>         kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
> @@ -8846,7 +8846,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>                 ctxt->exception.address = cr2_or_gpa;
>
>                 /* With shadow page tables, cr2 contains a GVA or nGPA. */
> -               if (vcpu->arch.mmu->root_role.direct) {
> +               if (vcpu->arch.mmu->root_role.arch.direct) {
>                         ctxt->gpa_available = true;
>                         ctxt->gpa_val = cr2_or_gpa;
>                 }
> diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> new file mode 100644
> index 000000000000..3f35a924e031
> --- /dev/null
> +++ b/include/kvm/mmu_types.h
> @@ -0,0 +1,37 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __KVM_MMU_TYPES_H
> +#define __KVM_MMU_TYPES_H
> +
> +#include <linux/bug.h>
> +#include <linux/types.h>
> +#include <linux/stddef.h>
> +
> +#include <asm/kvm/mmu_types.h>
> +
> +/*
> + * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> + * also includes TDP pages) to determine whether or not a page can be used in
> + * the given MMU context.
> + */
> +union kvm_mmu_page_role {
> +       u32 word;
> +       struct {
> +               struct {
> +                       /* The address space ID mapped by the page. */
> +                       u16 as_id:8;

We should either make this just 1 bit or preserve the comment saying
it's 8 bits to make it faster to load from memory. Otherwise folks
might think that as_id can use all 8 bits.
kvm_memory_slot has this as a full u16, so we're already unprepared to
express the full range there.

> +
> +                       /* The level of the page in the page table hierarchy. */
> +                       u16 level:4;
> +
> +                       /* Whether the page is invalid, i.e. pending destruction. */
> +                       u16 invalid:1;
> +               };
> +
> +               /* Architecture-specific properties. */
> +               struct kvm_mmu_page_role_arch arch;
> +       };
> +};
> +
> +static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page_role, word));
> +
> +#endif /* !__KVM_MMU_TYPES_H */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
@ 2022-12-12 17:48     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 17:48 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_mmu_page_role into common code, and move all
> x86-specific fields into a separate sub-struct within the role,
> kvm_mmu_page_role_arch.
>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  MAINTAINERS                          |   4 +-
>  arch/x86/include/asm/kvm/mmu_types.h |  56 ++++++++++
>  arch/x86/include/asm/kvm_host.h      |  68 +-----------
>  arch/x86/kvm/mmu/mmu.c               | 156 +++++++++++++--------------
>  arch/x86/kvm/mmu/mmu_internal.h      |   4 +-
>  arch/x86/kvm/mmu/mmutrace.h          |  12 +--
>  arch/x86/kvm/mmu/paging_tmpl.h       |  20 ++--
>  arch/x86/kvm/mmu/spte.c              |   4 +-
>  arch/x86/kvm/mmu/spte.h              |   2 +-
>  arch/x86/kvm/x86.c                   |   8 +-
>  include/kvm/mmu_types.h              |  37 +++++++
>  11 files changed, 202 insertions(+), 169 deletions(-)
>  create mode 100644 arch/x86/include/asm/kvm/mmu_types.h
>  create mode 100644 include/kvm/mmu_types.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 89672a59c0c3..7e586d7ba78c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11198,7 +11198,8 @@ W:      http://www.linux-kvm.org
>  T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
>  F:     Documentation/virt/kvm/
>  F:     include/asm-generic/kvm*
> -F:     include/kvm/iodev.h
> +F:     include/kvm/
> +X:     include/kvm/arm_*
>  F:     include/linux/kvm*
>  F:     include/trace/events/kvm.h
>  F:     include/uapi/asm-generic/kvm*
> @@ -11285,6 +11286,7 @@ L:      kvm@vger.kernel.org
>  S:     Supported
>  T:     git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
>  F:     arch/x86/include/asm/kvm*
> +F:     arch/x86/include/asm/kvm/
>  F:     arch/x86/include/asm/svm.h
>  F:     arch/x86/include/asm/vmx*.h
>  F:     arch/x86/include/uapi/asm/kvm*
> diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
> new file mode 100644
> index 000000000000..35f893ebab5a
> --- /dev/null
> +++ b/arch/x86/include/asm/kvm/mmu_types.h
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_KVM_MMU_TYPES_H
> +#define __ASM_KVM_MMU_TYPES_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * This is a subset of the overall kvm_cpu_role to minimize the size of
> + * kvm_memory_slot.arch.gfn_track, i.e. allows allocating 2 bytes per gfn
> + * instead of 4 bytes per gfn.
> + *
> + * Upper-level shadow pages having gptes are tracked for write-protection via
> + * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
> + * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
> + * gfn_track will overflow and explosions will ensure.
> + *
> + * A unique shadow page (SP) for a gfn is created if and only if an existing SP
> + * cannot be reused.  The ability to reuse a SP is tracked by its role, which
> + * incorporates various mode bits and properties of the SP.  Roughly speaking,
> + * the number of unique SPs that can theoretically be created is 2^n, where n
> + * is the number of bits that are used to compute the role.
> + *
> + * Note, not all combinations of modes and flags below are possible:
> + *
> + *   - invalid shadow pages are not accounted, so the bits are effectively 18
> + *
> + *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
> + *     execonly and ad_disabled are only used for nested EPT which has
> + *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
> + *
> + *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
> + *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
> + *     paging has exactly one upper level, making level completely redundant
> + *     when has_4_byte_gpte=1.
> + *
> + *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
> + *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
> + *
> + * Therefore, the maximum number of possible upper-level shadow pages for a
> + * single gfn is a bit less than 2^13.
> + */
> +struct kvm_mmu_page_role_arch {
> +       u16 has_4_byte_gpte:1;
> +       u16 quadrant:2;
> +       u16 direct:1;
> +       u16 access:3;
> +       u16 efer_nx:1;
> +       u16 cr0_wp:1;
> +       u16 smep_andnot_wp:1;
> +       u16 smap_andnot_wp:1;
> +       u16 ad_disabled:1;
> +       u16 guest_mode:1;
> +       u16 passthrough:1;
> +};
> +
> +#endif /* !__ASM_KVM_MMU_TYPES_H */
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0a819d40131a..ebcd7a0dabef 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -37,6 +37,8 @@
>  #include <asm/kvm_vcpu_regs.h>
>  #include <asm/hyperv-tlfs.h>
>
> +#include <kvm/mmu_types.h>
> +
>  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>
>  #define KVM_MAX_VCPUS 1024
> @@ -286,72 +288,6 @@ enum x86_intercept_stage;
>
>  struct kvm_kernel_irq_routing_entry;
>
> -/*
> - * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> - * also includes TDP pages) to determine whether or not a page can be used in
> - * the given MMU context.  This is a subset of the overall kvm_cpu_role to
> - * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
> - * 2 bytes per gfn instead of 4 bytes per gfn.
> - *
> - * Upper-level shadow pages having gptes are tracked for write-protection via
> - * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
> - * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
> - * gfn_track will overflow and explosions will ensure.
> - *
> - * A unique shadow page (SP) for a gfn is created if and only if an existing SP
> - * cannot be reused.  The ability to reuse a SP is tracked by its role, which
> - * incorporates various mode bits and properties of the SP.  Roughly speaking,
> - * the number of unique SPs that can theoretically be created is 2^n, where n
> - * is the number of bits that are used to compute the role.
> - *
> - * But, even though there are 19 bits in the mask below, not all combinations
> - * of modes and flags are possible:
> - *
> - *   - invalid shadow pages are not accounted, so the bits are effectively 18
> - *
> - *   - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
> - *     execonly and ad_disabled are only used for nested EPT which has
> - *     has_4_byte_gpte=0.  Therefore, 2 bits are always unused.
> - *
> - *   - the 4 bits of level are effectively limited to the values 2/3/4/5,
> - *     as 4k SPs are not tracked (allowed to go unsync).  In addition non-PAE
> - *     paging has exactly one upper level, making level completely redundant
> - *     when has_4_byte_gpte=1.
> - *
> - *   - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
> - *     cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
> - *
> - * Therefore, the maximum number of possible upper-level shadow pages for a
> - * single gfn is a bit less than 2^13.
> - */
> -union kvm_mmu_page_role {
> -       u32 word;
> -       struct {
> -               unsigned level:4;
> -               unsigned has_4_byte_gpte:1;
> -               unsigned quadrant:2;
> -               unsigned direct:1;
> -               unsigned access:3;
> -               unsigned invalid:1;
> -               unsigned efer_nx:1;
> -               unsigned cr0_wp:1;
> -               unsigned smep_andnot_wp:1;
> -               unsigned smap_andnot_wp:1;
> -               unsigned ad_disabled:1;
> -               unsigned guest_mode:1;
> -               unsigned passthrough:1;
> -               unsigned :5;
> -
> -               /*
> -                * This is left at the top of the word so that
> -                * kvm_memslots_for_spte_role can extract it with a
> -                * simple shift.  While there is room, give it a whole
> -                * byte so it is also faster to load it from memory.
> -                */
> -               unsigned as_id:8;
> -       };
> -};
> -
>  /*
>   * kvm_mmu_extended_role complements kvm_mmu_page_role, tracking properties
>   * relevant to the current MMU configuration.   When loading CR0, CR4, or EFER,
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index f375b719f565..355548603960 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -210,13 +210,13 @@ static inline bool __maybe_unused is_##reg##_##name(struct kvm_mmu *mmu)  \
>  {                                                              \
>         return !!(mmu->cpu_role. base_or_ext . reg##_##name);   \
>  }
> -BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
> +BUILD_MMU_ROLE_ACCESSOR(base.arch, cr0, wp);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pse);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smep);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, smap);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, pke);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  cr4, la57);
> -BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
> +BUILD_MMU_ROLE_ACCESSOR(base.arch, efer, nx);
>  BUILD_MMU_ROLE_ACCESSOR(ext,  efer, lma);
>
>  static inline bool is_cr0_pg(struct kvm_mmu *mmu)
> @@ -226,7 +226,7 @@ static inline bool is_cr0_pg(struct kvm_mmu *mmu)
>
>  static inline bool is_cr4_pae(struct kvm_mmu *mmu)
>  {
> -        return !mmu->cpu_role.base.has_4_byte_gpte;
> +       return !mmu->cpu_role.base.arch.has_4_byte_gpte;
>  }
>
>  static struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu)
> @@ -618,7 +618,7 @@ static bool mmu_spte_age(u64 *sptep)
>
>  static inline bool is_tdp_mmu_active(struct kvm_vcpu *vcpu)
>  {
> -       return tdp_mmu_enabled && vcpu->arch.mmu->root_role.direct;
> +       return tdp_mmu_enabled && vcpu->arch.mmu->root_role.arch.direct;
>  }
>
>  static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
> @@ -695,10 +695,10 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp);
>
>  static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>  {
> -       if (sp->role.passthrough)
> +       if (sp->role.arch.passthrough)
>                 return sp->gfn;
>
> -       if (!sp->role.direct)
> +       if (!sp->role.arch.direct)
>                 return sp->shadowed_translation[index] >> PAGE_SHIFT;
>
>         return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
> @@ -727,7 +727,7 @@ static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
>          *
>          * In both cases, sp->role.access contains the correct access bits.
>          */
> -       return sp->role.access;
> +       return sp->role.arch.access;
>  }
>
>  static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
> @@ -739,14 +739,14 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
>         }
>
>         WARN_ONCE(access != kvm_mmu_page_get_access(sp, index),
> -                 "access mismatch under %s page %llx (expected %u, got %u)\n",
> -                 sp->role.passthrough ? "passthrough" : "direct",
> -                 sp->gfn, kvm_mmu_page_get_access(sp, index), access);
> +                 "access mismatch under %s page %llx (expected %u, got %u)\n",
> +                 sp->role.arch.passthrough ? "passthrough" : "direct",
> +                 sp->gfn, kvm_mmu_page_get_access(sp, index), access);
>
>         WARN_ONCE(gfn != kvm_mmu_page_get_gfn(sp, index),
> -                 "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
> -                 sp->role.passthrough ? "passthrough" : "direct",
> -                 sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
> +                 "gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
> +                 sp->role.arch.passthrough ? "passthrough" : "direct",
> +                 sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
>  }
>
>  static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
> @@ -1723,7 +1723,7 @@ static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
>         hlist_del(&sp->hash_link);
>         list_del(&sp->link);
>         free_page((unsigned long)sp->spt);
> -       if (!sp->role.direct)
> +       if (!sp->role.arch.direct)
>                 free_page((unsigned long)sp->shadowed_translation);
>         kmem_cache_free(mmu_page_header_cache, sp);
>  }
> @@ -1884,10 +1884,10 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
>
>  static bool sp_has_gptes(struct kvm_mmu_page *sp)
>  {
> -       if (sp->role.direct)
> +       if (sp->role.arch.direct)
>                 return false;
>
> -       if (sp->role.passthrough)
> +       if (sp->role.arch.passthrough)
>                 return false;
>
>         return true;
> @@ -2065,7 +2065,7 @@ static void clear_sp_write_flooding_count(u64 *spte)
>   * The vCPU is required when finding indirect shadow pages; the shadow
>   * page may already exist and syncing it needs the vCPU pointer in
>   * order to read guest page tables.  Direct shadow pages are never
> - * unsync, thus @vcpu can be NULL if @role.direct is true.
> + * unsync, thus @vcpu can be NULL if @role.arch.direct is true.
>   */
>  static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                                                      struct kvm_vcpu *vcpu,
> @@ -2101,7 +2101,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                 }
>
>                 /* unsync and write-flooding only apply to indirect SPs. */
> -               if (sp->role.direct)
> +               if (sp->role.arch.direct)
>                         goto out;
>
>                 if (sp->unsync) {
> @@ -2162,7 +2162,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>
>         sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
>         sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
> -       if (!role.direct)
> +       if (!role.arch.direct)
>                 sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
>
>         set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
> @@ -2187,7 +2187,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>         return sp;
>  }
>
> -/* Note, @vcpu may be NULL if @role.direct is true; see kvm_mmu_find_shadow_page. */
> +/* Note, @vcpu may be NULL if @role.arch.direct is true; see kvm_mmu_find_shadow_page. */
>  static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
>                                                       struct kvm_vcpu *vcpu,
>                                                       struct shadow_page_caches *caches,
> @@ -2231,9 +2231,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
>
>         role = parent_sp->role;
>         role.level--;
> -       role.access = access;
> -       role.direct = direct;
> -       role.passthrough = 0;
> +       role.arch.access = access;
> +       role.arch.direct = direct;
> +       role.arch.passthrough = 0;
>
>         /*
>          * If the guest has 4-byte PTEs then that means it's using 32-bit,
> @@ -2261,9 +2261,9 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct,
>          * covers bit 21 (see above), thus the quadrant is calculated from the
>          * _least_ significant bit of the PDE index.
>          */
> -       if (role.has_4_byte_gpte) {
> +       if (role.arch.has_4_byte_gpte) {
>                 WARN_ON_ONCE(role.level != PG_LEVEL_4K);
> -               role.quadrant = spte_index(sptep) & 1;
> +               role.arch.quadrant = spte_index(sptep) & 1;
>         }
>
>         return role;
> @@ -2292,7 +2292,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
>
>         if (iterator->level >= PT64_ROOT_4LEVEL &&
>             vcpu->arch.mmu->cpu_role.base.level < PT64_ROOT_4LEVEL &&
> -           !vcpu->arch.mmu->root_role.direct)
> +           !vcpu->arch.mmu->root_role.arch.direct)
>                 iterator->level = PT32E_ROOT_LEVEL;
>
>         if (iterator->level == PT32E_ROOT_LEVEL) {
> @@ -2391,7 +2391,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>                  * a new sp with the correct access.
>                  */
>                 child = spte_to_child_sp(*sptep);
> -               if (child->role.access == direct_access)
> +               if (child->role.arch.access == direct_access)
>                         return;
>
>                 drop_parent_pte(child, sptep);
> @@ -2420,7 +2420,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
>                          * avoids retaining a large number of stale nested SPs.
>                          */
>                         if (tdp_enabled && invalid_list &&
> -                           child->role.guest_mode && !child->parent_ptes.val)
> +                           child->role.arch.guest_mode && !child->parent_ptes.val)
>                                 return kvm_mmu_prepare_zap_page(kvm, child,
>                                                                 invalid_list);
>                 }
> @@ -2689,7 +2689,7 @@ static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
>         gpa_t gpa;
>         int r;
>
> -       if (vcpu->arch.mmu->root_role.direct)
> +       if (vcpu->arch.mmu->root_role.arch.direct)
>                 return 0;
>
>         gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
> @@ -2900,7 +2900,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
>  {
>         struct page *pages[PTE_PREFETCH_NUM];
>         struct kvm_memory_slot *slot;
> -       unsigned int access = sp->role.access;
> +       unsigned int access = sp->role.arch.access;
>         int i, ret;
>         gfn_t gfn;
>
> @@ -2928,7 +2928,7 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
>         u64 *spte, *start = NULL;
>         int i;
>
> -       WARN_ON(!sp->role.direct);
> +       WARN_ON(!sp->role.arch.direct);
>
>         i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
>         spte = sp->spt + i;
> @@ -3549,7 +3549,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
>          * This should not be called while L2 is active, L2 can't invalidate
>          * _only_ its own roots, e.g. INVVPID unconditionally exits.
>          */
> -       WARN_ON_ONCE(mmu->root_role.guest_mode);
> +       WARN_ON_ONCE(mmu->root_role.arch.guest_mode);
>
>         for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
>                 root_hpa = mmu->prev_roots[i].hpa;
> @@ -3557,7 +3557,7 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
>                         continue;
>
>                 if (!to_shadow_page(root_hpa) ||
> -                       to_shadow_page(root_hpa)->role.guest_mode)
> +                       to_shadow_page(root_hpa)->role.arch.guest_mode)
>                         roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
>         }
>
> @@ -3585,10 +3585,10 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
>         struct kvm_mmu_page *sp;
>
>         role.level = level;
> -       role.quadrant = quadrant;
> +       role.arch.quadrant = quadrant;
>
> -       WARN_ON_ONCE(quadrant && !role.has_4_byte_gpte);
> -       WARN_ON_ONCE(role.direct && role.has_4_byte_gpte);
> +       WARN_ON_ONCE(quadrant && !role.arch.has_4_byte_gpte);
> +       WARN_ON_ONCE(role.arch.direct && role.arch.has_4_byte_gpte);
>
>         sp = kvm_mmu_get_shadow_page(vcpu, gfn, role);
>         ++sp->root_count;
> @@ -3834,7 +3834,7 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
>          * equivalent level in the guest's NPT to shadow.  Allocate the tables
>          * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare.
>          */
> -       if (mmu->root_role.direct ||
> +       if (mmu->root_role.arch.direct ||
>             mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL ||
>             mmu->root_role.level < PT64_ROOT_4LEVEL)
>                 return 0;
> @@ -3932,7 +3932,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
>         int i;
>         struct kvm_mmu_page *sp;
>
> -       if (vcpu->arch.mmu->root_role.direct)
> +       if (vcpu->arch.mmu->root_role.arch.direct)
>                 return;
>
>         if (!VALID_PAGE(vcpu->arch.mmu->root.hpa))
> @@ -4161,7 +4161,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>
>         arch.token = alloc_apf_token(vcpu);
>         arch.gfn = gfn;
> -       arch.direct_map = vcpu->arch.mmu->root_role.direct;
> +       arch.direct_map = vcpu->arch.mmu->root_role.arch.direct;
>         arch.cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu);
>
>         return kvm_setup_async_pf(vcpu, cr2_or_gpa,
> @@ -4172,7 +4172,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>  {
>         int r;
>
> -       if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||
> +       if ((vcpu->arch.mmu->root_role.arch.direct != work->arch.direct_map) ||
>               work->wakeup_all)
>                 return;
>
> @@ -4180,7 +4180,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>         if (unlikely(r))
>                 return;
>
> -       if (!vcpu->arch.mmu->root_role.direct &&
> +       if (!vcpu->arch.mmu->root_role.arch.direct &&
>               work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu))
>                 return;
>
> @@ -4456,7 +4456,7 @@ static void nonpaging_init_context(struct kvm_mmu *context)
>  static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
>                                   union kvm_mmu_page_role role)
>  {
> -       return (role.direct || pgd == root->pgd) &&
> +       return (role.arch.direct || pgd == root->pgd) &&
>                VALID_PAGE(root->hpa) &&
>                role.word == to_shadow_page(root->hpa)->role.word;
>  }
> @@ -4576,7 +4576,7 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
>          * If this is a direct root page, it doesn't have a write flooding
>          * count. Otherwise, clear the write flooding count.
>          */
> -       if (!new_role.direct)
> +       if (!new_role.arch.direct)
>                 __clear_sp_write_flooding_count(
>                                 to_shadow_page(vcpu->arch.mmu->root.hpa));
>  }
> @@ -4803,7 +4803,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
>         shadow_zero_check = &context->shadow_zero_check;
>         __reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
>                                 context->root_role.level,
> -                               context->root_role.efer_nx,
> +                               context->root_role.arch.efer_nx,
>                                 guest_can_use_gbpages(vcpu), is_pse, is_amd);
>
>         if (!shadow_me_mask)
> @@ -5055,21 +5055,21 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
>  {
>         union kvm_cpu_role role = {0};
>
> -       role.base.access = ACC_ALL;
>         role.base.as_id = is_smm(vcpu);
> -       role.base.guest_mode = is_guest_mode(vcpu);
> +       role.base.arch.access = ACC_ALL;
> +       role.base.arch.guest_mode = is_guest_mode(vcpu);
>         role.ext.valid = 1;
>
>         if (!____is_cr0_pg(regs)) {
> -               role.base.direct = 1;
> +               role.base.arch.direct = 1;
>                 return role;
>         }
>
> -       role.base.efer_nx = ____is_efer_nx(regs);
> -       role.base.cr0_wp = ____is_cr0_wp(regs);
> -       role.base.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
> -       role.base.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
> -       role.base.has_4_byte_gpte = !____is_cr4_pae(regs);
> +       role.base.arch.efer_nx = ____is_efer_nx(regs);
> +       role.base.arch.cr0_wp = ____is_cr0_wp(regs);
> +       role.base.arch.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
> +       role.base.arch.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
> +       role.base.arch.has_4_byte_gpte = !____is_cr4_pae(regs);
>
>         if (____is_efer_lma(regs))
>                 role.base.level = ____is_cr4_la57(regs) ? PT64_ROOT_5LEVEL
> @@ -5109,15 +5109,15 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
>  {
>         union kvm_mmu_page_role role = {0};
>
> -       role.access = ACC_ALL;
> -       role.cr0_wp = true;
> -       role.efer_nx = true;
>         role.as_id = cpu_role.base.as_id;
> -       role.guest_mode = cpu_role.base.guest_mode;
> -       role.ad_disabled = !kvm_ad_enabled();
>         role.level = kvm_mmu_get_tdp_level(vcpu);
> -       role.direct = true;
> -       role.has_4_byte_gpte = false;
> +       role.arch.access = ACC_ALL;
> +       role.arch.cr0_wp = true;
> +       role.arch.efer_nx = true;
> +       role.arch.guest_mode = cpu_role.base.arch.guest_mode;
> +       role.arch.ad_disabled = !kvm_ad_enabled();
> +       role.arch.direct = true;
> +       role.arch.has_4_byte_gpte = false;
>
>         return role;
>  }
> @@ -5194,7 +5194,7 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
>          * NX can be used by any non-nested shadow MMU to avoid having to reset
>          * MMU contexts.
>          */
> -       root_role.efer_nx = true;
> +       root_role.arch.efer_nx = true;
>
>         shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
>  }
> @@ -5212,13 +5212,13 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
>         union kvm_mmu_page_role root_role;
>
>         /* NPT requires CR0.PG=1. */
> -       WARN_ON_ONCE(cpu_role.base.direct);
> +       WARN_ON_ONCE(cpu_role.base.arch.direct);
>
>         root_role = cpu_role.base;
>         root_role.level = kvm_mmu_get_tdp_level(vcpu);
>         if (root_role.level == PT64_ROOT_5LEVEL &&
>             cpu_role.base.level == PT64_ROOT_4LEVEL)
> -               root_role.passthrough = 1;
> +               root_role.arch.passthrough = 1;
>
>         shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
>         kvm_mmu_new_pgd(vcpu, nested_cr3);
> @@ -5237,11 +5237,11 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
>          */
>         WARN_ON_ONCE(is_smm(vcpu));
>         role.base.level = level;
> -       role.base.has_4_byte_gpte = false;
> -       role.base.direct = false;
> -       role.base.ad_disabled = !accessed_dirty;
> -       role.base.guest_mode = true;
> -       role.base.access = ACC_ALL;
> +       role.base.arch.has_4_byte_gpte = false;
> +       role.base.arch.direct = false;
> +       role.base.arch.ad_disabled = !accessed_dirty;
> +       role.base.arch.guest_mode = true;
> +       role.base.arch.access = ACC_ALL;
>
>         role.ext.word = 0;
>         role.ext.execonly = execonly;
> @@ -5385,13 +5385,13 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
>  {
>         int r;
>
> -       r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.direct);
> +       r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.arch.direct);
>         if (r)
>                 goto out;
>         r = mmu_alloc_special_roots(vcpu);
>         if (r)
>                 goto out;
> -       if (vcpu->arch.mmu->root_role.direct)
> +       if (vcpu->arch.mmu->root_role.arch.direct)
>                 r = mmu_alloc_direct_roots(vcpu);
>         else
>                 r = mmu_alloc_shadow_roots(vcpu);
> @@ -5526,7 +5526,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa,
>                  gpa, bytes, sp->role.word);
>
>         offset = offset_in_page(gpa);
> -       pte_size = sp->role.has_4_byte_gpte ? 4 : 8;
> +       pte_size = sp->role.arch.has_4_byte_gpte ? 4 : 8;
>
>         /*
>          * Sometimes, the OS only writes the last one bytes to update status
> @@ -5550,7 +5550,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
>         page_offset = offset_in_page(gpa);
>         level = sp->role.level;
>         *nspte = 1;
> -       if (sp->role.has_4_byte_gpte) {
> +       if (sp->role.arch.has_4_byte_gpte) {
>                 page_offset <<= 1;      /* 32->64 */
>                 /*
>                  * A 32-bit pde maps 4MB while the shadow pdes map
> @@ -5564,7 +5564,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
>                 }
>                 quadrant = page_offset >> PAGE_SHIFT;
>                 page_offset &= ~PAGE_MASK;
> -               if (quadrant != sp->role.quadrant)
> +               if (quadrant != sp->role.arch.quadrant)
>                         return NULL;
>         }
>
> @@ -5628,7 +5628,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>                        void *insn, int insn_len)
>  {
>         int r, emulation_type = EMULTYPE_PF;
> -       bool direct = vcpu->arch.mmu->root_role.direct;
> +       bool direct = vcpu->arch.mmu->root_role.arch.direct;
>
>         if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
>                 return RET_PF_RETRY;
> @@ -5659,7 +5659,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>          * paging in both guests. If true, we simply unprotect the page
>          * and resume the guest.
>          */
> -       if (vcpu->arch.mmu->root_role.direct &&
> +       if (vcpu->arch.mmu->root_role.arch.direct &&
>             (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) {
>                 kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa));
>                 return 1;
> @@ -6321,7 +6321,7 @@ static void shadow_mmu_split_huge_page(struct kvm *kvm,
>
>                 spte = make_huge_page_split_spte(kvm, huge_spte, sp->role, index);
>                 mmu_spte_set(sptep, spte);
> -               __rmap_add(kvm, cache, slot, sptep, gfn, sp->role.access);
> +               __rmap_add(kvm, cache, slot, sptep, gfn, sp->role.arch.access);
>         }
>
>         __link_shadow_page(kvm, cache, huge_sptep, sp, flush);
> @@ -6380,7 +6380,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
>                 sp = sptep_to_sp(huge_sptep);
>
>                 /* TDP MMU is enabled, so rmap only contains nested MMU SPs. */
> -               if (WARN_ON_ONCE(!sp->role.guest_mode))
> +               if (WARN_ON_ONCE(!sp->role.arch.guest_mode))
>                         continue;
>
>                 /* The rmaps should never contain non-leaf SPTEs. */
> @@ -6502,7 +6502,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
>                  * the guest, and the guest page table is using 4K page size
>                  * mapping if the indirect sp has level = 1.
>                  */
> -               if (sp->role.direct &&
> +               if (sp->role.arch.direct &&
>                     sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
>                                                                PG_LEVEL_NUM)) {
>                         kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
> @@ -6942,7 +6942,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
>                                       struct kvm_mmu_page,
>                                       possible_nx_huge_page_link);
>                 WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
> -               WARN_ON_ONCE(!sp->role.direct);
> +               WARN_ON_ONCE(!sp->role.arch.direct);
>
>                 /*
>                  * Unaccount and do not attempt to recover any NX Huge Pages
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 5427f65117b4..c19a80fdeb8d 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -143,7 +143,7 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
>          * being enabled is mandatory as the bits used to denote WP-only SPTEs
>          * are reserved for PAE paging (32-bit KVM).
>          */
> -       return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
> +       return kvm_x86_ops.cpu_dirty_log_size && sp->role.arch.guest_mode;
>  }
>
>  int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
> @@ -270,7 +270,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>         };
>         int r;
>
> -       if (vcpu->arch.mmu->root_role.direct) {
> +       if (vcpu->arch.mmu->root_role.arch.direct) {
>                 fault.gfn = fault.addr >> PAGE_SHIFT;
>                 fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
>         }
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index ae86820cef69..6a4a43b90780 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -35,13 +35,13 @@
>                          " %snxe %sad root %u %s%c",                    \
>                          __entry->mmu_valid_gen,                        \
>                          __entry->gfn, role.level,                      \
> -                        role.has_4_byte_gpte ? 4 : 8,                  \
> -                        role.quadrant,                                 \
> -                        role.direct ? " direct" : "",                  \
> -                        access_str[role.access],                       \
> +                        role.arch.has_4_byte_gpte ? 4 : 8,                     \
> +                        role.arch.quadrant,                                    \
> +                        role.arch.direct ? " direct" : "",                     \
> +                        access_str[role.arch.access],                  \
>                          role.invalid ? " invalid" : "",                \
> -                        role.efer_nx ? "" : "!",                       \
> -                        role.ad_disabled ? "!" : "",                   \
> +                        role.arch.efer_nx ? "" : "!",                  \
> +                        role.arch.ad_disabled ? "!" : "",                      \
>                          __entry->root_count,                           \
>                          __entry->unsync ? "unsync" : "sync", 0);       \
>         saved_ptr;                                                      \
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index e5662dbd519c..e15ec1c473da 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -55,7 +55,7 @@
>         #define PT_LEVEL_BITS 9
>         #define PT_GUEST_DIRTY_SHIFT 9
>         #define PT_GUEST_ACCESSED_SHIFT 8
> -       #define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.ad_disabled)
> +       #define PT_HAVE_ACCESSED_DIRTY(mmu) (!(mmu)->cpu_role.base.arch.ad_disabled)
>         #define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL
>  #else
>         #error Invalid PTTYPE value
> @@ -532,7 +532,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>         pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
>
>         gfn = gpte_to_gfn(gpte);
> -       pte_access = sp->role.access & FNAME(gpte_access)(gpte);
> +       pte_access = sp->role.arch.access & FNAME(gpte_access)(gpte);
>         FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
>
>         slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn,
> @@ -592,7 +592,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
>         if (unlikely(vcpu->kvm->mmu_invalidate_in_progress))
>                 return;
>
> -       if (sp->role.direct)
> +       if (sp->role.arch.direct)
>                 return __direct_pte_prefetch(vcpu, sp, sptep);
>
>         i = spte_index(sptep) & ~(PTE_PREFETCH_NUM - 1);
> @@ -884,7 +884,7 @@ static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
>         WARN_ON(sp->role.level != PG_LEVEL_4K);
>
>         if (PTTYPE == 32)
> -               offset = sp->role.quadrant << SPTE_LEVEL_BITS;
> +               offset = sp->role.arch.quadrant << SPTE_LEVEL_BITS;
>
>         return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
>  }
> @@ -1003,9 +1003,11 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>          */
>         const union kvm_mmu_page_role sync_role_ign = {
>                 .level = 0xf,
> -               .access = 0x7,
> -               .quadrant = 0x3,
> -               .passthrough = 0x1,
> +               .arch = {
> +                       .access = 0x7,
> +                       .quadrant = 0x3,
> +                       .passthrough = 0x1,
> +               },
>         };
>
>         /*
> @@ -1014,7 +1016,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>          * differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
>          * reserved bits checks will be wrong, etc...
>          */
> -       if (WARN_ON_ONCE(sp->role.direct ||
> +       if (WARN_ON_ONCE(sp->role.arch.direct ||
>                          (sp->role.word ^ root_role.word) & ~sync_role_ign.word))
>                 return -1;
>
> @@ -1043,7 +1045,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>                 }
>
>                 gfn = gpte_to_gfn(gpte);
> -               pte_access = sp->role.access;
> +               pte_access = sp->role.arch.access;
>                 pte_access &= FNAME(gpte_access)(gpte);
>                 FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
>
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index c0fd7e049b4e..fe4b626cb431 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -146,7 +146,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>
>         WARN_ON_ONCE(!pte_access && !shadow_present_mask);
>
> -       if (sp->role.ad_disabled)
> +       if (sp->role.arch.ad_disabled)
>                 spte |= SPTE_TDP_AD_DISABLED_MASK;
>         else if (kvm_mmu_page_ad_need_write_protect(sp))
>                 spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
> @@ -301,7 +301,7 @@ u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte, union kvm_mmu_page
>                  * the page executable as the NX hugepage mitigation no longer
>                  * applies.
>                  */
> -               if ((role.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
> +               if ((role.arch.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
>                         child_spte = make_spte_executable(child_spte);
>         }
>
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index 1f03701b943a..ad84c549fe96 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -260,7 +260,7 @@ static inline bool kvm_ad_enabled(void)
>
>  static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
>  {
> -       return sp->role.ad_disabled;
> +       return sp->role.arch.ad_disabled;
>  }
>
>  static inline bool spte_ad_enabled(u64 spte)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b2da8c8f30a..2bfe060768fc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8442,7 +8442,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>             WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF)))
>                 return false;
>
> -       if (!vcpu->arch.mmu->root_role.direct) {
> +       if (!vcpu->arch.mmu->root_role.arch.direct) {
>                 /*
>                  * Write permission should be allowed since only
>                  * write access need to be emulated.
> @@ -8475,7 +8475,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>         kvm_release_pfn_clean(pfn);
>
>         /* The instructions are well-emulated on direct mmu. */
> -       if (vcpu->arch.mmu->root_role.direct) {
> +       if (vcpu->arch.mmu->root_role.arch.direct) {
>                 unsigned int indirect_shadow_pages;
>
>                 write_lock(&vcpu->kvm->mmu_lock);
> @@ -8543,7 +8543,7 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
>         vcpu->arch.last_retry_eip = ctxt->eip;
>         vcpu->arch.last_retry_addr = cr2_or_gpa;
>
> -       if (!vcpu->arch.mmu->root_role.direct)
> +       if (!vcpu->arch.mmu->root_role.arch.direct)
>                 gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);
>
>         kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
> @@ -8846,7 +8846,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>                 ctxt->exception.address = cr2_or_gpa;
>
>                 /* With shadow page tables, cr2 contains a GVA or nGPA. */
> -               if (vcpu->arch.mmu->root_role.direct) {
> +               if (vcpu->arch.mmu->root_role.arch.direct) {
>                         ctxt->gpa_available = true;
>                         ctxt->gpa_val = cr2_or_gpa;
>                 }
> diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> new file mode 100644
> index 000000000000..3f35a924e031
> --- /dev/null
> +++ b/include/kvm/mmu_types.h
> @@ -0,0 +1,37 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __KVM_MMU_TYPES_H
> +#define __KVM_MMU_TYPES_H
> +
> +#include <linux/bug.h>
> +#include <linux/types.h>
> +#include <linux/stddef.h>
> +
> +#include <asm/kvm/mmu_types.h>
> +
> +/*
> + * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> + * also includes TDP pages) to determine whether or not a page can be used in
> + * the given MMU context.
> + */
> +union kvm_mmu_page_role {
> +       u32 word;
> +       struct {
> +               struct {
> +                       /* The address space ID mapped by the page. */
> +                       u16 as_id:8;

We should either make this just 1 bit or preserve the comment saying
it's 8 bits to make it faster to load from memory. Otherwise folks
might think that as_id can use all 8 bits.
kvm_memory_slot has this as a full u16, so we're already unprepared to
express the full range there.

> +
> +                       /* The level of the page in the page table hierarchy. */
> +                       u16 level:4;
> +
> +                       /* Whether the page is invalid, i.e. pending destruction. */
> +                       u16 invalid:1;
> +               };
> +
> +               /* Architecture-specific properties. */
> +               struct kvm_mmu_page_role_arch arch;
> +       };
> +};
> +
> +static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page_role, word));
> +
> +#endif /* !__KVM_MMU_TYPES_H */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
  2022-12-08 19:38   ` David Matlack
  (?)
  (?)
@ 2022-12-12 18:07     ` Ben Gardon
  -1 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 18:07 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, linux-riscv, kvmarm, Yu Zhao, Marc Zyngier,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Paolo Bonzini,
	Andrew Morton

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_mmu_page to common code and all x86-specific fields into
> kvm_mmu_page_arch.
>
> This commit increases the size of struct kvm_mmu_page by 64 bytes on
> x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
> in future commits by moving TDP MMU root fields into a separate struct
> and by dynamically allocating fields only used by the Shadow MMU.
>
> No functional change intended.
>
> Signed-off-by: David Matlack <dmatlack@google.com>

I haven't reviewed every line of the mechanical refactor of adding
"arch." all over the place, but at a high level, this looks like a
good split on struct kvm_mmu_page.

> ---
>  arch/x86/include/asm/kvm/mmu_types.h |  62 ++++++++++++++
>  arch/x86/include/asm/kvm_host.h      |   4 -
>  arch/x86/kvm/mmu/mmu.c               | 122 ++++++++++++++-------------
>  arch/x86/kvm/mmu/mmu_internal.h      |  83 ------------------
>  arch/x86/kvm/mmu/mmutrace.h          |   4 +-
>  arch/x86/kvm/mmu/paging_tmpl.h       |  10 +--
>  arch/x86/kvm/mmu/tdp_mmu.c           |   8 +-
>  arch/x86/kvm/mmu/tdp_mmu.h           |   2 +-
>  include/kvm/mmu_types.h              |  32 ++++++-
>  9 files changed, 167 insertions(+), 160 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
> index 35f893ebab5a..affcb520b482 100644
> --- a/arch/x86/include/asm/kvm/mmu_types.h
> +++ b/arch/x86/include/asm/kvm/mmu_types.h
> @@ -2,6 +2,8 @@
>  #ifndef __ASM_KVM_MMU_TYPES_H
>  #define __ASM_KVM_MMU_TYPES_H
>
> +#include <linux/bitmap.h>
> +#include <linux/list.h>
>  #include <linux/types.h>
>
>  /*
> @@ -53,4 +55,64 @@ struct kvm_mmu_page_role_arch {
>         u16 passthrough:1;
>  };
>
> +struct kvm_rmap_head {
> +       unsigned long val;
> +};
> +
> +struct kvm_mmu_page_arch {
> +       struct hlist_node hash_link;
> +
> +       bool shadow_mmu_page;
> +       bool unsync;
> +       u8 mmu_valid_gen;
> +
> +        /*
> +         * The shadow page can't be replaced by an equivalent huge page
> +         * because it is being used to map an executable page in the guest
> +         * and the NX huge page mitigation is enabled.
> +         */
> +       bool nx_huge_page_disallowed;
> +
> +       /*
> +        * Stores the result of the guest translation being shadowed by each
> +        * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
> +        * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
> +        * cases the result of the translation is a GPA and a set of access
> +        * constraints.
> +        *
> +        * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
> +        * access permissions are stored in the lower bits. Note, for
> +        * convenience and uniformity across guests, the access permissions are
> +        * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
> +        */
> +       u64 *shadowed_translation;
> +
> +       unsigned int unsync_children;
> +
> +       /* Rmap pointers to all parent sptes. */
> +       struct kvm_rmap_head parent_ptes;
> +
> +       DECLARE_BITMAP(unsync_child_bitmap, 512);
> +
> +       /*
> +        * Tracks shadow pages that, if zapped, would allow KVM to create an NX
> +        * huge page.  A shadow page will have nx_huge_page_disallowed set but
> +        * not be on the list if a huge page is disallowed for other reasons,
> +        * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
> +        * isn't properly aligned, etc...
> +        */
> +       struct list_head possible_nx_huge_page_link;
> +
> +#ifdef CONFIG_X86_32
> +       /*
> +        * Used out of the mmu-lock to avoid reading spte values while an
> +        * update is in progress; see the comments in __get_spte_lockless().
> +        */
> +       int clear_spte_count;
> +#endif
> +
> +       /* Number of writes since the last time traversal visited this page.  */
> +       atomic_t write_flooding_count;
> +};
> +
>  #endif /* !__ASM_KVM_MMU_TYPES_H */
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ebcd7a0dabef..f5743a652e10 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -329,10 +329,6 @@ union kvm_cpu_role {
>         };
>  };
>
> -struct kvm_rmap_head {
> -       unsigned long val;
> -};
> -
>  struct kvm_pio_request {
>         unsigned long linear_rip;
>         unsigned long count;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 11cef930d5ed..e47f35878ab5 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -350,7 +350,7 @@ static void count_spte_clear(u64 *sptep, u64 spte)
>
>         /* Ensure the spte is completely set before we increase the count */
>         smp_wmb();
> -       sp->clear_spte_count++;
> +       sp->arch.clear_spte_count++;
>  }
>
>  static void __set_spte(u64 *sptep, u64 spte)
> @@ -432,7 +432,7 @@ static u64 __get_spte_lockless(u64 *sptep)
>         int count;
>
>  retry:
> -       count = sp->clear_spte_count;
> +       count = sp->arch.clear_spte_count;
>         smp_rmb();
>
>         spte.spte_low = orig->spte_low;
> @@ -442,7 +442,7 @@ static u64 __get_spte_lockless(u64 *sptep)
>         smp_rmb();
>
>         if (unlikely(spte.spte_low != orig->spte_low ||
> -             count != sp->clear_spte_count))
> +             count != sp->arch.clear_spte_count))
>                 goto retry;
>
>         return spte.spte;
> @@ -699,7 +699,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>                 return sp->gfn;
>
>         if (!sp->role.arch.direct)
> -               return sp->shadowed_translation[index] >> PAGE_SHIFT;
> +               return sp->arch.shadowed_translation[index] >> PAGE_SHIFT;
>
>         return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
>  }
> @@ -713,7 +713,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>  static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
>  {
>         if (sp_has_gptes(sp))
> -               return sp->shadowed_translation[index] & ACC_ALL;
> +               return sp->arch.shadowed_translation[index] & ACC_ALL;
>
>         /*
>          * For direct MMUs (e.g. TDP or non-paging guests) or passthrough SPs,
> @@ -734,7 +734,7 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
>                                          gfn_t gfn, unsigned int access)
>  {
>         if (sp_has_gptes(sp)) {
> -               sp->shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
> +               sp->arch.shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
>                 return;
>         }
>
> @@ -825,18 +825,18 @@ void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>          * on the list if KVM is reusing an existing shadow page, i.e. if KVM
>          * links a shadow page at multiple points.
>          */
> -       if (!list_empty(&sp->possible_nx_huge_page_link))
> +       if (!list_empty(&sp->arch.possible_nx_huge_page_link))
>                 return;
>
>         ++kvm->stat.nx_lpage_splits;
> -       list_add_tail(&sp->possible_nx_huge_page_link,
> +       list_add_tail(&sp->arch.possible_nx_huge_page_link,
>                       &kvm->arch.possible_nx_huge_pages);
>  }
>
>  static void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
>                                  bool nx_huge_page_possible)
>  {
> -       sp->nx_huge_page_disallowed = true;
> +       sp->arch.nx_huge_page_disallowed = true;
>
>         if (nx_huge_page_possible)
>                 track_possible_nx_huge_page(kvm, sp);
> @@ -861,16 +861,16 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
>
>  void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> -       if (list_empty(&sp->possible_nx_huge_page_link))
> +       if (list_empty(&sp->arch.possible_nx_huge_page_link))
>                 return;
>
>         --kvm->stat.nx_lpage_splits;
> -       list_del_init(&sp->possible_nx_huge_page_link);
> +       list_del_init(&sp->arch.possible_nx_huge_page_link);
>  }
>
>  static void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> -       sp->nx_huge_page_disallowed = false;
> +       sp->arch.nx_huge_page_disallowed = false;
>
>         untrack_possible_nx_huge_page(kvm, sp);
>  }
> @@ -1720,11 +1720,11 @@ static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
>  {
>         MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
> -       hlist_del(&sp->hash_link);
> +       hlist_del(&sp->arch.hash_link);
>         list_del(&sp->link);
>         free_page((unsigned long)sp->spt);
>         if (!sp->role.arch.direct)
> -               free_page((unsigned long)sp->shadowed_translation);
> +               free_page((unsigned long)sp->arch.shadowed_translation);
>         kmem_cache_free(mmu_page_header_cache, sp);
>  }
>
> @@ -1739,13 +1739,13 @@ static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache,
>         if (!parent_pte)
>                 return;
>
> -       pte_list_add(cache, parent_pte, &sp->parent_ptes);
> +       pte_list_add(cache, parent_pte, &sp->arch.parent_ptes);
>  }
>
>  static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
>                                        u64 *parent_pte)
>  {
> -       pte_list_remove(parent_pte, &sp->parent_ptes);
> +       pte_list_remove(parent_pte, &sp->arch.parent_ptes);
>  }
>
>  static void drop_parent_pte(struct kvm_mmu_page *sp,
> @@ -1761,7 +1761,7 @@ static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
>         u64 *sptep;
>         struct rmap_iterator iter;
>
> -       for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
> +       for_each_rmap_spte(&sp->arch.parent_ptes, &iter, sptep) {
>                 mark_unsync(sptep);
>         }
>  }
> @@ -1771,9 +1771,9 @@ static void mark_unsync(u64 *spte)
>         struct kvm_mmu_page *sp;
>
>         sp = sptep_to_sp(spte);
> -       if (__test_and_set_bit(spte_index(spte), sp->unsync_child_bitmap))
> +       if (__test_and_set_bit(spte_index(spte), sp->arch.unsync_child_bitmap))
>                 return;
> -       if (sp->unsync_children++)
> +       if (sp->arch.unsync_children++)
>                 return;
>         kvm_mmu_mark_parents_unsync(sp);
>  }
> @@ -1799,7 +1799,7 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
>  {
>         int i;
>
> -       if (sp->unsync)
> +       if (sp->arch.unsync)
>                 for (i=0; i < pvec->nr; i++)
>                         if (pvec->page[i].sp == sp)
>                                 return 0;
> @@ -1812,9 +1812,9 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
>
>  static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
>  {
> -       --sp->unsync_children;
> -       WARN_ON((int)sp->unsync_children < 0);
> -       __clear_bit(idx, sp->unsync_child_bitmap);
> +       --sp->arch.unsync_children;
> +       WARN_ON((int)sp->arch.unsync_children < 0);
> +       __clear_bit(idx, sp->arch.unsync_child_bitmap);
>  }
>
>  static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
> @@ -1822,7 +1822,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>  {
>         int i, ret, nr_unsync_leaf = 0;
>
> -       for_each_set_bit(i, sp->unsync_child_bitmap, 512) {
> +       for_each_set_bit(i, sp->arch.unsync_child_bitmap, 512) {
>                 struct kvm_mmu_page *child;
>                 u64 ent = sp->spt[i];
>
> @@ -1833,7 +1833,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>
>                 child = spte_to_child_sp(ent);
>
> -               if (child->unsync_children) {
> +               if (child->arch.unsync_children) {
>                         if (mmu_pages_add(pvec, child, i))
>                                 return -ENOSPC;
>
> @@ -1845,7 +1845,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>                                 nr_unsync_leaf += ret;
>                         } else
>                                 return ret;
> -               } else if (child->unsync) {
> +               } else if (child->arch.unsync) {
>                         nr_unsync_leaf++;
>                         if (mmu_pages_add(pvec, child, i))
>                                 return -ENOSPC;
> @@ -1862,7 +1862,7 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
>                            struct kvm_mmu_pages *pvec)
>  {
>         pvec->nr = 0;
> -       if (!sp->unsync_children)
> +       if (!sp->arch.unsync_children)
>                 return 0;
>
>         mmu_pages_add(pvec, sp, INVALID_INDEX);
> @@ -1871,9 +1871,9 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
>
>  static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> -       WARN_ON(!sp->unsync);
> +       WARN_ON(!sp->arch.unsync);
>         trace_kvm_mmu_sync_page(sp);
> -       sp->unsync = 0;
> +       sp->arch.unsync = 0;
>         --kvm->stat.mmu_unsync;
>  }
>
> @@ -1894,7 +1894,7 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
>  }
>
>  #define for_each_valid_sp(_kvm, _sp, _list)                            \
> -       hlist_for_each_entry(_sp, _list, hash_link)                     \
> +       hlist_for_each_entry(_sp, _list, arch.hash_link)                        \
>                 if (is_obsolete_sp((_kvm), (_sp))) {                    \
>                 } else
>
> @@ -1934,7 +1934,7 @@ static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>
>         /* TDP MMU pages do not use the MMU generation. */
>         return !is_tdp_mmu_page(sp) &&
> -              unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
> +              unlikely(sp->arch.mmu_valid_gen != kvm->arch.mmu_valid_gen);
>  }
>
>  struct mmu_page_path {
> @@ -2006,7 +2006,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
>                 WARN_ON(idx == INVALID_INDEX);
>                 clear_unsync_child_bit(sp, idx);
>                 level++;
> -       } while (!sp->unsync_children);
> +       } while (!sp->arch.unsync_children);
>  }
>
>  static int mmu_sync_children(struct kvm_vcpu *vcpu,
> @@ -2053,7 +2053,7 @@ static int mmu_sync_children(struct kvm_vcpu *vcpu,
>
>  static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
>  {
> -       atomic_set(&sp->write_flooding_count,  0);
> +       atomic_set(&sp->arch.write_flooding_count,  0);
>  }
>
>  static void clear_sp_write_flooding_count(u64 *spte)
> @@ -2094,7 +2094,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                          * Unsync pages must not be left as is, because the new
>                          * upper-level page will be write-protected.
>                          */
> -                       if (role.level > PG_LEVEL_4K && sp->unsync)
> +                       if (role.level > PG_LEVEL_4K && sp->arch.unsync)
>                                 kvm_mmu_prepare_zap_page(kvm, sp,
>                                                          &invalid_list);
>                         continue;
> @@ -2104,7 +2104,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                 if (sp->role.arch.direct)
>                         goto out;
>
> -               if (sp->unsync) {
> +               if (sp->arch.unsync) {
>                         if (KVM_BUG_ON(!vcpu, kvm))
>                                 break;
>
> @@ -2163,25 +2163,26 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>         sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
>         sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
>         if (!role.arch.direct)
> -               sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
> +               sp->arch.shadowed_translation =
> +                       kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
>
>         set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
>
> -       INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
> +       INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
>
>         /*
>          * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
>          * depends on valid pages being added to the head of the list.  See
>          * comments in kvm_zap_obsolete_pages().
>          */
> -       sp->mmu_valid_gen = kvm->arch.mmu_valid_gen;
> +       sp->arch.mmu_valid_gen = kvm->arch.mmu_valid_gen;
>         list_add(&sp->link, &kvm->arch.active_mmu_pages);
>         kvm_account_mmu_page(kvm, sp);
>
>         sp->gfn = gfn;
>         sp->role = role;
> -       sp->shadow_mmu_page = true;
> -       hlist_add_head(&sp->hash_link, sp_list);
> +       sp->arch.shadow_mmu_page = true;
> +       hlist_add_head(&sp->arch.hash_link, sp_list);
>         if (sp_has_gptes(sp))
>                 account_shadowed(kvm, sp);
>
> @@ -2368,7 +2369,7 @@ static void __link_shadow_page(struct kvm *kvm,
>
>         mmu_page_add_parent_pte(cache, sp, sptep);
>
> -       if (sp->unsync_children || sp->unsync)
> +       if (sp->arch.unsync_children || sp->arch.unsync)
>                 mark_unsync(sptep);
>  }
>
> @@ -2421,7 +2422,8 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
>                          * avoids retaining a large number of stale nested SPs.
>                          */
>                         if (tdp_enabled && invalid_list &&
> -                           child->role.arch.guest_mode && !child->parent_ptes.val)
> +                           child->role.arch.guest_mode &&
> +                           !child->arch.parent_ptes.val)
>                                 return kvm_mmu_prepare_zap_page(kvm, child,
>                                                                 invalid_list);
>                 }
> @@ -2449,7 +2451,7 @@ static void kvm_mmu_unlink_parents(struct kvm_mmu_page *sp)
>         u64 *sptep;
>         struct rmap_iterator iter;
>
> -       while ((sptep = rmap_get_first(&sp->parent_ptes, &iter)))
> +       while ((sptep = rmap_get_first(&sp->arch.parent_ptes, &iter)))
>                 drop_parent_pte(sp, sptep);
>  }
>
> @@ -2496,7 +2498,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
>         if (!sp->role.invalid && sp_has_gptes(sp))
>                 unaccount_shadowed(kvm, sp);
>
> -       if (sp->unsync)
> +       if (sp->arch.unsync)
>                 kvm_unlink_unsync_page(kvm, sp);
>         if (!refcount_read(&sp->root_refcount)) {
>                 /* Count self */
> @@ -2527,7 +2529,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
>                 zapped_root = !is_obsolete_sp(kvm, sp);
>         }
>
> -       if (sp->nx_huge_page_disallowed)
> +       if (sp->arch.nx_huge_page_disallowed)
>                 unaccount_nx_huge_page(kvm, sp);
>
>         sp->role.invalid = 1;
> @@ -2704,7 +2706,7 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
>         trace_kvm_mmu_unsync_page(sp);
>         ++kvm->stat.mmu_unsync;
> -       sp->unsync = 1;
> +       sp->arch.unsync = 1;
>
>         kvm_mmu_mark_parents_unsync(sp);
>  }
> @@ -2739,7 +2741,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>                 if (!can_unsync)
>                         return -EPERM;
>
> -               if (sp->unsync)
> +               if (sp->arch.unsync)
>                         continue;
>
>                 if (prefetch)
> @@ -2764,7 +2766,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>                          * for write, i.e. unsync cannot transition from 0->1
>                          * while this CPU holds mmu_lock for read (or write).
>                          */
> -                       if (READ_ONCE(sp->unsync))
> +                       if (READ_ONCE(sp->arch.unsync))
>                                 continue;
>                 }
>
> @@ -2796,8 +2798,8 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>          *                      2.2 Guest issues TLB flush.
>          *                          That causes a VM Exit.
>          *
> -        *                      2.3 Walking of unsync pages sees sp->unsync is
> -        *                          false and skips the page.
> +        *                      2.3 Walking of unsync pages sees sp->arch.unsync
> +        *                          is false and skips the page.
>          *
>          *                      2.4 Guest accesses GVA X.
>          *                          Since the mapping in the SP was not updated,
> @@ -2805,7 +2807,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>          *                          gets used.
>          * 1.1 Host marks SP
>          *     as unsync
> -        *     (sp->unsync = true)
> +        *     (sp->arch.unsync = true)
>          *
>          * The write barrier below ensures that 1.1 happens before 1.2 and thus
>          * the situation in 2.4 does not arise.  It pairs with the read barrier
> @@ -3126,7 +3128,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
>             cur_level == fault->goal_level &&
>             is_shadow_present_pte(spte) &&
>             !is_large_pte(spte) &&
> -           spte_to_child_sp(spte)->nx_huge_page_disallowed) {
> +           spte_to_child_sp(spte)->arch.nx_huge_page_disallowed) {
>                 /*
>                  * A small SPTE exists for this pfn, but FNAME(fetch),
>                  * direct_map(), or kvm_tdp_mmu_map() would like to create a
> @@ -3902,7 +3904,7 @@ static bool is_unsync_root(hpa_t root)
>
>         /*
>          * The read barrier orders the CPU's read of SPTE.W during the page table
> -        * walk before the reads of sp->unsync/sp->unsync_children here.
> +        * walk before the reads of sp->arch.{unsync,unsync_children} here.
>          *
>          * Even if another CPU was marking the SP as unsync-ed simultaneously,
>          * any guest page table changes are not guaranteed to be visible anyway
> @@ -3922,7 +3924,7 @@ static bool is_unsync_root(hpa_t root)
>         if (WARN_ON_ONCE(!sp))
>                 return false;
>
> -       if (sp->unsync || sp->unsync_children)
> +       if (sp->arch.unsync || sp->arch.unsync_children)
>                 return true;
>
>         return false;
> @@ -5510,8 +5512,8 @@ static bool detect_write_flooding(struct kvm_mmu_page *sp)
>         if (sp->role.level == PG_LEVEL_4K)
>                 return false;
>
> -       atomic_inc(&sp->write_flooding_count);
> -       return atomic_read(&sp->write_flooding_count) >= 3;
> +       atomic_inc(&sp->arch.write_flooding_count);
> +       return atomic_read(&sp->arch.write_flooding_count) >= 3;
>  }
>
>  /*
> @@ -6389,7 +6391,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
>                         continue;
>
>                 /* SPs with level >PG_LEVEL_4K should never by unsync. */
> -               if (WARN_ON_ONCE(sp->unsync))
> +               if (WARN_ON_ONCE(sp->arch.unsync))
>                         continue;
>
>                 /* Don't bother splitting huge pages on invalid SPs. */
> @@ -6941,8 +6943,8 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
>                  */
>                 sp = list_first_entry(&kvm->arch.possible_nx_huge_pages,
>                                       struct kvm_mmu_page,
> -                                     possible_nx_huge_page_link);
> -               WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
> +                                     arch.possible_nx_huge_page_link);
> +               WARN_ON_ONCE(!sp->arch.nx_huge_page_disallowed);
>                 WARN_ON_ONCE(!sp->role.arch.direct);
>
>                 /*
> @@ -6977,7 +6979,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
>                         flush |= kvm_tdp_mmu_zap_sp(kvm, sp);
>                 else
>                         kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
> -               WARN_ON_ONCE(sp->nx_huge_page_disallowed);
> +               WARN_ON_ONCE(sp->arch.nx_huge_page_disallowed);
>
>                 if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
>                         kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index fd4990c8b0e9..af2ae4887e35 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -44,89 +44,6 @@ extern bool dbg;
>  #define INVALID_PAE_ROOT       0
>  #define IS_VALID_PAE_ROOT(x)   (!!(x))
>
> -struct kvm_mmu_page {
> -       /*
> -        * Note, "link" through "spt" fit in a single 64 byte cache line on
> -        * 64-bit kernels, keep it that way unless there's a reason not to.
> -        */
> -       struct list_head link;
> -       struct hlist_node hash_link;
> -
> -       bool shadow_mmu_page;
> -       bool unsync;
> -       u8 mmu_valid_gen;
> -
> -        /*
> -         * The shadow page can't be replaced by an equivalent huge page
> -         * because it is being used to map an executable page in the guest
> -         * and the NX huge page mitigation is enabled.
> -         */
> -       bool nx_huge_page_disallowed;
> -
> -       /*
> -        * The following two entries are used to key the shadow page in the
> -        * hash table.
> -        */
> -       union kvm_mmu_page_role role;
> -       gfn_t gfn;
> -
> -       u64 *spt;
> -
> -       /*
> -        * Stores the result of the guest translation being shadowed by each
> -        * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
> -        * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
> -        * cases the result of the translation is a GPA and a set of access
> -        * constraints.
> -        *
> -        * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
> -        * access permissions are stored in the lower bits. Note, for
> -        * convenience and uniformity across guests, the access permissions are
> -        * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
> -        */
> -       u64 *shadowed_translation;
> -
> -       /* Currently serving as active root */
> -       refcount_t root_refcount;
> -
> -       unsigned int unsync_children;
> -       union {
> -               struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
> -               tdp_ptep_t ptep;
> -       };
> -       union {
> -               DECLARE_BITMAP(unsync_child_bitmap, 512);
> -               struct {
> -                       struct work_struct tdp_mmu_async_work;
> -                       void *tdp_mmu_async_data;
> -               };
> -       };
> -
> -       /*
> -        * Tracks shadow pages that, if zapped, would allow KVM to create an NX
> -        * huge page.  A shadow page will have nx_huge_page_disallowed set but
> -        * not be on the list if a huge page is disallowed for other reasons,
> -        * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
> -        * isn't properly aligned, etc...
> -        */
> -       struct list_head possible_nx_huge_page_link;
> -#ifdef CONFIG_X86_32
> -       /*
> -        * Used out of the mmu-lock to avoid reading spte values while an
> -        * update is in progress; see the comments in __get_spte_lockless().
> -        */
> -       int clear_spte_count;
> -#endif
> -
> -       /* Number of writes since the last time traversal visited this page.  */
> -       atomic_t write_flooding_count;
> -
> -#ifdef CONFIG_X86_64
> -       /* Used for freeing the page asynchronously if it is a TDP MMU page. */
> -       struct rcu_head rcu_head;
> -#endif
> -};
> -
>  extern struct kmem_cache *mmu_page_header_cache;
>
>  static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index ffd10ce3eae3..335f26dabdf3 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -16,11 +16,11 @@
>         __field(bool, unsync)
>
>  #define KVM_MMU_PAGE_ASSIGN(sp)                                \
> -       __entry->mmu_valid_gen = sp->mmu_valid_gen;     \
> +       __entry->mmu_valid_gen = sp->arch.mmu_valid_gen;        \
>         __entry->gfn = sp->gfn;                         \
>         __entry->role = sp->role.word;                  \
>         __entry->root_count = refcount_read(&sp->root_refcount);        \
> -       __entry->unsync = sp->unsync;
> +       __entry->unsync = sp->arch.unsync;
>
>  #define KVM_MMU_PAGE_PRINTK() ({                                       \
>         const char *saved_ptr = trace_seq_buffer_ptr(p);                \
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index e15ec1c473da..18bb92b70a01 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -671,7 +671,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>                          * KVM_REQ_MMU_SYNC is not necessary but it
>                          * expedites the process.
>                          */
> -                       if (sp->unsync_children &&
> +                       if (sp->arch.unsync_children &&
>                             mmu_sync_children(vcpu, sp, false))
>                                 return RET_PF_RETRY;
>                 }
> @@ -921,7 +921,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
>                         pt_element_t gpte;
>                         gpa_t pte_gpa;
>
> -                       if (!sp->unsync)
> +                       if (!sp->arch.unsync)
>                                 break;
>
>                         pte_gpa = FNAME(get_level1_sp_gpa)(sp);
> @@ -942,7 +942,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
>                         FNAME(prefetch_gpte)(vcpu, sp, sptep, gpte, false);
>                 }
>
> -               if (!sp->unsync_children)
> +               if (!sp->arch.unsync_children)
>                         break;
>         }
>         write_unlock(&vcpu->kvm->mmu_lock);
> @@ -974,8 +974,8 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  }
>
>  /*
> - * Using the information in sp->shadowed_translation (kvm_mmu_page_get_gfn()) is
> - * safe because:
> + * Using the information in sp->arch.shadowed_translation
> + * (kvm_mmu_page_get_gfn()) is safe because:
>   * - The spte has a reference to the struct page, so the pfn for a given gfn
>   *   can't change unless all sptes pointing to it are nuked first.
>   *
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 34d674080170..66231c7ed31e 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -270,7 +270,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
>  static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
>                             gfn_t gfn, union kvm_mmu_page_role role)
>  {
> -       INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
> +       INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
>
>         set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
>
> @@ -385,7 +385,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
>  {
>         tdp_unaccount_mmu_page(kvm, sp);
>
> -       if (!sp->nx_huge_page_disallowed)
> +       if (!sp->arch.nx_huge_page_disallowed)
>                 return;
>
>         if (shared)
> @@ -393,7 +393,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
>         else
>                 lockdep_assert_held_write(&kvm->mmu_lock);
>
> -       sp->nx_huge_page_disallowed = false;
> +       sp->arch.nx_huge_page_disallowed = false;
>         untrack_possible_nx_huge_page(kvm, sp);
>
>         if (shared)
> @@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                 sp = tdp_mmu_alloc_sp(vcpu);
>                 tdp_mmu_init_child_sp(sp, &iter);
>
> -               sp->nx_huge_page_disallowed = fault->huge_page_disallowed;
> +               sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
>
>                 if (is_shadow_present_pte(iter.old_spte))
>                         r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index 19d3153051a3..e6a929089715 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -73,7 +73,7 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
>  #ifdef CONFIG_X86_64
>  static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
>  {
> -       return !sp->shadow_mmu_page;
> +       return !sp->arch.shadow_mmu_page;
>  }
>  #else
>  static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
> diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> index 14099956fdac..a9da33d4baa8 100644
> --- a/include/kvm/mmu_types.h
> +++ b/include/kvm/mmu_types.h
> @@ -3,8 +3,11 @@
>  #define __KVM_MMU_TYPES_H
>
>  #include <linux/bug.h>
> -#include <linux/types.h>
> +#include <linux/kvm_types.h>
> +#include <linux/refcount.h>
>  #include <linux/stddef.h>
> +#include <linux/types.h>
> +#include <linux/workqueue.h>
>
>  #include <asm/kvm/mmu_types.h>
>
> @@ -36,4 +39,31 @@ static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page
>
>  typedef u64 __rcu *tdp_ptep_t;
>
> +struct kvm_mmu_page {
> +       struct list_head link;
> +
> +       union kvm_mmu_page_role role;
> +
> +       /* The start of the GFN region mapped by this shadow page. */
> +       gfn_t gfn;

I'd like to put in a vote for getting rid of the "shadow page" /
"shadow page table entry" terminology through this refactor.
I can't count the number of times folks trying to understand the MMU
have gotten confused by the overloaded terminology.

> +
> +       /* The page table page. */
> +       u64 *spt;
> +
> +       /* The PTE that points to this shadow page. */
> +       tdp_ptep_t ptep;

Totally fine to not change this, but something like parent_ptep would
be clearer. The Shadow MMU uses parent_pteps, which I think is much
clearer.
Could also change the comment to just:
/* The PTE that points to this kvm_mmu_page. */

> +
> +       /* Used for freeing TDP MMU pages asynchronously. */
> +       struct rcu_head rcu_head;
> +
> +       /* The number of references to this shadow page as a root. */
> +       refcount_t root_refcount;
> +
> +       /* Used for tearing down an entire page table tree. */
> +       struct work_struct tdp_mmu_async_work;
> +       void *tdp_mmu_async_data;
> +
> +       struct kvm_mmu_page_arch arch;
> +};
> +
>  #endif /* !__KVM_MMU_TYPES_H */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
@ 2022-12-12 18:07     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 18:07 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_mmu_page to common code and all x86-specific fields into
> kvm_mmu_page_arch.
>
> This commit increases the size of struct kvm_mmu_page by 64 bytes on
> x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
> in future commits by moving TDP MMU root fields into a separate struct
> and by dynamically allocating fields only used by the Shadow MMU.
>
> No functional change intended.
>
> Signed-off-by: David Matlack <dmatlack@google.com>

I haven't reviewed every line of the mechanical refactor of adding
"arch." all over the place, but at a high level, this looks like a
good split on struct kvm_mmu_page.

> ---
>  arch/x86/include/asm/kvm/mmu_types.h |  62 ++++++++++++++
>  arch/x86/include/asm/kvm_host.h      |   4 -
>  arch/x86/kvm/mmu/mmu.c               | 122 ++++++++++++++-------------
>  arch/x86/kvm/mmu/mmu_internal.h      |  83 ------------------
>  arch/x86/kvm/mmu/mmutrace.h          |   4 +-
>  arch/x86/kvm/mmu/paging_tmpl.h       |  10 +--
>  arch/x86/kvm/mmu/tdp_mmu.c           |   8 +-
>  arch/x86/kvm/mmu/tdp_mmu.h           |   2 +-
>  include/kvm/mmu_types.h              |  32 ++++++-
>  9 files changed, 167 insertions(+), 160 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
> index 35f893ebab5a..affcb520b482 100644
> --- a/arch/x86/include/asm/kvm/mmu_types.h
> +++ b/arch/x86/include/asm/kvm/mmu_types.h
> @@ -2,6 +2,8 @@
>  #ifndef __ASM_KVM_MMU_TYPES_H
>  #define __ASM_KVM_MMU_TYPES_H
>
> +#include <linux/bitmap.h>
> +#include <linux/list.h>
>  #include <linux/types.h>
>
>  /*
> @@ -53,4 +55,64 @@ struct kvm_mmu_page_role_arch {
>         u16 passthrough:1;
>  };
>
> +struct kvm_rmap_head {
> +       unsigned long val;
> +};
> +
> +struct kvm_mmu_page_arch {
> +       struct hlist_node hash_link;
> +
> +       bool shadow_mmu_page;
> +       bool unsync;
> +       u8 mmu_valid_gen;
> +
> +        /*
> +         * The shadow page can't be replaced by an equivalent huge page
> +         * because it is being used to map an executable page in the guest
> +         * and the NX huge page mitigation is enabled.
> +         */
> +       bool nx_huge_page_disallowed;
> +
> +       /*
> +        * Stores the result of the guest translation being shadowed by each
> +        * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
> +        * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
> +        * cases the result of the translation is a GPA and a set of access
> +        * constraints.
> +        *
> +        * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
> +        * access permissions are stored in the lower bits. Note, for
> +        * convenience and uniformity across guests, the access permissions are
> +        * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
> +        */
> +       u64 *shadowed_translation;
> +
> +       unsigned int unsync_children;
> +
> +       /* Rmap pointers to all parent sptes. */
> +       struct kvm_rmap_head parent_ptes;
> +
> +       DECLARE_BITMAP(unsync_child_bitmap, 512);
> +
> +       /*
> +        * Tracks shadow pages that, if zapped, would allow KVM to create an NX
> +        * huge page.  A shadow page will have nx_huge_page_disallowed set but
> +        * not be on the list if a huge page is disallowed for other reasons,
> +        * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
> +        * isn't properly aligned, etc...
> +        */
> +       struct list_head possible_nx_huge_page_link;
> +
> +#ifdef CONFIG_X86_32
> +       /*
> +        * Used out of the mmu-lock to avoid reading spte values while an
> +        * update is in progress; see the comments in __get_spte_lockless().
> +        */
> +       int clear_spte_count;
> +#endif
> +
> +       /* Number of writes since the last time traversal visited this page.  */
> +       atomic_t write_flooding_count;
> +};
> +
>  #endif /* !__ASM_KVM_MMU_TYPES_H */
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ebcd7a0dabef..f5743a652e10 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -329,10 +329,6 @@ union kvm_cpu_role {
>         };
>  };
>
> -struct kvm_rmap_head {
> -       unsigned long val;
> -};
> -
>  struct kvm_pio_request {
>         unsigned long linear_rip;
>         unsigned long count;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 11cef930d5ed..e47f35878ab5 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -350,7 +350,7 @@ static void count_spte_clear(u64 *sptep, u64 spte)
>
>         /* Ensure the spte is completely set before we increase the count */
>         smp_wmb();
> -       sp->clear_spte_count++;
> +       sp->arch.clear_spte_count++;
>  }
>
>  static void __set_spte(u64 *sptep, u64 spte)
> @@ -432,7 +432,7 @@ static u64 __get_spte_lockless(u64 *sptep)
>         int count;
>
>  retry:
> -       count = sp->clear_spte_count;
> +       count = sp->arch.clear_spte_count;
>         smp_rmb();
>
>         spte.spte_low = orig->spte_low;
> @@ -442,7 +442,7 @@ static u64 __get_spte_lockless(u64 *sptep)
>         smp_rmb();
>
>         if (unlikely(spte.spte_low != orig->spte_low ||
> -             count != sp->clear_spte_count))
> +             count != sp->arch.clear_spte_count))
>                 goto retry;
>
>         return spte.spte;
> @@ -699,7 +699,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>                 return sp->gfn;
>
>         if (!sp->role.arch.direct)
> -               return sp->shadowed_translation[index] >> PAGE_SHIFT;
> +               return sp->arch.shadowed_translation[index] >> PAGE_SHIFT;
>
>         return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
>  }
> @@ -713,7 +713,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>  static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
>  {
>         if (sp_has_gptes(sp))
> -               return sp->shadowed_translation[index] & ACC_ALL;
> +               return sp->arch.shadowed_translation[index] & ACC_ALL;
>
>         /*
>          * For direct MMUs (e.g. TDP or non-paging guests) or passthrough SPs,
> @@ -734,7 +734,7 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
>                                          gfn_t gfn, unsigned int access)
>  {
>         if (sp_has_gptes(sp)) {
> -               sp->shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
> +               sp->arch.shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
>                 return;
>         }
>
> @@ -825,18 +825,18 @@ void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>          * on the list if KVM is reusing an existing shadow page, i.e. if KVM
>          * links a shadow page at multiple points.
>          */
> -       if (!list_empty(&sp->possible_nx_huge_page_link))
> +       if (!list_empty(&sp->arch.possible_nx_huge_page_link))
>                 return;
>
>         ++kvm->stat.nx_lpage_splits;
> -       list_add_tail(&sp->possible_nx_huge_page_link,
> +       list_add_tail(&sp->arch.possible_nx_huge_page_link,
>                       &kvm->arch.possible_nx_huge_pages);
>  }
>
>  static void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
>                                  bool nx_huge_page_possible)
>  {
> -       sp->nx_huge_page_disallowed = true;
> +       sp->arch.nx_huge_page_disallowed = true;
>
>         if (nx_huge_page_possible)
>                 track_possible_nx_huge_page(kvm, sp);
> @@ -861,16 +861,16 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
>
>  void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> -       if (list_empty(&sp->possible_nx_huge_page_link))
> +       if (list_empty(&sp->arch.possible_nx_huge_page_link))
>                 return;
>
>         --kvm->stat.nx_lpage_splits;
> -       list_del_init(&sp->possible_nx_huge_page_link);
> +       list_del_init(&sp->arch.possible_nx_huge_page_link);
>  }
>
>  static void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> -       sp->nx_huge_page_disallowed = false;
> +       sp->arch.nx_huge_page_disallowed = false;
>
>         untrack_possible_nx_huge_page(kvm, sp);
>  }
> @@ -1720,11 +1720,11 @@ static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
>  {
>         MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
> -       hlist_del(&sp->hash_link);
> +       hlist_del(&sp->arch.hash_link);
>         list_del(&sp->link);
>         free_page((unsigned long)sp->spt);
>         if (!sp->role.arch.direct)
> -               free_page((unsigned long)sp->shadowed_translation);
> +               free_page((unsigned long)sp->arch.shadowed_translation);
>         kmem_cache_free(mmu_page_header_cache, sp);
>  }
>
> @@ -1739,13 +1739,13 @@ static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache,
>         if (!parent_pte)
>                 return;
>
> -       pte_list_add(cache, parent_pte, &sp->parent_ptes);
> +       pte_list_add(cache, parent_pte, &sp->arch.parent_ptes);
>  }
>
>  static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
>                                        u64 *parent_pte)
>  {
> -       pte_list_remove(parent_pte, &sp->parent_ptes);
> +       pte_list_remove(parent_pte, &sp->arch.parent_ptes);
>  }
>
>  static void drop_parent_pte(struct kvm_mmu_page *sp,
> @@ -1761,7 +1761,7 @@ static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
>         u64 *sptep;
>         struct rmap_iterator iter;
>
> -       for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
> +       for_each_rmap_spte(&sp->arch.parent_ptes, &iter, sptep) {
>                 mark_unsync(sptep);
>         }
>  }
> @@ -1771,9 +1771,9 @@ static void mark_unsync(u64 *spte)
>         struct kvm_mmu_page *sp;
>
>         sp = sptep_to_sp(spte);
> -       if (__test_and_set_bit(spte_index(spte), sp->unsync_child_bitmap))
> +       if (__test_and_set_bit(spte_index(spte), sp->arch.unsync_child_bitmap))
>                 return;
> -       if (sp->unsync_children++)
> +       if (sp->arch.unsync_children++)
>                 return;
>         kvm_mmu_mark_parents_unsync(sp);
>  }
> @@ -1799,7 +1799,7 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
>  {
>         int i;
>
> -       if (sp->unsync)
> +       if (sp->arch.unsync)
>                 for (i=0; i < pvec->nr; i++)
>                         if (pvec->page[i].sp == sp)
>                                 return 0;
> @@ -1812,9 +1812,9 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
>
>  static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
>  {
> -       --sp->unsync_children;
> -       WARN_ON((int)sp->unsync_children < 0);
> -       __clear_bit(idx, sp->unsync_child_bitmap);
> +       --sp->arch.unsync_children;
> +       WARN_ON((int)sp->arch.unsync_children < 0);
> +       __clear_bit(idx, sp->arch.unsync_child_bitmap);
>  }
>
>  static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
> @@ -1822,7 +1822,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>  {
>         int i, ret, nr_unsync_leaf = 0;
>
> -       for_each_set_bit(i, sp->unsync_child_bitmap, 512) {
> +       for_each_set_bit(i, sp->arch.unsync_child_bitmap, 512) {
>                 struct kvm_mmu_page *child;
>                 u64 ent = sp->spt[i];
>
> @@ -1833,7 +1833,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>
>                 child = spte_to_child_sp(ent);
>
> -               if (child->unsync_children) {
> +               if (child->arch.unsync_children) {
>                         if (mmu_pages_add(pvec, child, i))
>                                 return -ENOSPC;
>
> @@ -1845,7 +1845,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>                                 nr_unsync_leaf += ret;
>                         } else
>                                 return ret;
> -               } else if (child->unsync) {
> +               } else if (child->arch.unsync) {
>                         nr_unsync_leaf++;
>                         if (mmu_pages_add(pvec, child, i))
>                                 return -ENOSPC;
> @@ -1862,7 +1862,7 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
>                            struct kvm_mmu_pages *pvec)
>  {
>         pvec->nr = 0;
> -       if (!sp->unsync_children)
> +       if (!sp->arch.unsync_children)
>                 return 0;
>
>         mmu_pages_add(pvec, sp, INVALID_INDEX);
> @@ -1871,9 +1871,9 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
>
>  static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> -       WARN_ON(!sp->unsync);
> +       WARN_ON(!sp->arch.unsync);
>         trace_kvm_mmu_sync_page(sp);
> -       sp->unsync = 0;
> +       sp->arch.unsync = 0;
>         --kvm->stat.mmu_unsync;
>  }
>
> @@ -1894,7 +1894,7 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
>  }
>
>  #define for_each_valid_sp(_kvm, _sp, _list)                            \
> -       hlist_for_each_entry(_sp, _list, hash_link)                     \
> +       hlist_for_each_entry(_sp, _list, arch.hash_link)                        \
>                 if (is_obsolete_sp((_kvm), (_sp))) {                    \
>                 } else
>
> @@ -1934,7 +1934,7 @@ static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>
>         /* TDP MMU pages do not use the MMU generation. */
>         return !is_tdp_mmu_page(sp) &&
> -              unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
> +              unlikely(sp->arch.mmu_valid_gen != kvm->arch.mmu_valid_gen);
>  }
>
>  struct mmu_page_path {
> @@ -2006,7 +2006,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
>                 WARN_ON(idx == INVALID_INDEX);
>                 clear_unsync_child_bit(sp, idx);
>                 level++;
> -       } while (!sp->unsync_children);
> +       } while (!sp->arch.unsync_children);
>  }
>
>  static int mmu_sync_children(struct kvm_vcpu *vcpu,
> @@ -2053,7 +2053,7 @@ static int mmu_sync_children(struct kvm_vcpu *vcpu,
>
>  static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
>  {
> -       atomic_set(&sp->write_flooding_count,  0);
> +       atomic_set(&sp->arch.write_flooding_count,  0);
>  }
>
>  static void clear_sp_write_flooding_count(u64 *spte)
> @@ -2094,7 +2094,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                          * Unsync pages must not be left as is, because the new
>                          * upper-level page will be write-protected.
>                          */
> -                       if (role.level > PG_LEVEL_4K && sp->unsync)
> +                       if (role.level > PG_LEVEL_4K && sp->arch.unsync)
>                                 kvm_mmu_prepare_zap_page(kvm, sp,
>                                                          &invalid_list);
>                         continue;
> @@ -2104,7 +2104,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                 if (sp->role.arch.direct)
>                         goto out;
>
> -               if (sp->unsync) {
> +               if (sp->arch.unsync) {
>                         if (KVM_BUG_ON(!vcpu, kvm))
>                                 break;
>
> @@ -2163,25 +2163,26 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>         sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
>         sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
>         if (!role.arch.direct)
> -               sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
> +               sp->arch.shadowed_translation =
> +                       kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
>
>         set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
>
> -       INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
> +       INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
>
>         /*
>          * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
>          * depends on valid pages being added to the head of the list.  See
>          * comments in kvm_zap_obsolete_pages().
>          */
> -       sp->mmu_valid_gen = kvm->arch.mmu_valid_gen;
> +       sp->arch.mmu_valid_gen = kvm->arch.mmu_valid_gen;
>         list_add(&sp->link, &kvm->arch.active_mmu_pages);
>         kvm_account_mmu_page(kvm, sp);
>
>         sp->gfn = gfn;
>         sp->role = role;
> -       sp->shadow_mmu_page = true;
> -       hlist_add_head(&sp->hash_link, sp_list);
> +       sp->arch.shadow_mmu_page = true;
> +       hlist_add_head(&sp->arch.hash_link, sp_list);
>         if (sp_has_gptes(sp))
>                 account_shadowed(kvm, sp);
>
> @@ -2368,7 +2369,7 @@ static void __link_shadow_page(struct kvm *kvm,
>
>         mmu_page_add_parent_pte(cache, sp, sptep);
>
> -       if (sp->unsync_children || sp->unsync)
> +       if (sp->arch.unsync_children || sp->arch.unsync)
>                 mark_unsync(sptep);
>  }
>
> @@ -2421,7 +2422,8 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
>                          * avoids retaining a large number of stale nested SPs.
>                          */
>                         if (tdp_enabled && invalid_list &&
> -                           child->role.arch.guest_mode && !child->parent_ptes.val)
> +                           child->role.arch.guest_mode &&
> +                           !child->arch.parent_ptes.val)
>                                 return kvm_mmu_prepare_zap_page(kvm, child,
>                                                                 invalid_list);
>                 }
> @@ -2449,7 +2451,7 @@ static void kvm_mmu_unlink_parents(struct kvm_mmu_page *sp)
>         u64 *sptep;
>         struct rmap_iterator iter;
>
> -       while ((sptep = rmap_get_first(&sp->parent_ptes, &iter)))
> +       while ((sptep = rmap_get_first(&sp->arch.parent_ptes, &iter)))
>                 drop_parent_pte(sp, sptep);
>  }
>
> @@ -2496,7 +2498,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
>         if (!sp->role.invalid && sp_has_gptes(sp))
>                 unaccount_shadowed(kvm, sp);
>
> -       if (sp->unsync)
> +       if (sp->arch.unsync)
>                 kvm_unlink_unsync_page(kvm, sp);
>         if (!refcount_read(&sp->root_refcount)) {
>                 /* Count self */
> @@ -2527,7 +2529,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
>                 zapped_root = !is_obsolete_sp(kvm, sp);
>         }
>
> -       if (sp->nx_huge_page_disallowed)
> +       if (sp->arch.nx_huge_page_disallowed)
>                 unaccount_nx_huge_page(kvm, sp);
>
>         sp->role.invalid = 1;
> @@ -2704,7 +2706,7 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
>         trace_kvm_mmu_unsync_page(sp);
>         ++kvm->stat.mmu_unsync;
> -       sp->unsync = 1;
> +       sp->arch.unsync = 1;
>
>         kvm_mmu_mark_parents_unsync(sp);
>  }
> @@ -2739,7 +2741,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>                 if (!can_unsync)
>                         return -EPERM;
>
> -               if (sp->unsync)
> +               if (sp->arch.unsync)
>                         continue;
>
>                 if (prefetch)
> @@ -2764,7 +2766,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>                          * for write, i.e. unsync cannot transition from 0->1
>                          * while this CPU holds mmu_lock for read (or write).
>                          */
> -                       if (READ_ONCE(sp->unsync))
> +                       if (READ_ONCE(sp->arch.unsync))
>                                 continue;
>                 }
>
> @@ -2796,8 +2798,8 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>          *                      2.2 Guest issues TLB flush.
>          *                          That causes a VM Exit.
>          *
> -        *                      2.3 Walking of unsync pages sees sp->unsync is
> -        *                          false and skips the page.
> +        *                      2.3 Walking of unsync pages sees sp->arch.unsync
> +        *                          is false and skips the page.
>          *
>          *                      2.4 Guest accesses GVA X.
>          *                          Since the mapping in the SP was not updated,
> @@ -2805,7 +2807,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>          *                          gets used.
>          * 1.1 Host marks SP
>          *     as unsync
> -        *     (sp->unsync = true)
> +        *     (sp->arch.unsync = true)
>          *
>          * The write barrier below ensures that 1.1 happens before 1.2 and thus
>          * the situation in 2.4 does not arise.  It pairs with the read barrier
> @@ -3126,7 +3128,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
>             cur_level == fault->goal_level &&
>             is_shadow_present_pte(spte) &&
>             !is_large_pte(spte) &&
> -           spte_to_child_sp(spte)->nx_huge_page_disallowed) {
> +           spte_to_child_sp(spte)->arch.nx_huge_page_disallowed) {
>                 /*
>                  * A small SPTE exists for this pfn, but FNAME(fetch),
>                  * direct_map(), or kvm_tdp_mmu_map() would like to create a
> @@ -3902,7 +3904,7 @@ static bool is_unsync_root(hpa_t root)
>
>         /*
>          * The read barrier orders the CPU's read of SPTE.W during the page table
> -        * walk before the reads of sp->unsync/sp->unsync_children here.
> +        * walk before the reads of sp->arch.{unsync,unsync_children} here.
>          *
>          * Even if another CPU was marking the SP as unsync-ed simultaneously,
>          * any guest page table changes are not guaranteed to be visible anyway
> @@ -3922,7 +3924,7 @@ static bool is_unsync_root(hpa_t root)
>         if (WARN_ON_ONCE(!sp))
>                 return false;
>
> -       if (sp->unsync || sp->unsync_children)
> +       if (sp->arch.unsync || sp->arch.unsync_children)
>                 return true;
>
>         return false;
> @@ -5510,8 +5512,8 @@ static bool detect_write_flooding(struct kvm_mmu_page *sp)
>         if (sp->role.level == PG_LEVEL_4K)
>                 return false;
>
> -       atomic_inc(&sp->write_flooding_count);
> -       return atomic_read(&sp->write_flooding_count) >= 3;
> +       atomic_inc(&sp->arch.write_flooding_count);
> +       return atomic_read(&sp->arch.write_flooding_count) >= 3;
>  }
>
>  /*
> @@ -6389,7 +6391,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
>                         continue;
>
>                 /* SPs with level >PG_LEVEL_4K should never by unsync. */
> -               if (WARN_ON_ONCE(sp->unsync))
> +               if (WARN_ON_ONCE(sp->arch.unsync))
>                         continue;
>
>                 /* Don't bother splitting huge pages on invalid SPs. */
> @@ -6941,8 +6943,8 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
>                  */
>                 sp = list_first_entry(&kvm->arch.possible_nx_huge_pages,
>                                       struct kvm_mmu_page,
> -                                     possible_nx_huge_page_link);
> -               WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
> +                                     arch.possible_nx_huge_page_link);
> +               WARN_ON_ONCE(!sp->arch.nx_huge_page_disallowed);
>                 WARN_ON_ONCE(!sp->role.arch.direct);
>
>                 /*
> @@ -6977,7 +6979,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
>                         flush |= kvm_tdp_mmu_zap_sp(kvm, sp);
>                 else
>                         kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
> -               WARN_ON_ONCE(sp->nx_huge_page_disallowed);
> +               WARN_ON_ONCE(sp->arch.nx_huge_page_disallowed);
>
>                 if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
>                         kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index fd4990c8b0e9..af2ae4887e35 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -44,89 +44,6 @@ extern bool dbg;
>  #define INVALID_PAE_ROOT       0
>  #define IS_VALID_PAE_ROOT(x)   (!!(x))
>
> -struct kvm_mmu_page {
> -       /*
> -        * Note, "link" through "spt" fit in a single 64 byte cache line on
> -        * 64-bit kernels, keep it that way unless there's a reason not to.
> -        */
> -       struct list_head link;
> -       struct hlist_node hash_link;
> -
> -       bool shadow_mmu_page;
> -       bool unsync;
> -       u8 mmu_valid_gen;
> -
> -        /*
> -         * The shadow page can't be replaced by an equivalent huge page
> -         * because it is being used to map an executable page in the guest
> -         * and the NX huge page mitigation is enabled.
> -         */
> -       bool nx_huge_page_disallowed;
> -
> -       /*
> -        * The following two entries are used to key the shadow page in the
> -        * hash table.
> -        */
> -       union kvm_mmu_page_role role;
> -       gfn_t gfn;
> -
> -       u64 *spt;
> -
> -       /*
> -        * Stores the result of the guest translation being shadowed by each
> -        * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
> -        * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
> -        * cases the result of the translation is a GPA and a set of access
> -        * constraints.
> -        *
> -        * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
> -        * access permissions are stored in the lower bits. Note, for
> -        * convenience and uniformity across guests, the access permissions are
> -        * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
> -        */
> -       u64 *shadowed_translation;
> -
> -       /* Currently serving as active root */
> -       refcount_t root_refcount;
> -
> -       unsigned int unsync_children;
> -       union {
> -               struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
> -               tdp_ptep_t ptep;
> -       };
> -       union {
> -               DECLARE_BITMAP(unsync_child_bitmap, 512);
> -               struct {
> -                       struct work_struct tdp_mmu_async_work;
> -                       void *tdp_mmu_async_data;
> -               };
> -       };
> -
> -       /*
> -        * Tracks shadow pages that, if zapped, would allow KVM to create an NX
> -        * huge page.  A shadow page will have nx_huge_page_disallowed set but
> -        * not be on the list if a huge page is disallowed for other reasons,
> -        * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
> -        * isn't properly aligned, etc...
> -        */
> -       struct list_head possible_nx_huge_page_link;
> -#ifdef CONFIG_X86_32
> -       /*
> -        * Used out of the mmu-lock to avoid reading spte values while an
> -        * update is in progress; see the comments in __get_spte_lockless().
> -        */
> -       int clear_spte_count;
> -#endif
> -
> -       /* Number of writes since the last time traversal visited this page.  */
> -       atomic_t write_flooding_count;
> -
> -#ifdef CONFIG_X86_64
> -       /* Used for freeing the page asynchronously if it is a TDP MMU page. */
> -       struct rcu_head rcu_head;
> -#endif
> -};
> -
>  extern struct kmem_cache *mmu_page_header_cache;
>
>  static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index ffd10ce3eae3..335f26dabdf3 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -16,11 +16,11 @@
>         __field(bool, unsync)
>
>  #define KVM_MMU_PAGE_ASSIGN(sp)                                \
> -       __entry->mmu_valid_gen = sp->mmu_valid_gen;     \
> +       __entry->mmu_valid_gen = sp->arch.mmu_valid_gen;        \
>         __entry->gfn = sp->gfn;                         \
>         __entry->role = sp->role.word;                  \
>         __entry->root_count = refcount_read(&sp->root_refcount);        \
> -       __entry->unsync = sp->unsync;
> +       __entry->unsync = sp->arch.unsync;
>
>  #define KVM_MMU_PAGE_PRINTK() ({                                       \
>         const char *saved_ptr = trace_seq_buffer_ptr(p);                \
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index e15ec1c473da..18bb92b70a01 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -671,7 +671,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>                          * KVM_REQ_MMU_SYNC is not necessary but it
>                          * expedites the process.
>                          */
> -                       if (sp->unsync_children &&
> +                       if (sp->arch.unsync_children &&
>                             mmu_sync_children(vcpu, sp, false))
>                                 return RET_PF_RETRY;
>                 }
> @@ -921,7 +921,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
>                         pt_element_t gpte;
>                         gpa_t pte_gpa;
>
> -                       if (!sp->unsync)
> +                       if (!sp->arch.unsync)
>                                 break;
>
>                         pte_gpa = FNAME(get_level1_sp_gpa)(sp);
> @@ -942,7 +942,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
>                         FNAME(prefetch_gpte)(vcpu, sp, sptep, gpte, false);
>                 }
>
> -               if (!sp->unsync_children)
> +               if (!sp->arch.unsync_children)
>                         break;
>         }
>         write_unlock(&vcpu->kvm->mmu_lock);
> @@ -974,8 +974,8 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  }
>
>  /*
> - * Using the information in sp->shadowed_translation (kvm_mmu_page_get_gfn()) is
> - * safe because:
> + * Using the information in sp->arch.shadowed_translation
> + * (kvm_mmu_page_get_gfn()) is safe because:
>   * - The spte has a reference to the struct page, so the pfn for a given gfn
>   *   can't change unless all sptes pointing to it are nuked first.
>   *
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 34d674080170..66231c7ed31e 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -270,7 +270,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
>  static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
>                             gfn_t gfn, union kvm_mmu_page_role role)
>  {
> -       INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
> +       INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
>
>         set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
>
> @@ -385,7 +385,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
>  {
>         tdp_unaccount_mmu_page(kvm, sp);
>
> -       if (!sp->nx_huge_page_disallowed)
> +       if (!sp->arch.nx_huge_page_disallowed)
>                 return;
>
>         if (shared)
> @@ -393,7 +393,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
>         else
>                 lockdep_assert_held_write(&kvm->mmu_lock);
>
> -       sp->nx_huge_page_disallowed = false;
> +       sp->arch.nx_huge_page_disallowed = false;
>         untrack_possible_nx_huge_page(kvm, sp);
>
>         if (shared)
> @@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                 sp = tdp_mmu_alloc_sp(vcpu);
>                 tdp_mmu_init_child_sp(sp, &iter);
>
> -               sp->nx_huge_page_disallowed = fault->huge_page_disallowed;
> +               sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
>
>                 if (is_shadow_present_pte(iter.old_spte))
>                         r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index 19d3153051a3..e6a929089715 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -73,7 +73,7 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
>  #ifdef CONFIG_X86_64
>  static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
>  {
> -       return !sp->shadow_mmu_page;
> +       return !sp->arch.shadow_mmu_page;
>  }
>  #else
>  static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
> diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> index 14099956fdac..a9da33d4baa8 100644
> --- a/include/kvm/mmu_types.h
> +++ b/include/kvm/mmu_types.h
> @@ -3,8 +3,11 @@
>  #define __KVM_MMU_TYPES_H
>
>  #include <linux/bug.h>
> -#include <linux/types.h>
> +#include <linux/kvm_types.h>
> +#include <linux/refcount.h>
>  #include <linux/stddef.h>
> +#include <linux/types.h>
> +#include <linux/workqueue.h>
>
>  #include <asm/kvm/mmu_types.h>
>
> @@ -36,4 +39,31 @@ static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page
>
>  typedef u64 __rcu *tdp_ptep_t;
>
> +struct kvm_mmu_page {
> +       struct list_head link;
> +
> +       union kvm_mmu_page_role role;
> +
> +       /* The start of the GFN region mapped by this shadow page. */
> +       gfn_t gfn;

I'd like to put in a vote for getting rid of the "shadow page" /
"shadow page table entry" terminology through this refactor.
I can't count the number of times folks trying to understand the MMU
have gotten confused by the overloaded terminology.

> +
> +       /* The page table page. */
> +       u64 *spt;
> +
> +       /* The PTE that points to this shadow page. */
> +       tdp_ptep_t ptep;

Totally fine to not change this, but something like parent_ptep would
be clearer. The Shadow MMU uses parent_pteps, which I think is much
clearer.
Could also change the comment to just:
/* The PTE that points to this kvm_mmu_page. */

> +
> +       /* Used for freeing TDP MMU pages asynchronously. */
> +       struct rcu_head rcu_head;
> +
> +       /* The number of references to this shadow page as a root. */
> +       refcount_t root_refcount;
> +
> +       /* Used for tearing down an entire page table tree. */
> +       struct work_struct tdp_mmu_async_work;
> +       void *tdp_mmu_async_data;
> +
> +       struct kvm_mmu_page_arch arch;
> +};
> +
>  #endif /* !__KVM_MMU_TYPES_H */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
@ 2022-12-12 18:07     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 18:07 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_mmu_page to common code and all x86-specific fields into
> kvm_mmu_page_arch.
>
> This commit increases the size of struct kvm_mmu_page by 64 bytes on
> x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
> in future commits by moving TDP MMU root fields into a separate struct
> and by dynamically allocating fields only used by the Shadow MMU.
>
> No functional change intended.
>
> Signed-off-by: David Matlack <dmatlack@google.com>

I haven't reviewed every line of the mechanical refactor of adding
"arch." all over the place, but at a high level, this looks like a
good split on struct kvm_mmu_page.

> ---
>  arch/x86/include/asm/kvm/mmu_types.h |  62 ++++++++++++++
>  arch/x86/include/asm/kvm_host.h      |   4 -
>  arch/x86/kvm/mmu/mmu.c               | 122 ++++++++++++++-------------
>  arch/x86/kvm/mmu/mmu_internal.h      |  83 ------------------
>  arch/x86/kvm/mmu/mmutrace.h          |   4 +-
>  arch/x86/kvm/mmu/paging_tmpl.h       |  10 +--
>  arch/x86/kvm/mmu/tdp_mmu.c           |   8 +-
>  arch/x86/kvm/mmu/tdp_mmu.h           |   2 +-
>  include/kvm/mmu_types.h              |  32 ++++++-
>  9 files changed, 167 insertions(+), 160 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
> index 35f893ebab5a..affcb520b482 100644
> --- a/arch/x86/include/asm/kvm/mmu_types.h
> +++ b/arch/x86/include/asm/kvm/mmu_types.h
> @@ -2,6 +2,8 @@
>  #ifndef __ASM_KVM_MMU_TYPES_H
>  #define __ASM_KVM_MMU_TYPES_H
>
> +#include <linux/bitmap.h>
> +#include <linux/list.h>
>  #include <linux/types.h>
>
>  /*
> @@ -53,4 +55,64 @@ struct kvm_mmu_page_role_arch {
>         u16 passthrough:1;
>  };
>
> +struct kvm_rmap_head {
> +       unsigned long val;
> +};
> +
> +struct kvm_mmu_page_arch {
> +       struct hlist_node hash_link;
> +
> +       bool shadow_mmu_page;
> +       bool unsync;
> +       u8 mmu_valid_gen;
> +
> +        /*
> +         * The shadow page can't be replaced by an equivalent huge page
> +         * because it is being used to map an executable page in the guest
> +         * and the NX huge page mitigation is enabled.
> +         */
> +       bool nx_huge_page_disallowed;
> +
> +       /*
> +        * Stores the result of the guest translation being shadowed by each
> +        * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
> +        * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
> +        * cases the result of the translation is a GPA and a set of access
> +        * constraints.
> +        *
> +        * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
> +        * access permissions are stored in the lower bits. Note, for
> +        * convenience and uniformity across guests, the access permissions are
> +        * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
> +        */
> +       u64 *shadowed_translation;
> +
> +       unsigned int unsync_children;
> +
> +       /* Rmap pointers to all parent sptes. */
> +       struct kvm_rmap_head parent_ptes;
> +
> +       DECLARE_BITMAP(unsync_child_bitmap, 512);
> +
> +       /*
> +        * Tracks shadow pages that, if zapped, would allow KVM to create an NX
> +        * huge page.  A shadow page will have nx_huge_page_disallowed set but
> +        * not be on the list if a huge page is disallowed for other reasons,
> +        * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
> +        * isn't properly aligned, etc...
> +        */
> +       struct list_head possible_nx_huge_page_link;
> +
> +#ifdef CONFIG_X86_32
> +       /*
> +        * Used out of the mmu-lock to avoid reading spte values while an
> +        * update is in progress; see the comments in __get_spte_lockless().
> +        */
> +       int clear_spte_count;
> +#endif
> +
> +       /* Number of writes since the last time traversal visited this page.  */
> +       atomic_t write_flooding_count;
> +};
> +
>  #endif /* !__ASM_KVM_MMU_TYPES_H */
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ebcd7a0dabef..f5743a652e10 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -329,10 +329,6 @@ union kvm_cpu_role {
>         };
>  };
>
> -struct kvm_rmap_head {
> -       unsigned long val;
> -};
> -
>  struct kvm_pio_request {
>         unsigned long linear_rip;
>         unsigned long count;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 11cef930d5ed..e47f35878ab5 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -350,7 +350,7 @@ static void count_spte_clear(u64 *sptep, u64 spte)
>
>         /* Ensure the spte is completely set before we increase the count */
>         smp_wmb();
> -       sp->clear_spte_count++;
> +       sp->arch.clear_spte_count++;
>  }
>
>  static void __set_spte(u64 *sptep, u64 spte)
> @@ -432,7 +432,7 @@ static u64 __get_spte_lockless(u64 *sptep)
>         int count;
>
>  retry:
> -       count = sp->clear_spte_count;
> +       count = sp->arch.clear_spte_count;
>         smp_rmb();
>
>         spte.spte_low = orig->spte_low;
> @@ -442,7 +442,7 @@ static u64 __get_spte_lockless(u64 *sptep)
>         smp_rmb();
>
>         if (unlikely(spte.spte_low != orig->spte_low ||
> -             count != sp->clear_spte_count))
> +             count != sp->arch.clear_spte_count))
>                 goto retry;
>
>         return spte.spte;
> @@ -699,7 +699,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>                 return sp->gfn;
>
>         if (!sp->role.arch.direct)
> -               return sp->shadowed_translation[index] >> PAGE_SHIFT;
> +               return sp->arch.shadowed_translation[index] >> PAGE_SHIFT;
>
>         return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
>  }
> @@ -713,7 +713,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>  static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
>  {
>         if (sp_has_gptes(sp))
> -               return sp->shadowed_translation[index] & ACC_ALL;
> +               return sp->arch.shadowed_translation[index] & ACC_ALL;
>
>         /*
>          * For direct MMUs (e.g. TDP or non-paging guests) or passthrough SPs,
> @@ -734,7 +734,7 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
>                                          gfn_t gfn, unsigned int access)
>  {
>         if (sp_has_gptes(sp)) {
> -               sp->shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
> +               sp->arch.shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
>                 return;
>         }
>
> @@ -825,18 +825,18 @@ void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>          * on the list if KVM is reusing an existing shadow page, i.e. if KVM
>          * links a shadow page at multiple points.
>          */
> -       if (!list_empty(&sp->possible_nx_huge_page_link))
> +       if (!list_empty(&sp->arch.possible_nx_huge_page_link))
>                 return;
>
>         ++kvm->stat.nx_lpage_splits;
> -       list_add_tail(&sp->possible_nx_huge_page_link,
> +       list_add_tail(&sp->arch.possible_nx_huge_page_link,
>                       &kvm->arch.possible_nx_huge_pages);
>  }
>
>  static void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
>                                  bool nx_huge_page_possible)
>  {
> -       sp->nx_huge_page_disallowed = true;
> +       sp->arch.nx_huge_page_disallowed = true;
>
>         if (nx_huge_page_possible)
>                 track_possible_nx_huge_page(kvm, sp);
> @@ -861,16 +861,16 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
>
>  void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> -       if (list_empty(&sp->possible_nx_huge_page_link))
> +       if (list_empty(&sp->arch.possible_nx_huge_page_link))
>                 return;
>
>         --kvm->stat.nx_lpage_splits;
> -       list_del_init(&sp->possible_nx_huge_page_link);
> +       list_del_init(&sp->arch.possible_nx_huge_page_link);
>  }
>
>  static void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> -       sp->nx_huge_page_disallowed = false;
> +       sp->arch.nx_huge_page_disallowed = false;
>
>         untrack_possible_nx_huge_page(kvm, sp);
>  }
> @@ -1720,11 +1720,11 @@ static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
>  {
>         MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
> -       hlist_del(&sp->hash_link);
> +       hlist_del(&sp->arch.hash_link);
>         list_del(&sp->link);
>         free_page((unsigned long)sp->spt);
>         if (!sp->role.arch.direct)
> -               free_page((unsigned long)sp->shadowed_translation);
> +               free_page((unsigned long)sp->arch.shadowed_translation);
>         kmem_cache_free(mmu_page_header_cache, sp);
>  }
>
> @@ -1739,13 +1739,13 @@ static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache,
>         if (!parent_pte)
>                 return;
>
> -       pte_list_add(cache, parent_pte, &sp->parent_ptes);
> +       pte_list_add(cache, parent_pte, &sp->arch.parent_ptes);
>  }
>
>  static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
>                                        u64 *parent_pte)
>  {
> -       pte_list_remove(parent_pte, &sp->parent_ptes);
> +       pte_list_remove(parent_pte, &sp->arch.parent_ptes);
>  }
>
>  static void drop_parent_pte(struct kvm_mmu_page *sp,
> @@ -1761,7 +1761,7 @@ static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
>         u64 *sptep;
>         struct rmap_iterator iter;
>
> -       for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
> +       for_each_rmap_spte(&sp->arch.parent_ptes, &iter, sptep) {
>                 mark_unsync(sptep);
>         }
>  }
> @@ -1771,9 +1771,9 @@ static void mark_unsync(u64 *spte)
>         struct kvm_mmu_page *sp;
>
>         sp = sptep_to_sp(spte);
> -       if (__test_and_set_bit(spte_index(spte), sp->unsync_child_bitmap))
> +       if (__test_and_set_bit(spte_index(spte), sp->arch.unsync_child_bitmap))
>                 return;
> -       if (sp->unsync_children++)
> +       if (sp->arch.unsync_children++)
>                 return;
>         kvm_mmu_mark_parents_unsync(sp);
>  }
> @@ -1799,7 +1799,7 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
>  {
>         int i;
>
> -       if (sp->unsync)
> +       if (sp->arch.unsync)
>                 for (i=0; i < pvec->nr; i++)
>                         if (pvec->page[i].sp == sp)
>                                 return 0;
> @@ -1812,9 +1812,9 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
>
>  static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
>  {
> -       --sp->unsync_children;
> -       WARN_ON((int)sp->unsync_children < 0);
> -       __clear_bit(idx, sp->unsync_child_bitmap);
> +       --sp->arch.unsync_children;
> +       WARN_ON((int)sp->arch.unsync_children < 0);
> +       __clear_bit(idx, sp->arch.unsync_child_bitmap);
>  }
>
>  static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
> @@ -1822,7 +1822,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>  {
>         int i, ret, nr_unsync_leaf = 0;
>
> -       for_each_set_bit(i, sp->unsync_child_bitmap, 512) {
> +       for_each_set_bit(i, sp->arch.unsync_child_bitmap, 512) {
>                 struct kvm_mmu_page *child;
>                 u64 ent = sp->spt[i];
>
> @@ -1833,7 +1833,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>
>                 child = spte_to_child_sp(ent);
>
> -               if (child->unsync_children) {
> +               if (child->arch.unsync_children) {
>                         if (mmu_pages_add(pvec, child, i))
>                                 return -ENOSPC;
>
> @@ -1845,7 +1845,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>                                 nr_unsync_leaf += ret;
>                         } else
>                                 return ret;
> -               } else if (child->unsync) {
> +               } else if (child->arch.unsync) {
>                         nr_unsync_leaf++;
>                         if (mmu_pages_add(pvec, child, i))
>                                 return -ENOSPC;
> @@ -1862,7 +1862,7 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
>                            struct kvm_mmu_pages *pvec)
>  {
>         pvec->nr = 0;
> -       if (!sp->unsync_children)
> +       if (!sp->arch.unsync_children)
>                 return 0;
>
>         mmu_pages_add(pvec, sp, INVALID_INDEX);
> @@ -1871,9 +1871,9 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
>
>  static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> -       WARN_ON(!sp->unsync);
> +       WARN_ON(!sp->arch.unsync);
>         trace_kvm_mmu_sync_page(sp);
> -       sp->unsync = 0;
> +       sp->arch.unsync = 0;
>         --kvm->stat.mmu_unsync;
>  }
>
> @@ -1894,7 +1894,7 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
>  }
>
>  #define for_each_valid_sp(_kvm, _sp, _list)                            \
> -       hlist_for_each_entry(_sp, _list, hash_link)                     \
> +       hlist_for_each_entry(_sp, _list, arch.hash_link)                        \
>                 if (is_obsolete_sp((_kvm), (_sp))) {                    \
>                 } else
>
> @@ -1934,7 +1934,7 @@ static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>
>         /* TDP MMU pages do not use the MMU generation. */
>         return !is_tdp_mmu_page(sp) &&
> -              unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
> +              unlikely(sp->arch.mmu_valid_gen != kvm->arch.mmu_valid_gen);
>  }
>
>  struct mmu_page_path {
> @@ -2006,7 +2006,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
>                 WARN_ON(idx == INVALID_INDEX);
>                 clear_unsync_child_bit(sp, idx);
>                 level++;
> -       } while (!sp->unsync_children);
> +       } while (!sp->arch.unsync_children);
>  }
>
>  static int mmu_sync_children(struct kvm_vcpu *vcpu,
> @@ -2053,7 +2053,7 @@ static int mmu_sync_children(struct kvm_vcpu *vcpu,
>
>  static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
>  {
> -       atomic_set(&sp->write_flooding_count,  0);
> +       atomic_set(&sp->arch.write_flooding_count,  0);
>  }
>
>  static void clear_sp_write_flooding_count(u64 *spte)
> @@ -2094,7 +2094,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                          * Unsync pages must not be left as is, because the new
>                          * upper-level page will be write-protected.
>                          */
> -                       if (role.level > PG_LEVEL_4K && sp->unsync)
> +                       if (role.level > PG_LEVEL_4K && sp->arch.unsync)
>                                 kvm_mmu_prepare_zap_page(kvm, sp,
>                                                          &invalid_list);
>                         continue;
> @@ -2104,7 +2104,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                 if (sp->role.arch.direct)
>                         goto out;
>
> -               if (sp->unsync) {
> +               if (sp->arch.unsync) {
>                         if (KVM_BUG_ON(!vcpu, kvm))
>                                 break;
>
> @@ -2163,25 +2163,26 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>         sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
>         sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
>         if (!role.arch.direct)
> -               sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
> +               sp->arch.shadowed_translation =
> +                       kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
>
>         set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
>
> -       INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
> +       INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
>
>         /*
>          * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
>          * depends on valid pages being added to the head of the list.  See
>          * comments in kvm_zap_obsolete_pages().
>          */
> -       sp->mmu_valid_gen = kvm->arch.mmu_valid_gen;
> +       sp->arch.mmu_valid_gen = kvm->arch.mmu_valid_gen;
>         list_add(&sp->link, &kvm->arch.active_mmu_pages);
>         kvm_account_mmu_page(kvm, sp);
>
>         sp->gfn = gfn;
>         sp->role = role;
> -       sp->shadow_mmu_page = true;
> -       hlist_add_head(&sp->hash_link, sp_list);
> +       sp->arch.shadow_mmu_page = true;
> +       hlist_add_head(&sp->arch.hash_link, sp_list);
>         if (sp_has_gptes(sp))
>                 account_shadowed(kvm, sp);
>
> @@ -2368,7 +2369,7 @@ static void __link_shadow_page(struct kvm *kvm,
>
>         mmu_page_add_parent_pte(cache, sp, sptep);
>
> -       if (sp->unsync_children || sp->unsync)
> +       if (sp->arch.unsync_children || sp->arch.unsync)
>                 mark_unsync(sptep);
>  }
>
> @@ -2421,7 +2422,8 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
>                          * avoids retaining a large number of stale nested SPs.
>                          */
>                         if (tdp_enabled && invalid_list &&
> -                           child->role.arch.guest_mode && !child->parent_ptes.val)
> +                           child->role.arch.guest_mode &&
> +                           !child->arch.parent_ptes.val)
>                                 return kvm_mmu_prepare_zap_page(kvm, child,
>                                                                 invalid_list);
>                 }
> @@ -2449,7 +2451,7 @@ static void kvm_mmu_unlink_parents(struct kvm_mmu_page *sp)
>         u64 *sptep;
>         struct rmap_iterator iter;
>
> -       while ((sptep = rmap_get_first(&sp->parent_ptes, &iter)))
> +       while ((sptep = rmap_get_first(&sp->arch.parent_ptes, &iter)))
>                 drop_parent_pte(sp, sptep);
>  }
>
> @@ -2496,7 +2498,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
>         if (!sp->role.invalid && sp_has_gptes(sp))
>                 unaccount_shadowed(kvm, sp);
>
> -       if (sp->unsync)
> +       if (sp->arch.unsync)
>                 kvm_unlink_unsync_page(kvm, sp);
>         if (!refcount_read(&sp->root_refcount)) {
>                 /* Count self */
> @@ -2527,7 +2529,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
>                 zapped_root = !is_obsolete_sp(kvm, sp);
>         }
>
> -       if (sp->nx_huge_page_disallowed)
> +       if (sp->arch.nx_huge_page_disallowed)
>                 unaccount_nx_huge_page(kvm, sp);
>
>         sp->role.invalid = 1;
> @@ -2704,7 +2706,7 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
>         trace_kvm_mmu_unsync_page(sp);
>         ++kvm->stat.mmu_unsync;
> -       sp->unsync = 1;
> +       sp->arch.unsync = 1;
>
>         kvm_mmu_mark_parents_unsync(sp);
>  }
> @@ -2739,7 +2741,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>                 if (!can_unsync)
>                         return -EPERM;
>
> -               if (sp->unsync)
> +               if (sp->arch.unsync)
>                         continue;
>
>                 if (prefetch)
> @@ -2764,7 +2766,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>                          * for write, i.e. unsync cannot transition from 0->1
>                          * while this CPU holds mmu_lock for read (or write).
>                          */
> -                       if (READ_ONCE(sp->unsync))
> +                       if (READ_ONCE(sp->arch.unsync))
>                                 continue;
>                 }
>
> @@ -2796,8 +2798,8 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>          *                      2.2 Guest issues TLB flush.
>          *                          That causes a VM Exit.
>          *
> -        *                      2.3 Walking of unsync pages sees sp->unsync is
> -        *                          false and skips the page.
> +        *                      2.3 Walking of unsync pages sees sp->arch.unsync
> +        *                          is false and skips the page.
>          *
>          *                      2.4 Guest accesses GVA X.
>          *                          Since the mapping in the SP was not updated,
> @@ -2805,7 +2807,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>          *                          gets used.
>          * 1.1 Host marks SP
>          *     as unsync
> -        *     (sp->unsync = true)
> +        *     (sp->arch.unsync = true)
>          *
>          * The write barrier below ensures that 1.1 happens before 1.2 and thus
>          * the situation in 2.4 does not arise.  It pairs with the read barrier
> @@ -3126,7 +3128,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
>             cur_level == fault->goal_level &&
>             is_shadow_present_pte(spte) &&
>             !is_large_pte(spte) &&
> -           spte_to_child_sp(spte)->nx_huge_page_disallowed) {
> +           spte_to_child_sp(spte)->arch.nx_huge_page_disallowed) {
>                 /*
>                  * A small SPTE exists for this pfn, but FNAME(fetch),
>                  * direct_map(), or kvm_tdp_mmu_map() would like to create a
> @@ -3902,7 +3904,7 @@ static bool is_unsync_root(hpa_t root)
>
>         /*
>          * The read barrier orders the CPU's read of SPTE.W during the page table
> -        * walk before the reads of sp->unsync/sp->unsync_children here.
> +        * walk before the reads of sp->arch.{unsync,unsync_children} here.
>          *
>          * Even if another CPU was marking the SP as unsync-ed simultaneously,
>          * any guest page table changes are not guaranteed to be visible anyway
> @@ -3922,7 +3924,7 @@ static bool is_unsync_root(hpa_t root)
>         if (WARN_ON_ONCE(!sp))
>                 return false;
>
> -       if (sp->unsync || sp->unsync_children)
> +       if (sp->arch.unsync || sp->arch.unsync_children)
>                 return true;
>
>         return false;
> @@ -5510,8 +5512,8 @@ static bool detect_write_flooding(struct kvm_mmu_page *sp)
>         if (sp->role.level == PG_LEVEL_4K)
>                 return false;
>
> -       atomic_inc(&sp->write_flooding_count);
> -       return atomic_read(&sp->write_flooding_count) >= 3;
> +       atomic_inc(&sp->arch.write_flooding_count);
> +       return atomic_read(&sp->arch.write_flooding_count) >= 3;
>  }
>
>  /*
> @@ -6389,7 +6391,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
>                         continue;
>
>                 /* SPs with level >PG_LEVEL_4K should never by unsync. */
> -               if (WARN_ON_ONCE(sp->unsync))
> +               if (WARN_ON_ONCE(sp->arch.unsync))
>                         continue;
>
>                 /* Don't bother splitting huge pages on invalid SPs. */
> @@ -6941,8 +6943,8 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
>                  */
>                 sp = list_first_entry(&kvm->arch.possible_nx_huge_pages,
>                                       struct kvm_mmu_page,
> -                                     possible_nx_huge_page_link);
> -               WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
> +                                     arch.possible_nx_huge_page_link);
> +               WARN_ON_ONCE(!sp->arch.nx_huge_page_disallowed);
>                 WARN_ON_ONCE(!sp->role.arch.direct);
>
>                 /*
> @@ -6977,7 +6979,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
>                         flush |= kvm_tdp_mmu_zap_sp(kvm, sp);
>                 else
>                         kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
> -               WARN_ON_ONCE(sp->nx_huge_page_disallowed);
> +               WARN_ON_ONCE(sp->arch.nx_huge_page_disallowed);
>
>                 if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
>                         kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index fd4990c8b0e9..af2ae4887e35 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -44,89 +44,6 @@ extern bool dbg;
>  #define INVALID_PAE_ROOT       0
>  #define IS_VALID_PAE_ROOT(x)   (!!(x))
>
> -struct kvm_mmu_page {
> -       /*
> -        * Note, "link" through "spt" fit in a single 64 byte cache line on
> -        * 64-bit kernels, keep it that way unless there's a reason not to.
> -        */
> -       struct list_head link;
> -       struct hlist_node hash_link;
> -
> -       bool shadow_mmu_page;
> -       bool unsync;
> -       u8 mmu_valid_gen;
> -
> -        /*
> -         * The shadow page can't be replaced by an equivalent huge page
> -         * because it is being used to map an executable page in the guest
> -         * and the NX huge page mitigation is enabled.
> -         */
> -       bool nx_huge_page_disallowed;
> -
> -       /*
> -        * The following two entries are used to key the shadow page in the
> -        * hash table.
> -        */
> -       union kvm_mmu_page_role role;
> -       gfn_t gfn;
> -
> -       u64 *spt;
> -
> -       /*
> -        * Stores the result of the guest translation being shadowed by each
> -        * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
> -        * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
> -        * cases the result of the translation is a GPA and a set of access
> -        * constraints.
> -        *
> -        * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
> -        * access permissions are stored in the lower bits. Note, for
> -        * convenience and uniformity across guests, the access permissions are
> -        * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
> -        */
> -       u64 *shadowed_translation;
> -
> -       /* Currently serving as active root */
> -       refcount_t root_refcount;
> -
> -       unsigned int unsync_children;
> -       union {
> -               struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
> -               tdp_ptep_t ptep;
> -       };
> -       union {
> -               DECLARE_BITMAP(unsync_child_bitmap, 512);
> -               struct {
> -                       struct work_struct tdp_mmu_async_work;
> -                       void *tdp_mmu_async_data;
> -               };
> -       };
> -
> -       /*
> -        * Tracks shadow pages that, if zapped, would allow KVM to create an NX
> -        * huge page.  A shadow page will have nx_huge_page_disallowed set but
> -        * not be on the list if a huge page is disallowed for other reasons,
> -        * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
> -        * isn't properly aligned, etc...
> -        */
> -       struct list_head possible_nx_huge_page_link;
> -#ifdef CONFIG_X86_32
> -       /*
> -        * Used out of the mmu-lock to avoid reading spte values while an
> -        * update is in progress; see the comments in __get_spte_lockless().
> -        */
> -       int clear_spte_count;
> -#endif
> -
> -       /* Number of writes since the last time traversal visited this page.  */
> -       atomic_t write_flooding_count;
> -
> -#ifdef CONFIG_X86_64
> -       /* Used for freeing the page asynchronously if it is a TDP MMU page. */
> -       struct rcu_head rcu_head;
> -#endif
> -};
> -
>  extern struct kmem_cache *mmu_page_header_cache;
>
>  static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index ffd10ce3eae3..335f26dabdf3 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -16,11 +16,11 @@
>         __field(bool, unsync)
>
>  #define KVM_MMU_PAGE_ASSIGN(sp)                                \
> -       __entry->mmu_valid_gen = sp->mmu_valid_gen;     \
> +       __entry->mmu_valid_gen = sp->arch.mmu_valid_gen;        \
>         __entry->gfn = sp->gfn;                         \
>         __entry->role = sp->role.word;                  \
>         __entry->root_count = refcount_read(&sp->root_refcount);        \
> -       __entry->unsync = sp->unsync;
> +       __entry->unsync = sp->arch.unsync;
>
>  #define KVM_MMU_PAGE_PRINTK() ({                                       \
>         const char *saved_ptr = trace_seq_buffer_ptr(p);                \
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index e15ec1c473da..18bb92b70a01 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -671,7 +671,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>                          * KVM_REQ_MMU_SYNC is not necessary but it
>                          * expedites the process.
>                          */
> -                       if (sp->unsync_children &&
> +                       if (sp->arch.unsync_children &&
>                             mmu_sync_children(vcpu, sp, false))
>                                 return RET_PF_RETRY;
>                 }
> @@ -921,7 +921,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
>                         pt_element_t gpte;
>                         gpa_t pte_gpa;
>
> -                       if (!sp->unsync)
> +                       if (!sp->arch.unsync)
>                                 break;
>
>                         pte_gpa = FNAME(get_level1_sp_gpa)(sp);
> @@ -942,7 +942,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
>                         FNAME(prefetch_gpte)(vcpu, sp, sptep, gpte, false);
>                 }
>
> -               if (!sp->unsync_children)
> +               if (!sp->arch.unsync_children)
>                         break;
>         }
>         write_unlock(&vcpu->kvm->mmu_lock);
> @@ -974,8 +974,8 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  }
>
>  /*
> - * Using the information in sp->shadowed_translation (kvm_mmu_page_get_gfn()) is
> - * safe because:
> + * Using the information in sp->arch.shadowed_translation
> + * (kvm_mmu_page_get_gfn()) is safe because:
>   * - The spte has a reference to the struct page, so the pfn for a given gfn
>   *   can't change unless all sptes pointing to it are nuked first.
>   *
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 34d674080170..66231c7ed31e 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -270,7 +270,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
>  static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
>                             gfn_t gfn, union kvm_mmu_page_role role)
>  {
> -       INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
> +       INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
>
>         set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
>
> @@ -385,7 +385,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
>  {
>         tdp_unaccount_mmu_page(kvm, sp);
>
> -       if (!sp->nx_huge_page_disallowed)
> +       if (!sp->arch.nx_huge_page_disallowed)
>                 return;
>
>         if (shared)
> @@ -393,7 +393,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
>         else
>                 lockdep_assert_held_write(&kvm->mmu_lock);
>
> -       sp->nx_huge_page_disallowed = false;
> +       sp->arch.nx_huge_page_disallowed = false;
>         untrack_possible_nx_huge_page(kvm, sp);
>
>         if (shared)
> @@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                 sp = tdp_mmu_alloc_sp(vcpu);
>                 tdp_mmu_init_child_sp(sp, &iter);
>
> -               sp->nx_huge_page_disallowed = fault->huge_page_disallowed;
> +               sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
>
>                 if (is_shadow_present_pte(iter.old_spte))
>                         r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index 19d3153051a3..e6a929089715 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -73,7 +73,7 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
>  #ifdef CONFIG_X86_64
>  static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
>  {
> -       return !sp->shadow_mmu_page;
> +       return !sp->arch.shadow_mmu_page;
>  }
>  #else
>  static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
> diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> index 14099956fdac..a9da33d4baa8 100644
> --- a/include/kvm/mmu_types.h
> +++ b/include/kvm/mmu_types.h
> @@ -3,8 +3,11 @@
>  #define __KVM_MMU_TYPES_H
>
>  #include <linux/bug.h>
> -#include <linux/types.h>
> +#include <linux/kvm_types.h>
> +#include <linux/refcount.h>
>  #include <linux/stddef.h>
> +#include <linux/types.h>
> +#include <linux/workqueue.h>
>
>  #include <asm/kvm/mmu_types.h>
>
> @@ -36,4 +39,31 @@ static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page
>
>  typedef u64 __rcu *tdp_ptep_t;
>
> +struct kvm_mmu_page {
> +       struct list_head link;
> +
> +       union kvm_mmu_page_role role;
> +
> +       /* The start of the GFN region mapped by this shadow page. */
> +       gfn_t gfn;

I'd like to put in a vote for getting rid of the "shadow page" /
"shadow page table entry" terminology through this refactor.
I can't count the number of times folks trying to understand the MMU
have gotten confused by the overloaded terminology.

> +
> +       /* The page table page. */
> +       u64 *spt;
> +
> +       /* The PTE that points to this shadow page. */
> +       tdp_ptep_t ptep;

Totally fine to not change this, but something like parent_ptep would
be clearer. The Shadow MMU uses parent_pteps, which I think is much
clearer.
Could also change the comment to just:
/* The PTE that points to this kvm_mmu_page. */

> +
> +       /* Used for freeing TDP MMU pages asynchronously. */
> +       struct rcu_head rcu_head;
> +
> +       /* The number of references to this shadow page as a root. */
> +       refcount_t root_refcount;
> +
> +       /* Used for tearing down an entire page table tree. */
> +       struct work_struct tdp_mmu_async_work;
> +       void *tdp_mmu_async_data;
> +
> +       struct kvm_mmu_page_arch arch;
> +};
> +
>  #endif /* !__KVM_MMU_TYPES_H */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
@ 2022-12-12 18:07     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 18:07 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_mmu_page to common code and all x86-specific fields into
> kvm_mmu_page_arch.
>
> This commit increases the size of struct kvm_mmu_page by 64 bytes on
> x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
> in future commits by moving TDP MMU root fields into a separate struct
> and by dynamically allocating fields only used by the Shadow MMU.
>
> No functional change intended.
>
> Signed-off-by: David Matlack <dmatlack@google.com>

I haven't reviewed every line of the mechanical refactor of adding
"arch." all over the place, but at a high level, this looks like a
good split on struct kvm_mmu_page.

> ---
>  arch/x86/include/asm/kvm/mmu_types.h |  62 ++++++++++++++
>  arch/x86/include/asm/kvm_host.h      |   4 -
>  arch/x86/kvm/mmu/mmu.c               | 122 ++++++++++++++-------------
>  arch/x86/kvm/mmu/mmu_internal.h      |  83 ------------------
>  arch/x86/kvm/mmu/mmutrace.h          |   4 +-
>  arch/x86/kvm/mmu/paging_tmpl.h       |  10 +--
>  arch/x86/kvm/mmu/tdp_mmu.c           |   8 +-
>  arch/x86/kvm/mmu/tdp_mmu.h           |   2 +-
>  include/kvm/mmu_types.h              |  32 ++++++-
>  9 files changed, 167 insertions(+), 160 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
> index 35f893ebab5a..affcb520b482 100644
> --- a/arch/x86/include/asm/kvm/mmu_types.h
> +++ b/arch/x86/include/asm/kvm/mmu_types.h
> @@ -2,6 +2,8 @@
>  #ifndef __ASM_KVM_MMU_TYPES_H
>  #define __ASM_KVM_MMU_TYPES_H
>
> +#include <linux/bitmap.h>
> +#include <linux/list.h>
>  #include <linux/types.h>
>
>  /*
> @@ -53,4 +55,64 @@ struct kvm_mmu_page_role_arch {
>         u16 passthrough:1;
>  };
>
> +struct kvm_rmap_head {
> +       unsigned long val;
> +};
> +
> +struct kvm_mmu_page_arch {
> +       struct hlist_node hash_link;
> +
> +       bool shadow_mmu_page;
> +       bool unsync;
> +       u8 mmu_valid_gen;
> +
> +        /*
> +         * The shadow page can't be replaced by an equivalent huge page
> +         * because it is being used to map an executable page in the guest
> +         * and the NX huge page mitigation is enabled.
> +         */
> +       bool nx_huge_page_disallowed;
> +
> +       /*
> +        * Stores the result of the guest translation being shadowed by each
> +        * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
> +        * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
> +        * cases the result of the translation is a GPA and a set of access
> +        * constraints.
> +        *
> +        * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
> +        * access permissions are stored in the lower bits. Note, for
> +        * convenience and uniformity across guests, the access permissions are
> +        * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
> +        */
> +       u64 *shadowed_translation;
> +
> +       unsigned int unsync_children;
> +
> +       /* Rmap pointers to all parent sptes. */
> +       struct kvm_rmap_head parent_ptes;
> +
> +       DECLARE_BITMAP(unsync_child_bitmap, 512);
> +
> +       /*
> +        * Tracks shadow pages that, if zapped, would allow KVM to create an NX
> +        * huge page.  A shadow page will have nx_huge_page_disallowed set but
> +        * not be on the list if a huge page is disallowed for other reasons,
> +        * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
> +        * isn't properly aligned, etc...
> +        */
> +       struct list_head possible_nx_huge_page_link;
> +
> +#ifdef CONFIG_X86_32
> +       /*
> +        * Used out of the mmu-lock to avoid reading spte values while an
> +        * update is in progress; see the comments in __get_spte_lockless().
> +        */
> +       int clear_spte_count;
> +#endif
> +
> +       /* Number of writes since the last time traversal visited this page.  */
> +       atomic_t write_flooding_count;
> +};
> +
>  #endif /* !__ASM_KVM_MMU_TYPES_H */
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ebcd7a0dabef..f5743a652e10 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -329,10 +329,6 @@ union kvm_cpu_role {
>         };
>  };
>
> -struct kvm_rmap_head {
> -       unsigned long val;
> -};
> -
>  struct kvm_pio_request {
>         unsigned long linear_rip;
>         unsigned long count;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 11cef930d5ed..e47f35878ab5 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -350,7 +350,7 @@ static void count_spte_clear(u64 *sptep, u64 spte)
>
>         /* Ensure the spte is completely set before we increase the count */
>         smp_wmb();
> -       sp->clear_spte_count++;
> +       sp->arch.clear_spte_count++;
>  }
>
>  static void __set_spte(u64 *sptep, u64 spte)
> @@ -432,7 +432,7 @@ static u64 __get_spte_lockless(u64 *sptep)
>         int count;
>
>  retry:
> -       count = sp->clear_spte_count;
> +       count = sp->arch.clear_spte_count;
>         smp_rmb();
>
>         spte.spte_low = orig->spte_low;
> @@ -442,7 +442,7 @@ static u64 __get_spte_lockless(u64 *sptep)
>         smp_rmb();
>
>         if (unlikely(spte.spte_low != orig->spte_low ||
> -             count != sp->clear_spte_count))
> +             count != sp->arch.clear_spte_count))
>                 goto retry;
>
>         return spte.spte;
> @@ -699,7 +699,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>                 return sp->gfn;
>
>         if (!sp->role.arch.direct)
> -               return sp->shadowed_translation[index] >> PAGE_SHIFT;
> +               return sp->arch.shadowed_translation[index] >> PAGE_SHIFT;
>
>         return sp->gfn + (index << ((sp->role.level - 1) * SPTE_LEVEL_BITS));
>  }
> @@ -713,7 +713,7 @@ static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
>  static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
>  {
>         if (sp_has_gptes(sp))
> -               return sp->shadowed_translation[index] & ACC_ALL;
> +               return sp->arch.shadowed_translation[index] & ACC_ALL;
>
>         /*
>          * For direct MMUs (e.g. TDP or non-paging guests) or passthrough SPs,
> @@ -734,7 +734,7 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
>                                          gfn_t gfn, unsigned int access)
>  {
>         if (sp_has_gptes(sp)) {
> -               sp->shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
> +               sp->arch.shadowed_translation[index] = (gfn << PAGE_SHIFT) | access;
>                 return;
>         }
>
> @@ -825,18 +825,18 @@ void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>          * on the list if KVM is reusing an existing shadow page, i.e. if KVM
>          * links a shadow page at multiple points.
>          */
> -       if (!list_empty(&sp->possible_nx_huge_page_link))
> +       if (!list_empty(&sp->arch.possible_nx_huge_page_link))
>                 return;
>
>         ++kvm->stat.nx_lpage_splits;
> -       list_add_tail(&sp->possible_nx_huge_page_link,
> +       list_add_tail(&sp->arch.possible_nx_huge_page_link,
>                       &kvm->arch.possible_nx_huge_pages);
>  }
>
>  static void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
>                                  bool nx_huge_page_possible)
>  {
> -       sp->nx_huge_page_disallowed = true;
> +       sp->arch.nx_huge_page_disallowed = true;
>
>         if (nx_huge_page_possible)
>                 track_possible_nx_huge_page(kvm, sp);
> @@ -861,16 +861,16 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
>
>  void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> -       if (list_empty(&sp->possible_nx_huge_page_link))
> +       if (list_empty(&sp->arch.possible_nx_huge_page_link))
>                 return;
>
>         --kvm->stat.nx_lpage_splits;
> -       list_del_init(&sp->possible_nx_huge_page_link);
> +       list_del_init(&sp->arch.possible_nx_huge_page_link);
>  }
>
>  static void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> -       sp->nx_huge_page_disallowed = false;
> +       sp->arch.nx_huge_page_disallowed = false;
>
>         untrack_possible_nx_huge_page(kvm, sp);
>  }
> @@ -1720,11 +1720,11 @@ static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  static void kvm_mmu_free_shadow_page(struct kvm_mmu_page *sp)
>  {
>         MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
> -       hlist_del(&sp->hash_link);
> +       hlist_del(&sp->arch.hash_link);
>         list_del(&sp->link);
>         free_page((unsigned long)sp->spt);
>         if (!sp->role.arch.direct)
> -               free_page((unsigned long)sp->shadowed_translation);
> +               free_page((unsigned long)sp->arch.shadowed_translation);
>         kmem_cache_free(mmu_page_header_cache, sp);
>  }
>
> @@ -1739,13 +1739,13 @@ static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache,
>         if (!parent_pte)
>                 return;
>
> -       pte_list_add(cache, parent_pte, &sp->parent_ptes);
> +       pte_list_add(cache, parent_pte, &sp->arch.parent_ptes);
>  }
>
>  static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
>                                        u64 *parent_pte)
>  {
> -       pte_list_remove(parent_pte, &sp->parent_ptes);
> +       pte_list_remove(parent_pte, &sp->arch.parent_ptes);
>  }
>
>  static void drop_parent_pte(struct kvm_mmu_page *sp,
> @@ -1761,7 +1761,7 @@ static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
>         u64 *sptep;
>         struct rmap_iterator iter;
>
> -       for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
> +       for_each_rmap_spte(&sp->arch.parent_ptes, &iter, sptep) {
>                 mark_unsync(sptep);
>         }
>  }
> @@ -1771,9 +1771,9 @@ static void mark_unsync(u64 *spte)
>         struct kvm_mmu_page *sp;
>
>         sp = sptep_to_sp(spte);
> -       if (__test_and_set_bit(spte_index(spte), sp->unsync_child_bitmap))
> +       if (__test_and_set_bit(spte_index(spte), sp->arch.unsync_child_bitmap))
>                 return;
> -       if (sp->unsync_children++)
> +       if (sp->arch.unsync_children++)
>                 return;
>         kvm_mmu_mark_parents_unsync(sp);
>  }
> @@ -1799,7 +1799,7 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
>  {
>         int i;
>
> -       if (sp->unsync)
> +       if (sp->arch.unsync)
>                 for (i=0; i < pvec->nr; i++)
>                         if (pvec->page[i].sp == sp)
>                                 return 0;
> @@ -1812,9 +1812,9 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp,
>
>  static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
>  {
> -       --sp->unsync_children;
> -       WARN_ON((int)sp->unsync_children < 0);
> -       __clear_bit(idx, sp->unsync_child_bitmap);
> +       --sp->arch.unsync_children;
> +       WARN_ON((int)sp->arch.unsync_children < 0);
> +       __clear_bit(idx, sp->arch.unsync_child_bitmap);
>  }
>
>  static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
> @@ -1822,7 +1822,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>  {
>         int i, ret, nr_unsync_leaf = 0;
>
> -       for_each_set_bit(i, sp->unsync_child_bitmap, 512) {
> +       for_each_set_bit(i, sp->arch.unsync_child_bitmap, 512) {
>                 struct kvm_mmu_page *child;
>                 u64 ent = sp->spt[i];
>
> @@ -1833,7 +1833,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>
>                 child = spte_to_child_sp(ent);
>
> -               if (child->unsync_children) {
> +               if (child->arch.unsync_children) {
>                         if (mmu_pages_add(pvec, child, i))
>                                 return -ENOSPC;
>
> @@ -1845,7 +1845,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>                                 nr_unsync_leaf += ret;
>                         } else
>                                 return ret;
> -               } else if (child->unsync) {
> +               } else if (child->arch.unsync) {
>                         nr_unsync_leaf++;
>                         if (mmu_pages_add(pvec, child, i))
>                                 return -ENOSPC;
> @@ -1862,7 +1862,7 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
>                            struct kvm_mmu_pages *pvec)
>  {
>         pvec->nr = 0;
> -       if (!sp->unsync_children)
> +       if (!sp->arch.unsync_children)
>                 return 0;
>
>         mmu_pages_add(pvec, sp, INVALID_INDEX);
> @@ -1871,9 +1871,9 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
>
>  static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
> -       WARN_ON(!sp->unsync);
> +       WARN_ON(!sp->arch.unsync);
>         trace_kvm_mmu_sync_page(sp);
> -       sp->unsync = 0;
> +       sp->arch.unsync = 0;
>         --kvm->stat.mmu_unsync;
>  }
>
> @@ -1894,7 +1894,7 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
>  }
>
>  #define for_each_valid_sp(_kvm, _sp, _list)                            \
> -       hlist_for_each_entry(_sp, _list, hash_link)                     \
> +       hlist_for_each_entry(_sp, _list, arch.hash_link)                        \
>                 if (is_obsolete_sp((_kvm), (_sp))) {                    \
>                 } else
>
> @@ -1934,7 +1934,7 @@ static bool is_obsolete_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>
>         /* TDP MMU pages do not use the MMU generation. */
>         return !is_tdp_mmu_page(sp) &&
> -              unlikely(sp->mmu_valid_gen != kvm->arch.mmu_valid_gen);
> +              unlikely(sp->arch.mmu_valid_gen != kvm->arch.mmu_valid_gen);
>  }
>
>  struct mmu_page_path {
> @@ -2006,7 +2006,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
>                 WARN_ON(idx == INVALID_INDEX);
>                 clear_unsync_child_bit(sp, idx);
>                 level++;
> -       } while (!sp->unsync_children);
> +       } while (!sp->arch.unsync_children);
>  }
>
>  static int mmu_sync_children(struct kvm_vcpu *vcpu,
> @@ -2053,7 +2053,7 @@ static int mmu_sync_children(struct kvm_vcpu *vcpu,
>
>  static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
>  {
> -       atomic_set(&sp->write_flooding_count,  0);
> +       atomic_set(&sp->arch.write_flooding_count,  0);
>  }
>
>  static void clear_sp_write_flooding_count(u64 *spte)
> @@ -2094,7 +2094,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                          * Unsync pages must not be left as is, because the new
>                          * upper-level page will be write-protected.
>                          */
> -                       if (role.level > PG_LEVEL_4K && sp->unsync)
> +                       if (role.level > PG_LEVEL_4K && sp->arch.unsync)
>                                 kvm_mmu_prepare_zap_page(kvm, sp,
>                                                          &invalid_list);
>                         continue;
> @@ -2104,7 +2104,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                 if (sp->role.arch.direct)
>                         goto out;
>
> -               if (sp->unsync) {
> +               if (sp->arch.unsync) {
>                         if (KVM_BUG_ON(!vcpu, kvm))
>                                 break;
>
> @@ -2163,25 +2163,26 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>         sp = kvm_mmu_memory_cache_alloc(caches->page_header_cache);
>         sp->spt = kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
>         if (!role.arch.direct)
> -               sp->shadowed_translation = kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
> +               sp->arch.shadowed_translation =
> +                       kvm_mmu_memory_cache_alloc(caches->shadowed_info_cache);
>
>         set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
>
> -       INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
> +       INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
>
>         /*
>          * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
>          * depends on valid pages being added to the head of the list.  See
>          * comments in kvm_zap_obsolete_pages().
>          */
> -       sp->mmu_valid_gen = kvm->arch.mmu_valid_gen;
> +       sp->arch.mmu_valid_gen = kvm->arch.mmu_valid_gen;
>         list_add(&sp->link, &kvm->arch.active_mmu_pages);
>         kvm_account_mmu_page(kvm, sp);
>
>         sp->gfn = gfn;
>         sp->role = role;
> -       sp->shadow_mmu_page = true;
> -       hlist_add_head(&sp->hash_link, sp_list);
> +       sp->arch.shadow_mmu_page = true;
> +       hlist_add_head(&sp->arch.hash_link, sp_list);
>         if (sp_has_gptes(sp))
>                 account_shadowed(kvm, sp);
>
> @@ -2368,7 +2369,7 @@ static void __link_shadow_page(struct kvm *kvm,
>
>         mmu_page_add_parent_pte(cache, sp, sptep);
>
> -       if (sp->unsync_children || sp->unsync)
> +       if (sp->arch.unsync_children || sp->arch.unsync)
>                 mark_unsync(sptep);
>  }
>
> @@ -2421,7 +2422,8 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
>                          * avoids retaining a large number of stale nested SPs.
>                          */
>                         if (tdp_enabled && invalid_list &&
> -                           child->role.arch.guest_mode && !child->parent_ptes.val)
> +                           child->role.arch.guest_mode &&
> +                           !child->arch.parent_ptes.val)
>                                 return kvm_mmu_prepare_zap_page(kvm, child,
>                                                                 invalid_list);
>                 }
> @@ -2449,7 +2451,7 @@ static void kvm_mmu_unlink_parents(struct kvm_mmu_page *sp)
>         u64 *sptep;
>         struct rmap_iterator iter;
>
> -       while ((sptep = rmap_get_first(&sp->parent_ptes, &iter)))
> +       while ((sptep = rmap_get_first(&sp->arch.parent_ptes, &iter)))
>                 drop_parent_pte(sp, sptep);
>  }
>
> @@ -2496,7 +2498,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
>         if (!sp->role.invalid && sp_has_gptes(sp))
>                 unaccount_shadowed(kvm, sp);
>
> -       if (sp->unsync)
> +       if (sp->arch.unsync)
>                 kvm_unlink_unsync_page(kvm, sp);
>         if (!refcount_read(&sp->root_refcount)) {
>                 /* Count self */
> @@ -2527,7 +2529,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
>                 zapped_root = !is_obsolete_sp(kvm, sp);
>         }
>
> -       if (sp->nx_huge_page_disallowed)
> +       if (sp->arch.nx_huge_page_disallowed)
>                 unaccount_nx_huge_page(kvm, sp);
>
>         sp->role.invalid = 1;
> @@ -2704,7 +2706,7 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
>         trace_kvm_mmu_unsync_page(sp);
>         ++kvm->stat.mmu_unsync;
> -       sp->unsync = 1;
> +       sp->arch.unsync = 1;
>
>         kvm_mmu_mark_parents_unsync(sp);
>  }
> @@ -2739,7 +2741,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>                 if (!can_unsync)
>                         return -EPERM;
>
> -               if (sp->unsync)
> +               if (sp->arch.unsync)
>                         continue;
>
>                 if (prefetch)
> @@ -2764,7 +2766,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>                          * for write, i.e. unsync cannot transition from 0->1
>                          * while this CPU holds mmu_lock for read (or write).
>                          */
> -                       if (READ_ONCE(sp->unsync))
> +                       if (READ_ONCE(sp->arch.unsync))
>                                 continue;
>                 }
>
> @@ -2796,8 +2798,8 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>          *                      2.2 Guest issues TLB flush.
>          *                          That causes a VM Exit.
>          *
> -        *                      2.3 Walking of unsync pages sees sp->unsync is
> -        *                          false and skips the page.
> +        *                      2.3 Walking of unsync pages sees sp->arch.unsync
> +        *                          is false and skips the page.
>          *
>          *                      2.4 Guest accesses GVA X.
>          *                          Since the mapping in the SP was not updated,
> @@ -2805,7 +2807,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>          *                          gets used.
>          * 1.1 Host marks SP
>          *     as unsync
> -        *     (sp->unsync = true)
> +        *     (sp->arch.unsync = true)
>          *
>          * The write barrier below ensures that 1.1 happens before 1.2 and thus
>          * the situation in 2.4 does not arise.  It pairs with the read barrier
> @@ -3126,7 +3128,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
>             cur_level == fault->goal_level &&
>             is_shadow_present_pte(spte) &&
>             !is_large_pte(spte) &&
> -           spte_to_child_sp(spte)->nx_huge_page_disallowed) {
> +           spte_to_child_sp(spte)->arch.nx_huge_page_disallowed) {
>                 /*
>                  * A small SPTE exists for this pfn, but FNAME(fetch),
>                  * direct_map(), or kvm_tdp_mmu_map() would like to create a
> @@ -3902,7 +3904,7 @@ static bool is_unsync_root(hpa_t root)
>
>         /*
>          * The read barrier orders the CPU's read of SPTE.W during the page table
> -        * walk before the reads of sp->unsync/sp->unsync_children here.
> +        * walk before the reads of sp->arch.{unsync,unsync_children} here.
>          *
>          * Even if another CPU was marking the SP as unsync-ed simultaneously,
>          * any guest page table changes are not guaranteed to be visible anyway
> @@ -3922,7 +3924,7 @@ static bool is_unsync_root(hpa_t root)
>         if (WARN_ON_ONCE(!sp))
>                 return false;
>
> -       if (sp->unsync || sp->unsync_children)
> +       if (sp->arch.unsync || sp->arch.unsync_children)
>                 return true;
>
>         return false;
> @@ -5510,8 +5512,8 @@ static bool detect_write_flooding(struct kvm_mmu_page *sp)
>         if (sp->role.level == PG_LEVEL_4K)
>                 return false;
>
> -       atomic_inc(&sp->write_flooding_count);
> -       return atomic_read(&sp->write_flooding_count) >= 3;
> +       atomic_inc(&sp->arch.write_flooding_count);
> +       return atomic_read(&sp->arch.write_flooding_count) >= 3;
>  }
>
>  /*
> @@ -6389,7 +6391,7 @@ static bool shadow_mmu_try_split_huge_pages(struct kvm *kvm,
>                         continue;
>
>                 /* SPs with level >PG_LEVEL_4K should never by unsync. */
> -               if (WARN_ON_ONCE(sp->unsync))
> +               if (WARN_ON_ONCE(sp->arch.unsync))
>                         continue;
>
>                 /* Don't bother splitting huge pages on invalid SPs. */
> @@ -6941,8 +6943,8 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
>                  */
>                 sp = list_first_entry(&kvm->arch.possible_nx_huge_pages,
>                                       struct kvm_mmu_page,
> -                                     possible_nx_huge_page_link);
> -               WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
> +                                     arch.possible_nx_huge_page_link);
> +               WARN_ON_ONCE(!sp->arch.nx_huge_page_disallowed);
>                 WARN_ON_ONCE(!sp->role.arch.direct);
>
>                 /*
> @@ -6977,7 +6979,7 @@ static void kvm_recover_nx_huge_pages(struct kvm *kvm)
>                         flush |= kvm_tdp_mmu_zap_sp(kvm, sp);
>                 else
>                         kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
> -               WARN_ON_ONCE(sp->nx_huge_page_disallowed);
> +               WARN_ON_ONCE(sp->arch.nx_huge_page_disallowed);
>
>                 if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
>                         kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index fd4990c8b0e9..af2ae4887e35 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -44,89 +44,6 @@ extern bool dbg;
>  #define INVALID_PAE_ROOT       0
>  #define IS_VALID_PAE_ROOT(x)   (!!(x))
>
> -struct kvm_mmu_page {
> -       /*
> -        * Note, "link" through "spt" fit in a single 64 byte cache line on
> -        * 64-bit kernels, keep it that way unless there's a reason not to.
> -        */
> -       struct list_head link;
> -       struct hlist_node hash_link;
> -
> -       bool shadow_mmu_page;
> -       bool unsync;
> -       u8 mmu_valid_gen;
> -
> -        /*
> -         * The shadow page can't be replaced by an equivalent huge page
> -         * because it is being used to map an executable page in the guest
> -         * and the NX huge page mitigation is enabled.
> -         */
> -       bool nx_huge_page_disallowed;
> -
> -       /*
> -        * The following two entries are used to key the shadow page in the
> -        * hash table.
> -        */
> -       union kvm_mmu_page_role role;
> -       gfn_t gfn;
> -
> -       u64 *spt;
> -
> -       /*
> -        * Stores the result of the guest translation being shadowed by each
> -        * SPTE.  KVM shadows two types of guest translations: nGPA -> GPA
> -        * (shadow EPT/NPT) and GVA -> GPA (traditional shadow paging). In both
> -        * cases the result of the translation is a GPA and a set of access
> -        * constraints.
> -        *
> -        * The GFN is stored in the upper bits (PAGE_SHIFT) and the shadowed
> -        * access permissions are stored in the lower bits. Note, for
> -        * convenience and uniformity across guests, the access permissions are
> -        * stored in KVM format (e.g.  ACC_EXEC_MASK) not the raw guest format.
> -        */
> -       u64 *shadowed_translation;
> -
> -       /* Currently serving as active root */
> -       refcount_t root_refcount;
> -
> -       unsigned int unsync_children;
> -       union {
> -               struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
> -               tdp_ptep_t ptep;
> -       };
> -       union {
> -               DECLARE_BITMAP(unsync_child_bitmap, 512);
> -               struct {
> -                       struct work_struct tdp_mmu_async_work;
> -                       void *tdp_mmu_async_data;
> -               };
> -       };
> -
> -       /*
> -        * Tracks shadow pages that, if zapped, would allow KVM to create an NX
> -        * huge page.  A shadow page will have nx_huge_page_disallowed set but
> -        * not be on the list if a huge page is disallowed for other reasons,
> -        * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
> -        * isn't properly aligned, etc...
> -        */
> -       struct list_head possible_nx_huge_page_link;
> -#ifdef CONFIG_X86_32
> -       /*
> -        * Used out of the mmu-lock to avoid reading spte values while an
> -        * update is in progress; see the comments in __get_spte_lockless().
> -        */
> -       int clear_spte_count;
> -#endif
> -
> -       /* Number of writes since the last time traversal visited this page.  */
> -       atomic_t write_flooding_count;
> -
> -#ifdef CONFIG_X86_64
> -       /* Used for freeing the page asynchronously if it is a TDP MMU page. */
> -       struct rcu_head rcu_head;
> -#endif
> -};
> -
>  extern struct kmem_cache *mmu_page_header_cache;
>
>  static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index ffd10ce3eae3..335f26dabdf3 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -16,11 +16,11 @@
>         __field(bool, unsync)
>
>  #define KVM_MMU_PAGE_ASSIGN(sp)                                \
> -       __entry->mmu_valid_gen = sp->mmu_valid_gen;     \
> +       __entry->mmu_valid_gen = sp->arch.mmu_valid_gen;        \
>         __entry->gfn = sp->gfn;                         \
>         __entry->role = sp->role.word;                  \
>         __entry->root_count = refcount_read(&sp->root_refcount);        \
> -       __entry->unsync = sp->unsync;
> +       __entry->unsync = sp->arch.unsync;
>
>  #define KVM_MMU_PAGE_PRINTK() ({                                       \
>         const char *saved_ptr = trace_seq_buffer_ptr(p);                \
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index e15ec1c473da..18bb92b70a01 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -671,7 +671,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>                          * KVM_REQ_MMU_SYNC is not necessary but it
>                          * expedites the process.
>                          */
> -                       if (sp->unsync_children &&
> +                       if (sp->arch.unsync_children &&
>                             mmu_sync_children(vcpu, sp, false))
>                                 return RET_PF_RETRY;
>                 }
> @@ -921,7 +921,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
>                         pt_element_t gpte;
>                         gpa_t pte_gpa;
>
> -                       if (!sp->unsync)
> +                       if (!sp->arch.unsync)
>                                 break;
>
>                         pte_gpa = FNAME(get_level1_sp_gpa)(sp);
> @@ -942,7 +942,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
>                         FNAME(prefetch_gpte)(vcpu, sp, sptep, gpte, false);
>                 }
>
> -               if (!sp->unsync_children)
> +               if (!sp->arch.unsync_children)
>                         break;
>         }
>         write_unlock(&vcpu->kvm->mmu_lock);
> @@ -974,8 +974,8 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  }
>
>  /*
> - * Using the information in sp->shadowed_translation (kvm_mmu_page_get_gfn()) is
> - * safe because:
> + * Using the information in sp->arch.shadowed_translation
> + * (kvm_mmu_page_get_gfn()) is safe because:
>   * - The spte has a reference to the struct page, so the pfn for a given gfn
>   *   can't change unless all sptes pointing to it are nuked first.
>   *
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 34d674080170..66231c7ed31e 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -270,7 +270,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
>  static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
>                             gfn_t gfn, union kvm_mmu_page_role role)
>  {
> -       INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
> +       INIT_LIST_HEAD(&sp->arch.possible_nx_huge_page_link);
>
>         set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
>
> @@ -385,7 +385,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
>  {
>         tdp_unaccount_mmu_page(kvm, sp);
>
> -       if (!sp->nx_huge_page_disallowed)
> +       if (!sp->arch.nx_huge_page_disallowed)
>                 return;
>
>         if (shared)
> @@ -393,7 +393,7 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
>         else
>                 lockdep_assert_held_write(&kvm->mmu_lock);
>
> -       sp->nx_huge_page_disallowed = false;
> +       sp->arch.nx_huge_page_disallowed = false;
>         untrack_possible_nx_huge_page(kvm, sp);
>
>         if (shared)
> @@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                 sp = tdp_mmu_alloc_sp(vcpu);
>                 tdp_mmu_init_child_sp(sp, &iter);
>
> -               sp->nx_huge_page_disallowed = fault->huge_page_disallowed;
> +               sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
>
>                 if (is_shadow_present_pte(iter.old_spte))
>                         r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index 19d3153051a3..e6a929089715 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -73,7 +73,7 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
>  #ifdef CONFIG_X86_64
>  static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
>  {
> -       return !sp->shadow_mmu_page;
> +       return !sp->arch.shadow_mmu_page;
>  }
>  #else
>  static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
> diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> index 14099956fdac..a9da33d4baa8 100644
> --- a/include/kvm/mmu_types.h
> +++ b/include/kvm/mmu_types.h
> @@ -3,8 +3,11 @@
>  #define __KVM_MMU_TYPES_H
>
>  #include <linux/bug.h>
> -#include <linux/types.h>
> +#include <linux/kvm_types.h>
> +#include <linux/refcount.h>
>  #include <linux/stddef.h>
> +#include <linux/types.h>
> +#include <linux/workqueue.h>
>
>  #include <asm/kvm/mmu_types.h>
>
> @@ -36,4 +39,31 @@ static_assert(sizeof(union kvm_mmu_page_role) == sizeof_field(union kvm_mmu_page
>
>  typedef u64 __rcu *tdp_ptep_t;
>
> +struct kvm_mmu_page {
> +       struct list_head link;
> +
> +       union kvm_mmu_page_role role;
> +
> +       /* The start of the GFN region mapped by this shadow page. */
> +       gfn_t gfn;

I'd like to put in a vote for getting rid of the "shadow page" /
"shadow page table entry" terminology through this refactor.
I can't count the number of times folks trying to understand the MMU
have gotten confused by the overloaded terminology.

> +
> +       /* The page table page. */
> +       u64 *spt;
> +
> +       /* The PTE that points to this shadow page. */
> +       tdp_ptep_t ptep;

Totally fine to not change this, but something like parent_ptep would
be clearer. The Shadow MMU uses parent_pteps, which I think is much
clearer.
Could also change the comment to just:
/* The PTE that points to this kvm_mmu_page. */

> +
> +       /* Used for freeing TDP MMU pages asynchronously. */
> +       struct rcu_head rcu_head;
> +
> +       /* The number of references to this shadow page as a root. */
> +       refcount_t root_refcount;
> +
> +       /* Used for tearing down an entire page table tree. */
> +       struct work_struct tdp_mmu_async_work;
> +       void *tdp_mmu_async_data;
> +
> +       struct kvm_mmu_page_arch arch;
> +};
> +
>  #endif /* !__KVM_MMU_TYPES_H */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  2022-12-12 17:39           ` Sean Christopherson
  (?)
  (?)
@ 2022-12-12 18:17             ` Oliver Upton
  -1 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-12 18:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Matlack, Yang, Weijiang, Paolo Bonzini, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 05:39:38PM +0000, Sean Christopherson wrote:
> On Fri, Dec 09, 2022, David Matlack wrote:
> > On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> > >
> > > On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > > index 4d188f056933..f375b719f565 100644
> > > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> > > > >     union kvm_cpu_role role = {0};
> > > > >     role.base.access = ACC_ALL;
> > > > > -   role.base.smm = is_smm(vcpu);
> > > > > +   role.base.as_id = is_smm(vcpu);
> > > >
> > > > I'm not familiar with other architectures, is there similar conception as
> > > > x86 smm mode?
> > 
> > The notion of address spaces is already existing architecture-neutral
> > concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
> > virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
> 
> Yes, SMM is currently the only use-case.
> 
> > Architectures that do not use multiple address spaces will just leave
> > as_id is as always 0.
> 
> My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> address spaces to support SMM emulation was a mistake that should be contained to
> SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> even x86 can opt out.

+1

I don't think something is architecture-neutral by virtue of it existing
in virt/kvm/*.

> Emulating something like TrustZone or EL3 would be quite similar.

/me shudders

I know it is for the sake of discussion, but for posterity: please no!

-- 
Best,
Oliver

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-12 18:17             ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-12 18:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Yang, Weijiang,
	Amit, Nadav, Ben Gardon, linux-riscv, kvmarm, Yu Zhao,
	Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, kvmarm, linux-mips,
	Colin Cross, kvm-riscv, Paolo Bonzini, Andrew Morton

On Mon, Dec 12, 2022 at 05:39:38PM +0000, Sean Christopherson wrote:
> On Fri, Dec 09, 2022, David Matlack wrote:
> > On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> > >
> > > On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > > index 4d188f056933..f375b719f565 100644
> > > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> > > > >     union kvm_cpu_role role = {0};
> > > > >     role.base.access = ACC_ALL;
> > > > > -   role.base.smm = is_smm(vcpu);
> > > > > +   role.base.as_id = is_smm(vcpu);
> > > >
> > > > I'm not familiar with other architectures, is there similar conception as
> > > > x86 smm mode?
> > 
> > The notion of address spaces is already existing architecture-neutral
> > concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
> > virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
> 
> Yes, SMM is currently the only use-case.
> 
> > Architectures that do not use multiple address spaces will just leave
> > as_id is as always 0.
> 
> My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> address spaces to support SMM emulation was a mistake that should be contained to
> SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> even x86 can opt out.

+1

I don't think something is architecture-neutral by virtue of it existing
in virt/kvm/*.

> Emulating something like TrustZone or EL3 would be quite similar.

/me shudders

I know it is for the sake of discussion, but for posterity: please no!

-- 
Best,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-12 18:17             ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-12 18:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Matlack, Yang, Weijiang, Paolo Bonzini, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 05:39:38PM +0000, Sean Christopherson wrote:
> On Fri, Dec 09, 2022, David Matlack wrote:
> > On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> > >
> > > On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > > index 4d188f056933..f375b719f565 100644
> > > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> > > > >     union kvm_cpu_role role = {0};
> > > > >     role.base.access = ACC_ALL;
> > > > > -   role.base.smm = is_smm(vcpu);
> > > > > +   role.base.as_id = is_smm(vcpu);
> > > >
> > > > I'm not familiar with other architectures, is there similar conception as
> > > > x86 smm mode?
> > 
> > The notion of address spaces is already existing architecture-neutral
> > concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
> > virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
> 
> Yes, SMM is currently the only use-case.
> 
> > Architectures that do not use multiple address spaces will just leave
> > as_id is as always 0.
> 
> My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> address spaces to support SMM emulation was a mistake that should be contained to
> SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> even x86 can opt out.

+1

I don't think something is architecture-neutral by virtue of it existing
in virt/kvm/*.

> Emulating something like TrustZone or EL3 would be quite similar.

/me shudders

I know it is for the sake of discussion, but for posterity: please no!

-- 
Best,
Oliver

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-12 18:17             ` Oliver Upton
  0 siblings, 0 replies; 317+ messages in thread
From: Oliver Upton @ 2022-12-12 18:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Matlack, Yang, Weijiang, Paolo Bonzini, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 05:39:38PM +0000, Sean Christopherson wrote:
> On Fri, Dec 09, 2022, David Matlack wrote:
> > On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> > >
> > > On Fri, Dec 09, 2022 at 10:37:47AM +0800, Yang, Weijiang wrote:
> > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > > index 4d188f056933..f375b719f565 100644
> > > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > > @@ -5056,7 +5056,7 @@ kvm_calc_cpu_role(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
> > > > >     union kvm_cpu_role role = {0};
> > > > >     role.base.access = ACC_ALL;
> > > > > -   role.base.smm = is_smm(vcpu);
> > > > > +   role.base.as_id = is_smm(vcpu);
> > > >
> > > > I'm not familiar with other architectures, is there similar conception as
> > > > x86 smm mode?
> > 
> > The notion of address spaces is already existing architecture-neutral
> > concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
> > virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
> 
> Yes, SMM is currently the only use-case.
> 
> > Architectures that do not use multiple address spaces will just leave
> > as_id is as always 0.
> 
> My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> address spaces to support SMM emulation was a mistake that should be contained to
> SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> even x86 can opt out.

+1

I don't think something is architecture-neutral by virtue of it existing
in virt/kvm/*.

> Emulating something like TrustZone or EL3 would be quite similar.

/me shudders

I know it is for the sake of discussion, but for posterity: please no!

-- 
Best,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
  2022-12-08 19:38   ` David Matlack
  (?)
  (?)
@ 2022-12-12 18:24     ` Ben Gardon
  -1 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 18:24 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_page_fault to common code. This will be used in a future
> commit to move the TDP MMU to common code.
>
> No functional change intended.
>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/include/asm/kvm/mmu_types.h | 20 +++++++
>  arch/x86/kvm/mmu/mmu.c               | 19 +++----
>  arch/x86/kvm/mmu/mmu_internal.h      | 79 ++++++----------------------
>  arch/x86/kvm/mmu/mmutrace.h          |  2 +-
>  arch/x86/kvm/mmu/paging_tmpl.h       | 14 ++---
>  arch/x86/kvm/mmu/tdp_mmu.c           |  6 +--
>  include/kvm/mmu_types.h              | 44 ++++++++++++++++
>  7 files changed, 100 insertions(+), 84 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
> index affcb520b482..59d1be85f4b7 100644
> --- a/arch/x86/include/asm/kvm/mmu_types.h
> +++ b/arch/x86/include/asm/kvm/mmu_types.h
> @@ -5,6 +5,7 @@
>  #include <linux/bitmap.h>
>  #include <linux/list.h>
>  #include <linux/types.h>
> +#include <linux/kvm_types.h>
>
>  /*
>   * This is a subset of the overall kvm_cpu_role to minimize the size of
> @@ -115,4 +116,23 @@ struct kvm_mmu_page_arch {
>         atomic_t write_flooding_count;
>  };
>
> +struct kvm_page_fault_arch {
> +       const u32 error_code;
> +
> +       /* x86-specific error code bits */
> +       const bool present;
> +       const bool rsvd;
> +       const bool user;
> +
> +       /* Derived from mmu and global state.  */
> +       const bool is_tdp;
> +       const bool nx_huge_page_workaround_enabled;
> +
> +       /*
> +        * Whether a >4KB mapping can be created or is forbidden due to NX
> +        * hugepages.
> +        */
> +       bool huge_page_disallowed;
> +};
> +
>  #endif /* !__ASM_KVM_MMU_TYPES_H */
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e47f35878ab5..0593d4a60139 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3092,7 +3092,8 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>         struct kvm_memory_slot *slot = fault->slot;
>         kvm_pfn_t mask;
>
> -       fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
> +       fault->arch.huge_page_disallowed =
> +               fault->exec && fault->arch.nx_huge_page_workaround_enabled;
>
>         if (unlikely(fault->max_level == PG_LEVEL_4K))
>                 return;
> @@ -3109,7 +3110,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>          */
>         fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
>                                                      fault->gfn, fault->max_level);
> -       if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
> +       if (fault->req_level == PG_LEVEL_4K || fault->arch.huge_page_disallowed)
>                 return;
>
>         /*
> @@ -3158,7 +3159,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                  * We cannot overwrite existing page tables with an NX
>                  * large page, as the leaf could be executable.
>                  */
> -               if (fault->nx_huge_page_workaround_enabled)
> +               if (fault->arch.nx_huge_page_workaround_enabled)
>                         disallowed_hugepage_adjust(fault, *it.sptep, it.level);
>
>                 base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
> @@ -3170,7 +3171,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                         continue;
>
>                 link_shadow_page(vcpu, it.sptep, sp);
> -               if (fault->huge_page_disallowed)
> +               if (fault->arch.huge_page_disallowed)
>                         account_nx_huge_page(vcpu->kvm, sp,
>                                              fault->req_level >= it.level);
>         }
> @@ -3221,7 +3222,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
>                                    struct kvm_page_fault *fault,
>                                    unsigned int access)
>  {
> -       gva_t gva = fault->is_tdp ? 0 : fault->addr;
> +       gva_t gva = fault->arch.is_tdp ? 0 : fault->addr;
>
>         vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
>                              access & shadow_mmio_access_mask);
> @@ -3255,7 +3256,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
>          * generation number.  Refreshing the MMIO generation needs to go down
>          * the slow path.  Note, EPT Misconfigs do NOT set the PRESENT flag!
>          */
> -       if (fault->rsvd)
> +       if (fault->arch.rsvd)
>                 return false;
>
>         /*
> @@ -3273,7 +3274,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
>          *    SPTE is MMU-writable (determined later), the fault can be fixed
>          *    by setting the Writable bit, which can be done out of mmu_lock.
>          */
> -       if (!fault->present)
> +       if (!fault->arch.present)
>                 return !kvm_ad_enabled();
>
>         /*
> @@ -4119,10 +4120,10 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
>  static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
>                                          struct kvm_page_fault *fault)
>  {
> -       if (unlikely(fault->rsvd))
> +       if (unlikely(fault->arch.rsvd))
>                 return false;
>
> -       if (!fault->present || !fault->write)
> +       if (!fault->arch.present || !fault->write)
>                 return false;
>
>         /*
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index af2ae4887e35..4abb80a3bd01 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -77,60 +77,6 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
>         return READ_ONCE(nx_huge_pages) && !kvm->arch.disable_nx_huge_pages;
>  }
>
> -struct kvm_page_fault {
> -       /* arguments to kvm_mmu_do_page_fault.  */
> -       const gpa_t addr;
> -       const u32 error_code;
> -       const bool prefetch;
> -
> -       /* Derived from error_code.  */
> -       const bool exec;
> -       const bool write;
> -       const bool present;
> -       const bool rsvd;
> -       const bool user;
> -
> -       /* Derived from mmu and global state.  */
> -       const bool is_tdp;
> -       const bool nx_huge_page_workaround_enabled;
> -
> -       /*
> -        * Whether a >4KB mapping can be created or is forbidden due to NX
> -        * hugepages.
> -        */
> -       bool huge_page_disallowed;
> -
> -       /*
> -        * Maximum page size that can be created for this fault; input to
> -        * FNAME(fetch), direct_map() and kvm_tdp_mmu_map().
> -        */
> -       u8 max_level;
> -
> -       /*
> -        * Page size that can be created based on the max_level and the
> -        * page size used by the host mapping.
> -        */
> -       u8 req_level;
> -
> -       /*
> -        * Page size that will be created based on the req_level and
> -        * huge_page_disallowed.
> -        */
> -       u8 goal_level;
> -
> -       /* Shifted addr, or result of guest page table walk if addr is a gva.  */
> -       gfn_t gfn;
> -
> -       /* The memslot containing gfn. May be NULL. */
> -       struct kvm_memory_slot *slot;
> -
> -       /* Outputs of kvm_faultin_pfn.  */
> -       unsigned long mmu_seq;
> -       kvm_pfn_t pfn;
> -       hva_t hva;
> -       bool map_writable;
> -};
> -
>  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
>
>  /*
> @@ -164,22 +110,27 @@ enum {
>  static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>                                         u32 err, bool prefetch)
>  {
> +       bool is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault);
>         struct kvm_page_fault fault = {
>                 .addr = cr2_or_gpa,
> -               .error_code = err,
> -               .exec = err & PFERR_FETCH_MASK,
> -               .write = err & PFERR_WRITE_MASK,
> -               .present = err & PFERR_PRESENT_MASK,
> -               .rsvd = err & PFERR_RSVD_MASK,
> -               .user = err & PFERR_USER_MASK,
>                 .prefetch = prefetch,
> -               .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
> -               .nx_huge_page_workaround_enabled =
> -                       is_nx_huge_page_enabled(vcpu->kvm),
> +
> +               .write = err & PFERR_WRITE_MASK,
> +               .exec = err & PFERR_FETCH_MASK,
>
>                 .max_level = KVM_MAX_HUGEPAGE_LEVEL,
>                 .req_level = PG_LEVEL_4K,
>                 .goal_level = PG_LEVEL_4K,
> +
> +               .arch = {
> +                       .error_code = err,
> +                       .present = err & PFERR_PRESENT_MASK,
> +                       .rsvd = err & PFERR_RSVD_MASK,
> +                       .user = err & PFERR_USER_MASK,
> +                       .is_tdp = is_tdp,
> +                       .nx_huge_page_workaround_enabled =
> +                               is_nx_huge_page_enabled(vcpu->kvm),
> +               },
>         };
>         int r;
>
> @@ -196,7 +147,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>         if (!prefetch)
>                 vcpu->stat.pf_taken++;
>
> -       if (IS_ENABLED(CONFIG_RETPOLINE) && fault.is_tdp)
> +       if (IS_ENABLED(CONFIG_RETPOLINE) && fault.arch.is_tdp)
>                 r = kvm_tdp_page_fault(vcpu, &fault);
>         else
>                 r = vcpu->arch.mmu->page_fault(vcpu, &fault);
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index 335f26dabdf3..b01767acf073 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -270,7 +270,7 @@ TRACE_EVENT(
>         TP_fast_assign(
>                 __entry->vcpu_id = vcpu->vcpu_id;
>                 __entry->cr2_or_gpa = fault->addr;
> -               __entry->error_code = fault->error_code;
> +               __entry->error_code = fault->arch.error_code;
>                 __entry->sptep = sptep;
>                 __entry->old_spte = old_spte;
>                 __entry->new_spte = *sptep;
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 18bb92b70a01..daf9c7731edc 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -698,7 +698,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>                  * We cannot overwrite existing page tables with an NX
>                  * large page, as the leaf could be executable.
>                  */
> -               if (fault->nx_huge_page_workaround_enabled)
> +               if (fault->arch.nx_huge_page_workaround_enabled)
>                         disallowed_hugepage_adjust(fault, *it.sptep, it.level);
>
>                 base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
> @@ -713,7 +713,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>                         continue;
>
>                 link_shadow_page(vcpu, it.sptep, sp);
> -               if (fault->huge_page_disallowed)
> +               if (fault->arch.huge_page_disallowed)
>                         account_nx_huge_page(vcpu->kvm, sp,
>                                              fault->req_level >= it.level);
>         }
> @@ -793,8 +793,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>         int r;
>         bool is_self_change_mapping;
>
> -       pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
> -       WARN_ON_ONCE(fault->is_tdp);
> +       pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->arch.error_code);
> +       WARN_ON_ONCE(fault->arch.is_tdp);
>
>         /*
>          * Look up the guest pte for the faulting address.
> @@ -802,7 +802,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>          * The bit needs to be cleared before walking guest page tables.
>          */
>         r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
> -                            fault->error_code & ~PFERR_RSVD_MASK);
> +                            fault->arch.error_code & ~PFERR_RSVD_MASK);
>
>         /*
>          * The page is not mapped by the guest.  Let the guest handle it.
> @@ -830,7 +830,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>         vcpu->arch.write_fault_to_shadow_pgtable = false;
>
>         is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
> -             &walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable);
> +             &walker, fault->arch.user, &vcpu->arch.write_fault_to_shadow_pgtable);
>
>         if (is_self_change_mapping)
>                 fault->max_level = PG_LEVEL_4K;
> @@ -846,7 +846,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>          * we will cache the incorrect access into mmio spte.
>          */
>         if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) &&
> -           !is_cr0_wp(vcpu->arch.mmu) && !fault->user && fault->slot) {
> +           !is_cr0_wp(vcpu->arch.mmu) && !fault->arch.user && fault->slot) {
>                 walker.pte_access |= ACC_WRITE_MASK;
>                 walker.pte_access &= ~ACC_USER_MASK;
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 66231c7ed31e..4940413d3767 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1156,7 +1156,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>         tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
>                 int r;
>
> -               if (fault->nx_huge_page_workaround_enabled)
> +               if (fault->arch.nx_huge_page_workaround_enabled)
>                         disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
>
>                 if (iter.level == fault->goal_level)
> @@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                 sp = tdp_mmu_alloc_sp(vcpu);
>                 tdp_mmu_init_child_sp(sp, &iter);
>
> -               sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
> +               sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
>
>                 if (is_shadow_present_pte(iter.old_spte))
>                         r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
> @@ -1197,7 +1197,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                         goto retry;
>                 }
>
> -               if (fault->huge_page_disallowed &&
> +               if (fault->arch.huge_page_disallowed &&
>                     fault->req_level >= iter.level) {
>                         spin_lock(&kvm->arch.tdp_mmu_pages_lock);
>                         track_possible_nx_huge_page(kvm, sp);
> diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> index a9da33d4baa8..9f0ca920bf68 100644
> --- a/include/kvm/mmu_types.h
> +++ b/include/kvm/mmu_types.h
> @@ -66,4 +66,48 @@ struct kvm_mmu_page {
>         struct kvm_mmu_page_arch arch;
>  };
>
> +struct kvm_page_fault {
> +       /* The raw faulting address. */
> +       const gpa_t addr;
> +
> +       /* Whether the fault was synthesized to prefetch a mapping. */
> +       const bool prefetch;
> +
> +       /* Information about the cause of the fault. */
> +       const bool write;
> +       const bool exec;
> +
> +       /* Shifted addr, or result of guest page table walk if shadow paging. */
> +       gfn_t gfn;

Is this redundant to have in common code? If we're not doing common
shadow paging, then this is just addr shifted. Would this be better
placed in the arch specific struct?

> +
> +       /* The memslot that contains @gfn. May be NULL. */
> +       struct kvm_memory_slot *slot;
> +
> +       /* Maximum page size that can be created for this fault. */
> +       u8 max_level;
> +
> +       /*
> +        * Page size that can be created based on the max_level and the page
> +        * size used by the host mapping.
> +        */
> +       u8 req_level;
> +
> +       /* Final page size that will be created. */
> +       u8 goal_level;
> +
> +       /*
> +        * The value of kvm->mmu_invalidate_seq before fetching the host
> +        * mapping. Used to verify that the host mapping has not changed
> +        * after grabbing the MMU lock.
> +        */
> +       unsigned long mmu_seq;

Should this be ifdef'ed with  KVM_ARCH_WANT_MMU_NOTIFIER?

> +
> +       /* Information about the host mapping. */
> +       kvm_pfn_t pfn;
> +       hva_t hva;
> +       bool map_writable;
> +
> +       struct kvm_page_fault_arch arch;
> +};
> +
>  #endif /* !__KVM_MMU_TYPES_H */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
@ 2022-12-12 18:24     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 18:24 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_page_fault to common code. This will be used in a future
> commit to move the TDP MMU to common code.
>
> No functional change intended.
>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/include/asm/kvm/mmu_types.h | 20 +++++++
>  arch/x86/kvm/mmu/mmu.c               | 19 +++----
>  arch/x86/kvm/mmu/mmu_internal.h      | 79 ++++++----------------------
>  arch/x86/kvm/mmu/mmutrace.h          |  2 +-
>  arch/x86/kvm/mmu/paging_tmpl.h       | 14 ++---
>  arch/x86/kvm/mmu/tdp_mmu.c           |  6 +--
>  include/kvm/mmu_types.h              | 44 ++++++++++++++++
>  7 files changed, 100 insertions(+), 84 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
> index affcb520b482..59d1be85f4b7 100644
> --- a/arch/x86/include/asm/kvm/mmu_types.h
> +++ b/arch/x86/include/asm/kvm/mmu_types.h
> @@ -5,6 +5,7 @@
>  #include <linux/bitmap.h>
>  #include <linux/list.h>
>  #include <linux/types.h>
> +#include <linux/kvm_types.h>
>
>  /*
>   * This is a subset of the overall kvm_cpu_role to minimize the size of
> @@ -115,4 +116,23 @@ struct kvm_mmu_page_arch {
>         atomic_t write_flooding_count;
>  };
>
> +struct kvm_page_fault_arch {
> +       const u32 error_code;
> +
> +       /* x86-specific error code bits */
> +       const bool present;
> +       const bool rsvd;
> +       const bool user;
> +
> +       /* Derived from mmu and global state.  */
> +       const bool is_tdp;
> +       const bool nx_huge_page_workaround_enabled;
> +
> +       /*
> +        * Whether a >4KB mapping can be created or is forbidden due to NX
> +        * hugepages.
> +        */
> +       bool huge_page_disallowed;
> +};
> +
>  #endif /* !__ASM_KVM_MMU_TYPES_H */
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e47f35878ab5..0593d4a60139 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3092,7 +3092,8 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>         struct kvm_memory_slot *slot = fault->slot;
>         kvm_pfn_t mask;
>
> -       fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
> +       fault->arch.huge_page_disallowed =
> +               fault->exec && fault->arch.nx_huge_page_workaround_enabled;
>
>         if (unlikely(fault->max_level == PG_LEVEL_4K))
>                 return;
> @@ -3109,7 +3110,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>          */
>         fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
>                                                      fault->gfn, fault->max_level);
> -       if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
> +       if (fault->req_level == PG_LEVEL_4K || fault->arch.huge_page_disallowed)
>                 return;
>
>         /*
> @@ -3158,7 +3159,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                  * We cannot overwrite existing page tables with an NX
>                  * large page, as the leaf could be executable.
>                  */
> -               if (fault->nx_huge_page_workaround_enabled)
> +               if (fault->arch.nx_huge_page_workaround_enabled)
>                         disallowed_hugepage_adjust(fault, *it.sptep, it.level);
>
>                 base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
> @@ -3170,7 +3171,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                         continue;
>
>                 link_shadow_page(vcpu, it.sptep, sp);
> -               if (fault->huge_page_disallowed)
> +               if (fault->arch.huge_page_disallowed)
>                         account_nx_huge_page(vcpu->kvm, sp,
>                                              fault->req_level >= it.level);
>         }
> @@ -3221,7 +3222,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
>                                    struct kvm_page_fault *fault,
>                                    unsigned int access)
>  {
> -       gva_t gva = fault->is_tdp ? 0 : fault->addr;
> +       gva_t gva = fault->arch.is_tdp ? 0 : fault->addr;
>
>         vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
>                              access & shadow_mmio_access_mask);
> @@ -3255,7 +3256,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
>          * generation number.  Refreshing the MMIO generation needs to go down
>          * the slow path.  Note, EPT Misconfigs do NOT set the PRESENT flag!
>          */
> -       if (fault->rsvd)
> +       if (fault->arch.rsvd)
>                 return false;
>
>         /*
> @@ -3273,7 +3274,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
>          *    SPTE is MMU-writable (determined later), the fault can be fixed
>          *    by setting the Writable bit, which can be done out of mmu_lock.
>          */
> -       if (!fault->present)
> +       if (!fault->arch.present)
>                 return !kvm_ad_enabled();
>
>         /*
> @@ -4119,10 +4120,10 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
>  static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
>                                          struct kvm_page_fault *fault)
>  {
> -       if (unlikely(fault->rsvd))
> +       if (unlikely(fault->arch.rsvd))
>                 return false;
>
> -       if (!fault->present || !fault->write)
> +       if (!fault->arch.present || !fault->write)
>                 return false;
>
>         /*
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index af2ae4887e35..4abb80a3bd01 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -77,60 +77,6 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
>         return READ_ONCE(nx_huge_pages) && !kvm->arch.disable_nx_huge_pages;
>  }
>
> -struct kvm_page_fault {
> -       /* arguments to kvm_mmu_do_page_fault.  */
> -       const gpa_t addr;
> -       const u32 error_code;
> -       const bool prefetch;
> -
> -       /* Derived from error_code.  */
> -       const bool exec;
> -       const bool write;
> -       const bool present;
> -       const bool rsvd;
> -       const bool user;
> -
> -       /* Derived from mmu and global state.  */
> -       const bool is_tdp;
> -       const bool nx_huge_page_workaround_enabled;
> -
> -       /*
> -        * Whether a >4KB mapping can be created or is forbidden due to NX
> -        * hugepages.
> -        */
> -       bool huge_page_disallowed;
> -
> -       /*
> -        * Maximum page size that can be created for this fault; input to
> -        * FNAME(fetch), direct_map() and kvm_tdp_mmu_map().
> -        */
> -       u8 max_level;
> -
> -       /*
> -        * Page size that can be created based on the max_level and the
> -        * page size used by the host mapping.
> -        */
> -       u8 req_level;
> -
> -       /*
> -        * Page size that will be created based on the req_level and
> -        * huge_page_disallowed.
> -        */
> -       u8 goal_level;
> -
> -       /* Shifted addr, or result of guest page table walk if addr is a gva.  */
> -       gfn_t gfn;
> -
> -       /* The memslot containing gfn. May be NULL. */
> -       struct kvm_memory_slot *slot;
> -
> -       /* Outputs of kvm_faultin_pfn.  */
> -       unsigned long mmu_seq;
> -       kvm_pfn_t pfn;
> -       hva_t hva;
> -       bool map_writable;
> -};
> -
>  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
>
>  /*
> @@ -164,22 +110,27 @@ enum {
>  static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>                                         u32 err, bool prefetch)
>  {
> +       bool is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault);
>         struct kvm_page_fault fault = {
>                 .addr = cr2_or_gpa,
> -               .error_code = err,
> -               .exec = err & PFERR_FETCH_MASK,
> -               .write = err & PFERR_WRITE_MASK,
> -               .present = err & PFERR_PRESENT_MASK,
> -               .rsvd = err & PFERR_RSVD_MASK,
> -               .user = err & PFERR_USER_MASK,
>                 .prefetch = prefetch,
> -               .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
> -               .nx_huge_page_workaround_enabled =
> -                       is_nx_huge_page_enabled(vcpu->kvm),
> +
> +               .write = err & PFERR_WRITE_MASK,
> +               .exec = err & PFERR_FETCH_MASK,
>
>                 .max_level = KVM_MAX_HUGEPAGE_LEVEL,
>                 .req_level = PG_LEVEL_4K,
>                 .goal_level = PG_LEVEL_4K,
> +
> +               .arch = {
> +                       .error_code = err,
> +                       .present = err & PFERR_PRESENT_MASK,
> +                       .rsvd = err & PFERR_RSVD_MASK,
> +                       .user = err & PFERR_USER_MASK,
> +                       .is_tdp = is_tdp,
> +                       .nx_huge_page_workaround_enabled =
> +                               is_nx_huge_page_enabled(vcpu->kvm),
> +               },
>         };
>         int r;
>
> @@ -196,7 +147,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>         if (!prefetch)
>                 vcpu->stat.pf_taken++;
>
> -       if (IS_ENABLED(CONFIG_RETPOLINE) && fault.is_tdp)
> +       if (IS_ENABLED(CONFIG_RETPOLINE) && fault.arch.is_tdp)
>                 r = kvm_tdp_page_fault(vcpu, &fault);
>         else
>                 r = vcpu->arch.mmu->page_fault(vcpu, &fault);
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index 335f26dabdf3..b01767acf073 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -270,7 +270,7 @@ TRACE_EVENT(
>         TP_fast_assign(
>                 __entry->vcpu_id = vcpu->vcpu_id;
>                 __entry->cr2_or_gpa = fault->addr;
> -               __entry->error_code = fault->error_code;
> +               __entry->error_code = fault->arch.error_code;
>                 __entry->sptep = sptep;
>                 __entry->old_spte = old_spte;
>                 __entry->new_spte = *sptep;
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 18bb92b70a01..daf9c7731edc 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -698,7 +698,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>                  * We cannot overwrite existing page tables with an NX
>                  * large page, as the leaf could be executable.
>                  */
> -               if (fault->nx_huge_page_workaround_enabled)
> +               if (fault->arch.nx_huge_page_workaround_enabled)
>                         disallowed_hugepage_adjust(fault, *it.sptep, it.level);
>
>                 base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
> @@ -713,7 +713,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>                         continue;
>
>                 link_shadow_page(vcpu, it.sptep, sp);
> -               if (fault->huge_page_disallowed)
> +               if (fault->arch.huge_page_disallowed)
>                         account_nx_huge_page(vcpu->kvm, sp,
>                                              fault->req_level >= it.level);
>         }
> @@ -793,8 +793,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>         int r;
>         bool is_self_change_mapping;
>
> -       pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
> -       WARN_ON_ONCE(fault->is_tdp);
> +       pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->arch.error_code);
> +       WARN_ON_ONCE(fault->arch.is_tdp);
>
>         /*
>          * Look up the guest pte for the faulting address.
> @@ -802,7 +802,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>          * The bit needs to be cleared before walking guest page tables.
>          */
>         r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
> -                            fault->error_code & ~PFERR_RSVD_MASK);
> +                            fault->arch.error_code & ~PFERR_RSVD_MASK);
>
>         /*
>          * The page is not mapped by the guest.  Let the guest handle it.
> @@ -830,7 +830,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>         vcpu->arch.write_fault_to_shadow_pgtable = false;
>
>         is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
> -             &walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable);
> +             &walker, fault->arch.user, &vcpu->arch.write_fault_to_shadow_pgtable);
>
>         if (is_self_change_mapping)
>                 fault->max_level = PG_LEVEL_4K;
> @@ -846,7 +846,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>          * we will cache the incorrect access into mmio spte.
>          */
>         if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) &&
> -           !is_cr0_wp(vcpu->arch.mmu) && !fault->user && fault->slot) {
> +           !is_cr0_wp(vcpu->arch.mmu) && !fault->arch.user && fault->slot) {
>                 walker.pte_access |= ACC_WRITE_MASK;
>                 walker.pte_access &= ~ACC_USER_MASK;
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 66231c7ed31e..4940413d3767 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1156,7 +1156,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>         tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
>                 int r;
>
> -               if (fault->nx_huge_page_workaround_enabled)
> +               if (fault->arch.nx_huge_page_workaround_enabled)
>                         disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
>
>                 if (iter.level == fault->goal_level)
> @@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                 sp = tdp_mmu_alloc_sp(vcpu);
>                 tdp_mmu_init_child_sp(sp, &iter);
>
> -               sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
> +               sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
>
>                 if (is_shadow_present_pte(iter.old_spte))
>                         r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
> @@ -1197,7 +1197,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                         goto retry;
>                 }
>
> -               if (fault->huge_page_disallowed &&
> +               if (fault->arch.huge_page_disallowed &&
>                     fault->req_level >= iter.level) {
>                         spin_lock(&kvm->arch.tdp_mmu_pages_lock);
>                         track_possible_nx_huge_page(kvm, sp);
> diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> index a9da33d4baa8..9f0ca920bf68 100644
> --- a/include/kvm/mmu_types.h
> +++ b/include/kvm/mmu_types.h
> @@ -66,4 +66,48 @@ struct kvm_mmu_page {
>         struct kvm_mmu_page_arch arch;
>  };
>
> +struct kvm_page_fault {
> +       /* The raw faulting address. */
> +       const gpa_t addr;
> +
> +       /* Whether the fault was synthesized to prefetch a mapping. */
> +       const bool prefetch;
> +
> +       /* Information about the cause of the fault. */
> +       const bool write;
> +       const bool exec;
> +
> +       /* Shifted addr, or result of guest page table walk if shadow paging. */
> +       gfn_t gfn;

Is this redundant to have in common code? If we're not doing common
shadow paging, then this is just addr shifted. Would this be better
placed in the arch specific struct?

> +
> +       /* The memslot that contains @gfn. May be NULL. */
> +       struct kvm_memory_slot *slot;
> +
> +       /* Maximum page size that can be created for this fault. */
> +       u8 max_level;
> +
> +       /*
> +        * Page size that can be created based on the max_level and the page
> +        * size used by the host mapping.
> +        */
> +       u8 req_level;
> +
> +       /* Final page size that will be created. */
> +       u8 goal_level;
> +
> +       /*
> +        * The value of kvm->mmu_invalidate_seq before fetching the host
> +        * mapping. Used to verify that the host mapping has not changed
> +        * after grabbing the MMU lock.
> +        */
> +       unsigned long mmu_seq;

Should this be ifdef'ed with  KVM_ARCH_WANT_MMU_NOTIFIER?

> +
> +       /* Information about the host mapping. */
> +       kvm_pfn_t pfn;
> +       hva_t hva;
> +       bool map_writable;
> +
> +       struct kvm_page_fault_arch arch;
> +};
> +
>  #endif /* !__KVM_MMU_TYPES_H */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
@ 2022-12-12 18:24     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 18:24 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, linux-riscv, kvmarm, Yu Zhao, Marc Zyngier,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Paolo Bonzini,
	Andrew Morton

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_page_fault to common code. This will be used in a future
> commit to move the TDP MMU to common code.
>
> No functional change intended.
>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/include/asm/kvm/mmu_types.h | 20 +++++++
>  arch/x86/kvm/mmu/mmu.c               | 19 +++----
>  arch/x86/kvm/mmu/mmu_internal.h      | 79 ++++++----------------------
>  arch/x86/kvm/mmu/mmutrace.h          |  2 +-
>  arch/x86/kvm/mmu/paging_tmpl.h       | 14 ++---
>  arch/x86/kvm/mmu/tdp_mmu.c           |  6 +--
>  include/kvm/mmu_types.h              | 44 ++++++++++++++++
>  7 files changed, 100 insertions(+), 84 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
> index affcb520b482..59d1be85f4b7 100644
> --- a/arch/x86/include/asm/kvm/mmu_types.h
> +++ b/arch/x86/include/asm/kvm/mmu_types.h
> @@ -5,6 +5,7 @@
>  #include <linux/bitmap.h>
>  #include <linux/list.h>
>  #include <linux/types.h>
> +#include <linux/kvm_types.h>
>
>  /*
>   * This is a subset of the overall kvm_cpu_role to minimize the size of
> @@ -115,4 +116,23 @@ struct kvm_mmu_page_arch {
>         atomic_t write_flooding_count;
>  };
>
> +struct kvm_page_fault_arch {
> +       const u32 error_code;
> +
> +       /* x86-specific error code bits */
> +       const bool present;
> +       const bool rsvd;
> +       const bool user;
> +
> +       /* Derived from mmu and global state.  */
> +       const bool is_tdp;
> +       const bool nx_huge_page_workaround_enabled;
> +
> +       /*
> +        * Whether a >4KB mapping can be created or is forbidden due to NX
> +        * hugepages.
> +        */
> +       bool huge_page_disallowed;
> +};
> +
>  #endif /* !__ASM_KVM_MMU_TYPES_H */
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e47f35878ab5..0593d4a60139 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3092,7 +3092,8 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>         struct kvm_memory_slot *slot = fault->slot;
>         kvm_pfn_t mask;
>
> -       fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
> +       fault->arch.huge_page_disallowed =
> +               fault->exec && fault->arch.nx_huge_page_workaround_enabled;
>
>         if (unlikely(fault->max_level == PG_LEVEL_4K))
>                 return;
> @@ -3109,7 +3110,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>          */
>         fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
>                                                      fault->gfn, fault->max_level);
> -       if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
> +       if (fault->req_level == PG_LEVEL_4K || fault->arch.huge_page_disallowed)
>                 return;
>
>         /*
> @@ -3158,7 +3159,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                  * We cannot overwrite existing page tables with an NX
>                  * large page, as the leaf could be executable.
>                  */
> -               if (fault->nx_huge_page_workaround_enabled)
> +               if (fault->arch.nx_huge_page_workaround_enabled)
>                         disallowed_hugepage_adjust(fault, *it.sptep, it.level);
>
>                 base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
> @@ -3170,7 +3171,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                         continue;
>
>                 link_shadow_page(vcpu, it.sptep, sp);
> -               if (fault->huge_page_disallowed)
> +               if (fault->arch.huge_page_disallowed)
>                         account_nx_huge_page(vcpu->kvm, sp,
>                                              fault->req_level >= it.level);
>         }
> @@ -3221,7 +3222,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
>                                    struct kvm_page_fault *fault,
>                                    unsigned int access)
>  {
> -       gva_t gva = fault->is_tdp ? 0 : fault->addr;
> +       gva_t gva = fault->arch.is_tdp ? 0 : fault->addr;
>
>         vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
>                              access & shadow_mmio_access_mask);
> @@ -3255,7 +3256,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
>          * generation number.  Refreshing the MMIO generation needs to go down
>          * the slow path.  Note, EPT Misconfigs do NOT set the PRESENT flag!
>          */
> -       if (fault->rsvd)
> +       if (fault->arch.rsvd)
>                 return false;
>
>         /*
> @@ -3273,7 +3274,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
>          *    SPTE is MMU-writable (determined later), the fault can be fixed
>          *    by setting the Writable bit, which can be done out of mmu_lock.
>          */
> -       if (!fault->present)
> +       if (!fault->arch.present)
>                 return !kvm_ad_enabled();
>
>         /*
> @@ -4119,10 +4120,10 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
>  static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
>                                          struct kvm_page_fault *fault)
>  {
> -       if (unlikely(fault->rsvd))
> +       if (unlikely(fault->arch.rsvd))
>                 return false;
>
> -       if (!fault->present || !fault->write)
> +       if (!fault->arch.present || !fault->write)
>                 return false;
>
>         /*
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index af2ae4887e35..4abb80a3bd01 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -77,60 +77,6 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
>         return READ_ONCE(nx_huge_pages) && !kvm->arch.disable_nx_huge_pages;
>  }
>
> -struct kvm_page_fault {
> -       /* arguments to kvm_mmu_do_page_fault.  */
> -       const gpa_t addr;
> -       const u32 error_code;
> -       const bool prefetch;
> -
> -       /* Derived from error_code.  */
> -       const bool exec;
> -       const bool write;
> -       const bool present;
> -       const bool rsvd;
> -       const bool user;
> -
> -       /* Derived from mmu and global state.  */
> -       const bool is_tdp;
> -       const bool nx_huge_page_workaround_enabled;
> -
> -       /*
> -        * Whether a >4KB mapping can be created or is forbidden due to NX
> -        * hugepages.
> -        */
> -       bool huge_page_disallowed;
> -
> -       /*
> -        * Maximum page size that can be created for this fault; input to
> -        * FNAME(fetch), direct_map() and kvm_tdp_mmu_map().
> -        */
> -       u8 max_level;
> -
> -       /*
> -        * Page size that can be created based on the max_level and the
> -        * page size used by the host mapping.
> -        */
> -       u8 req_level;
> -
> -       /*
> -        * Page size that will be created based on the req_level and
> -        * huge_page_disallowed.
> -        */
> -       u8 goal_level;
> -
> -       /* Shifted addr, or result of guest page table walk if addr is a gva.  */
> -       gfn_t gfn;
> -
> -       /* The memslot containing gfn. May be NULL. */
> -       struct kvm_memory_slot *slot;
> -
> -       /* Outputs of kvm_faultin_pfn.  */
> -       unsigned long mmu_seq;
> -       kvm_pfn_t pfn;
> -       hva_t hva;
> -       bool map_writable;
> -};
> -
>  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
>
>  /*
> @@ -164,22 +110,27 @@ enum {
>  static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>                                         u32 err, bool prefetch)
>  {
> +       bool is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault);
>         struct kvm_page_fault fault = {
>                 .addr = cr2_or_gpa,
> -               .error_code = err,
> -               .exec = err & PFERR_FETCH_MASK,
> -               .write = err & PFERR_WRITE_MASK,
> -               .present = err & PFERR_PRESENT_MASK,
> -               .rsvd = err & PFERR_RSVD_MASK,
> -               .user = err & PFERR_USER_MASK,
>                 .prefetch = prefetch,
> -               .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
> -               .nx_huge_page_workaround_enabled =
> -                       is_nx_huge_page_enabled(vcpu->kvm),
> +
> +               .write = err & PFERR_WRITE_MASK,
> +               .exec = err & PFERR_FETCH_MASK,
>
>                 .max_level = KVM_MAX_HUGEPAGE_LEVEL,
>                 .req_level = PG_LEVEL_4K,
>                 .goal_level = PG_LEVEL_4K,
> +
> +               .arch = {
> +                       .error_code = err,
> +                       .present = err & PFERR_PRESENT_MASK,
> +                       .rsvd = err & PFERR_RSVD_MASK,
> +                       .user = err & PFERR_USER_MASK,
> +                       .is_tdp = is_tdp,
> +                       .nx_huge_page_workaround_enabled =
> +                               is_nx_huge_page_enabled(vcpu->kvm),
> +               },
>         };
>         int r;
>
> @@ -196,7 +147,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>         if (!prefetch)
>                 vcpu->stat.pf_taken++;
>
> -       if (IS_ENABLED(CONFIG_RETPOLINE) && fault.is_tdp)
> +       if (IS_ENABLED(CONFIG_RETPOLINE) && fault.arch.is_tdp)
>                 r = kvm_tdp_page_fault(vcpu, &fault);
>         else
>                 r = vcpu->arch.mmu->page_fault(vcpu, &fault);
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index 335f26dabdf3..b01767acf073 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -270,7 +270,7 @@ TRACE_EVENT(
>         TP_fast_assign(
>                 __entry->vcpu_id = vcpu->vcpu_id;
>                 __entry->cr2_or_gpa = fault->addr;
> -               __entry->error_code = fault->error_code;
> +               __entry->error_code = fault->arch.error_code;
>                 __entry->sptep = sptep;
>                 __entry->old_spte = old_spte;
>                 __entry->new_spte = *sptep;
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 18bb92b70a01..daf9c7731edc 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -698,7 +698,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>                  * We cannot overwrite existing page tables with an NX
>                  * large page, as the leaf could be executable.
>                  */
> -               if (fault->nx_huge_page_workaround_enabled)
> +               if (fault->arch.nx_huge_page_workaround_enabled)
>                         disallowed_hugepage_adjust(fault, *it.sptep, it.level);
>
>                 base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
> @@ -713,7 +713,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>                         continue;
>
>                 link_shadow_page(vcpu, it.sptep, sp);
> -               if (fault->huge_page_disallowed)
> +               if (fault->arch.huge_page_disallowed)
>                         account_nx_huge_page(vcpu->kvm, sp,
>                                              fault->req_level >= it.level);
>         }
> @@ -793,8 +793,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>         int r;
>         bool is_self_change_mapping;
>
> -       pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
> -       WARN_ON_ONCE(fault->is_tdp);
> +       pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->arch.error_code);
> +       WARN_ON_ONCE(fault->arch.is_tdp);
>
>         /*
>          * Look up the guest pte for the faulting address.
> @@ -802,7 +802,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>          * The bit needs to be cleared before walking guest page tables.
>          */
>         r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
> -                            fault->error_code & ~PFERR_RSVD_MASK);
> +                            fault->arch.error_code & ~PFERR_RSVD_MASK);
>
>         /*
>          * The page is not mapped by the guest.  Let the guest handle it.
> @@ -830,7 +830,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>         vcpu->arch.write_fault_to_shadow_pgtable = false;
>
>         is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
> -             &walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable);
> +             &walker, fault->arch.user, &vcpu->arch.write_fault_to_shadow_pgtable);
>
>         if (is_self_change_mapping)
>                 fault->max_level = PG_LEVEL_4K;
> @@ -846,7 +846,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>          * we will cache the incorrect access into mmio spte.
>          */
>         if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) &&
> -           !is_cr0_wp(vcpu->arch.mmu) && !fault->user && fault->slot) {
> +           !is_cr0_wp(vcpu->arch.mmu) && !fault->arch.user && fault->slot) {
>                 walker.pte_access |= ACC_WRITE_MASK;
>                 walker.pte_access &= ~ACC_USER_MASK;
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 66231c7ed31e..4940413d3767 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1156,7 +1156,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>         tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
>                 int r;
>
> -               if (fault->nx_huge_page_workaround_enabled)
> +               if (fault->arch.nx_huge_page_workaround_enabled)
>                         disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
>
>                 if (iter.level == fault->goal_level)
> @@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                 sp = tdp_mmu_alloc_sp(vcpu);
>                 tdp_mmu_init_child_sp(sp, &iter);
>
> -               sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
> +               sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
>
>                 if (is_shadow_present_pte(iter.old_spte))
>                         r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
> @@ -1197,7 +1197,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                         goto retry;
>                 }
>
> -               if (fault->huge_page_disallowed &&
> +               if (fault->arch.huge_page_disallowed &&
>                     fault->req_level >= iter.level) {
>                         spin_lock(&kvm->arch.tdp_mmu_pages_lock);
>                         track_possible_nx_huge_page(kvm, sp);
> diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> index a9da33d4baa8..9f0ca920bf68 100644
> --- a/include/kvm/mmu_types.h
> +++ b/include/kvm/mmu_types.h
> @@ -66,4 +66,48 @@ struct kvm_mmu_page {
>         struct kvm_mmu_page_arch arch;
>  };
>
> +struct kvm_page_fault {
> +       /* The raw faulting address. */
> +       const gpa_t addr;
> +
> +       /* Whether the fault was synthesized to prefetch a mapping. */
> +       const bool prefetch;
> +
> +       /* Information about the cause of the fault. */
> +       const bool write;
> +       const bool exec;
> +
> +       /* Shifted addr, or result of guest page table walk if shadow paging. */
> +       gfn_t gfn;

Is this redundant to have in common code? If we're not doing common
shadow paging, then this is just addr shifted. Would this be better
placed in the arch specific struct?

> +
> +       /* The memslot that contains @gfn. May be NULL. */
> +       struct kvm_memory_slot *slot;
> +
> +       /* Maximum page size that can be created for this fault. */
> +       u8 max_level;
> +
> +       /*
> +        * Page size that can be created based on the max_level and the page
> +        * size used by the host mapping.
> +        */
> +       u8 req_level;
> +
> +       /* Final page size that will be created. */
> +       u8 goal_level;
> +
> +       /*
> +        * The value of kvm->mmu_invalidate_seq before fetching the host
> +        * mapping. Used to verify that the host mapping has not changed
> +        * after grabbing the MMU lock.
> +        */
> +       unsigned long mmu_seq;

Should this be ifdef'ed with  KVM_ARCH_WANT_MMU_NOTIFIER?

> +
> +       /* Information about the host mapping. */
> +       kvm_pfn_t pfn;
> +       hva_t hva;
> +       bool map_writable;
> +
> +       struct kvm_page_fault_arch arch;
> +};
> +
>  #endif /* !__KVM_MMU_TYPES_H */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
@ 2022-12-12 18:24     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 18:24 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_page_fault to common code. This will be used in a future
> commit to move the TDP MMU to common code.
>
> No functional change intended.
>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/include/asm/kvm/mmu_types.h | 20 +++++++
>  arch/x86/kvm/mmu/mmu.c               | 19 +++----
>  arch/x86/kvm/mmu/mmu_internal.h      | 79 ++++++----------------------
>  arch/x86/kvm/mmu/mmutrace.h          |  2 +-
>  arch/x86/kvm/mmu/paging_tmpl.h       | 14 ++---
>  arch/x86/kvm/mmu/tdp_mmu.c           |  6 +--
>  include/kvm/mmu_types.h              | 44 ++++++++++++++++
>  7 files changed, 100 insertions(+), 84 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm/mmu_types.h b/arch/x86/include/asm/kvm/mmu_types.h
> index affcb520b482..59d1be85f4b7 100644
> --- a/arch/x86/include/asm/kvm/mmu_types.h
> +++ b/arch/x86/include/asm/kvm/mmu_types.h
> @@ -5,6 +5,7 @@
>  #include <linux/bitmap.h>
>  #include <linux/list.h>
>  #include <linux/types.h>
> +#include <linux/kvm_types.h>
>
>  /*
>   * This is a subset of the overall kvm_cpu_role to minimize the size of
> @@ -115,4 +116,23 @@ struct kvm_mmu_page_arch {
>         atomic_t write_flooding_count;
>  };
>
> +struct kvm_page_fault_arch {
> +       const u32 error_code;
> +
> +       /* x86-specific error code bits */
> +       const bool present;
> +       const bool rsvd;
> +       const bool user;
> +
> +       /* Derived from mmu and global state.  */
> +       const bool is_tdp;
> +       const bool nx_huge_page_workaround_enabled;
> +
> +       /*
> +        * Whether a >4KB mapping can be created or is forbidden due to NX
> +        * hugepages.
> +        */
> +       bool huge_page_disallowed;
> +};
> +
>  #endif /* !__ASM_KVM_MMU_TYPES_H */
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e47f35878ab5..0593d4a60139 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3092,7 +3092,8 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>         struct kvm_memory_slot *slot = fault->slot;
>         kvm_pfn_t mask;
>
> -       fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
> +       fault->arch.huge_page_disallowed =
> +               fault->exec && fault->arch.nx_huge_page_workaround_enabled;
>
>         if (unlikely(fault->max_level == PG_LEVEL_4K))
>                 return;
> @@ -3109,7 +3110,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>          */
>         fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
>                                                      fault->gfn, fault->max_level);
> -       if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
> +       if (fault->req_level == PG_LEVEL_4K || fault->arch.huge_page_disallowed)
>                 return;
>
>         /*
> @@ -3158,7 +3159,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                  * We cannot overwrite existing page tables with an NX
>                  * large page, as the leaf could be executable.
>                  */
> -               if (fault->nx_huge_page_workaround_enabled)
> +               if (fault->arch.nx_huge_page_workaround_enabled)
>                         disallowed_hugepage_adjust(fault, *it.sptep, it.level);
>
>                 base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
> @@ -3170,7 +3171,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                         continue;
>
>                 link_shadow_page(vcpu, it.sptep, sp);
> -               if (fault->huge_page_disallowed)
> +               if (fault->arch.huge_page_disallowed)
>                         account_nx_huge_page(vcpu->kvm, sp,
>                                              fault->req_level >= it.level);
>         }
> @@ -3221,7 +3222,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
>                                    struct kvm_page_fault *fault,
>                                    unsigned int access)
>  {
> -       gva_t gva = fault->is_tdp ? 0 : fault->addr;
> +       gva_t gva = fault->arch.is_tdp ? 0 : fault->addr;
>
>         vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
>                              access & shadow_mmio_access_mask);
> @@ -3255,7 +3256,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
>          * generation number.  Refreshing the MMIO generation needs to go down
>          * the slow path.  Note, EPT Misconfigs do NOT set the PRESENT flag!
>          */
> -       if (fault->rsvd)
> +       if (fault->arch.rsvd)
>                 return false;
>
>         /*
> @@ -3273,7 +3274,7 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
>          *    SPTE is MMU-writable (determined later), the fault can be fixed
>          *    by setting the Writable bit, which can be done out of mmu_lock.
>          */
> -       if (!fault->present)
> +       if (!fault->arch.present)
>                 return !kvm_ad_enabled();
>
>         /*
> @@ -4119,10 +4120,10 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
>  static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
>                                          struct kvm_page_fault *fault)
>  {
> -       if (unlikely(fault->rsvd))
> +       if (unlikely(fault->arch.rsvd))
>                 return false;
>
> -       if (!fault->present || !fault->write)
> +       if (!fault->arch.present || !fault->write)
>                 return false;
>
>         /*
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index af2ae4887e35..4abb80a3bd01 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -77,60 +77,6 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
>         return READ_ONCE(nx_huge_pages) && !kvm->arch.disable_nx_huge_pages;
>  }
>
> -struct kvm_page_fault {
> -       /* arguments to kvm_mmu_do_page_fault.  */
> -       const gpa_t addr;
> -       const u32 error_code;
> -       const bool prefetch;
> -
> -       /* Derived from error_code.  */
> -       const bool exec;
> -       const bool write;
> -       const bool present;
> -       const bool rsvd;
> -       const bool user;
> -
> -       /* Derived from mmu and global state.  */
> -       const bool is_tdp;
> -       const bool nx_huge_page_workaround_enabled;
> -
> -       /*
> -        * Whether a >4KB mapping can be created or is forbidden due to NX
> -        * hugepages.
> -        */
> -       bool huge_page_disallowed;
> -
> -       /*
> -        * Maximum page size that can be created for this fault; input to
> -        * FNAME(fetch), direct_map() and kvm_tdp_mmu_map().
> -        */
> -       u8 max_level;
> -
> -       /*
> -        * Page size that can be created based on the max_level and the
> -        * page size used by the host mapping.
> -        */
> -       u8 req_level;
> -
> -       /*
> -        * Page size that will be created based on the req_level and
> -        * huge_page_disallowed.
> -        */
> -       u8 goal_level;
> -
> -       /* Shifted addr, or result of guest page table walk if addr is a gva.  */
> -       gfn_t gfn;
> -
> -       /* The memslot containing gfn. May be NULL. */
> -       struct kvm_memory_slot *slot;
> -
> -       /* Outputs of kvm_faultin_pfn.  */
> -       unsigned long mmu_seq;
> -       kvm_pfn_t pfn;
> -       hva_t hva;
> -       bool map_writable;
> -};
> -
>  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
>
>  /*
> @@ -164,22 +110,27 @@ enum {
>  static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>                                         u32 err, bool prefetch)
>  {
> +       bool is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault);
>         struct kvm_page_fault fault = {
>                 .addr = cr2_or_gpa,
> -               .error_code = err,
> -               .exec = err & PFERR_FETCH_MASK,
> -               .write = err & PFERR_WRITE_MASK,
> -               .present = err & PFERR_PRESENT_MASK,
> -               .rsvd = err & PFERR_RSVD_MASK,
> -               .user = err & PFERR_USER_MASK,
>                 .prefetch = prefetch,
> -               .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
> -               .nx_huge_page_workaround_enabled =
> -                       is_nx_huge_page_enabled(vcpu->kvm),
> +
> +               .write = err & PFERR_WRITE_MASK,
> +               .exec = err & PFERR_FETCH_MASK,
>
>                 .max_level = KVM_MAX_HUGEPAGE_LEVEL,
>                 .req_level = PG_LEVEL_4K,
>                 .goal_level = PG_LEVEL_4K,
> +
> +               .arch = {
> +                       .error_code = err,
> +                       .present = err & PFERR_PRESENT_MASK,
> +                       .rsvd = err & PFERR_RSVD_MASK,
> +                       .user = err & PFERR_USER_MASK,
> +                       .is_tdp = is_tdp,
> +                       .nx_huge_page_workaround_enabled =
> +                               is_nx_huge_page_enabled(vcpu->kvm),
> +               },
>         };
>         int r;
>
> @@ -196,7 +147,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>         if (!prefetch)
>                 vcpu->stat.pf_taken++;
>
> -       if (IS_ENABLED(CONFIG_RETPOLINE) && fault.is_tdp)
> +       if (IS_ENABLED(CONFIG_RETPOLINE) && fault.arch.is_tdp)
>                 r = kvm_tdp_page_fault(vcpu, &fault);
>         else
>                 r = vcpu->arch.mmu->page_fault(vcpu, &fault);
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index 335f26dabdf3..b01767acf073 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -270,7 +270,7 @@ TRACE_EVENT(
>         TP_fast_assign(
>                 __entry->vcpu_id = vcpu->vcpu_id;
>                 __entry->cr2_or_gpa = fault->addr;
> -               __entry->error_code = fault->error_code;
> +               __entry->error_code = fault->arch.error_code;
>                 __entry->sptep = sptep;
>                 __entry->old_spte = old_spte;
>                 __entry->new_spte = *sptep;
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 18bb92b70a01..daf9c7731edc 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -698,7 +698,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>                  * We cannot overwrite existing page tables with an NX
>                  * large page, as the leaf could be executable.
>                  */
> -               if (fault->nx_huge_page_workaround_enabled)
> +               if (fault->arch.nx_huge_page_workaround_enabled)
>                         disallowed_hugepage_adjust(fault, *it.sptep, it.level);
>
>                 base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
> @@ -713,7 +713,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>                         continue;
>
>                 link_shadow_page(vcpu, it.sptep, sp);
> -               if (fault->huge_page_disallowed)
> +               if (fault->arch.huge_page_disallowed)
>                         account_nx_huge_page(vcpu->kvm, sp,
>                                              fault->req_level >= it.level);
>         }
> @@ -793,8 +793,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>         int r;
>         bool is_self_change_mapping;
>
> -       pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
> -       WARN_ON_ONCE(fault->is_tdp);
> +       pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->arch.error_code);
> +       WARN_ON_ONCE(fault->arch.is_tdp);
>
>         /*
>          * Look up the guest pte for the faulting address.
> @@ -802,7 +802,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>          * The bit needs to be cleared before walking guest page tables.
>          */
>         r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
> -                            fault->error_code & ~PFERR_RSVD_MASK);
> +                            fault->arch.error_code & ~PFERR_RSVD_MASK);
>
>         /*
>          * The page is not mapped by the guest.  Let the guest handle it.
> @@ -830,7 +830,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>         vcpu->arch.write_fault_to_shadow_pgtable = false;
>
>         is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
> -             &walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable);
> +             &walker, fault->arch.user, &vcpu->arch.write_fault_to_shadow_pgtable);
>
>         if (is_self_change_mapping)
>                 fault->max_level = PG_LEVEL_4K;
> @@ -846,7 +846,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>          * we will cache the incorrect access into mmio spte.
>          */
>         if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) &&
> -           !is_cr0_wp(vcpu->arch.mmu) && !fault->user && fault->slot) {
> +           !is_cr0_wp(vcpu->arch.mmu) && !fault->arch.user && fault->slot) {
>                 walker.pte_access |= ACC_WRITE_MASK;
>                 walker.pte_access &= ~ACC_USER_MASK;
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 66231c7ed31e..4940413d3767 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1156,7 +1156,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>         tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
>                 int r;
>
> -               if (fault->nx_huge_page_workaround_enabled)
> +               if (fault->arch.nx_huge_page_workaround_enabled)
>                         disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
>
>                 if (iter.level == fault->goal_level)
> @@ -1181,7 +1181,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                 sp = tdp_mmu_alloc_sp(vcpu);
>                 tdp_mmu_init_child_sp(sp, &iter);
>
> -               sp->arch.nx_huge_page_disallowed = fault->huge_page_disallowed;
> +               sp->arch.nx_huge_page_disallowed = fault->arch.huge_page_disallowed;
>
>                 if (is_shadow_present_pte(iter.old_spte))
>                         r = tdp_mmu_split_huge_page(kvm, &iter, sp, true);
> @@ -1197,7 +1197,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                         goto retry;
>                 }
>
> -               if (fault->huge_page_disallowed &&
> +               if (fault->arch.huge_page_disallowed &&
>                     fault->req_level >= iter.level) {
>                         spin_lock(&kvm->arch.tdp_mmu_pages_lock);
>                         track_possible_nx_huge_page(kvm, sp);
> diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> index a9da33d4baa8..9f0ca920bf68 100644
> --- a/include/kvm/mmu_types.h
> +++ b/include/kvm/mmu_types.h
> @@ -66,4 +66,48 @@ struct kvm_mmu_page {
>         struct kvm_mmu_page_arch arch;
>  };
>
> +struct kvm_page_fault {
> +       /* The raw faulting address. */
> +       const gpa_t addr;
> +
> +       /* Whether the fault was synthesized to prefetch a mapping. */
> +       const bool prefetch;
> +
> +       /* Information about the cause of the fault. */
> +       const bool write;
> +       const bool exec;
> +
> +       /* Shifted addr, or result of guest page table walk if shadow paging. */
> +       gfn_t gfn;

Is this redundant to have in common code? If we're not doing common
shadow paging, then this is just addr shifted. Would this be better
placed in the arch specific struct?

> +
> +       /* The memslot that contains @gfn. May be NULL. */
> +       struct kvm_memory_slot *slot;
> +
> +       /* Maximum page size that can be created for this fault. */
> +       u8 max_level;
> +
> +       /*
> +        * Page size that can be created based on the max_level and the page
> +        * size used by the host mapping.
> +        */
> +       u8 req_level;
> +
> +       /* Final page size that will be created. */
> +       u8 goal_level;
> +
> +       /*
> +        * The value of kvm->mmu_invalidate_seq before fetching the host
> +        * mapping. Used to verify that the host mapping has not changed
> +        * after grabbing the MMU lock.
> +        */
> +       unsigned long mmu_seq;

Should this be ifdef'ed with  KVM_ARCH_WANT_MMU_NOTIFIER?

> +
> +       /* Information about the host mapping. */
> +       kvm_pfn_t pfn;
> +       hva_t hva;
> +       bool map_writable;
> +
> +       struct kvm_page_fault_arch arch;
> +};
> +
>  #endif /* !__KVM_MMU_TYPES_H */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 17/37] KVM: Move struct kvm_gfn_range to kvm_types.h
  2022-12-08 19:38   ` David Matlack
  (?)
  (?)
@ 2022-12-12 19:16     ` Ben Gardon
  -1 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 19:16 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_gfn_range to kvm_types.h so that it's definition can be
> accessed in a future commit by arch/x86/include/asm/kvm/tdp_pgtable.h
> without needing to include the mega-header kvm_host.h.
>
> No functional change intended.
>
> Signed-off-by: David Matlack <dmatlack@google.com>

Reviewed-by: Ben Gardon <bgardon@google.com>

> ---
>  include/linux/kvm_host.h  | 7 -------
>  include/linux/kvm_types.h | 8 ++++++++
>  2 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 22ecb7ce4d31..469ff4202a0d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -256,13 +256,6 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  #endif
>
>  #ifdef KVM_ARCH_WANT_MMU_NOTIFIER

I don't have any problem with always having this defined, but might be
worth noting that it's now defined on all archs, regardless of
KVM_ARCH_WANT_MMU_NOTIFIER.

> -struct kvm_gfn_range {
> -       struct kvm_memory_slot *slot;
> -       gfn_t start;
> -       gfn_t end;
> -       pte_t pte;
> -       bool may_block;
> -};
>  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> index 59cf958d69df..001aad9ea987 100644
> --- a/include/linux/kvm_types.h
> +++ b/include/linux/kvm_types.h
> @@ -132,4 +132,12 @@ struct kvm_vcpu_stat_generic {
>
>  #define KVM_STATS_NAME_SIZE    48
>
> +struct kvm_gfn_range {
> +       struct kvm_memory_slot *slot;
> +       gfn_t start;
> +       gfn_t end;
> +       pte_t pte;
> +       bool may_block;
> +};
> +
>  #endif /* __KVM_TYPES_H__ */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 17/37] KVM: Move struct kvm_gfn_range to kvm_types.h
@ 2022-12-12 19:16     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 19:16 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_gfn_range to kvm_types.h so that it's definition can be
> accessed in a future commit by arch/x86/include/asm/kvm/tdp_pgtable.h
> without needing to include the mega-header kvm_host.h.
>
> No functional change intended.
>
> Signed-off-by: David Matlack <dmatlack@google.com>

Reviewed-by: Ben Gardon <bgardon@google.com>

> ---
>  include/linux/kvm_host.h  | 7 -------
>  include/linux/kvm_types.h | 8 ++++++++
>  2 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 22ecb7ce4d31..469ff4202a0d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -256,13 +256,6 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  #endif
>
>  #ifdef KVM_ARCH_WANT_MMU_NOTIFIER

I don't have any problem with always having this defined, but might be
worth noting that it's now defined on all archs, regardless of
KVM_ARCH_WANT_MMU_NOTIFIER.

> -struct kvm_gfn_range {
> -       struct kvm_memory_slot *slot;
> -       gfn_t start;
> -       gfn_t end;
> -       pte_t pte;
> -       bool may_block;
> -};
>  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> index 59cf958d69df..001aad9ea987 100644
> --- a/include/linux/kvm_types.h
> +++ b/include/linux/kvm_types.h
> @@ -132,4 +132,12 @@ struct kvm_vcpu_stat_generic {
>
>  #define KVM_STATS_NAME_SIZE    48
>
> +struct kvm_gfn_range {
> +       struct kvm_memory_slot *slot;
> +       gfn_t start;
> +       gfn_t end;
> +       pte_t pte;
> +       bool may_block;
> +};
> +
>  #endif /* __KVM_TYPES_H__ */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 17/37] KVM: Move struct kvm_gfn_range to kvm_types.h
@ 2022-12-12 19:16     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 19:16 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, linux-riscv, kvmarm, Yu Zhao, Marc Zyngier,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Paolo Bonzini,
	Andrew Morton

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_gfn_range to kvm_types.h so that it's definition can be
> accessed in a future commit by arch/x86/include/asm/kvm/tdp_pgtable.h
> without needing to include the mega-header kvm_host.h.
>
> No functional change intended.
>
> Signed-off-by: David Matlack <dmatlack@google.com>

Reviewed-by: Ben Gardon <bgardon@google.com>

> ---
>  include/linux/kvm_host.h  | 7 -------
>  include/linux/kvm_types.h | 8 ++++++++
>  2 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 22ecb7ce4d31..469ff4202a0d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -256,13 +256,6 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  #endif
>
>  #ifdef KVM_ARCH_WANT_MMU_NOTIFIER

I don't have any problem with always having this defined, but might be
worth noting that it's now defined on all archs, regardless of
KVM_ARCH_WANT_MMU_NOTIFIER.

> -struct kvm_gfn_range {
> -       struct kvm_memory_slot *slot;
> -       gfn_t start;
> -       gfn_t end;
> -       pte_t pte;
> -       bool may_block;
> -};
>  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> index 59cf958d69df..001aad9ea987 100644
> --- a/include/linux/kvm_types.h
> +++ b/include/linux/kvm_types.h
> @@ -132,4 +132,12 @@ struct kvm_vcpu_stat_generic {
>
>  #define KVM_STATS_NAME_SIZE    48
>
> +struct kvm_gfn_range {
> +       struct kvm_memory_slot *slot;
> +       gfn_t start;
> +       gfn_t end;
> +       pte_t pte;
> +       bool may_block;
> +};
> +
>  #endif /* __KVM_TYPES_H__ */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 17/37] KVM: Move struct kvm_gfn_range to kvm_types.h
@ 2022-12-12 19:16     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 19:16 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Move struct kvm_gfn_range to kvm_types.h so that it's definition can be
> accessed in a future commit by arch/x86/include/asm/kvm/tdp_pgtable.h
> without needing to include the mega-header kvm_host.h.
>
> No functional change intended.
>
> Signed-off-by: David Matlack <dmatlack@google.com>

Reviewed-by: Ben Gardon <bgardon@google.com>

> ---
>  include/linux/kvm_host.h  | 7 -------
>  include/linux/kvm_types.h | 8 ++++++++
>  2 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 22ecb7ce4d31..469ff4202a0d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -256,13 +256,6 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  #endif
>
>  #ifdef KVM_ARCH_WANT_MMU_NOTIFIER

I don't have any problem with always having this defined, but might be
worth noting that it's now defined on all archs, regardless of
KVM_ARCH_WANT_MMU_NOTIFIER.

> -struct kvm_gfn_range {
> -       struct kvm_memory_slot *slot;
> -       gfn_t start;
> -       gfn_t end;
> -       pte_t pte;
> -       bool may_block;
> -};
>  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
>  bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> index 59cf958d69df..001aad9ea987 100644
> --- a/include/linux/kvm_types.h
> +++ b/include/linux/kvm_types.h
> @@ -132,4 +132,12 @@ struct kvm_vcpu_stat_generic {
>
>  #define KVM_STATS_NAME_SIZE    48
>
> +struct kvm_gfn_range {
> +       struct kvm_memory_slot *slot;
> +       gfn_t start;
> +       gfn_t end;
> +       pte_t pte;
> +       bool may_block;
> +};
> +
>  #endif /* __KVM_TYPES_H__ */
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
  2022-12-08 19:38   ` David Matlack
  (?)
  (?)
@ 2022-12-12 19:32     ` Ben Gardon
  -1 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 19:32 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
> function for computing the max level that a given GFN can be mapped in
> KVM's page tables. This will be used in a future commit to enable moving
> the TDP MMU to common code.
>
> Provide a default implementation for non-x86 architectures that just
> returns the max level. This will result in more zapping than necessary
> when disabling dirty logging (i.e. less than optimal performance) but no
> correctness issues.

Apologies if you already implemented it in a later patch in this
series, but would it not at least be possible to port
host_pfn_mapping_level to common code and check that?
I'm assuming, though I could be wrong, that all archs map GFNs with at
most a host page table granularity mapping.
I suppose that doesn't strictly need to be included in this series,
but it would be worth addressing in the commit description.

>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c     | 14 ++++++++++----
>  arch/x86/kvm/mmu/tdp_pgtable.c |  7 +++++++
>  2 files changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 7670fbd8e72d..24d1dbd0a1ec 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1696,6 +1696,13 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
>                 clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
>  }
>
> +__weak int tdp_mmu_max_mapping_level(struct kvm *kvm,
> +                                    const struct kvm_memory_slot *slot,
> +                                    struct tdp_iter *iter)
> +{
> +       return TDP_MAX_HUGEPAGE_LEVEL;
> +}
> +
>  static void zap_collapsible_spte_range(struct kvm *kvm,
>                                        struct kvm_mmu_page *root,
>                                        const struct kvm_memory_slot *slot)
> @@ -1727,15 +1734,14 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
>                 /*
>                  * If iter.gfn resides outside of the slot, i.e. the page for
>                  * the current level overlaps but is not contained by the slot,
> -                * then the SPTE can't be made huge.  More importantly, trying
> -                * to query that info from slot->arch.lpage_info will cause an
> +                * then the SPTE can't be made huge. On x86, trying to query
> +                * that info from slot->arch.lpage_info will cause an
>                  * out-of-bounds access.
>                  */
>                 if (iter.gfn < start || iter.gfn >= end)
>                         continue;
>
> -               max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot,
> -                                                             iter.gfn, PG_LEVEL_NUM);
> +               max_mapping_level = tdp_mmu_max_mapping_level(kvm, slot, &iter);
>                 if (max_mapping_level < iter.level)
>                         continue;
>
> diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
> index b07ed99b4ab1..840d063c45b8 100644
> --- a/arch/x86/kvm/mmu/tdp_pgtable.c
> +++ b/arch/x86/kvm/mmu/tdp_pgtable.c
> @@ -163,3 +163,10 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
>         if (shared)
>                 spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
>  }
> +
> +int tdp_mmu_max_mapping_level(struct kvm *kvm,
> +                             const struct kvm_memory_slot *slot,
> +                             struct tdp_iter *iter)
> +{
> +       return kvm_mmu_max_mapping_level(kvm, slot, iter->gfn, PG_LEVEL_NUM);
> +}
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
@ 2022-12-12 19:32     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 19:32 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, linux-riscv, kvmarm, Yu Zhao, Marc Zyngier,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Paolo Bonzini,
	Andrew Morton

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
> function for computing the max level that a given GFN can be mapped in
> KVM's page tables. This will be used in a future commit to enable moving
> the TDP MMU to common code.
>
> Provide a default implementation for non-x86 architectures that just
> returns the max level. This will result in more zapping than necessary
> when disabling dirty logging (i.e. less than optimal performance) but no
> correctness issues.

Apologies if you already implemented it in a later patch in this
series, but would it not at least be possible to port
host_pfn_mapping_level to common code and check that?
I'm assuming, though I could be wrong, that all archs map GFNs with at
most a host page table granularity mapping.
I suppose that doesn't strictly need to be included in this series,
but it would be worth addressing in the commit description.

>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c     | 14 ++++++++++----
>  arch/x86/kvm/mmu/tdp_pgtable.c |  7 +++++++
>  2 files changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 7670fbd8e72d..24d1dbd0a1ec 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1696,6 +1696,13 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
>                 clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
>  }
>
> +__weak int tdp_mmu_max_mapping_level(struct kvm *kvm,
> +                                    const struct kvm_memory_slot *slot,
> +                                    struct tdp_iter *iter)
> +{
> +       return TDP_MAX_HUGEPAGE_LEVEL;
> +}
> +
>  static void zap_collapsible_spte_range(struct kvm *kvm,
>                                        struct kvm_mmu_page *root,
>                                        const struct kvm_memory_slot *slot)
> @@ -1727,15 +1734,14 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
>                 /*
>                  * If iter.gfn resides outside of the slot, i.e. the page for
>                  * the current level overlaps but is not contained by the slot,
> -                * then the SPTE can't be made huge.  More importantly, trying
> -                * to query that info from slot->arch.lpage_info will cause an
> +                * then the SPTE can't be made huge. On x86, trying to query
> +                * that info from slot->arch.lpage_info will cause an
>                  * out-of-bounds access.
>                  */
>                 if (iter.gfn < start || iter.gfn >= end)
>                         continue;
>
> -               max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot,
> -                                                             iter.gfn, PG_LEVEL_NUM);
> +               max_mapping_level = tdp_mmu_max_mapping_level(kvm, slot, &iter);
>                 if (max_mapping_level < iter.level)
>                         continue;
>
> diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
> index b07ed99b4ab1..840d063c45b8 100644
> --- a/arch/x86/kvm/mmu/tdp_pgtable.c
> +++ b/arch/x86/kvm/mmu/tdp_pgtable.c
> @@ -163,3 +163,10 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
>         if (shared)
>                 spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
>  }
> +
> +int tdp_mmu_max_mapping_level(struct kvm *kvm,
> +                             const struct kvm_memory_slot *slot,
> +                             struct tdp_iter *iter)
> +{
> +       return kvm_mmu_max_mapping_level(kvm, slot, iter->gfn, PG_LEVEL_NUM);
> +}
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
@ 2022-12-12 19:32     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 19:32 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
> function for computing the max level that a given GFN can be mapped in
> KVM's page tables. This will be used in a future commit to enable moving
> the TDP MMU to common code.
>
> Provide a default implementation for non-x86 architectures that just
> returns the max level. This will result in more zapping than necessary
> when disabling dirty logging (i.e. less than optimal performance) but no
> correctness issues.

Apologies if you already implemented it in a later patch in this
series, but would it not at least be possible to port
host_pfn_mapping_level to common code and check that?
I'm assuming, though I could be wrong, that all archs map GFNs with at
most a host page table granularity mapping.
I suppose that doesn't strictly need to be included in this series,
but it would be worth addressing in the commit description.

>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c     | 14 ++++++++++----
>  arch/x86/kvm/mmu/tdp_pgtable.c |  7 +++++++
>  2 files changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 7670fbd8e72d..24d1dbd0a1ec 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1696,6 +1696,13 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
>                 clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
>  }
>
> +__weak int tdp_mmu_max_mapping_level(struct kvm *kvm,
> +                                    const struct kvm_memory_slot *slot,
> +                                    struct tdp_iter *iter)
> +{
> +       return TDP_MAX_HUGEPAGE_LEVEL;
> +}
> +
>  static void zap_collapsible_spte_range(struct kvm *kvm,
>                                        struct kvm_mmu_page *root,
>                                        const struct kvm_memory_slot *slot)
> @@ -1727,15 +1734,14 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
>                 /*
>                  * If iter.gfn resides outside of the slot, i.e. the page for
>                  * the current level overlaps but is not contained by the slot,
> -                * then the SPTE can't be made huge.  More importantly, trying
> -                * to query that info from slot->arch.lpage_info will cause an
> +                * then the SPTE can't be made huge. On x86, trying to query
> +                * that info from slot->arch.lpage_info will cause an
>                  * out-of-bounds access.
>                  */
>                 if (iter.gfn < start || iter.gfn >= end)
>                         continue;
>
> -               max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot,
> -                                                             iter.gfn, PG_LEVEL_NUM);
> +               max_mapping_level = tdp_mmu_max_mapping_level(kvm, slot, &iter);
>                 if (max_mapping_level < iter.level)
>                         continue;
>
> diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
> index b07ed99b4ab1..840d063c45b8 100644
> --- a/arch/x86/kvm/mmu/tdp_pgtable.c
> +++ b/arch/x86/kvm/mmu/tdp_pgtable.c
> @@ -163,3 +163,10 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
>         if (shared)
>                 spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
>  }
> +
> +int tdp_mmu_max_mapping_level(struct kvm *kvm,
> +                             const struct kvm_memory_slot *slot,
> +                             struct tdp_iter *iter)
> +{
> +       return kvm_mmu_max_mapping_level(kvm, slot, iter->gfn, PG_LEVEL_NUM);
> +}
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
@ 2022-12-12 19:32     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 19:32 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
>
> Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
> function for computing the max level that a given GFN can be mapped in
> KVM's page tables. This will be used in a future commit to enable moving
> the TDP MMU to common code.
>
> Provide a default implementation for non-x86 architectures that just
> returns the max level. This will result in more zapping than necessary
> when disabling dirty logging (i.e. less than optimal performance) but no
> correctness issues.

Apologies if you already implemented it in a later patch in this
series, but would it not at least be possible to port
host_pfn_mapping_level to common code and check that?
I'm assuming, though I could be wrong, that all archs map GFNs with at
most a host page table granularity mapping.
I suppose that doesn't strictly need to be included in this series,
but it would be worth addressing in the commit description.

>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c     | 14 ++++++++++----
>  arch/x86/kvm/mmu/tdp_pgtable.c |  7 +++++++
>  2 files changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 7670fbd8e72d..24d1dbd0a1ec 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1696,6 +1696,13 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
>                 clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
>  }
>
> +__weak int tdp_mmu_max_mapping_level(struct kvm *kvm,
> +                                    const struct kvm_memory_slot *slot,
> +                                    struct tdp_iter *iter)
> +{
> +       return TDP_MAX_HUGEPAGE_LEVEL;
> +}
> +
>  static void zap_collapsible_spte_range(struct kvm *kvm,
>                                        struct kvm_mmu_page *root,
>                                        const struct kvm_memory_slot *slot)
> @@ -1727,15 +1734,14 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
>                 /*
>                  * If iter.gfn resides outside of the slot, i.e. the page for
>                  * the current level overlaps but is not contained by the slot,
> -                * then the SPTE can't be made huge.  More importantly, trying
> -                * to query that info from slot->arch.lpage_info will cause an
> +                * then the SPTE can't be made huge. On x86, trying to query
> +                * that info from slot->arch.lpage_info will cause an
>                  * out-of-bounds access.
>                  */
>                 if (iter.gfn < start || iter.gfn >= end)
>                         continue;
>
> -               max_mapping_level = kvm_mmu_max_mapping_level(kvm, slot,
> -                                                             iter.gfn, PG_LEVEL_NUM);
> +               max_mapping_level = tdp_mmu_max_mapping_level(kvm, slot, &iter);
>                 if (max_mapping_level < iter.level)
>                         continue;
>
> diff --git a/arch/x86/kvm/mmu/tdp_pgtable.c b/arch/x86/kvm/mmu/tdp_pgtable.c
> index b07ed99b4ab1..840d063c45b8 100644
> --- a/arch/x86/kvm/mmu/tdp_pgtable.c
> +++ b/arch/x86/kvm/mmu/tdp_pgtable.c
> @@ -163,3 +163,10 @@ void tdp_mmu_arch_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
>         if (shared)
>                 spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
>  }
> +
> +int tdp_mmu_max_mapping_level(struct kvm *kvm,
> +                             const struct kvm_memory_slot *slot,
> +                             struct tdp_iter *iter)
> +{
> +       return kvm_mmu_max_mapping_level(kvm, slot, iter->gfn, PG_LEVEL_NUM);
> +}
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
  2022-12-12 19:32     ` Ben Gardon
  (?)
  (?)
@ 2022-12-12 21:05       ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 21:05 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Mon, Dec 12, 2022 at 11:32 AM Ben Gardon <bgardon@google.com> wrote:
>
> On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
> >
> > Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
> > function for computing the max level that a given GFN can be mapped in
> > KVM's page tables. This will be used in a future commit to enable moving
> > the TDP MMU to common code.
> >
> > Provide a default implementation for non-x86 architectures that just
> > returns the max level. This will result in more zapping than necessary
> > when disabling dirty logging (i.e. less than optimal performance) but no
> > correctness issues.
>
> Apologies if you already implemented it in a later patch in this
> series, but would it not at least be possible to port
> host_pfn_mapping_level to common code and check that?
> I'm assuming, though I could be wrong, that all archs map GFNs with at
> most a host page table granularity mapping.
> I suppose that doesn't strictly need to be included in this series,
> but it would be worth addressing in the commit description.

It's not implemented later in this series, but I agree it's something
we should do. In fact, it's worth doing regardless of this series as a
way to share more code across architectures (e.g. KVM/ARM has it's own
version in arch/arm64/kvm/mmu.c:get_user_mapping_size()).

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
@ 2022-12-12 21:05       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 21:05 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, linux-riscv, kvmarm, Yu Zhao, Marc Zyngier,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Paolo Bonzini,
	Andrew Morton

On Mon, Dec 12, 2022 at 11:32 AM Ben Gardon <bgardon@google.com> wrote:
>
> On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
> >
> > Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
> > function for computing the max level that a given GFN can be mapped in
> > KVM's page tables. This will be used in a future commit to enable moving
> > the TDP MMU to common code.
> >
> > Provide a default implementation for non-x86 architectures that just
> > returns the max level. This will result in more zapping than necessary
> > when disabling dirty logging (i.e. less than optimal performance) but no
> > correctness issues.
>
> Apologies if you already implemented it in a later patch in this
> series, but would it not at least be possible to port
> host_pfn_mapping_level to common code and check that?
> I'm assuming, though I could be wrong, that all archs map GFNs with at
> most a host page table granularity mapping.
> I suppose that doesn't strictly need to be included in this series,
> but it would be worth addressing in the commit description.

It's not implemented later in this series, but I agree it's something
we should do. In fact, it's worth doing regardless of this series as a
way to share more code across architectures (e.g. KVM/ARM has it's own
version in arch/arm64/kvm/mmu.c:get_user_mapping_size()).
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
@ 2022-12-12 21:05       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 21:05 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Mon, Dec 12, 2022 at 11:32 AM Ben Gardon <bgardon@google.com> wrote:
>
> On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
> >
> > Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
> > function for computing the max level that a given GFN can be mapped in
> > KVM's page tables. This will be used in a future commit to enable moving
> > the TDP MMU to common code.
> >
> > Provide a default implementation for non-x86 architectures that just
> > returns the max level. This will result in more zapping than necessary
> > when disabling dirty logging (i.e. less than optimal performance) but no
> > correctness issues.
>
> Apologies if you already implemented it in a later patch in this
> series, but would it not at least be possible to port
> host_pfn_mapping_level to common code and check that?
> I'm assuming, though I could be wrong, that all archs map GFNs with at
> most a host page table granularity mapping.
> I suppose that doesn't strictly need to be included in this series,
> but it would be worth addressing in the commit description.

It's not implemented later in this series, but I agree it's something
we should do. In fact, it's worth doing regardless of this series as a
way to share more code across architectures (e.g. KVM/ARM has it's own
version in arch/arm64/kvm/mmu.c:get_user_mapping_size()).

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
@ 2022-12-12 21:05       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 21:05 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Mon, Dec 12, 2022 at 11:32 AM Ben Gardon <bgardon@google.com> wrote:
>
> On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
> >
> > Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
> > function for computing the max level that a given GFN can be mapped in
> > KVM's page tables. This will be used in a future commit to enable moving
> > the TDP MMU to common code.
> >
> > Provide a default implementation for non-x86 architectures that just
> > returns the max level. This will result in more zapping than necessary
> > when disabling dirty logging (i.e. less than optimal performance) but no
> > correctness issues.
>
> Apologies if you already implemented it in a later patch in this
> series, but would it not at least be possible to port
> host_pfn_mapping_level to common code and check that?
> I'm assuming, though I could be wrong, that all archs map GFNs with at
> most a host page table granularity mapping.
> I suppose that doesn't strictly need to be included in this series,
> but it would be worth addressing in the commit description.

It's not implemented later in this series, but I agree it's something
we should do. In fact, it's worth doing regardless of this series as a
way to share more code across architectures (e.g. KVM/ARM has it's own
version in arch/arm64/kvm/mmu.c:get_user_mapping_size()).

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 33/37] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
  2022-12-08 19:38   ` David Matlack
  (?)
  (?)
@ 2022-12-12 22:03     ` Ben Gardon
  -1 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 22:03 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:40 AM David Matlack <dmatlack@google.com> wrote:
>
> Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop
> "arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a
> range-based TLB invalidation where the range is defined by the memslot.
> Now that kvm_flush_remote_tlbs_range() can be called from common code we
> can just use that and drop a bunch of duplicate code from the arch
> directories.
>
> Note this adds a lockdep assertion for slot_lock being held when calling
> kvm_flush_remote_tlbs_memslot(), which was previously only asserted on
> x86.

Besides the one lockdep assertion, is there any benefit to having this
wrapper function? Open-coding "kvm_flush_remote_tlbs_range(kvm,
memslot->base_gfn, memslot->npages);" is only a slightly longer line
and, IMO, just as readable. I'm happy to see this cleanup, but it
might be just as easy to drop the function.

>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/arm64/kvm/arm.c     |  6 ------
>  arch/mips/kvm/mips.c     | 10 ++--------
>  arch/riscv/kvm/mmu.c     |  6 ------
>  arch/x86/kvm/mmu/mmu.c   | 16 +---------------
>  arch/x86/kvm/x86.c       |  2 +-
>  include/linux/kvm_host.h |  7 +++----
>  virt/kvm/kvm_main.c      | 17 +++++++++++++++--
>  7 files changed, 22 insertions(+), 42 deletions(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 0e0d4c4f79a2..4f1549c1d2d2 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1430,12 +1430,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
>
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       kvm_flush_remote_tlbs(kvm);
> -}
> -
>  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>                                         struct kvm_arm_device_addr *dev_addr)
>  {
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index a25e0b73ee70..ecd8a051fd6b 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -209,7 +209,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>         /* Flush slot from GPA */
>         kvm_mips_flush_gpa_pt(kvm, slot->base_gfn,
>                               slot->base_gfn + slot->npages - 1);
> -       kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
> +       kvm_flush_remote_tlbs_memslot(kvm, slot);
>         spin_unlock(&kvm->mmu_lock);
>  }
>
> @@ -245,7 +245,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>                 needs_flush = kvm_mips_mkclean_gpa_pt(kvm, new->base_gfn,
>                                         new->base_gfn + new->npages - 1);
>                 if (needs_flush)
> -                       kvm_arch_flush_remote_tlbs_memslot(kvm, new);
> +                       kvm_flush_remote_tlbs_memslot(kvm, new);
>                 spin_unlock(&kvm->mmu_lock);
>         }
>  }
> @@ -997,12 +997,6 @@ int kvm_arch_flush_remote_tlb(struct kvm *kvm)
>         return 1;
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       kvm_flush_remote_tlbs(kvm);
> -}
> -
>  long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
>  {
>         long r;
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index a8281a65cb3d..98bf3719a396 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -406,12 +406,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
>  {
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       kvm_flush_remote_tlbs(kvm);
> -}
> -
>  void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free)
>  {
>  }
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 19963ed83484..f2602ee1771f 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6524,7 +6524,7 @@ static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
>          */
>         if (slot_handle_level(kvm, slot, kvm_mmu_zap_collapsible_spte,
>                               PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
> +               kvm_flush_remote_tlbs_memslot(kvm, slot);
>  }
>
>  void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
> @@ -6543,20 +6543,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
>         }
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       /*
> -        * All current use cases for flushing the TLBs for a specific memslot
> -        * related to dirty logging, and many do the TLB flush out of mmu_lock.
> -        * The interaction between the various operations on memslot must be
> -        * serialized by slots_locks to ensure the TLB flush from one operation
> -        * is observed by any other operation on the same memslot.
> -        */
> -       lockdep_assert_held(&kvm->slots_lock);
> -       kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
> -}
> -
>  void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
>                                    const struct kvm_memory_slot *memslot)
>  {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 517c8ed33542..95ff95da55d5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12574,7 +12574,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
>                  * See is_writable_pte() for more details (the case involving
>                  * access-tracked SPTEs is particularly relevant).
>                  */
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, new);
> +               kvm_flush_remote_tlbs_memslot(kvm, new);
>         }
>  }
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d9a7f559d2c5..46ed0ef4fb79 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1366,6 +1366,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible);
>
>  void kvm_flush_remote_tlbs(struct kvm *kvm);
>  void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> +void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
> +                                  const struct kvm_memory_slot *memslot);
>
>  #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
>  int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
> @@ -1394,10 +1396,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>                                         unsigned long mask);
>  void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
>
> -#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot);
> -#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
> +#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
>  int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>  int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>                       int *is_dirty, struct kvm_memory_slot **memslot);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 662ca280c0cf..39c2efd15504 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -388,6 +388,19 @@ void __weak kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pa
>         kvm_flush_remote_tlbs(kvm);
>  }
>
> +void kvm_flush_remote_tlbs_memslot(struct kvm *kvm, const struct kvm_memory_slot *memslot)
> +{
> +       /*
> +        * All current use cases for flushing the TLBs for a specific memslot
> +        * related to dirty logging, and many do the TLB flush out of mmu_lock.
> +        * The interaction between the various operations on memslot must be
> +        * serialized by slots_locks to ensure the TLB flush from one operation
> +        * is observed by any other operation on the same memslot.
> +        */
> +       lockdep_assert_held(&kvm->slots_lock);
> +       kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
> +}
> +
>  static void kvm_flush_shadow_all(struct kvm *kvm)
>  {
>         kvm_arch_flush_shadow_all(kvm);
> @@ -2197,7 +2210,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>         }
>
>         if (flush)
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
> +               kvm_flush_remote_tlbs_memslot(kvm, memslot);
>
>         if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
>                 return -EFAULT;
> @@ -2314,7 +2327,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>         KVM_MMU_UNLOCK(kvm);
>
>         if (flush)
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
> +               kvm_flush_remote_tlbs_memslot(kvm, memslot);
>
>         return 0;
>  }
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 33/37] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
@ 2022-12-12 22:03     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 22:03 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:40 AM David Matlack <dmatlack@google.com> wrote:
>
> Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop
> "arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a
> range-based TLB invalidation where the range is defined by the memslot.
> Now that kvm_flush_remote_tlbs_range() can be called from common code we
> can just use that and drop a bunch of duplicate code from the arch
> directories.
>
> Note this adds a lockdep assertion for slot_lock being held when calling
> kvm_flush_remote_tlbs_memslot(), which was previously only asserted on
> x86.

Besides the one lockdep assertion, is there any benefit to having this
wrapper function? Open-coding "kvm_flush_remote_tlbs_range(kvm,
memslot->base_gfn, memslot->npages);" is only a slightly longer line
and, IMO, just as readable. I'm happy to see this cleanup, but it
might be just as easy to drop the function.

>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/arm64/kvm/arm.c     |  6 ------
>  arch/mips/kvm/mips.c     | 10 ++--------
>  arch/riscv/kvm/mmu.c     |  6 ------
>  arch/x86/kvm/mmu/mmu.c   | 16 +---------------
>  arch/x86/kvm/x86.c       |  2 +-
>  include/linux/kvm_host.h |  7 +++----
>  virt/kvm/kvm_main.c      | 17 +++++++++++++++--
>  7 files changed, 22 insertions(+), 42 deletions(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 0e0d4c4f79a2..4f1549c1d2d2 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1430,12 +1430,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
>
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       kvm_flush_remote_tlbs(kvm);
> -}
> -
>  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>                                         struct kvm_arm_device_addr *dev_addr)
>  {
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index a25e0b73ee70..ecd8a051fd6b 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -209,7 +209,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>         /* Flush slot from GPA */
>         kvm_mips_flush_gpa_pt(kvm, slot->base_gfn,
>                               slot->base_gfn + slot->npages - 1);
> -       kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
> +       kvm_flush_remote_tlbs_memslot(kvm, slot);
>         spin_unlock(&kvm->mmu_lock);
>  }
>
> @@ -245,7 +245,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>                 needs_flush = kvm_mips_mkclean_gpa_pt(kvm, new->base_gfn,
>                                         new->base_gfn + new->npages - 1);
>                 if (needs_flush)
> -                       kvm_arch_flush_remote_tlbs_memslot(kvm, new);
> +                       kvm_flush_remote_tlbs_memslot(kvm, new);
>                 spin_unlock(&kvm->mmu_lock);
>         }
>  }
> @@ -997,12 +997,6 @@ int kvm_arch_flush_remote_tlb(struct kvm *kvm)
>         return 1;
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       kvm_flush_remote_tlbs(kvm);
> -}
> -
>  long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
>  {
>         long r;
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index a8281a65cb3d..98bf3719a396 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -406,12 +406,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
>  {
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       kvm_flush_remote_tlbs(kvm);
> -}
> -
>  void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free)
>  {
>  }
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 19963ed83484..f2602ee1771f 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6524,7 +6524,7 @@ static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
>          */
>         if (slot_handle_level(kvm, slot, kvm_mmu_zap_collapsible_spte,
>                               PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
> +               kvm_flush_remote_tlbs_memslot(kvm, slot);
>  }
>
>  void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
> @@ -6543,20 +6543,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
>         }
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       /*
> -        * All current use cases for flushing the TLBs for a specific memslot
> -        * related to dirty logging, and many do the TLB flush out of mmu_lock.
> -        * The interaction between the various operations on memslot must be
> -        * serialized by slots_locks to ensure the TLB flush from one operation
> -        * is observed by any other operation on the same memslot.
> -        */
> -       lockdep_assert_held(&kvm->slots_lock);
> -       kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
> -}
> -
>  void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
>                                    const struct kvm_memory_slot *memslot)
>  {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 517c8ed33542..95ff95da55d5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12574,7 +12574,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
>                  * See is_writable_pte() for more details (the case involving
>                  * access-tracked SPTEs is particularly relevant).
>                  */
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, new);
> +               kvm_flush_remote_tlbs_memslot(kvm, new);
>         }
>  }
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d9a7f559d2c5..46ed0ef4fb79 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1366,6 +1366,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible);
>
>  void kvm_flush_remote_tlbs(struct kvm *kvm);
>  void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> +void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
> +                                  const struct kvm_memory_slot *memslot);
>
>  #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
>  int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
> @@ -1394,10 +1396,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>                                         unsigned long mask);
>  void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
>
> -#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot);
> -#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
> +#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
>  int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>  int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>                       int *is_dirty, struct kvm_memory_slot **memslot);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 662ca280c0cf..39c2efd15504 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -388,6 +388,19 @@ void __weak kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pa
>         kvm_flush_remote_tlbs(kvm);
>  }
>
> +void kvm_flush_remote_tlbs_memslot(struct kvm *kvm, const struct kvm_memory_slot *memslot)
> +{
> +       /*
> +        * All current use cases for flushing the TLBs for a specific memslot
> +        * related to dirty logging, and many do the TLB flush out of mmu_lock.
> +        * The interaction between the various operations on memslot must be
> +        * serialized by slots_locks to ensure the TLB flush from one operation
> +        * is observed by any other operation on the same memslot.
> +        */
> +       lockdep_assert_held(&kvm->slots_lock);
> +       kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
> +}
> +
>  static void kvm_flush_shadow_all(struct kvm *kvm)
>  {
>         kvm_arch_flush_shadow_all(kvm);
> @@ -2197,7 +2210,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>         }
>
>         if (flush)
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
> +               kvm_flush_remote_tlbs_memslot(kvm, memslot);
>
>         if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
>                 return -EFAULT;
> @@ -2314,7 +2327,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>         KVM_MMU_UNLOCK(kvm);
>
>         if (flush)
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
> +               kvm_flush_remote_tlbs_memslot(kvm, memslot);
>
>         return 0;
>  }
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 33/37] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
@ 2022-12-12 22:03     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 22:03 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, linux-riscv, kvmarm, Yu Zhao, Marc Zyngier,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Paolo Bonzini,
	Andrew Morton

On Thu, Dec 8, 2022 at 11:40 AM David Matlack <dmatlack@google.com> wrote:
>
> Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop
> "arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a
> range-based TLB invalidation where the range is defined by the memslot.
> Now that kvm_flush_remote_tlbs_range() can be called from common code we
> can just use that and drop a bunch of duplicate code from the arch
> directories.
>
> Note this adds a lockdep assertion for slot_lock being held when calling
> kvm_flush_remote_tlbs_memslot(), which was previously only asserted on
> x86.

Besides the one lockdep assertion, is there any benefit to having this
wrapper function? Open-coding "kvm_flush_remote_tlbs_range(kvm,
memslot->base_gfn, memslot->npages);" is only a slightly longer line
and, IMO, just as readable. I'm happy to see this cleanup, but it
might be just as easy to drop the function.

>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/arm64/kvm/arm.c     |  6 ------
>  arch/mips/kvm/mips.c     | 10 ++--------
>  arch/riscv/kvm/mmu.c     |  6 ------
>  arch/x86/kvm/mmu/mmu.c   | 16 +---------------
>  arch/x86/kvm/x86.c       |  2 +-
>  include/linux/kvm_host.h |  7 +++----
>  virt/kvm/kvm_main.c      | 17 +++++++++++++++--
>  7 files changed, 22 insertions(+), 42 deletions(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 0e0d4c4f79a2..4f1549c1d2d2 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1430,12 +1430,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
>
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       kvm_flush_remote_tlbs(kvm);
> -}
> -
>  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>                                         struct kvm_arm_device_addr *dev_addr)
>  {
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index a25e0b73ee70..ecd8a051fd6b 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -209,7 +209,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>         /* Flush slot from GPA */
>         kvm_mips_flush_gpa_pt(kvm, slot->base_gfn,
>                               slot->base_gfn + slot->npages - 1);
> -       kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
> +       kvm_flush_remote_tlbs_memslot(kvm, slot);
>         spin_unlock(&kvm->mmu_lock);
>  }
>
> @@ -245,7 +245,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>                 needs_flush = kvm_mips_mkclean_gpa_pt(kvm, new->base_gfn,
>                                         new->base_gfn + new->npages - 1);
>                 if (needs_flush)
> -                       kvm_arch_flush_remote_tlbs_memslot(kvm, new);
> +                       kvm_flush_remote_tlbs_memslot(kvm, new);
>                 spin_unlock(&kvm->mmu_lock);
>         }
>  }
> @@ -997,12 +997,6 @@ int kvm_arch_flush_remote_tlb(struct kvm *kvm)
>         return 1;
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       kvm_flush_remote_tlbs(kvm);
> -}
> -
>  long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
>  {
>         long r;
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index a8281a65cb3d..98bf3719a396 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -406,12 +406,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
>  {
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       kvm_flush_remote_tlbs(kvm);
> -}
> -
>  void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free)
>  {
>  }
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 19963ed83484..f2602ee1771f 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6524,7 +6524,7 @@ static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
>          */
>         if (slot_handle_level(kvm, slot, kvm_mmu_zap_collapsible_spte,
>                               PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
> +               kvm_flush_remote_tlbs_memslot(kvm, slot);
>  }
>
>  void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
> @@ -6543,20 +6543,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
>         }
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       /*
> -        * All current use cases for flushing the TLBs for a specific memslot
> -        * related to dirty logging, and many do the TLB flush out of mmu_lock.
> -        * The interaction between the various operations on memslot must be
> -        * serialized by slots_locks to ensure the TLB flush from one operation
> -        * is observed by any other operation on the same memslot.
> -        */
> -       lockdep_assert_held(&kvm->slots_lock);
> -       kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
> -}
> -
>  void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
>                                    const struct kvm_memory_slot *memslot)
>  {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 517c8ed33542..95ff95da55d5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12574,7 +12574,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
>                  * See is_writable_pte() for more details (the case involving
>                  * access-tracked SPTEs is particularly relevant).
>                  */
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, new);
> +               kvm_flush_remote_tlbs_memslot(kvm, new);
>         }
>  }
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d9a7f559d2c5..46ed0ef4fb79 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1366,6 +1366,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible);
>
>  void kvm_flush_remote_tlbs(struct kvm *kvm);
>  void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> +void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
> +                                  const struct kvm_memory_slot *memslot);
>
>  #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
>  int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
> @@ -1394,10 +1396,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>                                         unsigned long mask);
>  void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
>
> -#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot);
> -#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
> +#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
>  int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>  int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>                       int *is_dirty, struct kvm_memory_slot **memslot);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 662ca280c0cf..39c2efd15504 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -388,6 +388,19 @@ void __weak kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pa
>         kvm_flush_remote_tlbs(kvm);
>  }
>
> +void kvm_flush_remote_tlbs_memslot(struct kvm *kvm, const struct kvm_memory_slot *memslot)
> +{
> +       /*
> +        * All current use cases for flushing the TLBs for a specific memslot
> +        * related to dirty logging, and many do the TLB flush out of mmu_lock.
> +        * The interaction between the various operations on memslot must be
> +        * serialized by slots_locks to ensure the TLB flush from one operation
> +        * is observed by any other operation on the same memslot.
> +        */
> +       lockdep_assert_held(&kvm->slots_lock);
> +       kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
> +}
> +
>  static void kvm_flush_shadow_all(struct kvm *kvm)
>  {
>         kvm_arch_flush_shadow_all(kvm);
> @@ -2197,7 +2210,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>         }
>
>         if (flush)
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
> +               kvm_flush_remote_tlbs_memslot(kvm, memslot);
>
>         if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
>                 return -EFAULT;
> @@ -2314,7 +2327,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>         KVM_MMU_UNLOCK(kvm);
>
>         if (flush)
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
> +               kvm_flush_remote_tlbs_memslot(kvm, memslot);
>
>         return 0;
>  }
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 33/37] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
@ 2022-12-12 22:03     ` Ben Gardon
  0 siblings, 0 replies; 317+ messages in thread
From: Ben Gardon @ 2022-12-12 22:03 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Thu, Dec 8, 2022 at 11:40 AM David Matlack <dmatlack@google.com> wrote:
>
> Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop
> "arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a
> range-based TLB invalidation where the range is defined by the memslot.
> Now that kvm_flush_remote_tlbs_range() can be called from common code we
> can just use that and drop a bunch of duplicate code from the arch
> directories.
>
> Note this adds a lockdep assertion for slot_lock being held when calling
> kvm_flush_remote_tlbs_memslot(), which was previously only asserted on
> x86.

Besides the one lockdep assertion, is there any benefit to having this
wrapper function? Open-coding "kvm_flush_remote_tlbs_range(kvm,
memslot->base_gfn, memslot->npages);" is only a slightly longer line
and, IMO, just as readable. I'm happy to see this cleanup, but it
might be just as easy to drop the function.

>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/arm64/kvm/arm.c     |  6 ------
>  arch/mips/kvm/mips.c     | 10 ++--------
>  arch/riscv/kvm/mmu.c     |  6 ------
>  arch/x86/kvm/mmu/mmu.c   | 16 +---------------
>  arch/x86/kvm/x86.c       |  2 +-
>  include/linux/kvm_host.h |  7 +++----
>  virt/kvm/kvm_main.c      | 17 +++++++++++++++--
>  7 files changed, 22 insertions(+), 42 deletions(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 0e0d4c4f79a2..4f1549c1d2d2 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1430,12 +1430,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
>
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       kvm_flush_remote_tlbs(kvm);
> -}
> -
>  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>                                         struct kvm_arm_device_addr *dev_addr)
>  {
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index a25e0b73ee70..ecd8a051fd6b 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -209,7 +209,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>         /* Flush slot from GPA */
>         kvm_mips_flush_gpa_pt(kvm, slot->base_gfn,
>                               slot->base_gfn + slot->npages - 1);
> -       kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
> +       kvm_flush_remote_tlbs_memslot(kvm, slot);
>         spin_unlock(&kvm->mmu_lock);
>  }
>
> @@ -245,7 +245,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>                 needs_flush = kvm_mips_mkclean_gpa_pt(kvm, new->base_gfn,
>                                         new->base_gfn + new->npages - 1);
>                 if (needs_flush)
> -                       kvm_arch_flush_remote_tlbs_memslot(kvm, new);
> +                       kvm_flush_remote_tlbs_memslot(kvm, new);
>                 spin_unlock(&kvm->mmu_lock);
>         }
>  }
> @@ -997,12 +997,6 @@ int kvm_arch_flush_remote_tlb(struct kvm *kvm)
>         return 1;
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       kvm_flush_remote_tlbs(kvm);
> -}
> -
>  long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
>  {
>         long r;
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index a8281a65cb3d..98bf3719a396 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -406,12 +406,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
>  {
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       kvm_flush_remote_tlbs(kvm);
> -}
> -
>  void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free)
>  {
>  }
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 19963ed83484..f2602ee1771f 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6524,7 +6524,7 @@ static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
>          */
>         if (slot_handle_level(kvm, slot, kvm_mmu_zap_collapsible_spte,
>                               PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
> +               kvm_flush_remote_tlbs_memslot(kvm, slot);
>  }
>
>  void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
> @@ -6543,20 +6543,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
>         }
>  }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot)
> -{
> -       /*
> -        * All current use cases for flushing the TLBs for a specific memslot
> -        * related to dirty logging, and many do the TLB flush out of mmu_lock.
> -        * The interaction between the various operations on memslot must be
> -        * serialized by slots_locks to ensure the TLB flush from one operation
> -        * is observed by any other operation on the same memslot.
> -        */
> -       lockdep_assert_held(&kvm->slots_lock);
> -       kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
> -}
> -
>  void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
>                                    const struct kvm_memory_slot *memslot)
>  {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 517c8ed33542..95ff95da55d5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12574,7 +12574,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
>                  * See is_writable_pte() for more details (the case involving
>                  * access-tracked SPTEs is particularly relevant).
>                  */
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, new);
> +               kvm_flush_remote_tlbs_memslot(kvm, new);
>         }
>  }
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d9a7f559d2c5..46ed0ef4fb79 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1366,6 +1366,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible);
>
>  void kvm_flush_remote_tlbs(struct kvm *kvm);
>  void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> +void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
> +                                  const struct kvm_memory_slot *memslot);
>
>  #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
>  int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
> @@ -1394,10 +1396,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>                                         unsigned long mask);
>  void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
>
> -#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> -                                       const struct kvm_memory_slot *memslot);
> -#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
> +#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
>  int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
>  int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>                       int *is_dirty, struct kvm_memory_slot **memslot);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 662ca280c0cf..39c2efd15504 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -388,6 +388,19 @@ void __weak kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pa
>         kvm_flush_remote_tlbs(kvm);
>  }
>
> +void kvm_flush_remote_tlbs_memslot(struct kvm *kvm, const struct kvm_memory_slot *memslot)
> +{
> +       /*
> +        * All current use cases for flushing the TLBs for a specific memslot
> +        * related to dirty logging, and many do the TLB flush out of mmu_lock.
> +        * The interaction between the various operations on memslot must be
> +        * serialized by slots_locks to ensure the TLB flush from one operation
> +        * is observed by any other operation on the same memslot.
> +        */
> +       lockdep_assert_held(&kvm->slots_lock);
> +       kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
> +}
> +
>  static void kvm_flush_shadow_all(struct kvm *kvm)
>  {
>         kvm_arch_flush_shadow_all(kvm);
> @@ -2197,7 +2210,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>         }
>
>         if (flush)
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
> +               kvm_flush_remote_tlbs_memslot(kvm, memslot);
>
>         if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
>                 return -EFAULT;
> @@ -2314,7 +2327,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>         KVM_MMU_UNLOCK(kvm);
>
>         if (flush)
> -               kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
> +               kvm_flush_remote_tlbs_memslot(kvm, memslot);
>
>         return 0;
>  }
> --
> 2.39.0.rc1.256.g54fd8350bd-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
  2022-12-08 19:38   ` David Matlack
  (?)
  (?)
@ 2022-12-12 22:27     ` Paolo Bonzini
  -1 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:27 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, Ben Gardon, linux-riscv, kvmarm, Yu Zhao, xu xin,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Marc Zyngier,
	Andrew Morton

On 12/8/22 20:38, David Matlack wrote:
> +
> +	/* Derived from mmu and global state.  */
> +	const bool is_tdp;

I think this could stay in the architecture-independent part.

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
@ 2022-12-12 22:27     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:27 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/8/22 20:38, David Matlack wrote:
> +
> +	/* Derived from mmu and global state.  */
> +	const bool is_tdp;

I think this could stay in the architecture-independent part.

Paolo


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
@ 2022-12-12 22:27     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:27 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/8/22 20:38, David Matlack wrote:
> +
> +	/* Derived from mmu and global state.  */
> +	const bool is_tdp;

I think this could stay in the architecture-independent part.

Paolo


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
@ 2022-12-12 22:27     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:27 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/8/22 20:38, David Matlack wrote:
> +
> +	/* Derived from mmu and global state.  */
> +	const bool is_tdp;

I think this could stay in the architecture-independent part.

Paolo


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
  2022-12-12 18:24     ` Ben Gardon
  (?)
  (?)
@ 2022-12-12 22:30       ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 22:30 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Mon, Dec 12, 2022 at 10:24:31AM -0800, Ben Gardon wrote:
> On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
> >
> > Move struct kvm_page_fault to common code. This will be used in a future
> > commit to move the TDP MMU to common code.
> >
> > No functional change intended.
> >
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
[...]
> > diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> > index a9da33d4baa8..9f0ca920bf68 100644
> > --- a/include/kvm/mmu_types.h
> > +++ b/include/kvm/mmu_types.h
> > @@ -66,4 +66,48 @@ struct kvm_mmu_page {
> >         struct kvm_mmu_page_arch arch;
> >  };
> >
> > +struct kvm_page_fault {
> > +       /* The raw faulting address. */
> > +       const gpa_t addr;
> > +
> > +       /* Whether the fault was synthesized to prefetch a mapping. */
> > +       const bool prefetch;
> > +
> > +       /* Information about the cause of the fault. */
> > +       const bool write;
> > +       const bool exec;
> > +
> > +       /* Shifted addr, or result of guest page table walk if shadow paging. */
> > +       gfn_t gfn;
> 
> Is this redundant to have in common code? If we're not doing common
> shadow paging, then this is just addr shifted. Would this be better
> placed in the arch specific struct?

Yes it's redundant but it is actually used by the TDP MMU, unlike @addr.
So if anything I would rather move @addr to kvm_page_fault_arch.

> 
> > +
> > +       /* The memslot that contains @gfn. May be NULL. */
> > +       struct kvm_memory_slot *slot;
> > +
> > +       /* Maximum page size that can be created for this fault. */
> > +       u8 max_level;
> > +
> > +       /*
> > +        * Page size that can be created based on the max_level and the page
> > +        * size used by the host mapping.
> > +        */
> > +       u8 req_level;
> > +
> > +       /* Final page size that will be created. */
> > +       u8 goal_level;
> > +
> > +       /*
> > +        * The value of kvm->mmu_invalidate_seq before fetching the host
> > +        * mapping. Used to verify that the host mapping has not changed
> > +        * after grabbing the MMU lock.
> > +        */
> > +       unsigned long mmu_seq;
> 
> Should this be ifdef'ed with  KVM_ARCH_WANT_MMU_NOTIFIER?

I'll have to take a closer look, but probably yes.

> 
> > +
> > +       /* Information about the host mapping. */
> > +       kvm_pfn_t pfn;
> > +       hva_t hva;
> > +       bool map_writable;
> > +
> > +       struct kvm_page_fault_arch arch;
> > +};
> > +
> >  #endif /* !__KVM_MMU_TYPES_H */
> > --
> > 2.39.0.rc1.256.g54fd8350bd-goog
> >

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
@ 2022-12-12 22:30       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 22:30 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Mon, Dec 12, 2022 at 10:24:31AM -0800, Ben Gardon wrote:
> On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
> >
> > Move struct kvm_page_fault to common code. This will be used in a future
> > commit to move the TDP MMU to common code.
> >
> > No functional change intended.
> >
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
[...]
> > diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> > index a9da33d4baa8..9f0ca920bf68 100644
> > --- a/include/kvm/mmu_types.h
> > +++ b/include/kvm/mmu_types.h
> > @@ -66,4 +66,48 @@ struct kvm_mmu_page {
> >         struct kvm_mmu_page_arch arch;
> >  };
> >
> > +struct kvm_page_fault {
> > +       /* The raw faulting address. */
> > +       const gpa_t addr;
> > +
> > +       /* Whether the fault was synthesized to prefetch a mapping. */
> > +       const bool prefetch;
> > +
> > +       /* Information about the cause of the fault. */
> > +       const bool write;
> > +       const bool exec;
> > +
> > +       /* Shifted addr, or result of guest page table walk if shadow paging. */
> > +       gfn_t gfn;
> 
> Is this redundant to have in common code? If we're not doing common
> shadow paging, then this is just addr shifted. Would this be better
> placed in the arch specific struct?

Yes it's redundant but it is actually used by the TDP MMU, unlike @addr.
So if anything I would rather move @addr to kvm_page_fault_arch.

> 
> > +
> > +       /* The memslot that contains @gfn. May be NULL. */
> > +       struct kvm_memory_slot *slot;
> > +
> > +       /* Maximum page size that can be created for this fault. */
> > +       u8 max_level;
> > +
> > +       /*
> > +        * Page size that can be created based on the max_level and the page
> > +        * size used by the host mapping.
> > +        */
> > +       u8 req_level;
> > +
> > +       /* Final page size that will be created. */
> > +       u8 goal_level;
> > +
> > +       /*
> > +        * The value of kvm->mmu_invalidate_seq before fetching the host
> > +        * mapping. Used to verify that the host mapping has not changed
> > +        * after grabbing the MMU lock.
> > +        */
> > +       unsigned long mmu_seq;
> 
> Should this be ifdef'ed with  KVM_ARCH_WANT_MMU_NOTIFIER?

I'll have to take a closer look, but probably yes.

> 
> > +
> > +       /* Information about the host mapping. */
> > +       kvm_pfn_t pfn;
> > +       hva_t hva;
> > +       bool map_writable;
> > +
> > +       struct kvm_page_fault_arch arch;
> > +};
> > +
> >  #endif /* !__KVM_MMU_TYPES_H */
> > --
> > 2.39.0.rc1.256.g54fd8350bd-goog
> >

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
@ 2022-12-12 22:30       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 22:30 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, linux-riscv, kvmarm, Yu Zhao, Marc Zyngier,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Paolo Bonzini,
	Andrew Morton

On Mon, Dec 12, 2022 at 10:24:31AM -0800, Ben Gardon wrote:
> On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
> >
> > Move struct kvm_page_fault to common code. This will be used in a future
> > commit to move the TDP MMU to common code.
> >
> > No functional change intended.
> >
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
[...]
> > diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> > index a9da33d4baa8..9f0ca920bf68 100644
> > --- a/include/kvm/mmu_types.h
> > +++ b/include/kvm/mmu_types.h
> > @@ -66,4 +66,48 @@ struct kvm_mmu_page {
> >         struct kvm_mmu_page_arch arch;
> >  };
> >
> > +struct kvm_page_fault {
> > +       /* The raw faulting address. */
> > +       const gpa_t addr;
> > +
> > +       /* Whether the fault was synthesized to prefetch a mapping. */
> > +       const bool prefetch;
> > +
> > +       /* Information about the cause of the fault. */
> > +       const bool write;
> > +       const bool exec;
> > +
> > +       /* Shifted addr, or result of guest page table walk if shadow paging. */
> > +       gfn_t gfn;
> 
> Is this redundant to have in common code? If we're not doing common
> shadow paging, then this is just addr shifted. Would this be better
> placed in the arch specific struct?

Yes it's redundant but it is actually used by the TDP MMU, unlike @addr.
So if anything I would rather move @addr to kvm_page_fault_arch.

> 
> > +
> > +       /* The memslot that contains @gfn. May be NULL. */
> > +       struct kvm_memory_slot *slot;
> > +
> > +       /* Maximum page size that can be created for this fault. */
> > +       u8 max_level;
> > +
> > +       /*
> > +        * Page size that can be created based on the max_level and the page
> > +        * size used by the host mapping.
> > +        */
> > +       u8 req_level;
> > +
> > +       /* Final page size that will be created. */
> > +       u8 goal_level;
> > +
> > +       /*
> > +        * The value of kvm->mmu_invalidate_seq before fetching the host
> > +        * mapping. Used to verify that the host mapping has not changed
> > +        * after grabbing the MMU lock.
> > +        */
> > +       unsigned long mmu_seq;
> 
> Should this be ifdef'ed with  KVM_ARCH_WANT_MMU_NOTIFIER?

I'll have to take a closer look, but probably yes.

> 
> > +
> > +       /* Information about the host mapping. */
> > +       kvm_pfn_t pfn;
> > +       hva_t hva;
> > +       bool map_writable;
> > +
> > +       struct kvm_page_fault_arch arch;
> > +};
> > +
> >  #endif /* !__KVM_MMU_TYPES_H */
> > --
> > 2.39.0.rc1.256.g54fd8350bd-goog
> >
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
@ 2022-12-12 22:30       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 22:30 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Mon, Dec 12, 2022 at 10:24:31AM -0800, Ben Gardon wrote:
> On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
> >
> > Move struct kvm_page_fault to common code. This will be used in a future
> > commit to move the TDP MMU to common code.
> >
> > No functional change intended.
> >
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
[...]
> > diff --git a/include/kvm/mmu_types.h b/include/kvm/mmu_types.h
> > index a9da33d4baa8..9f0ca920bf68 100644
> > --- a/include/kvm/mmu_types.h
> > +++ b/include/kvm/mmu_types.h
> > @@ -66,4 +66,48 @@ struct kvm_mmu_page {
> >         struct kvm_mmu_page_arch arch;
> >  };
> >
> > +struct kvm_page_fault {
> > +       /* The raw faulting address. */
> > +       const gpa_t addr;
> > +
> > +       /* Whether the fault was synthesized to prefetch a mapping. */
> > +       const bool prefetch;
> > +
> > +       /* Information about the cause of the fault. */
> > +       const bool write;
> > +       const bool exec;
> > +
> > +       /* Shifted addr, or result of guest page table walk if shadow paging. */
> > +       gfn_t gfn;
> 
> Is this redundant to have in common code? If we're not doing common
> shadow paging, then this is just addr shifted. Would this be better
> placed in the arch specific struct?

Yes it's redundant but it is actually used by the TDP MMU, unlike @addr.
So if anything I would rather move @addr to kvm_page_fault_arch.

> 
> > +
> > +       /* The memslot that contains @gfn. May be NULL. */
> > +       struct kvm_memory_slot *slot;
> > +
> > +       /* Maximum page size that can be created for this fault. */
> > +       u8 max_level;
> > +
> > +       /*
> > +        * Page size that can be created based on the max_level and the page
> > +        * size used by the host mapping.
> > +        */
> > +       u8 req_level;
> > +
> > +       /* Final page size that will be created. */
> > +       u8 goal_level;
> > +
> > +       /*
> > +        * The value of kvm->mmu_invalidate_seq before fetching the host
> > +        * mapping. Used to verify that the host mapping has not changed
> > +        * after grabbing the MMU lock.
> > +        */
> > +       unsigned long mmu_seq;
> 
> Should this be ifdef'ed with  KVM_ARCH_WANT_MMU_NOTIFIER?

I'll have to take a closer look, but probably yes.

> 
> > +
> > +       /* Information about the host mapping. */
> > +       kvm_pfn_t pfn;
> > +       hva_t hva;
> > +       bool map_writable;
> > +
> > +       struct kvm_page_fault_arch arch;
> > +};
> > +
> >  #endif /* !__KVM_MMU_TYPES_H */
> > --
> > 2.39.0.rc1.256.g54fd8350bd-goog
> >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
  2022-12-08 19:38   ` David Matlack
  (?)
  (?)
@ 2022-12-12 22:32     ` Paolo Bonzini
  -1 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:32 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/8/22 20:38, David Matlack wrote:
> This commit increases the size of struct kvm_mmu_page by 64 bytes on
> x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
> in future commits by moving TDP MMU root fields into a separate struct
> and by dynamically allocating fields only used by the Shadow MMU.

I think it's already possible to use a union like

	union {
		struct kvm_mmu_page_arch arch;
		struct {
			struct work_struct work;
			void *data;
		};
	};

Paolo


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
@ 2022-12-12 22:32     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:32 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/8/22 20:38, David Matlack wrote:
> This commit increases the size of struct kvm_mmu_page by 64 bytes on
> x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
> in future commits by moving TDP MMU root fields into a separate struct
> and by dynamically allocating fields only used by the Shadow MMU.

I think it's already possible to use a union like

	union {
		struct kvm_mmu_page_arch arch;
		struct {
			struct work_struct work;
			void *data;
		};
	};

Paolo


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
@ 2022-12-12 22:32     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:32 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, Ben Gardon, linux-riscv, kvmarm, Yu Zhao, xu xin,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Marc Zyngier,
	Andrew Morton

On 12/8/22 20:38, David Matlack wrote:
> This commit increases the size of struct kvm_mmu_page by 64 bytes on
> x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
> in future commits by moving TDP MMU root fields into a separate struct
> and by dynamically allocating fields only used by the Shadow MMU.

I think it's already possible to use a union like

	union {
		struct kvm_mmu_page_arch arch;
		struct {
			struct work_struct work;
			void *data;
		};
	};

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
@ 2022-12-12 22:32     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:32 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/8/22 20:38, David Matlack wrote:
> This commit increases the size of struct kvm_mmu_page by 64 bytes on
> x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
> in future commits by moving TDP MMU root fields into a separate struct
> and by dynamically allocating fields only used by the Shadow MMU.

I think it's already possible to use a union like

	union {
		struct kvm_mmu_page_arch arch;
		struct {
			struct work_struct work;
			void *data;
		};
	};

Paolo


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 33/37] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
  2022-12-12 22:03     ` Ben Gardon
  (?)
  (?)
@ 2022-12-12 22:42       ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 22:42 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Mon, Dec 12, 2022 at 2:03 PM Ben Gardon <bgardon@google.com> wrote:
>
> On Thu, Dec 8, 2022 at 11:40 AM David Matlack <dmatlack@google.com> wrote:
> >
> > Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop
> > "arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a
> > range-based TLB invalidation where the range is defined by the memslot.
> > Now that kvm_flush_remote_tlbs_range() can be called from common code we
> > can just use that and drop a bunch of duplicate code from the arch
> > directories.
> >
> > Note this adds a lockdep assertion for slot_lock being held when calling
> > kvm_flush_remote_tlbs_memslot(), which was previously only asserted on
> > x86.
>
> Besides the one lockdep assertion, is there any benefit to having this
> wrapper function? Open-coding "kvm_flush_remote_tlbs_range(kvm,
> memslot->base_gfn, memslot->npages);" is only a slightly longer line
> and, IMO, just as readable. I'm happy to see this cleanup, but it
> might be just as easy to drop the function.

The wrapper makes lines shorter, adds a lockdep assertion, and is just
as readable. What's the reason to drop it?

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 33/37] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
@ 2022-12-12 22:42       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 22:42 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Mon, Dec 12, 2022 at 2:03 PM Ben Gardon <bgardon@google.com> wrote:
>
> On Thu, Dec 8, 2022 at 11:40 AM David Matlack <dmatlack@google.com> wrote:
> >
> > Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop
> > "arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a
> > range-based TLB invalidation where the range is defined by the memslot.
> > Now that kvm_flush_remote_tlbs_range() can be called from common code we
> > can just use that and drop a bunch of duplicate code from the arch
> > directories.
> >
> > Note this adds a lockdep assertion for slot_lock being held when calling
> > kvm_flush_remote_tlbs_memslot(), which was previously only asserted on
> > x86.
>
> Besides the one lockdep assertion, is there any benefit to having this
> wrapper function? Open-coding "kvm_flush_remote_tlbs_range(kvm,
> memslot->base_gfn, memslot->npages);" is only a slightly longer line
> and, IMO, just as readable. I'm happy to see this cleanup, but it
> might be just as easy to drop the function.

The wrapper makes lines shorter, adds a lockdep assertion, and is just
as readable. What's the reason to drop it?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 33/37] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
@ 2022-12-12 22:42       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 22:42 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, linux-riscv, kvmarm, Yu Zhao, Marc Zyngier,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Paolo Bonzini,
	Andrew Morton

On Mon, Dec 12, 2022 at 2:03 PM Ben Gardon <bgardon@google.com> wrote:
>
> On Thu, Dec 8, 2022 at 11:40 AM David Matlack <dmatlack@google.com> wrote:
> >
> > Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop
> > "arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a
> > range-based TLB invalidation where the range is defined by the memslot.
> > Now that kvm_flush_remote_tlbs_range() can be called from common code we
> > can just use that and drop a bunch of duplicate code from the arch
> > directories.
> >
> > Note this adds a lockdep assertion for slot_lock being held when calling
> > kvm_flush_remote_tlbs_memslot(), which was previously only asserted on
> > x86.
>
> Besides the one lockdep assertion, is there any benefit to having this
> wrapper function? Open-coding "kvm_flush_remote_tlbs_range(kvm,
> memslot->base_gfn, memslot->npages);" is only a slightly longer line
> and, IMO, just as readable. I'm happy to see this cleanup, but it
> might be just as easy to drop the function.

The wrapper makes lines shorter, adds a lockdep assertion, and is just
as readable. What's the reason to drop it?
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 33/37] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code
@ 2022-12-12 22:42       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 22:42 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Oliver Upton, Huacai Chen, Aleksandar Markovic,
	Anup Patel, Atish Patra, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Mon, Dec 12, 2022 at 2:03 PM Ben Gardon <bgardon@google.com> wrote:
>
> On Thu, Dec 8, 2022 at 11:40 AM David Matlack <dmatlack@google.com> wrote:
> >
> > Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop
> > "arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a
> > range-based TLB invalidation where the range is defined by the memslot.
> > Now that kvm_flush_remote_tlbs_range() can be called from common code we
> > can just use that and drop a bunch of duplicate code from the arch
> > directories.
> >
> > Note this adds a lockdep assertion for slot_lock being held when calling
> > kvm_flush_remote_tlbs_memslot(), which was previously only asserted on
> > x86.
>
> Besides the one lockdep assertion, is there any benefit to having this
> wrapper function? Open-coding "kvm_flush_remote_tlbs_range(kvm,
> memslot->base_gfn, memslot->npages);" is only a slightly longer line
> and, IMO, just as readable. I'm happy to see this cleanup, but it
> might be just as easy to drop the function.

The wrapper makes lines shorter, adds a lockdep assertion, and is just
as readable. What's the reason to drop it?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
  2022-12-12 22:32     ` Paolo Bonzini
  (?)
  (?)
@ 2022-12-12 22:49       ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 22:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 2:32 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/8/22 20:38, David Matlack wrote:
> > This commit increases the size of struct kvm_mmu_page by 64 bytes on
> > x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
> > in future commits by moving TDP MMU root fields into a separate struct
> > and by dynamically allocating fields only used by the Shadow MMU.
>
> I think it's already possible to use a union like
>
>         union {
>                 struct kvm_mmu_page_arch arch;
>                 struct {
>                         struct work_struct work;
>                         void *data;
>                 };
>         };

That would potentially corrupt
kvm_mmu_page_arch.nx_huge_page_disallowed and
possible_nx_huge_page_link, both of which still need to be used by the
TDP MMU on x86.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
@ 2022-12-12 22:49       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 22:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, Ben Gardon, linux-riscv, kvmarm, Yu Zhao, xu xin,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Marc Zyngier,
	Andrew Morton

On Mon, Dec 12, 2022 at 2:32 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/8/22 20:38, David Matlack wrote:
> > This commit increases the size of struct kvm_mmu_page by 64 bytes on
> > x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
> > in future commits by moving TDP MMU root fields into a separate struct
> > and by dynamically allocating fields only used by the Shadow MMU.
>
> I think it's already possible to use a union like
>
>         union {
>                 struct kvm_mmu_page_arch arch;
>                 struct {
>                         struct work_struct work;
>                         void *data;
>                 };
>         };

That would potentially corrupt
kvm_mmu_page_arch.nx_huge_page_disallowed and
possible_nx_huge_page_link, both of which still need to be used by the
TDP MMU on x86.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
@ 2022-12-12 22:49       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 22:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 2:32 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/8/22 20:38, David Matlack wrote:
> > This commit increases the size of struct kvm_mmu_page by 64 bytes on
> > x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
> > in future commits by moving TDP MMU root fields into a separate struct
> > and by dynamically allocating fields only used by the Shadow MMU.
>
> I think it's already possible to use a union like
>
>         union {
>                 struct kvm_mmu_page_arch arch;
>                 struct {
>                         struct work_struct work;
>                         void *data;
>                 };
>         };

That would potentially corrupt
kvm_mmu_page_arch.nx_huge_page_disallowed and
possible_nx_huge_page_link, both of which still need to be used by the
TDP MMU on x86.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code
@ 2022-12-12 22:49       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-12 22:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 2:32 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/8/22 20:38, David Matlack wrote:
> > This commit increases the size of struct kvm_mmu_page by 64 bytes on
> > x86_64 (184 bytes -> 248 bytes). The size of this struct can be reduced
> > in future commits by moving TDP MMU root fields into a separate struct
> > and by dynamically allocating fields only used by the Shadow MMU.
>
> I think it's already possible to use a union like
>
>         union {
>                 struct kvm_mmu_page_arch arch;
>                 struct {
>                         struct work_struct work;
>                         void *data;
>                 };
>         };

That would potentially corrupt
kvm_mmu_page_arch.nx_huge_page_disallowed and
possible_nx_huge_page_link, both of which still need to be used by the
TDP MMU on x86.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  2022-12-12 17:39           ` Sean Christopherson
  (?)
  (?)
@ 2022-12-12 22:50             ` Paolo Bonzini
  -1 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:50 UTC (permalink / raw)
  To: Sean Christopherson, David Matlack
  Cc: Oliver Upton, Yang, Weijiang, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/12/22 18:39, Sean Christopherson wrote:
>> The notion of address spaces is already existing architecture-neutral
>> concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
>> virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
> 
> Yes, SMM is currently the only use-case.

It's possible that in the future Hyper-V VTLs will also have per-level 
protections.  It wouldn't use as_id, but it would likely be recorded in 
the upper byte of the role.

I'm not sure if Microsoft intends to port those to ARM as well.

> My preference would be to leave .smm in x86's page role

What about defining a byte of arch_role and a macro to build it?


> Unless I'm missing something, e.g. a need to map GPAs differently for
> SMM vs. non-SMM, SMM could have been implemented with a simple flag
> in a memslot to mark the memslot as SMM-only.

Unfortunately not, because there can be another region (for example 
video RAM at 0A0000h) underneath SMRAM.

In fact, KVM_MEM_X86_SMRAM was the first idea.  It was uglier than 
multiple address spaces 
(https://lore.kernel.org/lkml/1431084034-8425-1-git-send-email-pbonzini@redhat.com). 
  In retrospect there were probably ways to mix the best of the two 
designs, but it wasn't generic enough.

> And separate address spaces become truly nasty if the CPU can access multiple
> protected regions, e.g. if the CPU can access type X and type Y at the same time,
> then there would need to be memslots for "regular", X, Y, and X+Y.

Without a usecase that's hard to say.  It's just as possible that there 
would be a natural hierarchy of levels.

Paolo


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-12 22:50             ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:50 UTC (permalink / raw)
  To: Sean Christopherson, David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Yang, Weijiang,
	Amit, Nadav, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, xu xin, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Marc Zyngier,
	Andrew Morton

On 12/12/22 18:39, Sean Christopherson wrote:
>> The notion of address spaces is already existing architecture-neutral
>> concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
>> virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
> 
> Yes, SMM is currently the only use-case.

It's possible that in the future Hyper-V VTLs will also have per-level 
protections.  It wouldn't use as_id, but it would likely be recorded in 
the upper byte of the role.

I'm not sure if Microsoft intends to port those to ARM as well.

> My preference would be to leave .smm in x86's page role

What about defining a byte of arch_role and a macro to build it?


> Unless I'm missing something, e.g. a need to map GPAs differently for
> SMM vs. non-SMM, SMM could have been implemented with a simple flag
> in a memslot to mark the memslot as SMM-only.

Unfortunately not, because there can be another region (for example 
video RAM at 0A0000h) underneath SMRAM.

In fact, KVM_MEM_X86_SMRAM was the first idea.  It was uglier than 
multiple address spaces 
(https://lore.kernel.org/lkml/1431084034-8425-1-git-send-email-pbonzini@redhat.com). 
  In retrospect there were probably ways to mix the best of the two 
designs, but it wasn't generic enough.

> And separate address spaces become truly nasty if the CPU can access multiple
> protected regions, e.g. if the CPU can access type X and type Y at the same time,
> then there would need to be memslots for "regular", X, Y, and X+Y.

Without a usecase that's hard to say.  It's just as possible that there 
would be a natural hierarchy of levels.

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-12 22:50             ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:50 UTC (permalink / raw)
  To: Sean Christopherson, David Matlack
  Cc: Oliver Upton, Yang, Weijiang, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/12/22 18:39, Sean Christopherson wrote:
>> The notion of address spaces is already existing architecture-neutral
>> concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
>> virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
> 
> Yes, SMM is currently the only use-case.

It's possible that in the future Hyper-V VTLs will also have per-level 
protections.  It wouldn't use as_id, but it would likely be recorded in 
the upper byte of the role.

I'm not sure if Microsoft intends to port those to ARM as well.

> My preference would be to leave .smm in x86's page role

What about defining a byte of arch_role and a macro to build it?


> Unless I'm missing something, e.g. a need to map GPAs differently for
> SMM vs. non-SMM, SMM could have been implemented with a simple flag
> in a memslot to mark the memslot as SMM-only.

Unfortunately not, because there can be another region (for example 
video RAM at 0A0000h) underneath SMRAM.

In fact, KVM_MEM_X86_SMRAM was the first idea.  It was uglier than 
multiple address spaces 
(https://lore.kernel.org/lkml/1431084034-8425-1-git-send-email-pbonzini@redhat.com). 
  In retrospect there were probably ways to mix the best of the two 
designs, but it wasn't generic enough.

> And separate address spaces become truly nasty if the CPU can access multiple
> protected regions, e.g. if the CPU can access type X and type Y at the same time,
> then there would need to be memslots for "regular", X, Y, and X+Y.

Without a usecase that's hard to say.  It's just as possible that there 
would be a natural hierarchy of levels.

Paolo


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-12 22:50             ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:50 UTC (permalink / raw)
  To: Sean Christopherson, David Matlack
  Cc: Oliver Upton, Yang, Weijiang, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/12/22 18:39, Sean Christopherson wrote:
>> The notion of address spaces is already existing architecture-neutral
>> concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
>> virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
> 
> Yes, SMM is currently the only use-case.

It's possible that in the future Hyper-V VTLs will also have per-level 
protections.  It wouldn't use as_id, but it would likely be recorded in 
the upper byte of the role.

I'm not sure if Microsoft intends to port those to ARM as well.

> My preference would be to leave .smm in x86's page role

What about defining a byte of arch_role and a macro to build it?


> Unless I'm missing something, e.g. a need to map GPAs differently for
> SMM vs. non-SMM, SMM could have been implemented with a simple flag
> in a memslot to mark the memslot as SMM-only.

Unfortunately not, because there can be another region (for example 
video RAM at 0A0000h) underneath SMRAM.

In fact, KVM_MEM_X86_SMRAM was the first idea.  It was uglier than 
multiple address spaces 
(https://lore.kernel.org/lkml/1431084034-8425-1-git-send-email-pbonzini@redhat.com). 
  In retrospect there were probably ways to mix the best of the two 
designs, but it wasn't generic enough.

> And separate address spaces become truly nasty if the CPU can access multiple
> protected regions, e.g. if the CPU can access type X and type Y at the same time,
> then there would need to be memslots for "regular", X, Y, and X+Y.

Without a usecase that's hard to say.  It's just as possible that there 
would be a natural hierarchy of levels.

Paolo


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
  2022-12-09 19:07   ` Oliver Upton
  (?)
  (?)
@ 2022-12-12 22:54     ` Paolo Bonzini
  -1 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:54 UTC (permalink / raw)
  To: Oliver Upton, David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Sean Christopherson,
	Andrew Morton, Anshuman Khandual, Nadav Amit,
	Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/9/22 20:07, Oliver Upton wrote:
>>   - Naming. This series does not change the names of any existing code.
>>     So all the KVM/x86 Shadow MMU-style terminology like
>>     "shadow_page"/"sp"/"spte" persists. Should we keep that style in
>>     common code or move toward something less shadow-paging-specific?
>>     e.g. "page_table"/"pt"/"pte".
>
> I would strongly be in favor of discarding the shadow paging residue if
> x86 folks are willing to part ways with it 😄

Yes, absolutely.  Something like to_shadow_page->to_mmu_data, sp->md, 
spt->hpt, spte->spte, sptep->hptep.

Paolo


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-12 22:54     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:54 UTC (permalink / raw)
  To: Oliver Upton, David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Ben Gardon, linux-riscv, kvmarm, Yu Zhao, xu xin, Huacai Chen,
	Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, Colin Cross, kvm-riscv,
	Marc Zyngier, Andrew Morton

On 12/9/22 20:07, Oliver Upton wrote:
>>   - Naming. This series does not change the names of any existing code.
>>     So all the KVM/x86 Shadow MMU-style terminology like
>>     "shadow_page"/"sp"/"spte" persists. Should we keep that style in
>>     common code or move toward something less shadow-paging-specific?
>>     e.g. "page_table"/"pt"/"pte".
>
> I would strongly be in favor of discarding the shadow paging residue if
> x86 folks are willing to part ways with it 😄

Yes, absolutely.  Something like to_shadow_page->to_mmu_data, sp->md, 
spt->hpt, spte->spte, sptep->hptep.

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-12 22:54     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:54 UTC (permalink / raw)
  To: Oliver Upton, David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Sean Christopherson,
	Andrew Morton, Anshuman Khandual, Nadav Amit,
	Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/9/22 20:07, Oliver Upton wrote:
>>   - Naming. This series does not change the names of any existing code.
>>     So all the KVM/x86 Shadow MMU-style terminology like
>>     "shadow_page"/"sp"/"spte" persists. Should we keep that style in
>>     common code or move toward something less shadow-paging-specific?
>>     e.g. "page_table"/"pt"/"pte".
>
> I would strongly be in favor of discarding the shadow paging residue if
> x86 folks are willing to part ways with it 😄

Yes, absolutely.  Something like to_shadow_page->to_mmu_data, sp->md, 
spt->hpt, spte->spte, sptep->hptep.

Paolo


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-12 22:54     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 22:54 UTC (permalink / raw)
  To: Oliver Upton, David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Sean Christopherson,
	Andrew Morton, Anshuman Khandual, Nadav Amit,
	Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/9/22 20:07, Oliver Upton wrote:
>>   - Naming. This series does not change the names of any existing code.
>>     So all the KVM/x86 Shadow MMU-style terminology like
>>     "shadow_page"/"sp"/"spte" persists. Should we keep that style in
>>     common code or move toward something less shadow-paging-specific?
>>     e.g. "page_table"/"pt"/"pte".
>
> I would strongly be in favor of discarding the shadow paging residue if
> x86 folks are willing to part ways with it 😄

Yes, absolutely.  Something like to_shadow_page->to_mmu_data, sp->md, 
spt->hpt, spte->spte, sptep->hptep.

Paolo


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
  2022-12-08 19:38   ` David Matlack
  (?)
  (?)
@ 2022-12-12 23:11     ` Paolo Bonzini
  -1 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 23:11 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/8/22 20:38, David Matlack wrote:
> +/*
> + * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> + * also includes TDP pages) to determine whether or not a page can be used in
> + * the given MMU context.
> + */
> +union kvm_mmu_page_role {
> +	u32 word;
> +	struct {
> +		struct {
> +			/* The address space ID mapped by the page. */
> +			u16 as_id:8;
> +
> +			/* The level of the page in the page table hierarchy. */
> +			u16 level:4;
> +
> +			/* Whether the page is invalid, i.e. pending destruction. */
> +			u16 invalid:1;
> +		};
> +
> +		/* Architecture-specific properties. */
> +		struct kvm_mmu_page_role_arch arch;
> +	};
> +};
> +

Have you considered adding a tdp_mmu:1 field to the arch-independent 
part?  I think that as long as _that_ field is the same, there's no need 
to have any overlap between TDP MMU and shadow MMU roles.

I'm not even sure if the x86 TDP MMU needs _any_ other role bit.  It 
needs of course the above three, and it also needs "direct" but it is 
used exactly to mean "is this a TDP MMU page".  So we could have

union {
	struct {
		u32 tdp_mmu:1;
		u32 invalid:1;
		u32 :6;
		u32 level:8;
		u32 arch:8;
		u32 :8;
	} tdp;
	/* the first field must be "u32 tdp_mmu:1;" */
	struct kvm_mmu_page_role_arch shadow;
};

Paolo


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
@ 2022-12-12 23:11     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 23:11 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, Ben Gardon, linux-riscv, kvmarm, Yu Zhao, xu xin,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Marc Zyngier,
	Andrew Morton

On 12/8/22 20:38, David Matlack wrote:
> +/*
> + * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> + * also includes TDP pages) to determine whether or not a page can be used in
> + * the given MMU context.
> + */
> +union kvm_mmu_page_role {
> +	u32 word;
> +	struct {
> +		struct {
> +			/* The address space ID mapped by the page. */
> +			u16 as_id:8;
> +
> +			/* The level of the page in the page table hierarchy. */
> +			u16 level:4;
> +
> +			/* Whether the page is invalid, i.e. pending destruction. */
> +			u16 invalid:1;
> +		};
> +
> +		/* Architecture-specific properties. */
> +		struct kvm_mmu_page_role_arch arch;
> +	};
> +};
> +

Have you considered adding a tdp_mmu:1 field to the arch-independent 
part?  I think that as long as _that_ field is the same, there's no need 
to have any overlap between TDP MMU and shadow MMU roles.

I'm not even sure if the x86 TDP MMU needs _any_ other role bit.  It 
needs of course the above three, and it also needs "direct" but it is 
used exactly to mean "is this a TDP MMU page".  So we could have

union {
	struct {
		u32 tdp_mmu:1;
		u32 invalid:1;
		u32 :6;
		u32 level:8;
		u32 arch:8;
		u32 :8;
	} tdp;
	/* the first field must be "u32 tdp_mmu:1;" */
	struct kvm_mmu_page_role_arch shadow;
};

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
@ 2022-12-12 23:11     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 23:11 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/8/22 20:38, David Matlack wrote:
> +/*
> + * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> + * also includes TDP pages) to determine whether or not a page can be used in
> + * the given MMU context.
> + */
> +union kvm_mmu_page_role {
> +	u32 word;
> +	struct {
> +		struct {
> +			/* The address space ID mapped by the page. */
> +			u16 as_id:8;
> +
> +			/* The level of the page in the page table hierarchy. */
> +			u16 level:4;
> +
> +			/* Whether the page is invalid, i.e. pending destruction. */
> +			u16 invalid:1;
> +		};
> +
> +		/* Architecture-specific properties. */
> +		struct kvm_mmu_page_role_arch arch;
> +	};
> +};
> +

Have you considered adding a tdp_mmu:1 field to the arch-independent 
part?  I think that as long as _that_ field is the same, there's no need 
to have any overlap between TDP MMU and shadow MMU roles.

I'm not even sure if the x86 TDP MMU needs _any_ other role bit.  It 
needs of course the above three, and it also needs "direct" but it is 
used exactly to mean "is this a TDP MMU page".  So we could have

union {
	struct {
		u32 tdp_mmu:1;
		u32 invalid:1;
		u32 :6;
		u32 level:8;
		u32 arch:8;
		u32 :8;
	} tdp;
	/* the first field must be "u32 tdp_mmu:1;" */
	struct kvm_mmu_page_role_arch shadow;
};

Paolo


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
@ 2022-12-12 23:11     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 23:11 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/8/22 20:38, David Matlack wrote:
> +/*
> + * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> + * also includes TDP pages) to determine whether or not a page can be used in
> + * the given MMU context.
> + */
> +union kvm_mmu_page_role {
> +	u32 word;
> +	struct {
> +		struct {
> +			/* The address space ID mapped by the page. */
> +			u16 as_id:8;
> +
> +			/* The level of the page in the page table hierarchy. */
> +			u16 level:4;
> +
> +			/* Whether the page is invalid, i.e. pending destruction. */
> +			u16 invalid:1;
> +		};
> +
> +		/* Architecture-specific properties. */
> +		struct kvm_mmu_page_role_arch arch;
> +	};
> +};
> +

Have you considered adding a tdp_mmu:1 field to the arch-independent 
part?  I think that as long as _that_ field is the same, there's no need 
to have any overlap between TDP MMU and shadow MMU roles.

I'm not even sure if the x86 TDP MMU needs _any_ other role bit.  It 
needs of course the above three, and it also needs "direct" but it is 
used exactly to mean "is this a TDP MMU page".  So we could have

union {
	struct {
		u32 tdp_mmu:1;
		u32 invalid:1;
		u32 :6;
		u32 level:8;
		u32 arch:8;
		u32 :8;
	} tdp;
	/* the first field must be "u32 tdp_mmu:1;" */
	struct kvm_mmu_page_role_arch shadow;
};

Paolo


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 04/37] KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
  2022-12-08 19:38   ` David Matlack
  (?)
  (?)
@ 2022-12-12 23:15     ` Paolo Bonzini
  -1 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 23:15 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, Ben Gardon, linux-riscv, kvmarm, Yu Zhao, xu xin,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Marc Zyngier,
	Andrew Morton

On 12/8/22 20:38, David Matlack wrote:
> Invert the meaning of sp->tdp_mmu_page and rename it accordingly. This
> allows the TDP MMU code to not care about this field, which will be used
> in a subsequent commit to move the TDP MMU to common code.
> 
> No functional change intended.

Let's use a bit of the role instead.

Paolo

> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c          | 1 +
>   arch/x86/kvm/mmu/mmu_internal.h | 2 +-
>   arch/x86/kvm/mmu/tdp_mmu.c      | 3 ---
>   arch/x86/kvm/mmu/tdp_mmu.h      | 5 ++++-
>   4 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 355548603960..f7668a32721d 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2180,6 +2180,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>   
>   	sp->gfn = gfn;
>   	sp->role = role;
> +	sp->shadow_mmu_page = true;
>   	hlist_add_head(&sp->hash_link, sp_list);
>   	if (sp_has_gptes(sp))
>   		account_shadowed(kvm, sp);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index e32379c5b1ad..c1a379fba24d 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -52,7 +52,7 @@ struct kvm_mmu_page {
>   	struct list_head link;
>   	struct hlist_node hash_link;
>   
> -	bool tdp_mmu_page;
> +	bool shadow_mmu_page;
>   	bool unsync;
>   	u8 mmu_valid_gen;
>   
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 7ccac1aa8df6..fc0b87ceb1ea 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -133,8 +133,6 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
>   	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
>   		return;
>   
> -	WARN_ON(!is_tdp_mmu_page(root));
> -
>   	/*
>   	 * The root now has refcount=0.  It is valid, but readers already
>   	 * cannot acquire a reference to it because kvm_tdp_mmu_get_root()
> @@ -279,7 +277,6 @@ static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
>   	sp->role = role;
>   	sp->gfn = gfn;
>   	sp->ptep = sptep;
> -	sp->tdp_mmu_page = true;
>   
>   	trace_kvm_mmu_get_page(sp, true);
>   }
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index 0a63b1afabd3..18d3719f14ea 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -71,7 +71,10 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
>   					u64 *spte);
>   
>   #ifdef CONFIG_X86_64
> -static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
> +static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
> +{
> +	return !sp->shadow_mmu_page;
> +}
>   #else
>   static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
>   #endif

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 04/37] KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
@ 2022-12-12 23:15     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 23:15 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/8/22 20:38, David Matlack wrote:
> Invert the meaning of sp->tdp_mmu_page and rename it accordingly. This
> allows the TDP MMU code to not care about this field, which will be used
> in a subsequent commit to move the TDP MMU to common code.
> 
> No functional change intended.

Let's use a bit of the role instead.

Paolo

> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c          | 1 +
>   arch/x86/kvm/mmu/mmu_internal.h | 2 +-
>   arch/x86/kvm/mmu/tdp_mmu.c      | 3 ---
>   arch/x86/kvm/mmu/tdp_mmu.h      | 5 ++++-
>   4 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 355548603960..f7668a32721d 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2180,6 +2180,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>   
>   	sp->gfn = gfn;
>   	sp->role = role;
> +	sp->shadow_mmu_page = true;
>   	hlist_add_head(&sp->hash_link, sp_list);
>   	if (sp_has_gptes(sp))
>   		account_shadowed(kvm, sp);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index e32379c5b1ad..c1a379fba24d 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -52,7 +52,7 @@ struct kvm_mmu_page {
>   	struct list_head link;
>   	struct hlist_node hash_link;
>   
> -	bool tdp_mmu_page;
> +	bool shadow_mmu_page;
>   	bool unsync;
>   	u8 mmu_valid_gen;
>   
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 7ccac1aa8df6..fc0b87ceb1ea 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -133,8 +133,6 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
>   	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
>   		return;
>   
> -	WARN_ON(!is_tdp_mmu_page(root));
> -
>   	/*
>   	 * The root now has refcount=0.  It is valid, but readers already
>   	 * cannot acquire a reference to it because kvm_tdp_mmu_get_root()
> @@ -279,7 +277,6 @@ static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
>   	sp->role = role;
>   	sp->gfn = gfn;
>   	sp->ptep = sptep;
> -	sp->tdp_mmu_page = true;
>   
>   	trace_kvm_mmu_get_page(sp, true);
>   }
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index 0a63b1afabd3..18d3719f14ea 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -71,7 +71,10 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
>   					u64 *spte);
>   
>   #ifdef CONFIG_X86_64
> -static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
> +static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
> +{
> +	return !sp->shadow_mmu_page;
> +}
>   #else
>   static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
>   #endif


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 04/37] KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
@ 2022-12-12 23:15     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 23:15 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/8/22 20:38, David Matlack wrote:
> Invert the meaning of sp->tdp_mmu_page and rename it accordingly. This
> allows the TDP MMU code to not care about this field, which will be used
> in a subsequent commit to move the TDP MMU to common code.
> 
> No functional change intended.

Let's use a bit of the role instead.

Paolo

> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c          | 1 +
>   arch/x86/kvm/mmu/mmu_internal.h | 2 +-
>   arch/x86/kvm/mmu/tdp_mmu.c      | 3 ---
>   arch/x86/kvm/mmu/tdp_mmu.h      | 5 ++++-
>   4 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 355548603960..f7668a32721d 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2180,6 +2180,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>   
>   	sp->gfn = gfn;
>   	sp->role = role;
> +	sp->shadow_mmu_page = true;
>   	hlist_add_head(&sp->hash_link, sp_list);
>   	if (sp_has_gptes(sp))
>   		account_shadowed(kvm, sp);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index e32379c5b1ad..c1a379fba24d 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -52,7 +52,7 @@ struct kvm_mmu_page {
>   	struct list_head link;
>   	struct hlist_node hash_link;
>   
> -	bool tdp_mmu_page;
> +	bool shadow_mmu_page;
>   	bool unsync;
>   	u8 mmu_valid_gen;
>   
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 7ccac1aa8df6..fc0b87ceb1ea 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -133,8 +133,6 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
>   	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
>   		return;
>   
> -	WARN_ON(!is_tdp_mmu_page(root));
> -
>   	/*
>   	 * The root now has refcount=0.  It is valid, but readers already
>   	 * cannot acquire a reference to it because kvm_tdp_mmu_get_root()
> @@ -279,7 +277,6 @@ static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
>   	sp->role = role;
>   	sp->gfn = gfn;
>   	sp->ptep = sptep;
> -	sp->tdp_mmu_page = true;
>   
>   	trace_kvm_mmu_get_page(sp, true);
>   }
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index 0a63b1afabd3..18d3719f14ea 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -71,7 +71,10 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
>   					u64 *spte);
>   
>   #ifdef CONFIG_X86_64
> -static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
> +static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
> +{
> +	return !sp->shadow_mmu_page;
> +}
>   #else
>   static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
>   #endif


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 04/37] KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
@ 2022-12-12 23:15     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 23:15 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/8/22 20:38, David Matlack wrote:
> Invert the meaning of sp->tdp_mmu_page and rename it accordingly. This
> allows the TDP MMU code to not care about this field, which will be used
> in a subsequent commit to move the TDP MMU to common code.
> 
> No functional change intended.

Let's use a bit of the role instead.

Paolo

> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c          | 1 +
>   arch/x86/kvm/mmu/mmu_internal.h | 2 +-
>   arch/x86/kvm/mmu/tdp_mmu.c      | 3 ---
>   arch/x86/kvm/mmu/tdp_mmu.h      | 5 ++++-
>   4 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 355548603960..f7668a32721d 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2180,6 +2180,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>   
>   	sp->gfn = gfn;
>   	sp->role = role;
> +	sp->shadow_mmu_page = true;
>   	hlist_add_head(&sp->hash_link, sp_list);
>   	if (sp_has_gptes(sp))
>   		account_shadowed(kvm, sp);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index e32379c5b1ad..c1a379fba24d 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -52,7 +52,7 @@ struct kvm_mmu_page {
>   	struct list_head link;
>   	struct hlist_node hash_link;
>   
> -	bool tdp_mmu_page;
> +	bool shadow_mmu_page;
>   	bool unsync;
>   	u8 mmu_valid_gen;
>   
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 7ccac1aa8df6..fc0b87ceb1ea 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -133,8 +133,6 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
>   	if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
>   		return;
>   
> -	WARN_ON(!is_tdp_mmu_page(root));
> -
>   	/*
>   	 * The root now has refcount=0.  It is valid, but readers already
>   	 * cannot acquire a reference to it because kvm_tdp_mmu_get_root()
> @@ -279,7 +277,6 @@ static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
>   	sp->role = role;
>   	sp->gfn = gfn;
>   	sp->ptep = sptep;
> -	sp->tdp_mmu_page = true;
>   
>   	trace_kvm_mmu_get_page(sp, true);
>   }
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index 0a63b1afabd3..18d3719f14ea 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -71,7 +71,10 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
>   					u64 *spte);
>   
>   #ifdef CONFIG_X86_64
> -static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
> +static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp)
> +{
> +	return !sp->shadow_mmu_page;
> +}
>   #else
>   static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
>   #endif


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
  2022-12-12 22:54     ` Paolo Bonzini
  (?)
  (?)
@ 2022-12-12 23:26       ` Sean Christopherson
  -1 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-12 23:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Oliver Upton, David Matlack, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022, Paolo Bonzini wrote:
> On 12/9/22 20:07, Oliver Upton wrote:
> > >   - Naming. This series does not change the names of any existing code.
> > >     So all the KVM/x86 Shadow MMU-style terminology like
> > >     "shadow_page"/"sp"/"spte" persists. Should we keep that style in
> > >     common code or move toward something less shadow-paging-specific?
> > >     e.g. "page_table"/"pt"/"pte".
> > 
> > I would strongly be in favor of discarding the shadow paging residue if
> > x86 folks are willing to part ways with it 😄
> 
> Yes, absolutely.  Something like to_shadow_page->to_mmu_data, sp->md,
> spt->hpt, spte->spte, sptep->hptep.

"host" will be confusing if when the primary MMU is involved though, e.g. I always
think of the primary MMU's page tables as the "host page tables".

What if we keep the "S" for SPT(E) and repurpose it to mean Secondary PTE?

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-12 23:26       ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-12 23:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Oliver Upton, David Matlack, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022, Paolo Bonzini wrote:
> On 12/9/22 20:07, Oliver Upton wrote:
> > >   - Naming. This series does not change the names of any existing code.
> > >     So all the KVM/x86 Shadow MMU-style terminology like
> > >     "shadow_page"/"sp"/"spte" persists. Should we keep that style in
> > >     common code or move toward something less shadow-paging-specific?
> > >     e.g. "page_table"/"pt"/"pte".
> > 
> > I would strongly be in favor of discarding the shadow paging residue if
> > x86 folks are willing to part ways with it 😄
> 
> Yes, absolutely.  Something like to_shadow_page->to_mmu_data, sp->md,
> spt->hpt, spte->spte, sptep->hptep.

"host" will be confusing if when the primary MMU is involved though, e.g. I always
think of the primary MMU's page tables as the "host page tables".

What if we keep the "S" for SPT(E) and repurpose it to mean Secondary PTE?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-12 23:26       ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-12 23:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, xu xin, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, David Matlack, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Marc Zyngier,
	Andrew Morton

On Mon, Dec 12, 2022, Paolo Bonzini wrote:
> On 12/9/22 20:07, Oliver Upton wrote:
> > >   - Naming. This series does not change the names of any existing code.
> > >     So all the KVM/x86 Shadow MMU-style terminology like
> > >     "shadow_page"/"sp"/"spte" persists. Should we keep that style in
> > >     common code or move toward something less shadow-paging-specific?
> > >     e.g. "page_table"/"pt"/"pte".
> > 
> > I would strongly be in favor of discarding the shadow paging residue if
> > x86 folks are willing to part ways with it 😄
> 
> Yes, absolutely.  Something like to_shadow_page->to_mmu_data, sp->md,
> spt->hpt, spte->spte, sptep->hptep.

"host" will be confusing if when the primary MMU is involved though, e.g. I always
think of the primary MMU's page tables as the "host page tables".

What if we keep the "S" for SPT(E) and repurpose it to mean Secondary PTE?
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-12 23:26       ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-12 23:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Oliver Upton, David Matlack, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022, Paolo Bonzini wrote:
> On 12/9/22 20:07, Oliver Upton wrote:
> > >   - Naming. This series does not change the names of any existing code.
> > >     So all the KVM/x86 Shadow MMU-style terminology like
> > >     "shadow_page"/"sp"/"spte" persists. Should we keep that style in
> > >     common code or move toward something less shadow-paging-specific?
> > >     e.g. "page_table"/"pt"/"pte".
> > 
> > I would strongly be in favor of discarding the shadow paging residue if
> > x86 folks are willing to part ways with it 😄
> 
> Yes, absolutely.  Something like to_shadow_page->to_mmu_data, sp->md,
> spt->hpt, spte->spte, sptep->hptep.

"host" will be confusing if when the primary MMU is involved though, e.g. I always
think of the primary MMU's page tables as the "host page tables".

What if we keep the "S" for SPT(E) and repurpose it to mean Secondary PTE?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
  2022-12-12 23:26       ` Sean Christopherson
  (?)
  (?)
@ 2022-12-12 23:43         ` Paolo Bonzini
  -1 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 23:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, kvmarm,
	Nadav Amit, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, xu xin, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, David Matlack, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Marc Zyngier,
	Andrew Morton

On 12/13/22 00:26, Sean Christopherson wrote:
>>> I would strongly be in favor of discarding the shadow paging residue if
>>> x86 folks are willing to part ways with it 😄
>> Yes, absolutely.  Something like to_shadow_page->to_mmu_data, sp->md,
>> spt->hpt, spte->spte, sptep->hptep.
> "host" will be confusing if when the primary MMU is involved though, e.g. I always
> think of the primary MMU's page tables as the "host page tables".
> 
> What if we keep the "S" for SPT(E) and repurpose it to mean Secondary PTE?

Makes sense, so just to_shadow_page->to_secmmu_page?

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-12 23:43         ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 23:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Oliver Upton, David Matlack, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/13/22 00:26, Sean Christopherson wrote:
>>> I would strongly be in favor of discarding the shadow paging residue if
>>> x86 folks are willing to part ways with it 😄
>> Yes, absolutely.  Something like to_shadow_page->to_mmu_data, sp->md,
>> spt->hpt, spte->spte, sptep->hptep.
> "host" will be confusing if when the primary MMU is involved though, e.g. I always
> think of the primary MMU's page tables as the "host page tables".
> 
> What if we keep the "S" for SPT(E) and repurpose it to mean Secondary PTE?

Makes sense, so just to_shadow_page->to_secmmu_page?

Paolo


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-12 23:43         ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 23:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Oliver Upton, David Matlack, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/13/22 00:26, Sean Christopherson wrote:
>>> I would strongly be in favor of discarding the shadow paging residue if
>>> x86 folks are willing to part ways with it 😄
>> Yes, absolutely.  Something like to_shadow_page->to_mmu_data, sp->md,
>> spt->hpt, spte->spte, sptep->hptep.
> "host" will be confusing if when the primary MMU is involved though, e.g. I always
> think of the primary MMU's page tables as the "host page tables".
> 
> What if we keep the "S" for SPT(E) and repurpose it to mean Secondary PTE?

Makes sense, so just to_shadow_page->to_secmmu_page?

Paolo


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2022-12-12 23:43         ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2022-12-12 23:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Oliver Upton, David Matlack, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On 12/13/22 00:26, Sean Christopherson wrote:
>>> I would strongly be in favor of discarding the shadow paging residue if
>>> x86 folks are willing to part ways with it 😄
>> Yes, absolutely.  Something like to_shadow_page->to_mmu_data, sp->md,
>> spt->hpt, spte->spte, sptep->hptep.
> "host" will be confusing if when the primary MMU is involved though, e.g. I always
> think of the primary MMU's page tables as the "host page tables".
> 
> What if we keep the "S" for SPT(E) and repurpose it to mean Secondary PTE?

Makes sense, so just to_shadow_page->to_secmmu_page?

Paolo


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
  2022-12-12 21:05       ` David Matlack
  (?)
  (?)
@ 2022-12-13  1:02         ` Sean Christopherson
  -1 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-13  1:02 UTC (permalink / raw)
  To: David Matlack
  Cc: Ben Gardon, Paolo Bonzini, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Oliver Upton, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Mon, Dec 12, 2022, David Matlack wrote:
> On Mon, Dec 12, 2022 at 11:32 AM Ben Gardon <bgardon@google.com> wrote:
> >
> > On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
> > >
> > > Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
> > > function for computing the max level that a given GFN can be mapped in
> > > KVM's page tables. This will be used in a future commit to enable moving
> > > the TDP MMU to common code.
> > >
> > > Provide a default implementation for non-x86 architectures that just
> > > returns the max level. This will result in more zapping than necessary
> > > when disabling dirty logging (i.e. less than optimal performance) but no
> > > correctness issues.
> >
> > Apologies if you already implemented it in a later patch in this
> > series, but would it not at least be possible to port
> > host_pfn_mapping_level to common code and check that?
> > I'm assuming, though I could be wrong, that all archs map GFNs with at
> > most a host page table granularity mapping.
> > I suppose that doesn't strictly need to be included in this series,
> > but it would be worth addressing in the commit description.
> 
> It's not implemented later in this series, but I agree it's something
> we should do. In fact, it's worth doing regardless of this series as a
> way to share more code across architectures (e.g. KVM/ARM has it's own
> version in arch/arm64/kvm/mmu.c:get_user_mapping_size()).

Ya, ARM converted to walking the host user page tables largely in response to
x86's conversion.  After x86 switched, ARM was left holding the bag that was
PageTransCompoundMap().

On a related topic, I'm guessing all the comments in transparent_hugepage_adjust()
about the code working _only_ for THP are stale.  Unless ARM support for HugeTLB
works differently, walking host user page tables should Just Work for all hugepage
types.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
@ 2022-12-13  1:02         ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-13  1:02 UTC (permalink / raw)
  To: David Matlack
  Cc: Ben Gardon, Paolo Bonzini, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Oliver Upton, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Mon, Dec 12, 2022, David Matlack wrote:
> On Mon, Dec 12, 2022 at 11:32 AM Ben Gardon <bgardon@google.com> wrote:
> >
> > On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
> > >
> > > Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
> > > function for computing the max level that a given GFN can be mapped in
> > > KVM's page tables. This will be used in a future commit to enable moving
> > > the TDP MMU to common code.
> > >
> > > Provide a default implementation for non-x86 architectures that just
> > > returns the max level. This will result in more zapping than necessary
> > > when disabling dirty logging (i.e. less than optimal performance) but no
> > > correctness issues.
> >
> > Apologies if you already implemented it in a later patch in this
> > series, but would it not at least be possible to port
> > host_pfn_mapping_level to common code and check that?
> > I'm assuming, though I could be wrong, that all archs map GFNs with at
> > most a host page table granularity mapping.
> > I suppose that doesn't strictly need to be included in this series,
> > but it would be worth addressing in the commit description.
> 
> It's not implemented later in this series, but I agree it's something
> we should do. In fact, it's worth doing regardless of this series as a
> way to share more code across architectures (e.g. KVM/ARM has it's own
> version in arch/arm64/kvm/mmu.c:get_user_mapping_size()).

Ya, ARM converted to walking the host user page tables largely in response to
x86's conversion.  After x86 switched, ARM was left holding the bag that was
PageTransCompoundMap().

On a related topic, I'm guessing all the comments in transparent_hugepage_adjust()
about the code working _only_ for THP are stale.  Unless ARM support for HugeTLB
works differently, walking host user page tables should Just Work for all hugepage
types.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
@ 2022-12-13  1:02         ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-13  1:02 UTC (permalink / raw)
  To: David Matlack
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, Ben Gardon, linux-riscv, kvmarm, Yu Zhao,
	Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Paolo Bonzini,
	Andrew Morton

On Mon, Dec 12, 2022, David Matlack wrote:
> On Mon, Dec 12, 2022 at 11:32 AM Ben Gardon <bgardon@google.com> wrote:
> >
> > On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
> > >
> > > Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
> > > function for computing the max level that a given GFN can be mapped in
> > > KVM's page tables. This will be used in a future commit to enable moving
> > > the TDP MMU to common code.
> > >
> > > Provide a default implementation for non-x86 architectures that just
> > > returns the max level. This will result in more zapping than necessary
> > > when disabling dirty logging (i.e. less than optimal performance) but no
> > > correctness issues.
> >
> > Apologies if you already implemented it in a later patch in this
> > series, but would it not at least be possible to port
> > host_pfn_mapping_level to common code and check that?
> > I'm assuming, though I could be wrong, that all archs map GFNs with at
> > most a host page table granularity mapping.
> > I suppose that doesn't strictly need to be included in this series,
> > but it would be worth addressing in the commit description.
> 
> It's not implemented later in this series, but I agree it's something
> we should do. In fact, it's worth doing regardless of this series as a
> way to share more code across architectures (e.g. KVM/ARM has it's own
> version in arch/arm64/kvm/mmu.c:get_user_mapping_size()).

Ya, ARM converted to walking the host user page tables largely in response to
x86's conversion.  After x86 switched, ARM was left holding the bag that was
PageTransCompoundMap().

On a related topic, I'm guessing all the comments in transparent_hugepage_adjust()
about the code working _only_ for THP are stale.  Unless ARM support for HugeTLB
works differently, walking host user page tables should Just Work for all hugepage
types.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level
@ 2022-12-13  1:02         ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-13  1:02 UTC (permalink / raw)
  To: David Matlack
  Cc: Ben Gardon, Paolo Bonzini, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Oliver Upton, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Mingwei Zhang, Krish Sadhukhan, Ricardo Koller, Jing Zhang,
	linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm, kvm-riscv,
	linux-riscv

On Mon, Dec 12, 2022, David Matlack wrote:
> On Mon, Dec 12, 2022 at 11:32 AM Ben Gardon <bgardon@google.com> wrote:
> >
> > On Thu, Dec 8, 2022 at 11:39 AM David Matlack <dmatlack@google.com> wrote:
> > >
> > > Abstract away kvm_mmu_max_mapping_level(), which is an x86-specific
> > > function for computing the max level that a given GFN can be mapped in
> > > KVM's page tables. This will be used in a future commit to enable moving
> > > the TDP MMU to common code.
> > >
> > > Provide a default implementation for non-x86 architectures that just
> > > returns the max level. This will result in more zapping than necessary
> > > when disabling dirty logging (i.e. less than optimal performance) but no
> > > correctness issues.
> >
> > Apologies if you already implemented it in a later patch in this
> > series, but would it not at least be possible to port
> > host_pfn_mapping_level to common code and check that?
> > I'm assuming, though I could be wrong, that all archs map GFNs with at
> > most a host page table granularity mapping.
> > I suppose that doesn't strictly need to be included in this series,
> > but it would be worth addressing in the commit description.
> 
> It's not implemented later in this series, but I agree it's something
> we should do. In fact, it's worth doing regardless of this series as a
> way to share more code across architectures (e.g. KVM/ARM has it's own
> version in arch/arm64/kvm/mmu.c:get_user_mapping_size()).

Ya, ARM converted to walking the host user page tables largely in response to
x86's conversion.  After x86 switched, ARM was left holding the bag that was
PageTransCompoundMap().

On a related topic, I'm guessing all the comments in transparent_hugepage_adjust()
about the code working _only_ for THP are stale.  Unless ARM support for HugeTLB
works differently, walking host user page tables should Just Work for all hugepage
types.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
  2022-12-12 23:11     ` Paolo Bonzini
  (?)
  (?)
@ 2022-12-13  1:06       ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-13  1:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 3:11 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/8/22 20:38, David Matlack wrote:
> > +/*
> > + * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> > + * also includes TDP pages) to determine whether or not a page can be used in
> > + * the given MMU context.
> > + */
> > +union kvm_mmu_page_role {
> > +     u32 word;
> > +     struct {
> > +             struct {
> > +                     /* The address space ID mapped by the page. */
> > +                     u16 as_id:8;
> > +
> > +                     /* The level of the page in the page table hierarchy. */
> > +                     u16 level:4;
> > +
> > +                     /* Whether the page is invalid, i.e. pending destruction. */
> > +                     u16 invalid:1;
> > +             };
> > +
> > +             /* Architecture-specific properties. */
> > +             struct kvm_mmu_page_role_arch arch;
> > +     };
> > +};
> > +
>
> Have you considered adding a tdp_mmu:1 field to the arch-independent
> part?  I think that as long as _that_ field is the same, there's no need
> to have any overlap between TDP MMU and shadow MMU roles.
>
> I'm not even sure if the x86 TDP MMU needs _any_ other role bit.  It
> needs of course the above three, and it also needs "direct" but it is
> used exactly to mean "is this a TDP MMU page".  So we could have
>
> union {
>         struct {
>                 u32 tdp_mmu:1;
>                 u32 invalid:1;
>                 u32 :6;
>                 u32 level:8;
>                 u32 arch:8;
>                 u32 :8;
>         } tdp;
>         /* the first field must be "u32 tdp_mmu:1;" */
>         struct kvm_mmu_page_role_arch shadow;

We could but then that prevents having common fields between the
Shadow MMU and TDP MMU. For example, make_spte() and
make_huge_page_split_spte() use sp->role.level regardless of TDP or
Shadow MMU, and is_obsolete_sp() uses sp->role.invalid. Plus then you
need the `arch:8` byte for SMM.

It's possible to make it work, but I don't see what the benefit would be.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
@ 2022-12-13  1:06       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-13  1:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 3:11 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/8/22 20:38, David Matlack wrote:
> > +/*
> > + * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> > + * also includes TDP pages) to determine whether or not a page can be used in
> > + * the given MMU context.
> > + */
> > +union kvm_mmu_page_role {
> > +     u32 word;
> > +     struct {
> > +             struct {
> > +                     /* The address space ID mapped by the page. */
> > +                     u16 as_id:8;
> > +
> > +                     /* The level of the page in the page table hierarchy. */
> > +                     u16 level:4;
> > +
> > +                     /* Whether the page is invalid, i.e. pending destruction. */
> > +                     u16 invalid:1;
> > +             };
> > +
> > +             /* Architecture-specific properties. */
> > +             struct kvm_mmu_page_role_arch arch;
> > +     };
> > +};
> > +
>
> Have you considered adding a tdp_mmu:1 field to the arch-independent
> part?  I think that as long as _that_ field is the same, there's no need
> to have any overlap between TDP MMU and shadow MMU roles.
>
> I'm not even sure if the x86 TDP MMU needs _any_ other role bit.  It
> needs of course the above three, and it also needs "direct" but it is
> used exactly to mean "is this a TDP MMU page".  So we could have
>
> union {
>         struct {
>                 u32 tdp_mmu:1;
>                 u32 invalid:1;
>                 u32 :6;
>                 u32 level:8;
>                 u32 arch:8;
>                 u32 :8;
>         } tdp;
>         /* the first field must be "u32 tdp_mmu:1;" */
>         struct kvm_mmu_page_role_arch shadow;

We could but then that prevents having common fields between the
Shadow MMU and TDP MMU. For example, make_spte() and
make_huge_page_split_spte() use sp->role.level regardless of TDP or
Shadow MMU, and is_obsolete_sp() uses sp->role.invalid. Plus then you
need the `arch:8` byte for SMM.

It's possible to make it work, but I don't see what the benefit would be.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
@ 2022-12-13  1:06       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-13  1:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Nadav Amit,
	Colin Cross, Ben Gardon, linux-riscv, kvmarm, Yu Zhao, xu xin,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, Arnd Bergmann, Liam R. Howlett, kvm,
	Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Marc Zyngier,
	Andrew Morton

On Mon, Dec 12, 2022 at 3:11 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/8/22 20:38, David Matlack wrote:
> > +/*
> > + * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> > + * also includes TDP pages) to determine whether or not a page can be used in
> > + * the given MMU context.
> > + */
> > +union kvm_mmu_page_role {
> > +     u32 word;
> > +     struct {
> > +             struct {
> > +                     /* The address space ID mapped by the page. */
> > +                     u16 as_id:8;
> > +
> > +                     /* The level of the page in the page table hierarchy. */
> > +                     u16 level:4;
> > +
> > +                     /* Whether the page is invalid, i.e. pending destruction. */
> > +                     u16 invalid:1;
> > +             };
> > +
> > +             /* Architecture-specific properties. */
> > +             struct kvm_mmu_page_role_arch arch;
> > +     };
> > +};
> > +
>
> Have you considered adding a tdp_mmu:1 field to the arch-independent
> part?  I think that as long as _that_ field is the same, there's no need
> to have any overlap between TDP MMU and shadow MMU roles.
>
> I'm not even sure if the x86 TDP MMU needs _any_ other role bit.  It
> needs of course the above three, and it also needs "direct" but it is
> used exactly to mean "is this a TDP MMU page".  So we could have
>
> union {
>         struct {
>                 u32 tdp_mmu:1;
>                 u32 invalid:1;
>                 u32 :6;
>                 u32 level:8;
>                 u32 arch:8;
>                 u32 :8;
>         } tdp;
>         /* the first field must be "u32 tdp_mmu:1;" */
>         struct kvm_mmu_page_role_arch shadow;

We could but then that prevents having common fields between the
Shadow MMU and TDP MMU. For example, make_spte() and
make_huge_page_split_spte() use sp->role.level regardless of TDP or
Shadow MMU, and is_obsolete_sp() uses sp->role.invalid. Plus then you
need the `arch:8` byte for SMM.

It's possible to make it work, but I don't see what the benefit would be.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code
@ 2022-12-13  1:06       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-13  1:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 3:11 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/8/22 20:38, David Matlack wrote:
> > +/*
> > + * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
> > + * also includes TDP pages) to determine whether or not a page can be used in
> > + * the given MMU context.
> > + */
> > +union kvm_mmu_page_role {
> > +     u32 word;
> > +     struct {
> > +             struct {
> > +                     /* The address space ID mapped by the page. */
> > +                     u16 as_id:8;
> > +
> > +                     /* The level of the page in the page table hierarchy. */
> > +                     u16 level:4;
> > +
> > +                     /* Whether the page is invalid, i.e. pending destruction. */
> > +                     u16 invalid:1;
> > +             };
> > +
> > +             /* Architecture-specific properties. */
> > +             struct kvm_mmu_page_role_arch arch;
> > +     };
> > +};
> > +
>
> Have you considered adding a tdp_mmu:1 field to the arch-independent
> part?  I think that as long as _that_ field is the same, there's no need
> to have any overlap between TDP MMU and shadow MMU roles.
>
> I'm not even sure if the x86 TDP MMU needs _any_ other role bit.  It
> needs of course the above three, and it also needs "direct" but it is
> used exactly to mean "is this a TDP MMU page".  So we could have
>
> union {
>         struct {
>                 u32 tdp_mmu:1;
>                 u32 invalid:1;
>                 u32 :6;
>                 u32 level:8;
>                 u32 arch:8;
>                 u32 :8;
>         } tdp;
>         /* the first field must be "u32 tdp_mmu:1;" */
>         struct kvm_mmu_page_role_arch shadow;

We could but then that prevents having common fields between the
Shadow MMU and TDP MMU. For example, make_spte() and
make_huge_page_split_spte() use sp->role.level regardless of TDP or
Shadow MMU, and is_obsolete_sp() uses sp->role.invalid. Plus then you
need the `arch:8` byte for SMM.

It's possible to make it work, but I don't see what the benefit would be.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  2022-12-12 18:17             ` Oliver Upton
  (?)
  (?)
@ 2022-12-13  1:11               ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-13  1:11 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Sean Christopherson, Yang, Weijiang, Paolo Bonzini, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 06:17:36PM +0000, Oliver Upton wrote:
> On Mon, Dec 12, 2022 at 05:39:38PM +0000, Sean Christopherson wrote:
> > On Fri, Dec 09, 2022, David Matlack wrote:
> > > On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> > My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> > address spaces to support SMM emulation was a mistake that should be contained to
> > SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> > even x86 can opt out.
> 
> +1
> 
> I don't think something is architecture-neutral by virtue of it existing
> in virt/kvm/*.

Put another way, just because something exists in virt/kvm/* doesn't
mean it is used (or will be useful) to more than one architecture.
Totally agree.  In this case, there never turned out to be any other
usecases for memslot address spaces.

As for role.arch.smm vs role.as_id, I'll post my response on the other
thread with Paolo. Juggling these threads is hard.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-13  1:11               ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-13  1:11 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Sean Christopherson, Yang, Weijiang, Paolo Bonzini, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 06:17:36PM +0000, Oliver Upton wrote:
> On Mon, Dec 12, 2022 at 05:39:38PM +0000, Sean Christopherson wrote:
> > On Fri, Dec 09, 2022, David Matlack wrote:
> > > On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> > My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> > address spaces to support SMM emulation was a mistake that should be contained to
> > SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> > even x86 can opt out.
> 
> +1
> 
> I don't think something is architecture-neutral by virtue of it existing
> in virt/kvm/*.

Put another way, just because something exists in virt/kvm/* doesn't
mean it is used (or will be useful) to more than one architecture.
Totally agree.  In this case, there never turned out to be any other
usecases for memslot address spaces.

As for role.arch.smm vs role.as_id, I'll post my response on the other
thread with Paolo. Juggling these threads is hard.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-13  1:11               ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-13  1:11 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Yang, Weijiang,
	Amit, Nadav, Ben Gardon, linux-riscv, kvmarm, Yu Zhao,
	Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, Colin Cross, kvm-riscv,
	Marc Zyngier, Paolo Bonzini, Andrew Morton

On Mon, Dec 12, 2022 at 06:17:36PM +0000, Oliver Upton wrote:
> On Mon, Dec 12, 2022 at 05:39:38PM +0000, Sean Christopherson wrote:
> > On Fri, Dec 09, 2022, David Matlack wrote:
> > > On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> > My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> > address spaces to support SMM emulation was a mistake that should be contained to
> > SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> > even x86 can opt out.
> 
> +1
> 
> I don't think something is architecture-neutral by virtue of it existing
> in virt/kvm/*.

Put another way, just because something exists in virt/kvm/* doesn't
mean it is used (or will be useful) to more than one architecture.
Totally agree.  In this case, there never turned out to be any other
usecases for memslot address spaces.

As for role.arch.smm vs role.as_id, I'll post my response on the other
thread with Paolo. Juggling these threads is hard.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-13  1:11               ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-13  1:11 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Sean Christopherson, Yang, Weijiang, Paolo Bonzini, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 06:17:36PM +0000, Oliver Upton wrote:
> On Mon, Dec 12, 2022 at 05:39:38PM +0000, Sean Christopherson wrote:
> > On Fri, Dec 09, 2022, David Matlack wrote:
> > > On Fri, Dec 9, 2022 at 9:25 AM Oliver Upton <oliver.upton@linux.dev> wrote:
> > My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> > address spaces to support SMM emulation was a mistake that should be contained to
> > SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> > even x86 can opt out.
> 
> +1
> 
> I don't think something is architecture-neutral by virtue of it existing
> in virt/kvm/*.

Put another way, just because something exists in virt/kvm/* doesn't
mean it is used (or will be useful) to more than one architecture.
Totally agree.  In this case, there never turned out to be any other
usecases for memslot address spaces.

As for role.arch.smm vs role.as_id, I'll post my response on the other
thread with Paolo. Juggling these threads is hard.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  2022-12-12 22:50             ` Paolo Bonzini
  (?)
  (?)
@ 2022-12-13  1:18               ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-13  1:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Oliver Upton, Yang, Weijiang, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 11:50:29PM +0100, Paolo Bonzini wrote:
> On 12/12/22 18:39, Sean Christopherson wrote:
> > > The notion of address spaces is already existing architecture-neutral
> > > concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
> > > virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
> > 
> > Yes, SMM is currently the only use-case.
> 
> It's possible that in the future Hyper-V VTLs will also have per-level
> protections.  It wouldn't use as_id, but it would likely be recorded in the
> upper byte of the role.
> 
> I'm not sure if Microsoft intends to port those to ARM as well.
> 
> > My preference would be to leave .smm in x86's page role
> 
> What about defining a byte of arch_role and a macro to build it?

Both would work. I went with as_id in the common role since that's how
it's encoded in kvm_memory_slot and because, not matter what, the TDP
MMU still has to handle multiple address spaces. i.e. Even if we hide
SMM away in the role, the TDP MMU still has to access it with some
wrapper e.g.  kvm_mmu_page_as_id() (that would just return 0 outside of
x86). From that perspective, just having as_id directly in the common
role seemed like the cleanest option.

The only way to truly shield the TDP MMU from SMM would be to disallow
it. e.g. Disable the TDP MMU if defined(CONFIG_KVM_SMM), or something
similar. But I don't know enough about how KVM SMM support is used to
say if that's even worth entertaining.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-13  1:18               ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-13  1:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Oliver Upton, Yang, Weijiang, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 11:50:29PM +0100, Paolo Bonzini wrote:
> On 12/12/22 18:39, Sean Christopherson wrote:
> > > The notion of address spaces is already existing architecture-neutral
> > > concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
> > > virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
> > 
> > Yes, SMM is currently the only use-case.
> 
> It's possible that in the future Hyper-V VTLs will also have per-level
> protections.  It wouldn't use as_id, but it would likely be recorded in the
> upper byte of the role.
> 
> I'm not sure if Microsoft intends to port those to ARM as well.
> 
> > My preference would be to leave .smm in x86's page role
> 
> What about defining a byte of arch_role and a macro to build it?

Both would work. I went with as_id in the common role since that's how
it's encoded in kvm_memory_slot and because, not matter what, the TDP
MMU still has to handle multiple address spaces. i.e. Even if we hide
SMM away in the role, the TDP MMU still has to access it with some
wrapper e.g.  kvm_mmu_page_as_id() (that would just return 0 outside of
x86). From that perspective, just having as_id directly in the common
role seemed like the cleanest option.

The only way to truly shield the TDP MMU from SMM would be to disallow
it. e.g. Disable the TDP MMU if defined(CONFIG_KVM_SMM), or something
similar. But I don't know enough about how KVM SMM support is used to
say if that's even worth entertaining.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-13  1:18               ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-13  1:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Yang, Weijiang,
	Amit, Nadav, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, kvmarm, Suren Baghdasaryan, Vlastimil Babka,
	linux-arm-kernel, linux-mips, kvm-riscv, Marc Zyngier,
	Andrew Morton

On Mon, Dec 12, 2022 at 11:50:29PM +0100, Paolo Bonzini wrote:
> On 12/12/22 18:39, Sean Christopherson wrote:
> > > The notion of address spaces is already existing architecture-neutral
> > > concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
> > > virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
> > 
> > Yes, SMM is currently the only use-case.
> 
> It's possible that in the future Hyper-V VTLs will also have per-level
> protections.  It wouldn't use as_id, but it would likely be recorded in the
> upper byte of the role.
> 
> I'm not sure if Microsoft intends to port those to ARM as well.
> 
> > My preference would be to leave .smm in x86's page role
> 
> What about defining a byte of arch_role and a macro to build it?

Both would work. I went with as_id in the common role since that's how
it's encoded in kvm_memory_slot and because, not matter what, the TDP
MMU still has to handle multiple address spaces. i.e. Even if we hide
SMM away in the role, the TDP MMU still has to access it with some
wrapper e.g.  kvm_mmu_page_as_id() (that would just return 0 outside of
x86). From that perspective, just having as_id directly in the common
role seemed like the cleanest option.

The only way to truly shield the TDP MMU from SMM would be to disallow
it. e.g. Disable the TDP MMU if defined(CONFIG_KVM_SMM), or something
similar. But I don't know enough about how KVM SMM support is used to
say if that's even worth entertaining.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-13  1:18               ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2022-12-13  1:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Oliver Upton, Yang, Weijiang, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 11:50:29PM +0100, Paolo Bonzini wrote:
> On 12/12/22 18:39, Sean Christopherson wrote:
> > > The notion of address spaces is already existing architecture-neutral
> > > concept in KVM (e.g. see uses of KVM_ADDRESS_SPACE_NUM in
> > > virt/kvm/kvm_main.c), although SMM is the only use-case I'm aware of.
> > 
> > Yes, SMM is currently the only use-case.
> 
> It's possible that in the future Hyper-V VTLs will also have per-level
> protections.  It wouldn't use as_id, but it would likely be recorded in the
> upper byte of the role.
> 
> I'm not sure if Microsoft intends to port those to ARM as well.
> 
> > My preference would be to leave .smm in x86's page role
> 
> What about defining a byte of arch_role and a macro to build it?

Both would work. I went with as_id in the common role since that's how
it's encoded in kvm_memory_slot and because, not matter what, the TDP
MMU still has to handle multiple address spaces. i.e. Even if we hide
SMM away in the role, the TDP MMU still has to access it with some
wrapper e.g.  kvm_mmu_page_as_id() (that would just return 0 outside of
x86). From that perspective, just having as_id directly in the common
role seemed like the cleanest option.

The only way to truly shield the TDP MMU from SMM would be to disallow
it. e.g. Disable the TDP MMU if defined(CONFIG_KVM_SMM), or something
similar. But I don't know enough about how KVM SMM support is used to
say if that's even worth entertaining.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  2022-12-12 22:50             ` Paolo Bonzini
  (?)
  (?)
@ 2022-12-13  1:42               ` Sean Christopherson
  -1 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-13  1:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: David Matlack, Oliver Upton, Yang, Weijiang, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022, Paolo Bonzini wrote:
> On 12/12/22 18:39, Sean Christopherson wrote:
> > My preference would be to leave .smm in x86's page role
> 
> What about defining a byte of arch_role and a macro to build it?

I think David already carved out a big chunk for arch role bits, my objection
was purely to making as_id a generic 8-bit role.

> > Unless I'm missing something, e.g. a need to map GPAs differently for
> > SMM vs. non-SMM, SMM could have been implemented with a simple flag
> > in a memslot to mark the memslot as SMM-only.
> 
> Unfortunately not, because there can be another region (for example video
> RAM at 0A0000h) underneath SMRAM.

Ugh, it's even a very explicitly documented "feature".

  When compatible SMM space is enabled, SMM-mode CBO accesses to this range route
  to physical system DRAM at 00_000A_0h – 00_000B_FFFFh.
  
  Non-SMM mode CBO accesses to this range are considered to be to the Video Buffer
  Area as described above. PCI Express* and DMI originated cycles to SMM space are not
  supported and are considered to be to the Video Buffer Area.

I also forgot KVM supports SMBASE relocation :-(

> In fact, KVM_MEM_X86_SMRAM was the first idea.  It was uglier than multiple
> address spaces (https://lore.kernel.org/lkml/1431084034-8425-1-git-send-email-pbonzini@redhat.com).
> In retrospect there were probably ways to mix the best of the two designs,
> but it wasn't generic enough.
>
> > And separate address spaces become truly nasty if the CPU can access multiple
> > protected regions, e.g. if the CPU can access type X and type Y at the same time,
> > then there would need to be memslots for "regular", X, Y, and X+Y.
> 
> Without a usecase that's hard to say.  It's just as possible that there
> would be a natural hierarchy of levels.

Ah, true.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-13  1:42               ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-13  1:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: David Matlack, Oliver Upton, Yang, Weijiang, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022, Paolo Bonzini wrote:
> On 12/12/22 18:39, Sean Christopherson wrote:
> > My preference would be to leave .smm in x86's page role
> 
> What about defining a byte of arch_role and a macro to build it?

I think David already carved out a big chunk for arch role bits, my objection
was purely to making as_id a generic 8-bit role.

> > Unless I'm missing something, e.g. a need to map GPAs differently for
> > SMM vs. non-SMM, SMM could have been implemented with a simple flag
> > in a memslot to mark the memslot as SMM-only.
> 
> Unfortunately not, because there can be another region (for example video
> RAM at 0A0000h) underneath SMRAM.

Ugh, it's even a very explicitly documented "feature".

  When compatible SMM space is enabled, SMM-mode CBO accesses to this range route
  to physical system DRAM at 00_000A_0h – 00_000B_FFFFh.
  
  Non-SMM mode CBO accesses to this range are considered to be to the Video Buffer
  Area as described above. PCI Express* and DMI originated cycles to SMM space are not
  supported and are considered to be to the Video Buffer Area.

I also forgot KVM supports SMBASE relocation :-(

> In fact, KVM_MEM_X86_SMRAM was the first idea.  It was uglier than multiple
> address spaces (https://lore.kernel.org/lkml/1431084034-8425-1-git-send-email-pbonzini@redhat.com).
> In retrospect there were probably ways to mix the best of the two designs,
> but it wasn't generic enough.
>
> > And separate address spaces become truly nasty if the CPU can access multiple
> > protected regions, e.g. if the CPU can access type X and type Y at the same time,
> > then there would need to be memslots for "regular", X, Y, and X+Y.
> 
> Without a usecase that's hard to say.  It's just as possible that there
> would be a natural hierarchy of levels.

Ah, true.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-13  1:42               ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-13  1:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Yang, Weijiang,
	Amit, Nadav, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, kvmarm, linux-mips, kvm-riscv,
	Andrew Morton

On Mon, Dec 12, 2022, Paolo Bonzini wrote:
> On 12/12/22 18:39, Sean Christopherson wrote:
> > My preference would be to leave .smm in x86's page role
> 
> What about defining a byte of arch_role and a macro to build it?

I think David already carved out a big chunk for arch role bits, my objection
was purely to making as_id a generic 8-bit role.

> > Unless I'm missing something, e.g. a need to map GPAs differently for
> > SMM vs. non-SMM, SMM could have been implemented with a simple flag
> > in a memslot to mark the memslot as SMM-only.
> 
> Unfortunately not, because there can be another region (for example video
> RAM at 0A0000h) underneath SMRAM.

Ugh, it's even a very explicitly documented "feature".

  When compatible SMM space is enabled, SMM-mode CBO accesses to this range route
  to physical system DRAM at 00_000A_0h – 00_000B_FFFFh.
  
  Non-SMM mode CBO accesses to this range are considered to be to the Video Buffer
  Area as described above. PCI Express* and DMI originated cycles to SMM space are not
  supported and are considered to be to the Video Buffer Area.

I also forgot KVM supports SMBASE relocation :-(

> In fact, KVM_MEM_X86_SMRAM was the first idea.  It was uglier than multiple
> address spaces (https://lore.kernel.org/lkml/1431084034-8425-1-git-send-email-pbonzini@redhat.com).
> In retrospect there were probably ways to mix the best of the two designs,
> but it wasn't generic enough.
>
> > And separate address spaces become truly nasty if the CPU can access multiple
> > protected regions, e.g. if the CPU can access type X and type Y at the same time,
> > then there would need to be memslots for "regular", X, Y, and X+Y.
> 
> Without a usecase that's hard to say.  It's just as possible that there
> would be a natural hierarchy of levels.

Ah, true.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-13  1:42               ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-13  1:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: David Matlack, Oliver Upton, Yang, Weijiang, Marc Zyngier,
	James Morse, Alexandru Elisei, Suzuki K Poulose, Huacai Chen,
	Aleksandar Markovic, Anup Patel, Atish Patra, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Andrew Morton, Anshuman Khandual,
	Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022, Paolo Bonzini wrote:
> On 12/12/22 18:39, Sean Christopherson wrote:
> > My preference would be to leave .smm in x86's page role
> 
> What about defining a byte of arch_role and a macro to build it?

I think David already carved out a big chunk for arch role bits, my objection
was purely to making as_id a generic 8-bit role.

> > Unless I'm missing something, e.g. a need to map GPAs differently for
> > SMM vs. non-SMM, SMM could have been implemented with a simple flag
> > in a memslot to mark the memslot as SMM-only.
> 
> Unfortunately not, because there can be another region (for example video
> RAM at 0A0000h) underneath SMRAM.

Ugh, it's even a very explicitly documented "feature".

  When compatible SMM space is enabled, SMM-mode CBO accesses to this range route
  to physical system DRAM at 00_000A_0h – 00_000B_FFFFh.
  
  Non-SMM mode CBO accesses to this range are considered to be to the Video Buffer
  Area as described above. PCI Express* and DMI originated cycles to SMM space are not
  supported and are considered to be to the Video Buffer Area.

I also forgot KVM supports SMBASE relocation :-(

> In fact, KVM_MEM_X86_SMRAM was the first idea.  It was uglier than multiple
> address spaces (https://lore.kernel.org/lkml/1431084034-8425-1-git-send-email-pbonzini@redhat.com).
> In retrospect there were probably ways to mix the best of the two designs,
> but it wasn't generic enough.
>
> > And separate address spaces become truly nasty if the CPU can access multiple
> > protected regions, e.g. if the CPU can access type X and type Y at the same time,
> > then there would need to be memslots for "regular", X, Y, and X+Y.
> 
> Without a usecase that's hard to say.  It's just as possible that there
> would be a natural hierarchy of levels.

Ah, true.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  2022-12-12 17:39           ` Sean Christopherson
  (?)
  (?)
@ 2022-12-14  9:50             ` Lai Jiangshan
  -1 siblings, 0 replies; 317+ messages in thread
From: Lai Jiangshan @ 2022-12-14  9:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Matlack, Oliver Upton, Yang, Weijiang, Paolo Bonzini,
	Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrew Morton,
	Anshuman Khandual, Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote:

>
> My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> address spaces to support SMM emulation was a mistake that should be contained to
> SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> even x86 can opt out.
>


I think the name ASID in kvm/x86 should be used for vmcb's ASID,
vmcs's VPID, and PCID. Using the name ASID for other purposes
would only result in unnecessary confusion.

There is a bug for shadow paging when it uses two separate sets
of memslots which are using two sets of rmap and page-tracking.

When SMM world is writing to a non-SMM page which happens to be
a guest pagetable in the non-SMM world, the write operation will
go smoothly without specially handled and the shadow page for the guest
pagetable is neither unshadowed nor marked unsync.  The shadow paging
code is unaware that the shadow page has deviated from the guest
pagetable.

It means when SMM is enabled, shadow paging should be disabled,
which also means it has to use tdp and not to use nested tdp.

Thanks
Lai

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-14  9:50             ` Lai Jiangshan
  0 siblings, 0 replies; 317+ messages in thread
From: Lai Jiangshan @ 2022-12-14  9:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Matlack, Oliver Upton, Yang, Weijiang, Paolo Bonzini,
	Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrew Morton,
	Anshuman Khandual, Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote:

>
> My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> address spaces to support SMM emulation was a mistake that should be contained to
> SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> even x86 can opt out.
>


I think the name ASID in kvm/x86 should be used for vmcb's ASID,
vmcs's VPID, and PCID. Using the name ASID for other purposes
would only result in unnecessary confusion.

There is a bug for shadow paging when it uses two separate sets
of memslots which are using two sets of rmap and page-tracking.

When SMM world is writing to a non-SMM page which happens to be
a guest pagetable in the non-SMM world, the write operation will
go smoothly without specially handled and the shadow page for the guest
pagetable is neither unshadowed nor marked unsync.  The shadow paging
code is unaware that the shadow page has deviated from the guest
pagetable.

It means when SMM is enabled, shadow paging should be disabled,
which also means it has to use tdp and not to use nested tdp.

Thanks
Lai

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-14  9:50             ` Lai Jiangshan
  0 siblings, 0 replies; 317+ messages in thread
From: Lai Jiangshan @ 2022-12-14  9:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Matlack, Oliver Upton, Yang, Weijiang, Paolo Bonzini,
	Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrew Morton,
	Anshuman Khandual, Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote:

>
> My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> address spaces to support SMM emulation was a mistake that should be contained to
> SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> even x86 can opt out.
>


I think the name ASID in kvm/x86 should be used for vmcb's ASID,
vmcs's VPID, and PCID. Using the name ASID for other purposes
would only result in unnecessary confusion.

There is a bug for shadow paging when it uses two separate sets
of memslots which are using two sets of rmap and page-tracking.

When SMM world is writing to a non-SMM page which happens to be
a guest pagetable in the non-SMM world, the write operation will
go smoothly without specially handled and the shadow page for the guest
pagetable is neither unshadowed nor marked unsync.  The shadow paging
code is unaware that the shadow page has deviated from the guest
pagetable.

It means when SMM is enabled, shadow paging should be disabled,
which also means it has to use tdp and not to use nested tdp.

Thanks
Lai

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-14  9:50             ` Lai Jiangshan
  0 siblings, 0 replies; 317+ messages in thread
From: Lai Jiangshan @ 2022-12-14  9:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Yang, Weijiang,
	Amit, Nadav, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, kvmarm, linux-mips, kvm-riscv,
	Paolo Bonzini, Andrew Morton

On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote:

>
> My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> address spaces to support SMM emulation was a mistake that should be contained to
> SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> even x86 can opt out.
>


I think the name ASID in kvm/x86 should be used for vmcb's ASID,
vmcs's VPID, and PCID. Using the name ASID for other purposes
would only result in unnecessary confusion.

There is a bug for shadow paging when it uses two separate sets
of memslots which are using two sets of rmap and page-tracking.

When SMM world is writing to a non-SMM page which happens to be
a guest pagetable in the non-SMM world, the write operation will
go smoothly without specially handled and the shadow page for the guest
pagetable is neither unshadowed nor marked unsync.  The shadow paging
code is unaware that the shadow page has deviated from the guest
pagetable.

It means when SMM is enabled, shadow paging should be disabled,
which also means it has to use tdp and not to use nested tdp.

Thanks
Lai
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  2022-12-14  9:50             ` Lai Jiangshan
  (?)
  (?)
@ 2022-12-14 19:42               ` Sean Christopherson
  -1 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-14 19:42 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: David Matlack, Oliver Upton, Yang, Weijiang, Paolo Bonzini,
	Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrew Morton,
	Anshuman Khandual, Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Wed, Dec 14, 2022, Lai Jiangshan wrote:
> On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote:
> 
> >
> > My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> > address spaces to support SMM emulation was a mistake that should be contained to
> > SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> > even x86 can opt out.
> >
> 
> 
> I think the name ASID in kvm/x86 should be used for vmcb's ASID,
> vmcs's VPID, and PCID. Using the name ASID for other purposes
> would only result in unnecessary confusion.

I agree in principle, but at this point fixing the problem would require a lot of
churn since "as_id" is pervasive throughout the memslots code.

It should be possible though, as I don't think anything in KVM's uAPI explicitly
takes an as_id, i.e. it's KVM-internal terminology for the most part.

> There is a bug for shadow paging when it uses two separate sets
> of memslots which are using two sets of rmap and page-tracking.
> 
> When SMM world is writing to a non-SMM page which happens to be
> a guest pagetable in the non-SMM world, the write operation will
> go smoothly without specially handled and the shadow page for the guest
> pagetable is neither unshadowed nor marked unsync.  The shadow paging
> code is unaware that the shadow page has deviated from the guest
> pagetable.

Won't the unsync code work as intended?  for_each_gfn_valid_sp_with_gptes()
doesn't consume the slot and I don't see any explicit filtering on role.smm,
i.e. should unsync all SPTEs for the gfn.

Addressing the page-track case is a bit gross, but doesn't appear to be too
difficult.  I wouldn't be surprised if there are other SMM => non-SMM bugs lurking
though.

---
 arch/x86/include/asm/kvm_page_track.h |  2 +-
 arch/x86/kvm/mmu/mmu.c                |  7 +++---
 arch/x86/kvm/mmu/mmu_internal.h       |  3 ++-
 arch/x86/kvm/mmu/page_track.c         | 32 +++++++++++++++++++--------
 arch/x86/kvm/mmu/spte.c               |  2 +-
 5 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index eb186bc57f6a..fdd9de31e160 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -63,7 +63,7 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
 				     struct kvm_memory_slot *slot, gfn_t gfn,
 				     enum kvm_page_track_mode mode);
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
+bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
 				   const struct kvm_memory_slot *slot,
 				   gfn_t gfn, enum kvm_page_track_mode mode);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 254bc46234e0..1c2200042133 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2715,9 +2715,10 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
  * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must
  * be write-protected.
  */
-int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
+int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot,
 			    gfn_t gfn, bool can_unsync, bool prefetch)
 {
+	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_page *sp;
 	bool locked = false;
 
@@ -2726,7 +2727,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 * track machinery is used to write-protect upper-level shadow pages,
 	 * i.e. this guards the role.level == 4K assertion below!
 	 */
-	if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(vcpu, slot, gfn, KVM_PAGE_TRACK_WRITE))
 		return -EPERM;
 
 	/*
@@ -4127,7 +4128,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	 * guest is writing the page which is write tracked which can
 	 * not be fixed by page fault handler.
 	 */
-	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(vcpu, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
 		return true;
 
 	return false;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ac00bfbf32f6..38040ab27986 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -156,7 +156,8 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 	return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
 }
 
-int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
+int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu,
+			    const struct kvm_memory_slot *slot,
 			    gfn_t gfn, bool can_unsync, bool prefetch);
 
 void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 2e09d1b6249f..0e9bc837257e 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -18,6 +18,7 @@
 
 #include "mmu.h"
 #include "mmu_internal.h"
+#include "smm.h"
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
 {
@@ -171,27 +172,40 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
 }
 EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
 
+static bool __kvm_slot_page_track_is_active(const struct kvm_memory_slot *slot,
+					    gfn_t gfn, enum kvm_page_track_mode mode)
+{
+	int index;
+
+	if (!slot)
+		return false;
+
+	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
+	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
+}
+
 /*
  * check if the corresponding access on the specified guest page is tracked.
  */
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
+bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
 				   const struct kvm_memory_slot *slot,
 				   gfn_t gfn, enum kvm_page_track_mode mode)
 {
-	int index;
-
 	if (WARN_ON(!page_track_mode_is_valid(mode)))
 		return false;
 
-	if (!slot)
-		return false;
-
 	if (mode == KVM_PAGE_TRACK_WRITE &&
-	    !kvm_page_track_write_tracking_enabled(kvm))
+	    !kvm_page_track_write_tracking_enabled(vcpu->kvm))
 		return false;
 
-	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
-	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
+	if (__kvm_slot_page_track_is_active(slot, gfn, mode))
+		return true;
+
+	if (!is_smm(vcpu))
+		return false;
+
+	return __kvm_slot_page_track_is_active(gfn_to_memslot(vcpu->kvm, gfn),
+					       gfn, mode);
 }
 
 void kvm_page_track_cleanup(struct kvm *kvm)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index c0fd7e049b4e..89ddd113c1b9 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -220,7 +220,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		 * e.g. it's write-tracked (upper-level SPs) or has one or more
 		 * shadow pages and unsync'ing pages is not allowed.
 		 */
-		if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, can_unsync, prefetch)) {
+		if (mmu_try_to_unsync_pages(vcpu, slot, gfn, can_unsync, prefetch)) {
 			pgprintk("%s: found shadow page for %llx, marking ro\n",
 				 __func__, gfn);
 			wrprot = true;

base-commit: e0ef1f14e97ff65adf6e2157952dbbd1e482065c
-- 


^ permalink raw reply related	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-14 19:42               ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-14 19:42 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: David Matlack, Oliver Upton, Yang, Weijiang, Paolo Bonzini,
	Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrew Morton,
	Anshuman Khandual, Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Wed, Dec 14, 2022, Lai Jiangshan wrote:
> On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote:
> 
> >
> > My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> > address spaces to support SMM emulation was a mistake that should be contained to
> > SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> > even x86 can opt out.
> >
> 
> 
> I think the name ASID in kvm/x86 should be used for vmcb's ASID,
> vmcs's VPID, and PCID. Using the name ASID for other purposes
> would only result in unnecessary confusion.

I agree in principle, but at this point fixing the problem would require a lot of
churn since "as_id" is pervasive throughout the memslots code.

It should be possible though, as I don't think anything in KVM's uAPI explicitly
takes an as_id, i.e. it's KVM-internal terminology for the most part.

> There is a bug for shadow paging when it uses two separate sets
> of memslots which are using two sets of rmap and page-tracking.
> 
> When SMM world is writing to a non-SMM page which happens to be
> a guest pagetable in the non-SMM world, the write operation will
> go smoothly without specially handled and the shadow page for the guest
> pagetable is neither unshadowed nor marked unsync.  The shadow paging
> code is unaware that the shadow page has deviated from the guest
> pagetable.

Won't the unsync code work as intended?  for_each_gfn_valid_sp_with_gptes()
doesn't consume the slot and I don't see any explicit filtering on role.smm,
i.e. should unsync all SPTEs for the gfn.

Addressing the page-track case is a bit gross, but doesn't appear to be too
difficult.  I wouldn't be surprised if there are other SMM => non-SMM bugs lurking
though.

---
 arch/x86/include/asm/kvm_page_track.h |  2 +-
 arch/x86/kvm/mmu/mmu.c                |  7 +++---
 arch/x86/kvm/mmu/mmu_internal.h       |  3 ++-
 arch/x86/kvm/mmu/page_track.c         | 32 +++++++++++++++++++--------
 arch/x86/kvm/mmu/spte.c               |  2 +-
 5 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index eb186bc57f6a..fdd9de31e160 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -63,7 +63,7 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
 				     struct kvm_memory_slot *slot, gfn_t gfn,
 				     enum kvm_page_track_mode mode);
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
+bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
 				   const struct kvm_memory_slot *slot,
 				   gfn_t gfn, enum kvm_page_track_mode mode);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 254bc46234e0..1c2200042133 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2715,9 +2715,10 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
  * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must
  * be write-protected.
  */
-int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
+int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot,
 			    gfn_t gfn, bool can_unsync, bool prefetch)
 {
+	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_page *sp;
 	bool locked = false;
 
@@ -2726,7 +2727,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 * track machinery is used to write-protect upper-level shadow pages,
 	 * i.e. this guards the role.level == 4K assertion below!
 	 */
-	if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(vcpu, slot, gfn, KVM_PAGE_TRACK_WRITE))
 		return -EPERM;
 
 	/*
@@ -4127,7 +4128,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	 * guest is writing the page which is write tracked which can
 	 * not be fixed by page fault handler.
 	 */
-	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(vcpu, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
 		return true;
 
 	return false;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ac00bfbf32f6..38040ab27986 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -156,7 +156,8 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 	return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
 }
 
-int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
+int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu,
+			    const struct kvm_memory_slot *slot,
 			    gfn_t gfn, bool can_unsync, bool prefetch);
 
 void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 2e09d1b6249f..0e9bc837257e 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -18,6 +18,7 @@
 
 #include "mmu.h"
 #include "mmu_internal.h"
+#include "smm.h"
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
 {
@@ -171,27 +172,40 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
 }
 EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
 
+static bool __kvm_slot_page_track_is_active(const struct kvm_memory_slot *slot,
+					    gfn_t gfn, enum kvm_page_track_mode mode)
+{
+	int index;
+
+	if (!slot)
+		return false;
+
+	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
+	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
+}
+
 /*
  * check if the corresponding access on the specified guest page is tracked.
  */
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
+bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
 				   const struct kvm_memory_slot *slot,
 				   gfn_t gfn, enum kvm_page_track_mode mode)
 {
-	int index;
-
 	if (WARN_ON(!page_track_mode_is_valid(mode)))
 		return false;
 
-	if (!slot)
-		return false;
-
 	if (mode == KVM_PAGE_TRACK_WRITE &&
-	    !kvm_page_track_write_tracking_enabled(kvm))
+	    !kvm_page_track_write_tracking_enabled(vcpu->kvm))
 		return false;
 
-	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
-	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
+	if (__kvm_slot_page_track_is_active(slot, gfn, mode))
+		return true;
+
+	if (!is_smm(vcpu))
+		return false;
+
+	return __kvm_slot_page_track_is_active(gfn_to_memslot(vcpu->kvm, gfn),
+					       gfn, mode);
 }
 
 void kvm_page_track_cleanup(struct kvm *kvm)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index c0fd7e049b4e..89ddd113c1b9 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -220,7 +220,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		 * e.g. it's write-tracked (upper-level SPs) or has one or more
 		 * shadow pages and unsync'ing pages is not allowed.
 		 */
-		if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, can_unsync, prefetch)) {
+		if (mmu_try_to_unsync_pages(vcpu, slot, gfn, can_unsync, prefetch)) {
 			pgprintk("%s: found shadow page for %llx, marking ro\n",
 				 __func__, gfn);
 			wrprot = true;

base-commit: e0ef1f14e97ff65adf6e2157952dbbd1e482065c
-- 


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-14 19:42               ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-14 19:42 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Yang, Weijiang,
	Amit, Nadav, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, kvmarm, linux-mips, kvm-riscv,
	Paolo Bonzini, Andrew Morton

On Wed, Dec 14, 2022, Lai Jiangshan wrote:
> On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote:
> 
> >
> > My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> > address spaces to support SMM emulation was a mistake that should be contained to
> > SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> > even x86 can opt out.
> >
> 
> 
> I think the name ASID in kvm/x86 should be used for vmcb's ASID,
> vmcs's VPID, and PCID. Using the name ASID for other purposes
> would only result in unnecessary confusion.

I agree in principle, but at this point fixing the problem would require a lot of
churn since "as_id" is pervasive throughout the memslots code.

It should be possible though, as I don't think anything in KVM's uAPI explicitly
takes an as_id, i.e. it's KVM-internal terminology for the most part.

> There is a bug for shadow paging when it uses two separate sets
> of memslots which are using two sets of rmap and page-tracking.
> 
> When SMM world is writing to a non-SMM page which happens to be
> a guest pagetable in the non-SMM world, the write operation will
> go smoothly without specially handled and the shadow page for the guest
> pagetable is neither unshadowed nor marked unsync.  The shadow paging
> code is unaware that the shadow page has deviated from the guest
> pagetable.

Won't the unsync code work as intended?  for_each_gfn_valid_sp_with_gptes()
doesn't consume the slot and I don't see any explicit filtering on role.smm,
i.e. should unsync all SPTEs for the gfn.

Addressing the page-track case is a bit gross, but doesn't appear to be too
difficult.  I wouldn't be surprised if there are other SMM => non-SMM bugs lurking
though.

---
 arch/x86/include/asm/kvm_page_track.h |  2 +-
 arch/x86/kvm/mmu/mmu.c                |  7 +++---
 arch/x86/kvm/mmu/mmu_internal.h       |  3 ++-
 arch/x86/kvm/mmu/page_track.c         | 32 +++++++++++++++++++--------
 arch/x86/kvm/mmu/spte.c               |  2 +-
 5 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index eb186bc57f6a..fdd9de31e160 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -63,7 +63,7 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
 				     struct kvm_memory_slot *slot, gfn_t gfn,
 				     enum kvm_page_track_mode mode);
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
+bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
 				   const struct kvm_memory_slot *slot,
 				   gfn_t gfn, enum kvm_page_track_mode mode);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 254bc46234e0..1c2200042133 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2715,9 +2715,10 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
  * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must
  * be write-protected.
  */
-int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
+int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot,
 			    gfn_t gfn, bool can_unsync, bool prefetch)
 {
+	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_page *sp;
 	bool locked = false;
 
@@ -2726,7 +2727,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 * track machinery is used to write-protect upper-level shadow pages,
 	 * i.e. this guards the role.level == 4K assertion below!
 	 */
-	if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(vcpu, slot, gfn, KVM_PAGE_TRACK_WRITE))
 		return -EPERM;
 
 	/*
@@ -4127,7 +4128,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	 * guest is writing the page which is write tracked which can
 	 * not be fixed by page fault handler.
 	 */
-	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(vcpu, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
 		return true;
 
 	return false;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ac00bfbf32f6..38040ab27986 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -156,7 +156,8 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 	return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
 }
 
-int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
+int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu,
+			    const struct kvm_memory_slot *slot,
 			    gfn_t gfn, bool can_unsync, bool prefetch);
 
 void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 2e09d1b6249f..0e9bc837257e 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -18,6 +18,7 @@
 
 #include "mmu.h"
 #include "mmu_internal.h"
+#include "smm.h"
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
 {
@@ -171,27 +172,40 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
 }
 EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
 
+static bool __kvm_slot_page_track_is_active(const struct kvm_memory_slot *slot,
+					    gfn_t gfn, enum kvm_page_track_mode mode)
+{
+	int index;
+
+	if (!slot)
+		return false;
+
+	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
+	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
+}
+
 /*
  * check if the corresponding access on the specified guest page is tracked.
  */
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
+bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
 				   const struct kvm_memory_slot *slot,
 				   gfn_t gfn, enum kvm_page_track_mode mode)
 {
-	int index;
-
 	if (WARN_ON(!page_track_mode_is_valid(mode)))
 		return false;
 
-	if (!slot)
-		return false;
-
 	if (mode == KVM_PAGE_TRACK_WRITE &&
-	    !kvm_page_track_write_tracking_enabled(kvm))
+	    !kvm_page_track_write_tracking_enabled(vcpu->kvm))
 		return false;
 
-	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
-	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
+	if (__kvm_slot_page_track_is_active(slot, gfn, mode))
+		return true;
+
+	if (!is_smm(vcpu))
+		return false;
+
+	return __kvm_slot_page_track_is_active(gfn_to_memslot(vcpu->kvm, gfn),
+					       gfn, mode);
 }
 
 void kvm_page_track_cleanup(struct kvm *kvm)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index c0fd7e049b4e..89ddd113c1b9 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -220,7 +220,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		 * e.g. it's write-tracked (upper-level SPs) or has one or more
 		 * shadow pages and unsync'ing pages is not allowed.
 		 */
-		if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, can_unsync, prefetch)) {
+		if (mmu_try_to_unsync_pages(vcpu, slot, gfn, can_unsync, prefetch)) {
 			pgprintk("%s: found shadow page for %llx, marking ro\n",
 				 __func__, gfn);
 			wrprot = true;

base-commit: e0ef1f14e97ff65adf6e2157952dbbd1e482065c
-- 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-14 19:42               ` Sean Christopherson
  0 siblings, 0 replies; 317+ messages in thread
From: Sean Christopherson @ 2022-12-14 19:42 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: David Matlack, Oliver Upton, Yang, Weijiang, Paolo Bonzini,
	Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrew Morton,
	Anshuman Khandual, Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Wed, Dec 14, 2022, Lai Jiangshan wrote:
> On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote:
> 
> >
> > My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> > address spaces to support SMM emulation was a mistake that should be contained to
> > SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> > even x86 can opt out.
> >
> 
> 
> I think the name ASID in kvm/x86 should be used for vmcb's ASID,
> vmcs's VPID, and PCID. Using the name ASID for other purposes
> would only result in unnecessary confusion.

I agree in principle, but at this point fixing the problem would require a lot of
churn since "as_id" is pervasive throughout the memslots code.

It should be possible though, as I don't think anything in KVM's uAPI explicitly
takes an as_id, i.e. it's KVM-internal terminology for the most part.

> There is a bug for shadow paging when it uses two separate sets
> of memslots which are using two sets of rmap and page-tracking.
> 
> When SMM world is writing to a non-SMM page which happens to be
> a guest pagetable in the non-SMM world, the write operation will
> go smoothly without specially handled and the shadow page for the guest
> pagetable is neither unshadowed nor marked unsync.  The shadow paging
> code is unaware that the shadow page has deviated from the guest
> pagetable.

Won't the unsync code work as intended?  for_each_gfn_valid_sp_with_gptes()
doesn't consume the slot and I don't see any explicit filtering on role.smm,
i.e. should unsync all SPTEs for the gfn.

Addressing the page-track case is a bit gross, but doesn't appear to be too
difficult.  I wouldn't be surprised if there are other SMM => non-SMM bugs lurking
though.

---
 arch/x86/include/asm/kvm_page_track.h |  2 +-
 arch/x86/kvm/mmu/mmu.c                |  7 +++---
 arch/x86/kvm/mmu/mmu_internal.h       |  3 ++-
 arch/x86/kvm/mmu/page_track.c         | 32 +++++++++++++++++++--------
 arch/x86/kvm/mmu/spte.c               |  2 +-
 5 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index eb186bc57f6a..fdd9de31e160 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -63,7 +63,7 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
 				     struct kvm_memory_slot *slot, gfn_t gfn,
 				     enum kvm_page_track_mode mode);
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
+bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
 				   const struct kvm_memory_slot *slot,
 				   gfn_t gfn, enum kvm_page_track_mode mode);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 254bc46234e0..1c2200042133 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2715,9 +2715,10 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
  * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must
  * be write-protected.
  */
-int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
+int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot,
 			    gfn_t gfn, bool can_unsync, bool prefetch)
 {
+	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_page *sp;
 	bool locked = false;
 
@@ -2726,7 +2727,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 * track machinery is used to write-protect upper-level shadow pages,
 	 * i.e. this guards the role.level == 4K assertion below!
 	 */
-	if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(vcpu, slot, gfn, KVM_PAGE_TRACK_WRITE))
 		return -EPERM;
 
 	/*
@@ -4127,7 +4128,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	 * guest is writing the page which is write tracked which can
 	 * not be fixed by page fault handler.
 	 */
-	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(vcpu, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
 		return true;
 
 	return false;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ac00bfbf32f6..38040ab27986 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -156,7 +156,8 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
 	return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
 }
 
-int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
+int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu,
+			    const struct kvm_memory_slot *slot,
 			    gfn_t gfn, bool can_unsync, bool prefetch);
 
 void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 2e09d1b6249f..0e9bc837257e 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -18,6 +18,7 @@
 
 #include "mmu.h"
 #include "mmu_internal.h"
+#include "smm.h"
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
 {
@@ -171,27 +172,40 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
 }
 EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
 
+static bool __kvm_slot_page_track_is_active(const struct kvm_memory_slot *slot,
+					    gfn_t gfn, enum kvm_page_track_mode mode)
+{
+	int index;
+
+	if (!slot)
+		return false;
+
+	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
+	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
+}
+
 /*
  * check if the corresponding access on the specified guest page is tracked.
  */
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
+bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
 				   const struct kvm_memory_slot *slot,
 				   gfn_t gfn, enum kvm_page_track_mode mode)
 {
-	int index;
-
 	if (WARN_ON(!page_track_mode_is_valid(mode)))
 		return false;
 
-	if (!slot)
-		return false;
-
 	if (mode == KVM_PAGE_TRACK_WRITE &&
-	    !kvm_page_track_write_tracking_enabled(kvm))
+	    !kvm_page_track_write_tracking_enabled(vcpu->kvm))
 		return false;
 
-	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
-	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
+	if (__kvm_slot_page_track_is_active(slot, gfn, mode))
+		return true;
+
+	if (!is_smm(vcpu))
+		return false;
+
+	return __kvm_slot_page_track_is_active(gfn_to_memslot(vcpu->kvm, gfn),
+					       gfn, mode);
 }
 
 void kvm_page_track_cleanup(struct kvm *kvm)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index c0fd7e049b4e..89ddd113c1b9 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -220,7 +220,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		 * e.g. it's write-tracked (upper-level SPs) or has one or more
 		 * shadow pages and unsync'ing pages is not allowed.
 		 */
-		if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, can_unsync, prefetch)) {
+		if (mmu_try_to_unsync_pages(vcpu, slot, gfn, can_unsync, prefetch)) {
 			pgprintk("%s: found shadow page for %llx, marking ro\n",
 				 __func__, gfn);
 			wrprot = true;

base-commit: e0ef1f14e97ff65adf6e2157952dbbd1e482065c
-- 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
  2022-12-14 19:42               ` Sean Christopherson
  (?)
  (?)
@ 2022-12-15  7:20                 ` Lai Jiangshan
  -1 siblings, 0 replies; 317+ messages in thread
From: Lai Jiangshan @ 2022-12-15  7:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Matlack, Oliver Upton, Yang, Weijiang, Paolo Bonzini,
	Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrew Morton,
	Anshuman Khandual, Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Thu, Dec 15, 2022 at 3:42 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Dec 14, 2022, Lai Jiangshan wrote:
> > On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > >
> > > My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> > > address spaces to support SMM emulation was a mistake that should be contained to
> > > SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> > > even x86 can opt out.
> > >
> >
> >
> > I think the name ASID in kvm/x86 should be used for vmcb's ASID,
> > vmcs's VPID, and PCID. Using the name ASID for other purposes
> > would only result in unnecessary confusion.
>
> I agree in principle, but at this point fixing the problem would require a lot of
> churn since "as_id" is pervasive throughout the memslots code.
>
> It should be possible though, as I don't think anything in KVM's uAPI explicitly
> takes an as_id, i.e. it's KVM-internal terminology for the most part.
>
> > There is a bug for shadow paging when it uses two separate sets
> > of memslots which are using two sets of rmap and page-tracking.
> >
> > When SMM world is writing to a non-SMM page which happens to be
> > a guest pagetable in the non-SMM world, the write operation will
> > go smoothly without specially handled and the shadow page for the guest
> > pagetable is neither unshadowed nor marked unsync.  The shadow paging
> > code is unaware that the shadow page has deviated from the guest
> > pagetable.
>
> Won't the unsync code work as intended?  for_each_gfn_valid_sp_with_gptes()
> doesn't consume the slot and I don't see any explicit filtering on role.smm,
> i.e. should unsync all SPTEs for the gfn.
>
> Addressing the page-track case is a bit gross, but doesn't appear to be too
> difficult.  I wouldn't be surprised if there are other SMM => non-SMM bugs lurking
> though.
>
> ---
>  arch/x86/include/asm/kvm_page_track.h |  2 +-
>  arch/x86/kvm/mmu/mmu.c                |  7 +++---
>  arch/x86/kvm/mmu/mmu_internal.h       |  3 ++-
>  arch/x86/kvm/mmu/page_track.c         | 32 +++++++++++++++++++--------
>  arch/x86/kvm/mmu/spte.c               |  2 +-
>  5 files changed, 31 insertions(+), 15 deletions(-)

Could you send the patch in a new thread, please?

I will add my reviewed-by then.

It still lacks the parts that do write protection for sp->gfn.
kvm_vcpu_write_protect_gfn() has to handle the two worlds.
account_shadowed() and kvm_slot_page_track_add_page() have also
to handle the two worlds.

And I don't think there is any page-table in SMM-world, so
kvm_slot_page_track_is_active() can just skip the SMM-world
and check the non-SMM world only.

Thanks
Lai

>
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index eb186bc57f6a..fdd9de31e160 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -63,7 +63,7 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
>  void kvm_slot_page_track_remove_page(struct kvm *kvm,
>                                      struct kvm_memory_slot *slot, gfn_t gfn,
>                                      enum kvm_page_track_mode mode);
> -bool kvm_slot_page_track_is_active(struct kvm *kvm,
> +bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
>                                    const struct kvm_memory_slot *slot,
>                                    gfn_t gfn, enum kvm_page_track_mode mode);
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 254bc46234e0..1c2200042133 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2715,9 +2715,10 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>   * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must
>   * be write-protected.
>   */
> -int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
> +int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot,
>                             gfn_t gfn, bool can_unsync, bool prefetch)
>  {
> +       struct kvm *kvm = vcpu->kvm;
>         struct kvm_mmu_page *sp;
>         bool locked = false;
>
> @@ -2726,7 +2727,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>          * track machinery is used to write-protect upper-level shadow pages,
>          * i.e. this guards the role.level == 4K assertion below!
>          */
> -       if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
> +       if (kvm_slot_page_track_is_active(vcpu, slot, gfn, KVM_PAGE_TRACK_WRITE))
>                 return -EPERM;
>
>         /*
> @@ -4127,7 +4128,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
>          * guest is writing the page which is write tracked which can
>          * not be fixed by page fault handler.
>          */
> -       if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
> +       if (kvm_slot_page_track_is_active(vcpu, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
>                 return true;
>
>         return false;
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index ac00bfbf32f6..38040ab27986 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -156,7 +156,8 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
>         return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
>  }
>
> -int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
> +int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu,
> +                           const struct kvm_memory_slot *slot,
>                             gfn_t gfn, bool can_unsync, bool prefetch);
>
>  void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index 2e09d1b6249f..0e9bc837257e 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -18,6 +18,7 @@
>
>  #include "mmu.h"
>  #include "mmu_internal.h"
> +#include "smm.h"
>
>  bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
>  {
> @@ -171,27 +172,40 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
>  }
>  EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
>
> +static bool __kvm_slot_page_track_is_active(const struct kvm_memory_slot *slot,
> +                                           gfn_t gfn, enum kvm_page_track_mode mode)
> +{
> +       int index;
> +
> +       if (!slot)
> +               return false;
> +
> +       index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
> +       return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
> +}
> +
>  /*
>   * check if the corresponding access on the specified guest page is tracked.
>   */
> -bool kvm_slot_page_track_is_active(struct kvm *kvm,
> +bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
>                                    const struct kvm_memory_slot *slot,
>                                    gfn_t gfn, enum kvm_page_track_mode mode)
>  {
> -       int index;
> -
>         if (WARN_ON(!page_track_mode_is_valid(mode)))
>                 return false;
>
> -       if (!slot)
> -               return false;
> -
>         if (mode == KVM_PAGE_TRACK_WRITE &&
> -           !kvm_page_track_write_tracking_enabled(kvm))
> +           !kvm_page_track_write_tracking_enabled(vcpu->kvm))
>                 return false;
>
> -       index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
> -       return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
> +       if (__kvm_slot_page_track_is_active(slot, gfn, mode))
> +               return true;
> +
> +       if (!is_smm(vcpu))
> +               return false;
> +
> +       return __kvm_slot_page_track_is_active(gfn_to_memslot(vcpu->kvm, gfn),
> +                                              gfn, mode);
>  }
>
>  void kvm_page_track_cleanup(struct kvm *kvm)
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index c0fd7e049b4e..89ddd113c1b9 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -220,7 +220,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>                  * e.g. it's write-tracked (upper-level SPs) or has one or more
>                  * shadow pages and unsync'ing pages is not allowed.
>                  */
> -               if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, can_unsync, prefetch)) {
> +               if (mmu_try_to_unsync_pages(vcpu, slot, gfn, can_unsync, prefetch)) {
>                         pgprintk("%s: found shadow page for %llx, marking ro\n",
>                                  __func__, gfn);
>                         wrprot = true;
>
> base-commit: e0ef1f14e97ff65adf6e2157952dbbd1e482065c
> --
>

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-15  7:20                 ` Lai Jiangshan
  0 siblings, 0 replies; 317+ messages in thread
From: Lai Jiangshan @ 2022-12-15  7:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Anshuman Khandual, Hugh Dickins, Paul Walmsley, Yang, Weijiang,
	Amit, Nadav, Colin Cross, Ben Gardon, linux-riscv, kvmarm,
	Yu Zhao, Marc Zyngier, Huacai Chen, Matthew Wilcox (Oracle),
	Aleksandar Markovic, Krish Sadhukhan, Palmer Dabbelt,
	Mingwei Zhang, Albert Ou, xu xin, Arnd Bergmann, Liam R. Howlett,
	kvm, Atish Patra, David Matlack, Suren Baghdasaryan,
	Vlastimil Babka, linux-arm-kernel, kvmarm, linux-mips, kvm-riscv,
	Paolo Bonzini, Andrew Morton

On Thu, Dec 15, 2022 at 3:42 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Dec 14, 2022, Lai Jiangshan wrote:
> > On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > >
> > > My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> > > address spaces to support SMM emulation was a mistake that should be contained to
> > > SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> > > even x86 can opt out.
> > >
> >
> >
> > I think the name ASID in kvm/x86 should be used for vmcb's ASID,
> > vmcs's VPID, and PCID. Using the name ASID for other purposes
> > would only result in unnecessary confusion.
>
> I agree in principle, but at this point fixing the problem would require a lot of
> churn since "as_id" is pervasive throughout the memslots code.
>
> It should be possible though, as I don't think anything in KVM's uAPI explicitly
> takes an as_id, i.e. it's KVM-internal terminology for the most part.
>
> > There is a bug for shadow paging when it uses two separate sets
> > of memslots which are using two sets of rmap and page-tracking.
> >
> > When SMM world is writing to a non-SMM page which happens to be
> > a guest pagetable in the non-SMM world, the write operation will
> > go smoothly without specially handled and the shadow page for the guest
> > pagetable is neither unshadowed nor marked unsync.  The shadow paging
> > code is unaware that the shadow page has deviated from the guest
> > pagetable.
>
> Won't the unsync code work as intended?  for_each_gfn_valid_sp_with_gptes()
> doesn't consume the slot and I don't see any explicit filtering on role.smm,
> i.e. should unsync all SPTEs for the gfn.
>
> Addressing the page-track case is a bit gross, but doesn't appear to be too
> difficult.  I wouldn't be surprised if there are other SMM => non-SMM bugs lurking
> though.
>
> ---
>  arch/x86/include/asm/kvm_page_track.h |  2 +-
>  arch/x86/kvm/mmu/mmu.c                |  7 +++---
>  arch/x86/kvm/mmu/mmu_internal.h       |  3 ++-
>  arch/x86/kvm/mmu/page_track.c         | 32 +++++++++++++++++++--------
>  arch/x86/kvm/mmu/spte.c               |  2 +-
>  5 files changed, 31 insertions(+), 15 deletions(-)

Could you send the patch in a new thread, please?

I will add my reviewed-by then.

It still lacks the parts that do write protection for sp->gfn.
kvm_vcpu_write_protect_gfn() has to handle the two worlds.
account_shadowed() and kvm_slot_page_track_add_page() have also
to handle the two worlds.

And I don't think there is any page-table in SMM-world, so
kvm_slot_page_track_is_active() can just skip the SMM-world
and check the non-SMM world only.

Thanks
Lai

>
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index eb186bc57f6a..fdd9de31e160 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -63,7 +63,7 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
>  void kvm_slot_page_track_remove_page(struct kvm *kvm,
>                                      struct kvm_memory_slot *slot, gfn_t gfn,
>                                      enum kvm_page_track_mode mode);
> -bool kvm_slot_page_track_is_active(struct kvm *kvm,
> +bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
>                                    const struct kvm_memory_slot *slot,
>                                    gfn_t gfn, enum kvm_page_track_mode mode);
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 254bc46234e0..1c2200042133 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2715,9 +2715,10 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>   * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must
>   * be write-protected.
>   */
> -int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
> +int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot,
>                             gfn_t gfn, bool can_unsync, bool prefetch)
>  {
> +       struct kvm *kvm = vcpu->kvm;
>         struct kvm_mmu_page *sp;
>         bool locked = false;
>
> @@ -2726,7 +2727,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>          * track machinery is used to write-protect upper-level shadow pages,
>          * i.e. this guards the role.level == 4K assertion below!
>          */
> -       if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
> +       if (kvm_slot_page_track_is_active(vcpu, slot, gfn, KVM_PAGE_TRACK_WRITE))
>                 return -EPERM;
>
>         /*
> @@ -4127,7 +4128,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
>          * guest is writing the page which is write tracked which can
>          * not be fixed by page fault handler.
>          */
> -       if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
> +       if (kvm_slot_page_track_is_active(vcpu, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
>                 return true;
>
>         return false;
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index ac00bfbf32f6..38040ab27986 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -156,7 +156,8 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
>         return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
>  }
>
> -int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
> +int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu,
> +                           const struct kvm_memory_slot *slot,
>                             gfn_t gfn, bool can_unsync, bool prefetch);
>
>  void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index 2e09d1b6249f..0e9bc837257e 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -18,6 +18,7 @@
>
>  #include "mmu.h"
>  #include "mmu_internal.h"
> +#include "smm.h"
>
>  bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
>  {
> @@ -171,27 +172,40 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
>  }
>  EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
>
> +static bool __kvm_slot_page_track_is_active(const struct kvm_memory_slot *slot,
> +                                           gfn_t gfn, enum kvm_page_track_mode mode)
> +{
> +       int index;
> +
> +       if (!slot)
> +               return false;
> +
> +       index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
> +       return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
> +}
> +
>  /*
>   * check if the corresponding access on the specified guest page is tracked.
>   */
> -bool kvm_slot_page_track_is_active(struct kvm *kvm,
> +bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
>                                    const struct kvm_memory_slot *slot,
>                                    gfn_t gfn, enum kvm_page_track_mode mode)
>  {
> -       int index;
> -
>         if (WARN_ON(!page_track_mode_is_valid(mode)))
>                 return false;
>
> -       if (!slot)
> -               return false;
> -
>         if (mode == KVM_PAGE_TRACK_WRITE &&
> -           !kvm_page_track_write_tracking_enabled(kvm))
> +           !kvm_page_track_write_tracking_enabled(vcpu->kvm))
>                 return false;
>
> -       index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
> -       return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
> +       if (__kvm_slot_page_track_is_active(slot, gfn, mode))
> +               return true;
> +
> +       if (!is_smm(vcpu))
> +               return false;
> +
> +       return __kvm_slot_page_track_is_active(gfn_to_memslot(vcpu->kvm, gfn),
> +                                              gfn, mode);
>  }
>
>  void kvm_page_track_cleanup(struct kvm *kvm)
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index c0fd7e049b4e..89ddd113c1b9 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -220,7 +220,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>                  * e.g. it's write-tracked (upper-level SPs) or has one or more
>                  * shadow pages and unsync'ing pages is not allowed.
>                  */
> -               if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, can_unsync, prefetch)) {
> +               if (mmu_try_to_unsync_pages(vcpu, slot, gfn, can_unsync, prefetch)) {
>                         pgprintk("%s: found shadow page for %llx, marking ro\n",
>                                  __func__, gfn);
>                         wrprot = true;
>
> base-commit: e0ef1f14e97ff65adf6e2157952dbbd1e482065c
> --
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-15  7:20                 ` Lai Jiangshan
  0 siblings, 0 replies; 317+ messages in thread
From: Lai Jiangshan @ 2022-12-15  7:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Matlack, Oliver Upton, Yang, Weijiang, Paolo Bonzini,
	Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrew Morton,
	Anshuman Khandual, Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Thu, Dec 15, 2022 at 3:42 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Dec 14, 2022, Lai Jiangshan wrote:
> > On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > >
> > > My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> > > address spaces to support SMM emulation was a mistake that should be contained to
> > > SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> > > even x86 can opt out.
> > >
> >
> >
> > I think the name ASID in kvm/x86 should be used for vmcb's ASID,
> > vmcs's VPID, and PCID. Using the name ASID for other purposes
> > would only result in unnecessary confusion.
>
> I agree in principle, but at this point fixing the problem would require a lot of
> churn since "as_id" is pervasive throughout the memslots code.
>
> It should be possible though, as I don't think anything in KVM's uAPI explicitly
> takes an as_id, i.e. it's KVM-internal terminology for the most part.
>
> > There is a bug for shadow paging when it uses two separate sets
> > of memslots which are using two sets of rmap and page-tracking.
> >
> > When SMM world is writing to a non-SMM page which happens to be
> > a guest pagetable in the non-SMM world, the write operation will
> > go smoothly without specially handled and the shadow page for the guest
> > pagetable is neither unshadowed nor marked unsync.  The shadow paging
> > code is unaware that the shadow page has deviated from the guest
> > pagetable.
>
> Won't the unsync code work as intended?  for_each_gfn_valid_sp_with_gptes()
> doesn't consume the slot and I don't see any explicit filtering on role.smm,
> i.e. should unsync all SPTEs for the gfn.
>
> Addressing the page-track case is a bit gross, but doesn't appear to be too
> difficult.  I wouldn't be surprised if there are other SMM => non-SMM bugs lurking
> though.
>
> ---
>  arch/x86/include/asm/kvm_page_track.h |  2 +-
>  arch/x86/kvm/mmu/mmu.c                |  7 +++---
>  arch/x86/kvm/mmu/mmu_internal.h       |  3 ++-
>  arch/x86/kvm/mmu/page_track.c         | 32 +++++++++++++++++++--------
>  arch/x86/kvm/mmu/spte.c               |  2 +-
>  5 files changed, 31 insertions(+), 15 deletions(-)

Could you send the patch in a new thread, please?

I will add my reviewed-by then.

It still lacks the parts that do write protection for sp->gfn.
kvm_vcpu_write_protect_gfn() has to handle the two worlds.
account_shadowed() and kvm_slot_page_track_add_page() have also
to handle the two worlds.

And I don't think there is any page-table in SMM-world, so
kvm_slot_page_track_is_active() can just skip the SMM-world
and check the non-SMM world only.

Thanks
Lai

>
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index eb186bc57f6a..fdd9de31e160 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -63,7 +63,7 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
>  void kvm_slot_page_track_remove_page(struct kvm *kvm,
>                                      struct kvm_memory_slot *slot, gfn_t gfn,
>                                      enum kvm_page_track_mode mode);
> -bool kvm_slot_page_track_is_active(struct kvm *kvm,
> +bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
>                                    const struct kvm_memory_slot *slot,
>                                    gfn_t gfn, enum kvm_page_track_mode mode);
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 254bc46234e0..1c2200042133 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2715,9 +2715,10 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>   * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must
>   * be write-protected.
>   */
> -int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
> +int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot,
>                             gfn_t gfn, bool can_unsync, bool prefetch)
>  {
> +       struct kvm *kvm = vcpu->kvm;
>         struct kvm_mmu_page *sp;
>         bool locked = false;
>
> @@ -2726,7 +2727,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>          * track machinery is used to write-protect upper-level shadow pages,
>          * i.e. this guards the role.level == 4K assertion below!
>          */
> -       if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
> +       if (kvm_slot_page_track_is_active(vcpu, slot, gfn, KVM_PAGE_TRACK_WRITE))
>                 return -EPERM;
>
>         /*
> @@ -4127,7 +4128,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
>          * guest is writing the page which is write tracked which can
>          * not be fixed by page fault handler.
>          */
> -       if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
> +       if (kvm_slot_page_track_is_active(vcpu, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
>                 return true;
>
>         return false;
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index ac00bfbf32f6..38040ab27986 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -156,7 +156,8 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
>         return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
>  }
>
> -int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
> +int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu,
> +                           const struct kvm_memory_slot *slot,
>                             gfn_t gfn, bool can_unsync, bool prefetch);
>
>  void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index 2e09d1b6249f..0e9bc837257e 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -18,6 +18,7 @@
>
>  #include "mmu.h"
>  #include "mmu_internal.h"
> +#include "smm.h"
>
>  bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
>  {
> @@ -171,27 +172,40 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
>  }
>  EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
>
> +static bool __kvm_slot_page_track_is_active(const struct kvm_memory_slot *slot,
> +                                           gfn_t gfn, enum kvm_page_track_mode mode)
> +{
> +       int index;
> +
> +       if (!slot)
> +               return false;
> +
> +       index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
> +       return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
> +}
> +
>  /*
>   * check if the corresponding access on the specified guest page is tracked.
>   */
> -bool kvm_slot_page_track_is_active(struct kvm *kvm,
> +bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
>                                    const struct kvm_memory_slot *slot,
>                                    gfn_t gfn, enum kvm_page_track_mode mode)
>  {
> -       int index;
> -
>         if (WARN_ON(!page_track_mode_is_valid(mode)))
>                 return false;
>
> -       if (!slot)
> -               return false;
> -
>         if (mode == KVM_PAGE_TRACK_WRITE &&
> -           !kvm_page_track_write_tracking_enabled(kvm))
> +           !kvm_page_track_write_tracking_enabled(vcpu->kvm))
>                 return false;
>
> -       index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
> -       return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
> +       if (__kvm_slot_page_track_is_active(slot, gfn, mode))
> +               return true;
> +
> +       if (!is_smm(vcpu))
> +               return false;
> +
> +       return __kvm_slot_page_track_is_active(gfn_to_memslot(vcpu->kvm, gfn),
> +                                              gfn, mode);
>  }
>
>  void kvm_page_track_cleanup(struct kvm *kvm)
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index c0fd7e049b4e..89ddd113c1b9 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -220,7 +220,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>                  * e.g. it's write-tracked (upper-level SPs) or has one or more
>                  * shadow pages and unsync'ing pages is not allowed.
>                  */
> -               if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, can_unsync, prefetch)) {
> +               if (mmu_try_to_unsync_pages(vcpu, slot, gfn, can_unsync, prefetch)) {
>                         pgprintk("%s: found shadow page for %llx, marking ro\n",
>                                  __func__, gfn);
>                         wrprot = true;
>
> base-commit: e0ef1f14e97ff65adf6e2157952dbbd1e482065c
> --
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role
@ 2022-12-15  7:20                 ` Lai Jiangshan
  0 siblings, 0 replies; 317+ messages in thread
From: Lai Jiangshan @ 2022-12-15  7:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Matlack, Oliver Upton, Yang, Weijiang, Paolo Bonzini,
	Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrew Morton,
	Anshuman Khandual, Amit, Nadav, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Thu, Dec 15, 2022 at 3:42 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Dec 14, 2022, Lai Jiangshan wrote:
> > On Tue, Dec 13, 2022 at 1:47 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > >
> > > My preference would be to leave .smm in x86's page role.  IMO, defining multiple
> > > address spaces to support SMM emulation was a mistake that should be contained to
> > > SMM, i.e. should never be used for any other feature.  And with CONFIG_KVM_SMM,
> > > even x86 can opt out.
> > >
> >
> >
> > I think the name ASID in kvm/x86 should be used for vmcb's ASID,
> > vmcs's VPID, and PCID. Using the name ASID for other purposes
> > would only result in unnecessary confusion.
>
> I agree in principle, but at this point fixing the problem would require a lot of
> churn since "as_id" is pervasive throughout the memslots code.
>
> It should be possible though, as I don't think anything in KVM's uAPI explicitly
> takes an as_id, i.e. it's KVM-internal terminology for the most part.
>
> > There is a bug for shadow paging when it uses two separate sets
> > of memslots which are using two sets of rmap and page-tracking.
> >
> > When SMM world is writing to a non-SMM page which happens to be
> > a guest pagetable in the non-SMM world, the write operation will
> > go smoothly without specially handled and the shadow page for the guest
> > pagetable is neither unshadowed nor marked unsync.  The shadow paging
> > code is unaware that the shadow page has deviated from the guest
> > pagetable.
>
> Won't the unsync code work as intended?  for_each_gfn_valid_sp_with_gptes()
> doesn't consume the slot and I don't see any explicit filtering on role.smm,
> i.e. should unsync all SPTEs for the gfn.
>
> Addressing the page-track case is a bit gross, but doesn't appear to be too
> difficult.  I wouldn't be surprised if there are other SMM => non-SMM bugs lurking
> though.
>
> ---
>  arch/x86/include/asm/kvm_page_track.h |  2 +-
>  arch/x86/kvm/mmu/mmu.c                |  7 +++---
>  arch/x86/kvm/mmu/mmu_internal.h       |  3 ++-
>  arch/x86/kvm/mmu/page_track.c         | 32 +++++++++++++++++++--------
>  arch/x86/kvm/mmu/spte.c               |  2 +-
>  5 files changed, 31 insertions(+), 15 deletions(-)

Could you send the patch in a new thread, please?

I will add my reviewed-by then.

It still lacks the parts that do write protection for sp->gfn.
kvm_vcpu_write_protect_gfn() has to handle the two worlds.
account_shadowed() and kvm_slot_page_track_add_page() have also
to handle the two worlds.

And I don't think there is any page-table in SMM-world, so
kvm_slot_page_track_is_active() can just skip the SMM-world
and check the non-SMM world only.

Thanks
Lai

>
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index eb186bc57f6a..fdd9de31e160 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -63,7 +63,7 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
>  void kvm_slot_page_track_remove_page(struct kvm *kvm,
>                                      struct kvm_memory_slot *slot, gfn_t gfn,
>                                      enum kvm_page_track_mode mode);
> -bool kvm_slot_page_track_is_active(struct kvm *kvm,
> +bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
>                                    const struct kvm_memory_slot *slot,
>                                    gfn_t gfn, enum kvm_page_track_mode mode);
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 254bc46234e0..1c2200042133 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2715,9 +2715,10 @@ static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
>   * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must
>   * be write-protected.
>   */
> -int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
> +int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, const struct kvm_memory_slot *slot,
>                             gfn_t gfn, bool can_unsync, bool prefetch)
>  {
> +       struct kvm *kvm = vcpu->kvm;
>         struct kvm_mmu_page *sp;
>         bool locked = false;
>
> @@ -2726,7 +2727,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
>          * track machinery is used to write-protect upper-level shadow pages,
>          * i.e. this guards the role.level == 4K assertion below!
>          */
> -       if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
> +       if (kvm_slot_page_track_is_active(vcpu, slot, gfn, KVM_PAGE_TRACK_WRITE))
>                 return -EPERM;
>
>         /*
> @@ -4127,7 +4128,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
>          * guest is writing the page which is write tracked which can
>          * not be fixed by page fault handler.
>          */
> -       if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
> +       if (kvm_slot_page_track_is_active(vcpu, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
>                 return true;
>
>         return false;
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index ac00bfbf32f6..38040ab27986 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -156,7 +156,8 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
>         return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode;
>  }
>
> -int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
> +int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu,
> +                           const struct kvm_memory_slot *slot,
>                             gfn_t gfn, bool can_unsync, bool prefetch);
>
>  void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index 2e09d1b6249f..0e9bc837257e 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -18,6 +18,7 @@
>
>  #include "mmu.h"
>  #include "mmu_internal.h"
> +#include "smm.h"
>
>  bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
>  {
> @@ -171,27 +172,40 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
>  }
>  EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
>
> +static bool __kvm_slot_page_track_is_active(const struct kvm_memory_slot *slot,
> +                                           gfn_t gfn, enum kvm_page_track_mode mode)
> +{
> +       int index;
> +
> +       if (!slot)
> +               return false;
> +
> +       index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
> +       return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
> +}
> +
>  /*
>   * check if the corresponding access on the specified guest page is tracked.
>   */
> -bool kvm_slot_page_track_is_active(struct kvm *kvm,
> +bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu,
>                                    const struct kvm_memory_slot *slot,
>                                    gfn_t gfn, enum kvm_page_track_mode mode)
>  {
> -       int index;
> -
>         if (WARN_ON(!page_track_mode_is_valid(mode)))
>                 return false;
>
> -       if (!slot)
> -               return false;
> -
>         if (mode == KVM_PAGE_TRACK_WRITE &&
> -           !kvm_page_track_write_tracking_enabled(kvm))
> +           !kvm_page_track_write_tracking_enabled(vcpu->kvm))
>                 return false;
>
> -       index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
> -       return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
> +       if (__kvm_slot_page_track_is_active(slot, gfn, mode))
> +               return true;
> +
> +       if (!is_smm(vcpu))
> +               return false;
> +
> +       return __kvm_slot_page_track_is_active(gfn_to_memslot(vcpu->kvm, gfn),
> +                                              gfn, mode);
>  }
>
>  void kvm_page_track_cleanup(struct kvm *kvm)
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index c0fd7e049b4e..89ddd113c1b9 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -220,7 +220,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>                  * e.g. it's write-tracked (upper-level SPs) or has one or more
>                  * shadow pages and unsync'ing pages is not allowed.
>                  */
> -               if (mmu_try_to_unsync_pages(vcpu->kvm, slot, gfn, can_unsync, prefetch)) {
> +               if (mmu_try_to_unsync_pages(vcpu, slot, gfn, can_unsync, prefetch)) {
>                         pgprintk("%s: found shadow page for %llx, marking ro\n",
>                                  __func__, gfn);
>                         wrprot = true;
>
> base-commit: e0ef1f14e97ff65adf6e2157952dbbd1e482065c
> --
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
  2022-12-12 22:27     ` Paolo Bonzini
  (?)
@ 2023-01-09 18:55       ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-09 18:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 2:27 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/8/22 20:38, David Matlack wrote:
> > +
> > +     /* Derived from mmu and global state.  */
> > +     const bool is_tdp;
>
> I think this could stay in the architecture-independent part.

I agree but until there's a use case for accessing it in common code
I'm inclined to leave it in x86's kvm_page_fault_arch.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
@ 2023-01-09 18:55       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-09 18:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 2:27 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/8/22 20:38, David Matlack wrote:
> > +
> > +     /* Derived from mmu and global state.  */
> > +     const bool is_tdp;
>
> I think this could stay in the architecture-independent part.

I agree but until there's a use case for accessing it in common code
I'm inclined to leave it in x86's kvm_page_fault_arch.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to common code
@ 2023-01-09 18:55       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-09 18:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 2:27 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/8/22 20:38, David Matlack wrote:
> > +
> > +     /* Derived from mmu and global state.  */
> > +     const bool is_tdp;
>
> I think this could stay in the architecture-independent part.

I agree but until there's a use case for accessing it in common code
I'm inclined to leave it in x86's kvm_page_fault_arch.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 04/37] KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
  2022-12-12 23:15     ` Paolo Bonzini
  (?)
@ 2023-01-11 22:45       ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-11 22:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 3:15 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/8/22 20:38, David Matlack wrote:
> > Invert the meaning of sp->tdp_mmu_page and rename it accordingly. This
> > allows the TDP MMU code to not care about this field, which will be used
> > in a subsequent commit to move the TDP MMU to common code.
> >
> > No functional change intended.
>
> Let's use a bit of the role instead.

Will do in v2, thanks.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 04/37] KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
@ 2023-01-11 22:45       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-11 22:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 3:15 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/8/22 20:38, David Matlack wrote:
> > Invert the meaning of sp->tdp_mmu_page and rename it accordingly. This
> > allows the TDP MMU code to not care about this field, which will be used
> > in a subsequent commit to move the TDP MMU to common code.
> >
> > No functional change intended.
>
> Let's use a bit of the role instead.

Will do in v2, thanks.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 04/37] KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page
@ 2023-01-11 22:45       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-11 22:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv

On Mon, Dec 12, 2022 at 3:15 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 12/8/22 20:38, David Matlack wrote:
> > Invert the meaning of sp->tdp_mmu_page and rename it accordingly. This
> > allows the TDP MMU code to not care about this field, which will be used
> > in a subsequent commit to move the TDP MMU to common code.
> >
> > No functional change intended.
>
> Let's use a bit of the role instead.

Will do in v2, thanks.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
  2022-12-08 19:38 ` David Matlack
  (?)
@ 2023-01-19 17:14   ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-19 17:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
> 
> Hello,
> 
> This series refactors the KVM/x86 "TDP MMU" into common code. This is
> the first step toward sharing TDP (aka Stage-2) page table management
> code across architectures that support KVM.

Thank you everyone for the feedback on this RFC. I have a couple of
updates to share and a question at the end.

First, Alexandre Ghiti from Rivos is going to work on the RISC-V port.
I'd like to target RISC-V first, since it has significantly lower risk
and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
[soon] nested virtualization to deal with).

Before I send a v2 I am working on sending several related patches.
These are patches that should have enough justification to be merged
regardless of the fate of the Common MMU. By sending them out
separately, I figure they will be easier to review, allow me to make
incremental progress, and ultimately simplify the v2 of this series.

What I've sent so far:

 - https://lore.kernel.org/kvm/20230117222707.3949974-1-dmatlack@google.com/
 - https://lore.kernel.org/kvm/20230118175300.790835-1-dmatlack@google.com/

What's coming soon:

 - A series to add a common API for range-based TLB flushing (patches
   29-33 from this series, plus another cleanup). This cleanup stands on
   its own, plus Raghavendra from Google has need of this for his ARM
   series to add range-based TLBI support [1]

 - A patch to move sp->tdp_mmu_page into sp->role.tdp_mmu. This was
   suggested by Paolo as an alternative to patch 4, and saves a byte
   from struct kvm_mmu_page.

There will probably be more related cleanups I will send but this is
everything I'm tracking so far. If anyone wants to see a complete v2
sooner, let me know.

Paolo and Sean, what are your thoughts on merging the Common MMU
refactor without RISC-V support? e.g. Should Alexandre and I work on
developing a functional prototype first, or are you open to merging the
refactor and then building RISC-V support on top of that? My preference
is the latter so that there is a more stable base on which to build the
RISC-V support, we can make incremental progress, and keep everyone
upstream more involved in the development.

Thanks.

[2] https://lore.kernel.org/kvm/20230109215347.3119271-4-rananta@google.com/

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2023-01-19 17:14   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-19 17:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
> 
> Hello,
> 
> This series refactors the KVM/x86 "TDP MMU" into common code. This is
> the first step toward sharing TDP (aka Stage-2) page table management
> code across architectures that support KVM.

Thank you everyone for the feedback on this RFC. I have a couple of
updates to share and a question at the end.

First, Alexandre Ghiti from Rivos is going to work on the RISC-V port.
I'd like to target RISC-V first, since it has significantly lower risk
and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
[soon] nested virtualization to deal with).

Before I send a v2 I am working on sending several related patches.
These are patches that should have enough justification to be merged
regardless of the fate of the Common MMU. By sending them out
separately, I figure they will be easier to review, allow me to make
incremental progress, and ultimately simplify the v2 of this series.

What I've sent so far:

 - https://lore.kernel.org/kvm/20230117222707.3949974-1-dmatlack@google.com/
 - https://lore.kernel.org/kvm/20230118175300.790835-1-dmatlack@google.com/

What's coming soon:

 - A series to add a common API for range-based TLB flushing (patches
   29-33 from this series, plus another cleanup). This cleanup stands on
   its own, plus Raghavendra from Google has need of this for his ARM
   series to add range-based TLBI support [1]

 - A patch to move sp->tdp_mmu_page into sp->role.tdp_mmu. This was
   suggested by Paolo as an alternative to patch 4, and saves a byte
   from struct kvm_mmu_page.

There will probably be more related cleanups I will send but this is
everything I'm tracking so far. If anyone wants to see a complete v2
sooner, let me know.

Paolo and Sean, what are your thoughts on merging the Common MMU
refactor without RISC-V support? e.g. Should Alexandre and I work on
developing a functional prototype first, or are you open to merging the
refactor and then building RISC-V support on top of that? My preference
is the latter so that there is a more stable base on which to build the
RISC-V support, we can make incremental progress, and keep everyone
upstream more involved in the development.

Thanks.

[2] https://lore.kernel.org/kvm/20230109215347.3119271-4-rananta@google.com/

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2023-01-19 17:14   ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-19 17:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
> 
> Hello,
> 
> This series refactors the KVM/x86 "TDP MMU" into common code. This is
> the first step toward sharing TDP (aka Stage-2) page table management
> code across architectures that support KVM.

Thank you everyone for the feedback on this RFC. I have a couple of
updates to share and a question at the end.

First, Alexandre Ghiti from Rivos is going to work on the RISC-V port.
I'd like to target RISC-V first, since it has significantly lower risk
and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
[soon] nested virtualization to deal with).

Before I send a v2 I am working on sending several related patches.
These are patches that should have enough justification to be merged
regardless of the fate of the Common MMU. By sending them out
separately, I figure they will be easier to review, allow me to make
incremental progress, and ultimately simplify the v2 of this series.

What I've sent so far:

 - https://lore.kernel.org/kvm/20230117222707.3949974-1-dmatlack@google.com/
 - https://lore.kernel.org/kvm/20230118175300.790835-1-dmatlack@google.com/

What's coming soon:

 - A series to add a common API for range-based TLB flushing (patches
   29-33 from this series, plus another cleanup). This cleanup stands on
   its own, plus Raghavendra from Google has need of this for his ARM
   series to add range-based TLBI support [1]

 - A patch to move sp->tdp_mmu_page into sp->role.tdp_mmu. This was
   suggested by Paolo as an alternative to patch 4, and saves a byte
   from struct kvm_mmu_page.

There will probably be more related cleanups I will send but this is
everything I'm tracking so far. If anyone wants to see a complete v2
sooner, let me know.

Paolo and Sean, what are your thoughts on merging the Common MMU
refactor without RISC-V support? e.g. Should Alexandre and I work on
developing a functional prototype first, or are you open to merging the
refactor and then building RISC-V support on top of that? My preference
is the latter so that there is a more stable base on which to build the
RISC-V support, we can make incremental progress, and keep everyone
upstream more involved in the development.

Thanks.

[2] https://lore.kernel.org/kvm/20230109215347.3119271-4-rananta@google.com/

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
  2023-01-19 17:14   ` David Matlack
  (?)
@ 2023-01-19 17:23     ` Paolo Bonzini
  -1 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2023-01-19 17:23 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, Jan 19, 2023 at 6:14 PM David Matlack <dmatlack@google.com> wrote:
> Paolo and Sean, what are your thoughts on merging the Common MMU
> refactor without RISC-V support?

I have no objection. We know what the long-term plan is, and it's not
so long-term anyway.

Paolo


^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2023-01-19 17:23     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2023-01-19 17:23 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, Jan 19, 2023 at 6:14 PM David Matlack <dmatlack@google.com> wrote:
> Paolo and Sean, what are your thoughts on merging the Common MMU
> refactor without RISC-V support?

I have no objection. We know what the long-term plan is, and it's not
so long-term anyway.

Paolo


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2023-01-19 17:23     ` Paolo Bonzini
  0 siblings, 0 replies; 317+ messages in thread
From: Paolo Bonzini @ 2023-01-19 17:23 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, Jan 19, 2023 at 6:14 PM David Matlack <dmatlack@google.com> wrote:
> Paolo and Sean, what are your thoughts on merging the Common MMU
> refactor without RISC-V support?

I have no objection. We know what the long-term plan is, and it's not
so long-term anyway.

Paolo


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
  2023-01-19 17:14   ` David Matlack
  (?)
@ 2023-01-19 17:24     ` Marc Zyngier
  -1 siblings, 0 replies; 317+ messages in thread
From: Marc Zyngier @ 2023-01-19 17:24 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, 19 Jan 2023 17:14:34 +0000,
David Matlack <dmatlack@google.com> wrote:
> 
> On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
> > 
> > Hello,
> > 
> > This series refactors the KVM/x86 "TDP MMU" into common code. This is
> > the first step toward sharing TDP (aka Stage-2) page table management
> > code across architectures that support KVM.
> 
> Thank you everyone for the feedback on this RFC. I have a couple of
> updates to share and a question at the end.
> 
> First, Alexandre Ghiti from Rivos is going to work on the RISC-V port.
> I'd like to target RISC-V first, since it has significantly lower risk
> and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
> [soon] nested virtualization to deal with).

And (joy, happiness), the upcoming 128bit page table support[1].

	M.

[1] https://developer.arm.com/documentation/ddi0601/2022-12/AArch64-Registers/TTBR0-EL1--Translation-Table-Base-Register-0--EL1-?lang=en

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2023-01-19 17:24     ` Marc Zyngier
  0 siblings, 0 replies; 317+ messages in thread
From: Marc Zyngier @ 2023-01-19 17:24 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, 19 Jan 2023 17:14:34 +0000,
David Matlack <dmatlack@google.com> wrote:
> 
> On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
> > 
> > Hello,
> > 
> > This series refactors the KVM/x86 "TDP MMU" into common code. This is
> > the first step toward sharing TDP (aka Stage-2) page table management
> > code across architectures that support KVM.
> 
> Thank you everyone for the feedback on this RFC. I have a couple of
> updates to share and a question at the end.
> 
> First, Alexandre Ghiti from Rivos is going to work on the RISC-V port.
> I'd like to target RISC-V first, since it has significantly lower risk
> and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
> [soon] nested virtualization to deal with).

And (joy, happiness), the upcoming 128bit page table support[1].

	M.

[1] https://developer.arm.com/documentation/ddi0601/2022-12/AArch64-Registers/TTBR0-EL1--Translation-Table-Base-Register-0--EL1-?lang=en

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2023-01-19 17:24     ` Marc Zyngier
  0 siblings, 0 replies; 317+ messages in thread
From: Marc Zyngier @ 2023-01-19 17:24 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, 19 Jan 2023 17:14:34 +0000,
David Matlack <dmatlack@google.com> wrote:
> 
> On Thu, Dec 08, 2022 at 11:38:20AM -0800, David Matlack wrote:
> > 
> > Hello,
> > 
> > This series refactors the KVM/x86 "TDP MMU" into common code. This is
> > the first step toward sharing TDP (aka Stage-2) page table management
> > code across architectures that support KVM.
> 
> Thank you everyone for the feedback on this RFC. I have a couple of
> updates to share and a question at the end.
> 
> First, Alexandre Ghiti from Rivos is going to work on the RISC-V port.
> I'd like to target RISC-V first, since it has significantly lower risk
> and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
> [soon] nested virtualization to deal with).

And (joy, happiness), the upcoming 128bit page table support[1].

	M.

[1] https://developer.arm.com/documentation/ddi0601/2022-12/AArch64-Registers/TTBR0-EL1--Translation-Table-Base-Register-0--EL1-?lang=en

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
  2023-01-19 17:24     ` Marc Zyngier
  (?)
@ 2023-01-19 18:38       ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-19 18:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Paolo Bonzini, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, Jan 19, 2023 at 9:24 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 19 Jan 2023 17:14:34 +0000, David Matlack <dmatlack@google.com> wrote:
> > I'd like to target RISC-V first, since it has significantly lower risk
> > and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
> > [soon] nested virtualization to deal with).
>
> And (joy, happiness), the upcoming 128bit page table support[1].

Oh good, I was worried the ARM port was going to be too easy :)

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2023-01-19 18:38       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-19 18:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Paolo Bonzini, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, Jan 19, 2023 at 9:24 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 19 Jan 2023 17:14:34 +0000, David Matlack <dmatlack@google.com> wrote:
> > I'd like to target RISC-V first, since it has significantly lower risk
> > and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
> > [soon] nested virtualization to deal with).
>
> And (joy, happiness), the upcoming 128bit page table support[1].

Oh good, I was worried the ARM port was going to be too easy :)

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2023-01-19 18:38       ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-19 18:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Paolo Bonzini, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, Jan 19, 2023 at 9:24 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 19 Jan 2023 17:14:34 +0000, David Matlack <dmatlack@google.com> wrote:
> > I'd like to target RISC-V first, since it has significantly lower risk
> > and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
> > [soon] nested virtualization to deal with).
>
> And (joy, happiness), the upcoming 128bit page table support[1].

Oh good, I was worried the ARM port was going to be too easy :)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
  2023-01-19 18:38       ` David Matlack
  (?)
@ 2023-01-19 19:04         ` David Matlack
  -1 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-19 19:04 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Paolo Bonzini, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, Jan 19, 2023 at 10:38 AM David Matlack <dmatlack@google.com> wrote:
>
> On Thu, Jan 19, 2023 at 9:24 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Thu, 19 Jan 2023 17:14:34 +0000, David Matlack <dmatlack@google.com> wrote:
> > > I'd like to target RISC-V first, since it has significantly lower risk
> > > and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
> > > [soon] nested virtualization to deal with).
> >
> > And (joy, happiness), the upcoming 128bit page table support[1].
>
> Oh good, I was worried the ARM port was going to be too easy :)

But in all seriousness, I'm not too worried about supporting 128-bit
page tables in the Common MMU, assuming it is a compile-time decision.
The way I'm planning to organize the code, architecture-specific code
will own the PTEs, so each architecture can do whatever they want.
There is a hard-coded assumption that PTEs are u64 in the current
code, but we can abstract that behind a typedef for 128-bit support.

We will need to figure out how to deal with concurrency though. Will
128-bit page table support come with 128-bit atomic support (e.g.
compare-exchange)? If so we should be good to go. If not, we'll need
to emulate them with e.g. spinlocks. But either way, figuring this out
is not specific to the Common MMU. Even if ARM kept its own stage-2
MMU we'd have to solve the same problem there.

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2023-01-19 19:04         ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-19 19:04 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Paolo Bonzini, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, Jan 19, 2023 at 10:38 AM David Matlack <dmatlack@google.com> wrote:
>
> On Thu, Jan 19, 2023 at 9:24 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Thu, 19 Jan 2023 17:14:34 +0000, David Matlack <dmatlack@google.com> wrote:
> > > I'd like to target RISC-V first, since it has significantly lower risk
> > > and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
> > > [soon] nested virtualization to deal with).
> >
> > And (joy, happiness), the upcoming 128bit page table support[1].
>
> Oh good, I was worried the ARM port was going to be too easy :)

But in all seriousness, I'm not too worried about supporting 128-bit
page tables in the Common MMU, assuming it is a compile-time decision.
The way I'm planning to organize the code, architecture-specific code
will own the PTEs, so each architecture can do whatever they want.
There is a hard-coded assumption that PTEs are u64 in the current
code, but we can abstract that behind a typedef for 128-bit support.

We will need to figure out how to deal with concurrency though. Will
128-bit page table support come with 128-bit atomic support (e.g.
compare-exchange)? If so we should be good to go. If not, we'll need
to emulate them with e.g. spinlocks. But either way, figuring this out
is not specific to the Common MMU. Even if ARM kept its own stage-2
MMU we'd have to solve the same problem there.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 317+ messages in thread

* Re: [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code
@ 2023-01-19 19:04         ` David Matlack
  0 siblings, 0 replies; 317+ messages in thread
From: David Matlack @ 2023-01-19 19:04 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Paolo Bonzini, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Huacai Chen, Aleksandar Markovic, Anup Patel,
	Atish Patra, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Sean Christopherson, Andrew Morton, Anshuman Khandual,
	Nadav Amit, Matthew Wilcox (Oracle),
	Vlastimil Babka, Liam R. Howlett, Suren Baghdasaryan, Peter Xu,
	xu xin, Arnd Bergmann, Yu Zhao, Colin Cross, Hugh Dickins,
	Ben Gardon, Mingwei Zhang, Krish Sadhukhan, Ricardo Koller,
	Jing Zhang, linux-arm-kernel, kvmarm, kvmarm, linux-mips, kvm,
	kvm-riscv, linux-riscv, Alexandre Ghiti, Raghavendra Rao Ananta

On Thu, Jan 19, 2023 at 10:38 AM David Matlack <dmatlack@google.com> wrote:
>
> On Thu, Jan 19, 2023 at 9:24 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Thu, 19 Jan 2023 17:14:34 +0000, David Matlack <dmatlack@google.com> wrote:
> > > I'd like to target RISC-V first, since it has significantly lower risk
> > > and complexity than e.g. ARM (which has pKVM, stage-1 walkers, and
> > > [soon] nested virtualization to deal with).
> >
> > And (joy, happiness), the upcoming 128bit page table support[1].
>
> Oh good, I was worried the ARM port was going to be too easy :)

But in all seriousness, I'm not too worried about supporting 128-bit
page tables in the Common MMU, assuming it is a compile-time decision.
The way I'm planning to organize the code, architecture-specific code
will own the PTEs, so each architecture can do whatever they want.
There is a hard-coded assumption that PTEs are u64 in the current
code, but we can abstract that behind a typedef for 128-bit support.

We will need to figure out how to deal with concurrency though. Will
128-bit page table support come with 128-bit atomic support (e.g.
compare-exchange)? If so we should be good to go. If not, we'll need
to emulate them with e.g. spinlocks. But either way, figuring this out
is not specific to the Common MMU. Even if ARM kept its own stage-2
MMU we'd have to solve the same problem there.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 317+ messages in thread

end of thread, other threads:[~2023-01-20  4:28 UTC | newest]

Thread overview: 317+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-08 19:38 [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into common code David Matlack
2022-12-08 19:38 ` David Matlack
2022-12-08 19:38 ` David Matlack
2022-12-08 19:38 ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 01/37] KVM: x86/mmu: Store the address space ID directly in kvm_mmu_page_role David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-09  2:37   ` Yang, Weijiang
2022-12-09  2:37     ` Yang, Weijiang
2022-12-09  2:37     ` Yang, Weijiang
2022-12-09  2:37     ` Yang, Weijiang
2022-12-09 17:24     ` Oliver Upton
2022-12-09 17:24       ` Oliver Upton
2022-12-09 17:24       ` Oliver Upton
2022-12-09 17:24       ` Oliver Upton
2022-12-09 17:40       ` David Matlack
2022-12-09 17:40         ` David Matlack
2022-12-09 17:40         ` David Matlack
2022-12-09 17:40         ` David Matlack
2022-12-12 17:39         ` Sean Christopherson
2022-12-12 17:39           ` Sean Christopherson
2022-12-12 17:39           ` Sean Christopherson
2022-12-12 17:39           ` Sean Christopherson
2022-12-12 18:17           ` Oliver Upton
2022-12-12 18:17             ` Oliver Upton
2022-12-12 18:17             ` Oliver Upton
2022-12-12 18:17             ` Oliver Upton
2022-12-13  1:11             ` David Matlack
2022-12-13  1:11               ` David Matlack
2022-12-13  1:11               ` David Matlack
2022-12-13  1:11               ` David Matlack
2022-12-12 22:50           ` Paolo Bonzini
2022-12-12 22:50             ` Paolo Bonzini
2022-12-12 22:50             ` Paolo Bonzini
2022-12-12 22:50             ` Paolo Bonzini
2022-12-13  1:18             ` David Matlack
2022-12-13  1:18               ` David Matlack
2022-12-13  1:18               ` David Matlack
2022-12-13  1:18               ` David Matlack
2022-12-13  1:42             ` Sean Christopherson
2022-12-13  1:42               ` Sean Christopherson
2022-12-13  1:42               ` Sean Christopherson
2022-12-13  1:42               ` Sean Christopherson
2022-12-14  9:50           ` Lai Jiangshan
2022-12-14  9:50             ` Lai Jiangshan
2022-12-14  9:50             ` Lai Jiangshan
2022-12-14  9:50             ` Lai Jiangshan
2022-12-14 19:42             ` Sean Christopherson
2022-12-14 19:42               ` Sean Christopherson
2022-12-14 19:42               ` Sean Christopherson
2022-12-14 19:42               ` Sean Christopherson
2022-12-15  7:20               ` Lai Jiangshan
2022-12-15  7:20                 ` Lai Jiangshan
2022-12-15  7:20                 ` Lai Jiangshan
2022-12-15  7:20                 ` Lai Jiangshan
2022-12-08 19:38 ` [RFC PATCH 02/37] KVM: MMU: Move struct kvm_mmu_page_role into common code David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-12 17:48   ` Ben Gardon
2022-12-12 17:48     ` Ben Gardon
2022-12-12 17:48     ` Ben Gardon
2022-12-12 17:48     ` Ben Gardon
2022-12-12 23:11   ` Paolo Bonzini
2022-12-12 23:11     ` Paolo Bonzini
2022-12-12 23:11     ` Paolo Bonzini
2022-12-12 23:11     ` Paolo Bonzini
2022-12-13  1:06     ` David Matlack
2022-12-13  1:06       ` David Matlack
2022-12-13  1:06       ` David Matlack
2022-12-13  1:06       ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 03/37] KVM: MMU: Move tdp_ptep_t " David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 04/37] KVM: x86/mmu: Invert sp->tdp_mmu_page to sp->shadow_mmu_page David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-12 23:15   ` Paolo Bonzini
2022-12-12 23:15     ` Paolo Bonzini
2022-12-12 23:15     ` Paolo Bonzini
2022-12-12 23:15     ` Paolo Bonzini
2023-01-11 22:45     ` David Matlack
2023-01-11 22:45       ` David Matlack
2023-01-11 22:45       ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 05/37] KVM: x86/mmu: Unify TDP MMU and Shadow MMU root refcounts David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 06/37] KVM: MMU: Move struct kvm_mmu_page to common code David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-12 18:07   ` Ben Gardon
2022-12-12 18:07     ` Ben Gardon
2022-12-12 18:07     ` Ben Gardon
2022-12-12 18:07     ` Ben Gardon
2022-12-12 22:32   ` Paolo Bonzini
2022-12-12 22:32     ` Paolo Bonzini
2022-12-12 22:32     ` Paolo Bonzini
2022-12-12 22:32     ` Paolo Bonzini
2022-12-12 22:49     ` David Matlack
2022-12-12 22:49       ` David Matlack
2022-12-12 22:49       ` David Matlack
2022-12-12 22:49       ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 07/37] mm: Introduce architecture-neutral PG_LEVEL macros David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 08/37] KVM: selftests: Stop assuming stats are contiguous in kvm_binary_stats_test David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 09/37] KVM: Move page size stats into common code David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 10/37] KVM: MMU: Move struct kvm_page_fault to " David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-12 18:24   ` Ben Gardon
2022-12-12 18:24     ` Ben Gardon
2022-12-12 18:24     ` Ben Gardon
2022-12-12 18:24     ` Ben Gardon
2022-12-12 22:30     ` David Matlack
2022-12-12 22:30       ` David Matlack
2022-12-12 22:30       ` David Matlack
2022-12-12 22:30       ` David Matlack
2022-12-12 22:27   ` Paolo Bonzini
2022-12-12 22:27     ` Paolo Bonzini
2022-12-12 22:27     ` Paolo Bonzini
2022-12-12 22:27     ` Paolo Bonzini
2023-01-09 18:55     ` David Matlack
2023-01-09 18:55       ` David Matlack
2023-01-09 18:55       ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 11/37] KVM: MMU: Move RET_PF_* into " David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 12/37] KVM: x86/mmu: Use PG_LEVEL_{PTE,PMD,PUD} in the TDP MMU David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` [RFC PATCH 12/37] KVM: x86/mmu: Use PG_LEVEL_{PTE, PMD, PUD} " David Matlack
2022-12-08 19:38 ` [RFC PATCH 13/37] KVM: MMU: Move sptep_to_sp() to common code David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 14/37] KVM: MMU: Introduce common macros for TDP page tables David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 15/37] KVM: x86/mmu: Add a common API for inspecting/modifying TDP PTEs David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 16/37] KVM: x86/mmu: Abstract away TDP MMU root lookup David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 17/37] KVM: Move struct kvm_gfn_range to kvm_types.h David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-12 19:16   ` Ben Gardon
2022-12-12 19:16     ` Ben Gardon
2022-12-12 19:16     ` Ben Gardon
2022-12-12 19:16     ` Ben Gardon
2022-12-08 19:38 ` [RFC PATCH 18/37] KVM: x86/mmu: Add common API for creating TDP PTEs David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 19/37] KVM: x86/mmu: Add arch hooks for NX Huge Pages David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 20/37] KVM: x86/mmu: Abstract away computing the max mapping level David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-12 19:32   ` Ben Gardon
2022-12-12 19:32     ` Ben Gardon
2022-12-12 19:32     ` Ben Gardon
2022-12-12 19:32     ` Ben Gardon
2022-12-12 21:05     ` David Matlack
2022-12-12 21:05       ` David Matlack
2022-12-12 21:05       ` David Matlack
2022-12-12 21:05       ` David Matlack
2022-12-13  1:02       ` Sean Christopherson
2022-12-13  1:02         ` Sean Christopherson
2022-12-13  1:02         ` Sean Christopherson
2022-12-13  1:02         ` Sean Christopherson
2022-12-08 19:38 ` [RFC PATCH 21/37] KVM: Introduce CONFIG_HAVE_TDP_MMU David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 22/37] KVM: x86: Select HAVE_TDP_MMU if X86_64 David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 23/37] KVM: MMU: Move VM-level TDP MMU state to struct kvm David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-09 17:31   ` Oliver Upton
2022-12-09 17:31     ` Oliver Upton
2022-12-09 17:31     ` Oliver Upton
2022-12-09 17:31     ` Oliver Upton
2022-12-09 17:57     ` David Matlack
2022-12-09 17:57       ` David Matlack
2022-12-09 17:57       ` David Matlack
2022-12-09 17:57       ` David Matlack
2022-12-09 18:30       ` Oliver Upton
2022-12-09 18:30         ` Oliver Upton
2022-12-09 18:30         ` Oliver Upton
2022-12-09 18:30         ` Oliver Upton
2022-12-08 19:38 ` [RFC PATCH 24/37] KVM: x86/mmu: Move kvm_mmu_hugepage_adjust() up to fault handler David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 25/37] KVM: x86/mmu: Pass root role to kvm_tdp_mmu_get_vcpu_root_hpa() David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 26/37] KVM: Move page table cache to struct kvm_vcpu David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 27/37] KVM: MMU: Move mmu_page_header_cache to common code David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 28/37] KVM: MMU: Stub out tracepoints on non-x86 architectures David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 29/37] KVM: x86/mmu: Collapse kvm_flush_remote_tlbs_with_{range,address}() together David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` [RFC PATCH 29/37] KVM: x86/mmu: Collapse kvm_flush_remote_tlbs_with_{range, address}() together David Matlack
2022-12-08 19:38 ` [RFC PATCH 30/37] KVM: x86/mmu: Rename kvm_flush_remote_tlbs_with_address() David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 31/37] KVM: x86/MMU: Use gfn_t in kvm_flush_remote_tlbs_range() David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 32/37] KVM: Allow range-based TLB invalidation from common code David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 33/37] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to " David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-12 22:03   ` Ben Gardon
2022-12-12 22:03     ` Ben Gardon
2022-12-12 22:03     ` Ben Gardon
2022-12-12 22:03     ` Ben Gardon
2022-12-12 22:42     ` David Matlack
2022-12-12 22:42       ` David Matlack
2022-12-12 22:42       ` David Matlack
2022-12-12 22:42       ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 34/37] KVM: MMU: Move the TDP iterator " David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 35/37] KVM: x86/mmu: Move tdp_mmu_max_gfn_exclusive() to tdp_pgtable.c David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 36/37] KVM: x86/mmu: Move is_tdp_mmu_page() to mmu_internal.h David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38 ` [RFC PATCH 37/37] KVM: MMU: Move the TDP MMU to common code David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-08 19:38   ` David Matlack
2022-12-09 19:07 ` [RFC PATCH 00/37] KVM: Refactor the KVM/x86 TDP MMU into " Oliver Upton
2022-12-09 19:07   ` Oliver Upton
2022-12-09 19:07   ` Oliver Upton
2022-12-09 19:07   ` Oliver Upton
2022-12-10  1:07   ` David Matlack
2022-12-10  1:07     ` David Matlack
2022-12-10  1:07     ` David Matlack
2022-12-10  1:07     ` David Matlack
2022-12-12 22:54   ` Paolo Bonzini
2022-12-12 22:54     ` Paolo Bonzini
2022-12-12 22:54     ` Paolo Bonzini
2022-12-12 22:54     ` Paolo Bonzini
2022-12-12 23:26     ` Sean Christopherson
2022-12-12 23:26       ` Sean Christopherson
2022-12-12 23:26       ` Sean Christopherson
2022-12-12 23:26       ` Sean Christopherson
2022-12-12 23:43       ` Paolo Bonzini
2022-12-12 23:43         ` Paolo Bonzini
2022-12-12 23:43         ` Paolo Bonzini
2022-12-12 23:43         ` Paolo Bonzini
2023-01-19 17:14 ` David Matlack
2023-01-19 17:14   ` David Matlack
2023-01-19 17:14   ` David Matlack
2023-01-19 17:23   ` Paolo Bonzini
2023-01-19 17:23     ` Paolo Bonzini
2023-01-19 17:23     ` Paolo Bonzini
2023-01-19 17:24   ` Marc Zyngier
2023-01-19 17:24     ` Marc Zyngier
2023-01-19 17:24     ` Marc Zyngier
2023-01-19 18:38     ` David Matlack
2023-01-19 18:38       ` David Matlack
2023-01-19 18:38       ` David Matlack
2023-01-19 19:04       ` David Matlack
2023-01-19 19:04         ` David Matlack
2023-01-19 19:04         ` David Matlack

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.