All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/22] Introduce the TDP MMU
@ 2020-09-25 21:22 Ben Gardon
  2020-09-25 21:22 ` [PATCH 01/22] kvm: mmu: Separate making SPTEs from set_spte Ben Gardon
                   ` (24 more replies)
  0 siblings, 25 replies; 112+ messages in thread
From: Ben Gardon @ 2020-09-25 21:22 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Cannon Matthews, Paolo Bonzini, Peter Xu, Sean Christopherson,
	Peter Shier, Peter Feiner, Junaid Shahid, Jim Mattson,
	Yulei Zhang, Wanpeng Li, Vitaly Kuznetsov, Xiao Guangrong,
	Ben Gardon

Over the years, the needs for KVM's x86 MMU have grown from running small
guests to live migrating multi-terabyte VMs with hundreds of vCPUs. Where
we previously depended on shadow paging to run all guests, we now have
two dimensional paging (TDP). This patch set introduces a new
implementation of much of the KVM MMU, optimized for running guests with
TDP. We have re-implemented many of the MMU functions to take advantage of
the relative simplicity of TDP and eliminate the need for an rmap.
Building on this simplified implementation, a future patch set will change
the synchronization model for this "TDP MMU" to enable more parallelism
than the monolithic MMU lock. A TDP MMU is currently in use at Google
and has given us the performance necessary to live migrate our 416 vCPU,
12TiB m2-ultramem-416 VMs.

This work was motivated by the need to handle page faults in parallel for
very large VMs. When VMs have hundreds of vCPUs and terabytes of memory,
KVM's MMU lock suffers extreme contention, resulting in soft-lockups and
long latency on guest page faults. This contention can be easily seen
running the KVM selftests demand_paging_test with a couple hundred vCPUs.
Over a 1 second profile of the demand_paging_test, with 416 vCPUs and 4G
per vCPU, 98% of the time was spent waiting for the MMU lock. At Google,
the TDP MMU reduced the test duration by 89% and the execution was
dominated by get_user_pages and the user fault FD ioctl instead of the
MMU lock.

This series is the first of two. In this series we add a basic
implementation of the TDP MMU. In the next series we will improve the
performance of the TDP MMU and allow it to execute MMU operations
in parallel.

The overall purpose of the KVM MMU is to program paging structures
(CR3/EPT/NPT) to encode the mapping of guest addresses to host physical
addresses (HPA), and to provide utilities for other KVM features, for
example dirty logging. The definition of the L1 guest physical address
(GPA) to HPA mapping comes in two parts: KVM's memslots map GPA to HVA,
and the kernel MM/x86 host page tables map HVA -> HPA. Without TDP, the
MMU must program the x86 page tables to encode the full translation of
guest virtual addresses (GVA) to HPA. This requires "shadowing" the
guest's page tables to create a composite x86 paging structure. This
solution is complicated, requires separate paging structures for each
guest CR3, and requires emulating guest page table changes. The TDP case
is much simpler. In this case, KVM lets the guest control CR3 and programs
the EPT/NPT paging structures with the GPA -> HPA mapping. The guest has
no way to change this mapping and only one version of the paging structure
is needed per L1 paging mode. In this case the paging mode is some
combination of the number of levels in the paging structure, the address
space (normal execution or system management mode, on x86), and other
attributes. Most VMs only ever use 1 paging mode and so only ever need one
TDP structure.

This series implements a "TDP MMU" through alternative implementations of
MMU functions for running L1 guests with TDP. The TDP MMU falls back to
the existing shadow paging implementation when TDP is not available, and
interoperates with the existing shadow paging implementation for nesting.
The use of the TDP MMU can be controlled by a module parameter which is
snapshot on VM creation and follows the life of the VM. This snapshot
is used in many functions to decide whether or not to use TDP MMU handlers
for a given operation.

This series can also be viewed in Gerrit here:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
(Thanks to Dmitry Vyukov <dvyukov@google.com> for setting up the
Gerrit instance)

Ben Gardon (22):
  kvm: mmu: Separate making SPTEs from set_spte
  kvm: mmu: Introduce tdp_iter
  kvm: mmu: Init / Uninit the TDP MMU
  kvm: mmu: Allocate and free TDP MMU roots
  kvm: mmu: Add functions to handle changed TDP SPTEs
  kvm: mmu: Make address space ID a property of memslots
  kvm: mmu: Support zapping SPTEs in the TDP MMU
  kvm: mmu: Separate making non-leaf sptes from link_shadow_page
  kvm: mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg
  kvm: mmu: Add TDP MMU PF handler
  kvm: mmu: Factor out allocating a new tdp_mmu_page
  kvm: mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU
  kvm: mmu: Support invalidate range MMU notifier for TDP MMU
  kvm: mmu: Add access tracking for tdp_mmu
  kvm: mmu: Support changed pte notifier in tdp MMU
  kvm: mmu: Add dirty logging handler for changed sptes
  kvm: mmu: Support dirty logging for the TDP MMU
  kvm: mmu: Support disabling dirty logging for the tdp MMU
  kvm: mmu: Support write protection for nesting in tdp MMU
  kvm: mmu: NX largepage recovery for TDP MMU
  kvm: mmu: Support MMIO in the TDP MMU
  kvm: mmu: Don't clear write flooding count for direct roots

 arch/x86/include/asm/kvm_host.h |   17 +
 arch/x86/kvm/Makefile           |    3 +-
 arch/x86/kvm/mmu/mmu.c          |  437 ++++++----
 arch/x86/kvm/mmu/mmu_internal.h |   98 +++
 arch/x86/kvm/mmu/paging_tmpl.h  |    3 +-
 arch/x86/kvm/mmu/tdp_iter.c     |  198 +++++
 arch/x86/kvm/mmu/tdp_iter.h     |   55 ++
 arch/x86/kvm/mmu/tdp_mmu.c      | 1315 +++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu/tdp_mmu.h      |   52 ++
 include/linux/kvm_host.h        |    2 +
 virt/kvm/kvm_main.c             |    7 +-
 11 files changed, 2022 insertions(+), 165 deletions(-)
 create mode 100644 arch/x86/kvm/mmu/tdp_iter.c
 create mode 100644 arch/x86/kvm/mmu/tdp_iter.h
 create mode 100644 arch/x86/kvm/mmu/tdp_mmu.c
 create mode 100644 arch/x86/kvm/mmu/tdp_mmu.h

-- 
2.28.0.709.gb0816b6eb0-goog


^ permalink raw reply	[flat|nested] 112+ messages in thread
* Re: [PATCH 21/22] kvm: mmu: Support MMIO in the TDP MMU
@ 2020-10-08  1:32 kernel test robot
  0 siblings, 0 replies; 112+ messages in thread
From: kernel test robot @ 2020-10-08  1:32 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 6518 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20200925212302.3979661-22-bgardon@google.com>
References: <20200925212302.3979661-22-bgardon@google.com>
TO: Ben Gardon <bgardon@google.com>

Hi Ben,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.9-rc8 next-20201007]
[cannot apply to kvm/linux-next linux/master vhost/linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ben-Gardon/Introduce-the-TDP-MMU/20200926-052649
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 6d28cf7dfede6cfca5119a0d415a6a447c68f3a0
:::::: branch date: 12 days ago
:::::: commit date: 12 days ago
config: x86_64-randconfig-m001-20201008 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
arch/x86/kvm/mmu/mmu.c:3995 get_mmio_spte() error: uninitialized symbol 'root'.

vim +/root +3995 arch/x86/kvm/mmu/mmu.c

ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3972  
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3973  /* return true if reserved bit is detected on spte. */
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3974  static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3975  {
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3976  	u64 sptes[PT64_ROOT_MAX_LEVEL];
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3977  	struct rsvd_bits_validate *rsvd_check;
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3978  	int root;
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3979  	int leaf;
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3980  	int level;
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3981  	bool reserved = false;
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3982  
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3983  	if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) {
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3984  		*sptep = 0ull;
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3985  		return reserved;
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3986  	}
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3987  
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3988  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3989  		leaf = kvm_tdp_mmu_get_walk(vcpu, addr, sptes);
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3990  	else
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3991  		leaf = get_walk(vcpu, addr, sptes);
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3992  
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3993  	rsvd_check = &vcpu->arch.mmu->shadow_zero_check;
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3994  
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25 @3995  	for (level = root; level >= leaf; level--) {
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3996  		if (!is_shadow_present_pte(sptes[level - 1]))
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  3997  			break;
b5c3c1b3c6e95cc arch/x86/kvm/mmu/mmu.c Sean Christopherson 2020-01-09  3998  		/*
b5c3c1b3c6e95cc arch/x86/kvm/mmu/mmu.c Sean Christopherson 2020-01-09  3999  		 * Use a bitwise-OR instead of a logical-OR to aggregate the
b5c3c1b3c6e95cc arch/x86/kvm/mmu/mmu.c Sean Christopherson 2020-01-09  4000  		 * reserved bit and EPT's invalid memtype/XWR checks to avoid
b5c3c1b3c6e95cc arch/x86/kvm/mmu/mmu.c Sean Christopherson 2020-01-09  4001  		 * adding a Jcc in the loop.
b5c3c1b3c6e95cc arch/x86/kvm/mmu/mmu.c Sean Christopherson 2020-01-09  4002  		 */
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  4003  		reserved |= __is_bad_mt_xwr(rsvd_check, sptes[level - 1]) |
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  4004  			    __is_rsvd_bits_set(rsvd_check, sptes[level - 1],
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  4005  					       level);
47ab8751695f71d arch/x86/kvm/mmu.c     Xiao Guangrong      2015-08-05  4006  	}
47ab8751695f71d arch/x86/kvm/mmu.c     Xiao Guangrong      2015-08-05  4007  
47ab8751695f71d arch/x86/kvm/mmu.c     Xiao Guangrong      2015-08-05  4008  	if (reserved) {
47ab8751695f71d arch/x86/kvm/mmu.c     Xiao Guangrong      2015-08-05  4009  		pr_err("%s: detect reserved bits on spte, addr 0x%llx, dump hierarchy:\n",
47ab8751695f71d arch/x86/kvm/mmu.c     Xiao Guangrong      2015-08-05  4010  		       __func__, addr);
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  4011  		for (level = root; level >= leaf; level--)
47ab8751695f71d arch/x86/kvm/mmu.c     Xiao Guangrong      2015-08-05  4012  			pr_err("------ spte 0x%llx level %d.\n",
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  4013  			       sptes[level - 1], level);
47ab8751695f71d arch/x86/kvm/mmu.c     Xiao Guangrong      2015-08-05  4014  	}
ddce6208217c1aa arch/x86/kvm/mmu/mmu.c Sean Christopherson 2019-12-06  4015  
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  4016  	*sptep = sptes[leaf - 1];
ea7e7c74afc3303 arch/x86/kvm/mmu/mmu.c Ben Gardon          2020-09-25  4017  
47ab8751695f71d arch/x86/kvm/mmu.c     Xiao Guangrong      2015-08-05  4018  	return reserved;
ce88decffd17bf9 arch/x86/kvm/mmu.c     Xiao Guangrong      2011-07-12  4019  }
ce88decffd17bf9 arch/x86/kvm/mmu.c     Xiao Guangrong      2011-07-12  4020  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 41414 bytes --]

^ permalink raw reply	[flat|nested] 112+ messages in thread

end of thread, other threads:[~2020-10-12 23:59 UTC | newest]

Thread overview: 112+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25 21:22 [PATCH 00/22] Introduce the TDP MMU Ben Gardon
2020-09-25 21:22 ` [PATCH 01/22] kvm: mmu: Separate making SPTEs from set_spte Ben Gardon
2020-09-30  4:55   ` Sean Christopherson
2020-09-30 23:03     ` Ben Gardon
2020-09-25 21:22 ` [PATCH 02/22] kvm: mmu: Introduce tdp_iter Ben Gardon
2020-09-26  0:04   ` Paolo Bonzini
2020-09-30  5:06     ` Sean Christopherson
2020-09-26  0:54   ` Paolo Bonzini
2020-09-30  5:08   ` Sean Christopherson
2020-09-30  5:24   ` Sean Christopherson
2020-09-30  6:24     ` Paolo Bonzini
2020-09-30 23:20   ` Eric van Tassell
2020-09-30 23:34     ` Paolo Bonzini
2020-10-01  0:07       ` Sean Christopherson
2020-09-25 21:22 ` [PATCH 03/22] kvm: mmu: Init / Uninit the TDP MMU Ben Gardon
2020-09-26  0:06   ` Paolo Bonzini
2020-09-30  5:34   ` Sean Christopherson
2020-09-30 18:36     ` Ben Gardon
2020-09-30 16:57   ` Sean Christopherson
2020-09-30 17:39     ` Paolo Bonzini
2020-09-30 18:42       ` Ben Gardon
2020-09-25 21:22 ` [PATCH 04/22] kvm: mmu: Allocate and free TDP MMU roots Ben Gardon
2020-09-30  6:06   ` Sean Christopherson
2020-09-30  6:26     ` Paolo Bonzini
2020-09-30 15:38       ` Sean Christopherson
2020-10-12 22:59     ` Ben Gardon
2020-10-12 23:59       ` Sean Christopherson
2020-09-25 21:22 ` [PATCH 05/22] kvm: mmu: Add functions to handle changed TDP SPTEs Ben Gardon
2020-09-26  0:39   ` Paolo Bonzini
2020-09-28 17:23     ` Paolo Bonzini
2020-09-25 21:22 ` [PATCH 06/22] kvm: mmu: Make address space ID a property of memslots Ben Gardon
2020-09-30  6:10   ` Sean Christopherson
2020-09-30 23:11     ` Ben Gardon
2020-09-25 21:22 ` [PATCH 07/22] kvm: mmu: Support zapping SPTEs in the TDP MMU Ben Gardon
2020-09-26  0:14   ` Paolo Bonzini
2020-09-30  6:15   ` Sean Christopherson
2020-09-30  6:28     ` Paolo Bonzini
2020-09-25 21:22 ` [PATCH 08/22] kvm: mmu: Separate making non-leaf sptes from link_shadow_page Ben Gardon
2020-09-25 21:22 ` [PATCH 09/22] kvm: mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg Ben Gardon
2020-09-30 16:19   ` Sean Christopherson
2020-09-25 21:22 ` [PATCH 10/22] kvm: mmu: Add TDP MMU PF handler Ben Gardon
2020-09-26  0:24   ` Paolo Bonzini
2020-09-30 16:37   ` Sean Christopherson
2020-09-30 16:55     ` Paolo Bonzini
2020-09-30 17:37     ` Paolo Bonzini
2020-10-06 22:35       ` Ben Gardon
2020-10-06 22:33     ` Ben Gardon
2020-10-07 20:55       ` Sean Christopherson
2020-09-25 21:22 ` [PATCH 11/22] kvm: mmu: Factor out allocating a new tdp_mmu_page Ben Gardon
2020-09-26  0:22   ` Paolo Bonzini
2020-09-30 18:53     ` Ben Gardon
2020-09-25 21:22 ` [PATCH 12/22] kvm: mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU Ben Gardon
2020-09-25 21:22 ` [PATCH 13/22] kvm: mmu: Support invalidate range MMU notifier for " Ben Gardon
2020-09-30 17:03   ` Sean Christopherson
2020-09-30 23:15     ` Ben Gardon
2020-09-30 23:24       ` Sean Christopherson
2020-09-30 23:27         ` Ben Gardon
2020-09-25 21:22 ` [PATCH 14/22] kvm: mmu: Add access tracking for tdp_mmu Ben Gardon
2020-09-26  0:32   ` Paolo Bonzini
2020-09-30 17:48   ` Sean Christopherson
2020-10-06 23:38     ` Ben Gardon
2020-09-25 21:22 ` [PATCH 15/22] kvm: mmu: Support changed pte notifier in tdp MMU Ben Gardon
2020-09-26  0:33   ` Paolo Bonzini
2020-09-28 15:11   ` Paolo Bonzini
2020-10-07 16:53     ` Ben Gardon
2020-10-07 17:18       ` Paolo Bonzini
2020-10-07 17:30         ` Ben Gardon
2020-10-07 17:54           ` Paolo Bonzini
2020-10-09 10:59   ` Dan Carpenter
2020-10-09 10:59     ` Dan Carpenter
2020-09-25 21:22 ` [PATCH 16/22] kvm: mmu: Add dirty logging handler for changed sptes Ben Gardon
2020-09-26  0:45   ` Paolo Bonzini
2020-09-25 21:22 ` [PATCH 17/22] kvm: mmu: Support dirty logging for the TDP MMU Ben Gardon
2020-09-26  1:04   ` Paolo Bonzini
2020-10-08 18:27     ` Ben Gardon
2020-09-29 15:07   ` Paolo Bonzini
2020-09-30 18:04   ` Sean Christopherson
2020-09-30 18:08     ` Paolo Bonzini
2020-09-25 21:22 ` [PATCH 18/22] kvm: mmu: Support disabling dirty logging for the tdp MMU Ben Gardon
2020-09-26  1:09   ` Paolo Bonzini
2020-10-07 16:30     ` Ben Gardon
2020-10-07 17:21       ` Paolo Bonzini
2020-10-07 17:28         ` Ben Gardon
2020-10-07 17:53           ` Paolo Bonzini
2020-09-25 21:22 ` [PATCH 19/22] kvm: mmu: Support write protection for nesting in " Ben Gardon
2020-09-30 18:06   ` Sean Christopherson
2020-09-25 21:23 ` [PATCH 20/22] kvm: mmu: NX largepage recovery for TDP MMU Ben Gardon
2020-09-26  1:14   ` Paolo Bonzini
2020-09-30 22:23     ` Ben Gardon
2020-09-29 18:24   ` Paolo Bonzini
2020-09-30 18:15   ` Sean Christopherson
2020-09-30 19:56     ` Paolo Bonzini
2020-09-30 22:33       ` Ben Gardon
2020-09-30 22:27     ` Ben Gardon
2020-10-09 11:03   ` Dan Carpenter
2020-10-09 11:03     ` Dan Carpenter
2020-09-25 21:23 ` [PATCH 21/22] kvm: mmu: Support MMIO in the " Ben Gardon
2020-09-30 18:19   ` Sean Christopherson
2020-10-09 11:43   ` Dan Carpenter
2020-10-09 11:43     ` Dan Carpenter
2020-09-25 21:23 ` [PATCH 22/22] kvm: mmu: Don't clear write flooding count for direct roots Ben Gardon
2020-09-26  1:25   ` Paolo Bonzini
2020-10-05 22:48     ` Ben Gardon
2020-10-05 23:44       ` Sean Christopherson
2020-10-06 16:19         ` Ben Gardon
2020-09-26  1:14 ` [PATCH 00/22] Introduce the TDP MMU Paolo Bonzini
2020-09-28 17:31 ` Paolo Bonzini
2020-09-29 17:40   ` Ben Gardon
2020-09-29 18:10     ` Paolo Bonzini
2020-09-30  6:19 ` Sean Christopherson
2020-09-30  6:30   ` Paolo Bonzini
2020-10-08  1:32 [PATCH 21/22] kvm: mmu: Support MMIO in " kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.