linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/9] KVM:x86/mmu:Introduce parallel memory virtualization to boost performance
@ 2020-08-05 14:12 Yulei Zhang
  2020-08-06  0:22 ` Wanpeng Li
  2020-08-06 17:03 ` Ben Gardon
  0 siblings, 2 replies; 4+ messages in thread
From: Yulei Zhang @ 2020-08-05 14:12 UTC (permalink / raw)
  To: pbonzini
  Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, vkuznets,
	xiaoguangrong.eric, kernellwp, lihaiwei.kernel, Yulei Zhang

From: Yulei Zhang <yuleixzhang@tencent.com>

Currently in KVM memory virtulization we relay on mmu_lock to synchronize
the memory mapping update, which make vCPUs work in serialize mode and
slow down the execution, especially after migration to do substantial
memory mapping setup, and performance get worse if increase vCPU numbers
and guest memories.
  
The idea we present in this patch set is to mitigate the issue with
pre-constructed memory mapping table. We will fast pin the guest memory
to build up a global memory mapping table according to the guest memslots
changes and apply it to cr3, so that after guest starts up all the vCPUs
would be able to update the memory concurrently, thus the performance 
improvement is expected.

And after test the initial patch with memory dirty pattern workload, we
have seen positive results even with huge page enabled. For example,
guest with 32 vCPUs and 64G memories, in 2M/1G huge page mode we would get
more than 50% improvement. 


Yulei Zhang (9):
  Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
    support
  Introduce page table population function for direct build EPT feature
  Introduce page table remove function for direct build EPT feature
  Add release function for direct build ept when guest VM exit
  Modify the page fault path to meet the direct build EPT requirement
  Apply the direct build EPT according to the memory slots change
  Add migration support when using direct build EPT
  Introduce kvm module parameter global_tdp to turn on the direct build
    EPT mode
  Handle certain mmu exposed functions properly while turn on direct
    build EPT mode

 arch/mips/kvm/mips.c            |  13 +
 arch/powerpc/kvm/powerpc.c      |  13 +
 arch/s390/kvm/kvm-s390.c        |  13 +
 arch/x86/include/asm/kvm_host.h |  13 +-
 arch/x86/kvm/mmu/mmu.c          | 537 ++++++++++++++++++++++++++++++--
 arch/x86/kvm/svm/svm.c          |   2 +-
 arch/x86/kvm/vmx/vmx.c          |  17 +-
 arch/x86/kvm/x86.c              |  55 ++--
 include/linux/kvm_host.h        |   7 +-
 virt/kvm/kvm_main.c             |  43 ++-
 10 files changed, 648 insertions(+), 65 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC 0/9] KVM:x86/mmu:Introduce parallel memory virtualization to boost performance
  2020-08-05 14:12 [RFC 0/9] KVM:x86/mmu:Introduce parallel memory virtualization to boost performance Yulei Zhang
@ 2020-08-06  0:22 ` Wanpeng Li
  2020-08-06 17:03 ` Ben Gardon
  1 sibling, 0 replies; 4+ messages in thread
From: Wanpeng Li @ 2020-08-06  0:22 UTC (permalink / raw)
  To: Yulei Zhang
  Cc: Paolo Bonzini, kvm, LKML, Sean Christopherson, Jim Mattson,
	Vitaly Kuznetsov, Xiao Guangrong, Haiwei Li, Yulei Zhang,
	Junaid Shahid, Ben Gardon

Also Cc Junaid Shahid, Ben Gardon
On Wed, 5 Aug 2020 at 22:11, Yulei Zhang <yulei.kernel@gmail.com> wrote:
>
> From: Yulei Zhang <yuleixzhang@tencent.com>
>
> Currently in KVM memory virtulization we relay on mmu_lock to synchronize
> the memory mapping update, which make vCPUs work in serialize mode and
> slow down the execution, especially after migration to do substantial
> memory mapping setup, and performance get worse if increase vCPU numbers
> and guest memories.
>
> The idea we present in this patch set is to mitigate the issue with
> pre-constructed memory mapping table. We will fast pin the guest memory
> to build up a global memory mapping table according to the guest memslots
> changes and apply it to cr3, so that after guest starts up all the vCPUs
> would be able to update the memory concurrently, thus the performance
> improvement is expected.
>
> And after test the initial patch with memory dirty pattern workload, we
> have seen positive results even with huge page enabled. For example,
> guest with 32 vCPUs and 64G memories, in 2M/1G huge page mode we would get
> more than 50% improvement.
>
>
> Yulei Zhang (9):
>   Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
>     support
>   Introduce page table population function for direct build EPT feature
>   Introduce page table remove function for direct build EPT feature
>   Add release function for direct build ept when guest VM exit
>   Modify the page fault path to meet the direct build EPT requirement
>   Apply the direct build EPT according to the memory slots change
>   Add migration support when using direct build EPT
>   Introduce kvm module parameter global_tdp to turn on the direct build
>     EPT mode
>   Handle certain mmu exposed functions properly while turn on direct
>     build EPT mode
>
>  arch/mips/kvm/mips.c            |  13 +
>  arch/powerpc/kvm/powerpc.c      |  13 +
>  arch/s390/kvm/kvm-s390.c        |  13 +
>  arch/x86/include/asm/kvm_host.h |  13 +-
>  arch/x86/kvm/mmu/mmu.c          | 537 ++++++++++++++++++++++++++++++--
>  arch/x86/kvm/svm/svm.c          |   2 +-
>  arch/x86/kvm/vmx/vmx.c          |  17 +-
>  arch/x86/kvm/x86.c              |  55 ++--
>  include/linux/kvm_host.h        |   7 +-
>  virt/kvm/kvm_main.c             |  43 ++-
>  10 files changed, 648 insertions(+), 65 deletions(-)
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC 0/9] KVM:x86/mmu:Introduce parallel memory virtualization to boost performance
  2020-08-05 14:12 [RFC 0/9] KVM:x86/mmu:Introduce parallel memory virtualization to boost performance Yulei Zhang
  2020-08-06  0:22 ` Wanpeng Li
@ 2020-08-06 17:03 ` Ben Gardon
  2020-08-07  9:03   ` yulei zhang
  1 sibling, 1 reply; 4+ messages in thread
From: Ben Gardon @ 2020-08-06 17:03 UTC (permalink / raw)
  To: Yulei Zhang
  Cc: Paolo Bonzini, kvm, linux-kernel, Sean Christopherson,
	Jim Mattson, Vitaly Kuznetsov, xiaoguangrong.eric, kernellwp,
	lihaiwei.kernel, Yulei Zhang, Junaid Shahid

On Wed, Aug 5, 2020 at 9:53 AM Yulei Zhang <yulei.kernel@gmail.com> wrote:
>
> From: Yulei Zhang <yuleixzhang@tencent.com>
>
> Currently in KVM memory virtulization we relay on mmu_lock to synchronize
> the memory mapping update, which make vCPUs work in serialize mode and
> slow down the execution, especially after migration to do substantial
> memory mapping setup, and performance get worse if increase vCPU numbers
> and guest memories.
>
> The idea we present in this patch set is to mitigate the issue with
> pre-constructed memory mapping table. We will fast pin the guest memory
> to build up a global memory mapping table according to the guest memslots
> changes and apply it to cr3, so that after guest starts up all the vCPUs
> would be able to update the memory concurrently, thus the performance
> improvement is expected.

Is a re-implementation of the various MMU functions in this series
necessary to pre-populate the EPT/NPT? I realize the approach you took
is probably the fastest way to pre-populate an EPT, but it seems like
similar pre-population could be achieved with some changes to the PF
handler's prefault scheme or, from user space by adding a dummy vCPU
to touch memory before loading the actual guest image.

I think this series is taking a similar approach to the direct MMU RFC
I sent out a little less than a year ago. (I will send another version
of that series in the next month.) I'm not sure this level of
complexity is worth it if you're only interested in EPT pre-population.
Is pre-population your goal? You mention "parallel memory
virtualization," does that refer to parallel page fault handling you
intend to implement in a future series?

There are a number of features I see you've chosen to leave behind in
this series which might work for your use case, but I think they're
necessary. These include handling vCPUs with different roles (SMM, VMX
non root mode, etc.), MMU notifiers (which I realize matter less for
pinned memory), demand paging through UFFD, fast EPT
invalidation/teardown and others.

>
> And after test the initial patch with memory dirty pattern workload, we
> have seen positive results even with huge page enabled. For example,
> guest with 32 vCPUs and 64G memories, in 2M/1G huge page mode we would get
> more than 50% improvement.
>
>
> Yulei Zhang (9):
>   Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
>     support
>   Introduce page table population function for direct build EPT feature
>   Introduce page table remove function for direct build EPT feature
>   Add release function for direct build ept when guest VM exit
>   Modify the page fault path to meet the direct build EPT requirement
>   Apply the direct build EPT according to the memory slots change
>   Add migration support when using direct build EPT
>   Introduce kvm module parameter global_tdp to turn on the direct build
>     EPT mode
>   Handle certain mmu exposed functions properly while turn on direct
>     build EPT mode
>
>  arch/mips/kvm/mips.c            |  13 +
>  arch/powerpc/kvm/powerpc.c      |  13 +
>  arch/s390/kvm/kvm-s390.c        |  13 +
>  arch/x86/include/asm/kvm_host.h |  13 +-
>  arch/x86/kvm/mmu/mmu.c          | 537 ++++++++++++++++++++++++++++++--
>  arch/x86/kvm/svm/svm.c          |   2 +-
>  arch/x86/kvm/vmx/vmx.c          |  17 +-
>  arch/x86/kvm/x86.c              |  55 ++--
>  include/linux/kvm_host.h        |   7 +-
>  virt/kvm/kvm_main.c             |  43 ++-
>  10 files changed, 648 insertions(+), 65 deletions(-)
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC 0/9] KVM:x86/mmu:Introduce parallel memory virtualization to boost performance
  2020-08-06 17:03 ` Ben Gardon
@ 2020-08-07  9:03   ` yulei zhang
  0 siblings, 0 replies; 4+ messages in thread
From: yulei zhang @ 2020-08-07  9:03 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Paolo Bonzini, kvm, linux-kernel, Sean Christopherson,
	Jim Mattson, Vitaly Kuznetsov, xiaoguangrong.eric, kernellwp,
	lihaiwei.kernel, Yulei Zhang, Junaid Shahid

On Fri, Aug 7, 2020 at 1:04 AM Ben Gardon <bgardon@google.com> wrote:
>
> On Wed, Aug 5, 2020 at 9:53 AM Yulei Zhang <yulei.kernel@gmail.com> wrote:
> >
> > From: Yulei Zhang <yuleixzhang@tencent.com>
> >
> > Currently in KVM memory virtulization we relay on mmu_lock to synchronize
> > the memory mapping update, which make vCPUs work in serialize mode and
> > slow down the execution, especially after migration to do substantial
> > memory mapping setup, and performance get worse if increase vCPU numbers
> > and guest memories.
> >
> > The idea we present in this patch set is to mitigate the issue with
> > pre-constructed memory mapping table. We will fast pin the guest memory
> > to build up a global memory mapping table according to the guest memslots
> > changes and apply it to cr3, so that after guest starts up all the vCPUs
> > would be able to update the memory concurrently, thus the performance
> > improvement is expected.
>
> Is a re-implementation of the various MMU functions in this series
> necessary to pre-populate the EPT/NPT? I realize the approach you took
> is probably the fastest way to pre-populate an EPT, but it seems like
> similar pre-population could be achieved with some changes to the PF
> handler's prefault scheme or, from user space by adding a dummy vCPU
> to touch memory before loading the actual guest image.
>
> I think this series is taking a similar approach to the direct MMU RFC
> I sent out a little less than a year ago. (I will send another version
> of that series in the next month.) I'm not sure this level of
> complexity is worth it if you're only interested in EPT pre-population.
> Is pre-population your goal? You mention "parallel memory
> virtualization," does that refer to parallel page fault handling you
> intend to implement in a future series?
>
> There are a number of features I see you've chosen to leave behind in
> this series which might work for your use case, but I think they're
> necessary. These include handling vCPUs with different roles (SMM, VMX
> non root mode, etc.), MMU notifiers (which I realize matter less for
> pinned memory), demand paging through UFFD, fast EPT
> invalidation/teardown and others.
>
Thanks for the feedback. I think the target circumstance for this feature is
without memory overcommitment, thus it can fast pin the memory and
setup the GPA->HPA mapping table, and after that we don't expect PF
while vCPUs access the memory. We call it "parallel memory virtualization"
as with pre-populated EPT the vCPUs will be able to update the memory
in parallel mode.
Yes, so far we disable the SMM etc. We are looking forward to gathering
the inputs from your experts and refine the implementation.

> >
> > And after test the initial patch with memory dirty pattern workload, we
> > have seen positive results even with huge page enabled. For example,
> > guest with 32 vCPUs and 64G memories, in 2M/1G huge page mode we would get
> > more than 50% improvement.
> >
> >
> > Yulei Zhang (9):
> >   Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
> >     support
> >   Introduce page table population function for direct build EPT feature
> >   Introduce page table remove function for direct build EPT feature
> >   Add release function for direct build ept when guest VM exit
> >   Modify the page fault path to meet the direct build EPT requirement
> >   Apply the direct build EPT according to the memory slots change
> >   Add migration support when using direct build EPT
> >   Introduce kvm module parameter global_tdp to turn on the direct build
> >     EPT mode
> >   Handle certain mmu exposed functions properly while turn on direct
> >     build EPT mode
> >
> >  arch/mips/kvm/mips.c            |  13 +
> >  arch/powerpc/kvm/powerpc.c      |  13 +
> >  arch/s390/kvm/kvm-s390.c        |  13 +
> >  arch/x86/include/asm/kvm_host.h |  13 +-
> >  arch/x86/kvm/mmu/mmu.c          | 537 ++++++++++++++++++++++++++++++--
> >  arch/x86/kvm/svm/svm.c          |   2 +-
> >  arch/x86/kvm/vmx/vmx.c          |  17 +-
> >  arch/x86/kvm/x86.c              |  55 ++--
> >  include/linux/kvm_host.h        |   7 +-
> >  virt/kvm/kvm_main.c             |  43 ++-
> >  10 files changed, 648 insertions(+), 65 deletions(-)
> >
> > --
> > 2.17.1
> >

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-08-07  9:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-05 14:12 [RFC 0/9] KVM:x86/mmu:Introduce parallel memory virtualization to boost performance Yulei Zhang
2020-08-06  0:22 ` Wanpeng Li
2020-08-06 17:03 ` Ben Gardon
2020-08-07  9:03   ` yulei zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).