linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/1]  Restore change_pte optimization
@ 2019-02-21  1:22 jglisse
  2019-02-21  1:22 ` [PATCH v2 1/1] kvm/mmu_notifier: re-enable the change_pte() optimization jglisse
  0 siblings, 1 reply; 2+ messages in thread
From: jglisse @ 2019-02-21  1:22 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář, kvm
  Cc: linux-mm, linux-kernel, Jérôme Glisse,
	Andrea Arcangeli, Peter Xu, Andrew Morton

From: Jérôme Glisse <jglisse@redhat.com>

This patch is on top of my patchset to add context information to
mmu notifier [1] you can find a branch with everything [2]. It has
been tested with qemu/KVM building kernel within the guest and also
running a benchmark which the result are given below.

The change_pte() callback is impaired by the range invalidation call-
back within KVM as the range invalidation callback as those do fully
invalidate the secondary mmu. This means that there is a window between
the range_start callback and the change_pte callback where the secondary
mmu for the address is empty. Guest can fault on that address during
that window.

That window can last for some times if the kernel code which is
doing the invalidation is interrupted or if they are other mmu
listener for the process that might sleep within their range_start
callback.

With this patch KVM will ignore the range_start and range_end call-
back and will rely solely on the change_pte callback to update the
secondary mmu. This means that the secondary mmu never have an empty
entry for the address between range_start and range_end and hence
the guest will not have a chance to fault.

This optimization is not valid for all the mmu notifier cases and
thanks to the patchset that add context informations to the mmu
notifier [1] we can now identify within KVM when it is safe to rely
on this optimization.

Roughly it is safe when:
    - going from read only to read and write (same or different pfn)
    - going from read and write to read only same pfn
    - going from read only to read only different pfn

Longer explaination in [1] and [3].

Running ksm02 from ltp gives the following results:

before  mean  {real: 675.460632, user: 857.771423, sys: 215.929657, npages: 4773.066895}
before  stdev {real:  37.035435, user:   4.395942, sys:   3.976172, npages:  675.352783}
after   mean  {real: 672.515503, user: 855.817322, sys: 200.902710, npages: 4899.000000}
after   stdev {real:  37.340954, user:   4.051633, sys:   3.894153, npages:  742.413452}

Roughly 7%-8% less time spent in the kernel. So we are saving few
cycles (this is with KSM enabled on the host and ksm sleep set to
0). Dunno how this translate to real workload.


Note that with the context information further optimization are now
possible within KVM. For instance you can find out if a range is
updated to read only (ie no pfn change just protection change) and
update the secondary mmu accordingly.

You can also identify munmap()/mremap() syscall and only free up the
resources you have allocated for the range (like freeing up secondary
page table for the range or data structure) when it is an munmap or
a mremap. Today my understanding is that kvm_unmap_hva_range() will
free up resources always assuming it is an munmap of some sort. So
for mundane invalidation (like migration, reclaim, mprotect, fork,
...) KVM is freeing up potential mega bytes of structure that it will
have to re-allocate shortly there after (see [4] for WIP example).

Cheers,
Jérôme

[1] https://lkml.org/lkml/2019/2/19/752
[2] https://cgit.freedesktop.org/~glisse/linux/log/?h=mmu-notifier-v05
[3] https://lkml.org/lkml/2019/2/19/754
[4] https://cgit.freedesktop.org/~glisse/linux/log/?h=wip-kvm-mmu-notifier-opti

Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>

Jérôme Glisse (1):
  kvm/mmu_notifier: re-enable the change_pte() optimization.

 virt/kvm/kvm_main.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [PATCH v2 1/1] kvm/mmu_notifier: re-enable the change_pte() optimization.
  2019-02-21  1:22 [PATCH v2 0/1] Restore change_pte optimization jglisse
@ 2019-02-21  1:22 ` jglisse
  0 siblings, 0 replies; 2+ messages in thread
From: jglisse @ 2019-02-21  1:22 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář, kvm
  Cc: linux-mm, linux-kernel, Jérôme Glisse,
	Andrea Arcangeli, Peter Xu, Andrew Morton

From: Jérôme Glisse <jglisse@redhat.com>

Since changes to mmu notifier the change_pte() optimization was lost
for kvm. This re-enable it, when ever a pte is going from read and
write to read only with same pfn, or from read only to read and write
with different pfn.

It is safe to update the secondary MMUs, because the primary MMU
pte invalidate must have already happened with a ptep_clear_flush()
before set_pte_at_notify() is invoked (and thus before change_pte()
callback).

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 virt/kvm/kvm_main.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 629760c0fb95..0f979f02bf1c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -369,6 +369,14 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	int need_tlb_flush = 0, idx;
 	int ret;
 
+	/*
+	 * Nothing to do when using change_pte() which will be call for each
+	 * individual pte update at the right time. See mmu_notifier.h for more
+	 * informations.
+	 */
+	if (mmu_notifier_range_use_change_pte(range))
+		return 0;
+
 	idx = srcu_read_lock(&kvm->srcu);
 	spin_lock(&kvm->mmu_lock);
 	/*
@@ -399,6 +407,14 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
 {
 	struct kvm *kvm = mmu_notifier_to_kvm(mn);
 
+	/*
+	 * Nothing to do when using change_pte() which will be call for each
+	 * individual pte update at the right time. See mmu_notifier.h for more
+	 * informations.
+	 */
+	if (mmu_notifier_range_use_change_pte(range))
+		return;
+
 	spin_lock(&kvm->mmu_lock);
 	/*
 	 * This sequence increase will notify the kvm page fault that
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-02-21  1:22 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-21  1:22 [PATCH v2 0/1] Restore change_pte optimization jglisse
2019-02-21  1:22 ` [PATCH v2 1/1] kvm/mmu_notifier: re-enable the change_pte() optimization jglisse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).