All of lore.kernel.org
 help / color / mirror / Atom feed
From: Raghavendra Rao Ananta <rananta@google.com>
To: Oliver Upton <oupton@google.com>, Marc Zyngier <maz@kernel.org>,
	Ricardo Koller <ricarkol@google.com>,
	Reiji Watanabe <reijiw@google.com>,
	James Morse <james.morse@arm.com>,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Will Deacon <will@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Jing Zhang <jingzhangos@google.com>,
	Colton Lewis <coltonlewis@google.com>,
	Raghavendra Rao Anata <rananta@google.com>,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: [PATCH v2 0/7] KVM: arm64: Add support for FEAT_TLBIRANGE
Date: Mon,  6 Feb 2023 17:23:33 +0000	[thread overview]
Message-ID: <20230206172340.2639971-1-rananta@google.com> (raw)

In certain code paths, KVM/ARM currently invalidates the entire VM's
page-tables instead of just invalidating a necessary range. For example,
when collapsing a table PTE to a block PTE, instead of iterating over
each PTE and flushing them, KVM uses 'vmalls12e1is' TLBI operation to
flush all the entries. This is inefficient since the guest would have
to refill the TLBs again, even for the addresses that aren't covered
by the table entry. The performance impact would scale poorly if many
addresses in the VM is going through this remapping.

For architectures that implement FEAT_TLBIRANGE, KVM can replace such
inefficient paths by performing the invalidations only on the range of
addresses that are in scope. This series tries to achieve the same in
the areas of stage-2 map, unmap and write-protecting the pages.

Patch-1 refactors the core arm64's __flush_tlb_range() to be used by
other entities.

Patch-2 adds a generic range-based TLBI mechanism for KVM.

Patch-3 adds support to flush a range of IPAs for KVM.

Patch-4 implements the kvm_arch_flush_remote_tlbs_range() for arm64.

Patch-5 aims to flush only the memslot that undergoes a write-protect,
instead of the entire VM.

Patch-6 operates on stage2_try_break_pte() to use the range based
TLBI instructions when breaking a table entry. The map path is the
immediate consumer of this when KVM remaps a table entry into a block.

Patch-7 introduces a fast stage-2 unmap path in which, for the right
conditions, instead of traversing each and every PTE and unmapping them,
disconnect the PTE at a higher level (say at level-1 for a 4K pagesize)
and unmap the table entries using free_removed_table(). This would allow
KVM to use the range based TLBI to flush the entire range governed at
that level.

The series is based off of upstream v6.2-rc6, and applied David
Matlack's common API for TLB invalidations[1] on top.

The performance evaluation was done on a hardware that supports
FEAT_TLBIRANGE, on a VHE configuration, using a modified kvm_page_table_test.
The modified version updates the guest code in the ADJUST_MAPPINGS case
to not only access this page but also to access up to 512 pages backwards
for every new page it iterates through. This is done to test the effect
of TLBI misses after KVM has handled a fault.

The series captures the impact in the map and unmap paths as described above.

$ kvm_page_table_test -m 2 -v 128 -s anonymous_hugetlb_2mb -b $i

+--------+------------------------------+------------------------------+
| mem_sz |    ADJUST_MAPPINGS (s)       |      Unmap VM (s)            |
|  (GB)  | Baseline | Baseline + series | Baseline | Baseline + series |
+--------+----------|-------------------+------------------------------+
|   1    |   4.15   |   4.26            | 0.50     | 0.007             |
|   2    |   6.09   |   6.08            | 0.50     | 0.009             |
|   4    |  12.65   |  11.46            | 0.50     | 0.01              |
|   8    |  25.35   |  24.75            | 0.52     | 0.02              |
|  16    |  52.17   |  48.23            | 0.53     | 0.03              |
|  32    | 100.09   |  84.53            | 0.57     | 0.06              |
|  64    | 176.46   | 166.96            | 0.75     | 0.11              |
| 128    | 340.22   | 302.82            | 0.81     | 0.20              |
+--------+----------+-------------------+----------+-------------------+

$ kvm_page_table_test -m 2 -b 128G -s anonymous_hugetlb_2mb -v $i

+--------+------------------------------+
| vCPUs  |    ADJUST_MAPPINGS (s)       |
|        | Baseline | Baseline + series |
+--------+----------|-------------------+
|   1    | 153.91   | 148.75            |
|   2    | 188.17   | 176.11            |
|   4    | 193.15   | 175.77            |
|   8    | 195.60   | 184.92            |
|  16    | 183.49   | 170.22            |
|  32    | 159.37   | 152.70            |
|  64    | 190.15   | 180.45            |
| 128    | 340.22   | 302.82            |   
+--------+----------+-------------------+

For the ADJUST_MAPPINGS cases, which maps back the 4K table entries to
2M hugepages, the series sees an average improvement of ~7%. For unmapping
2M hugepages, we see at least a 4x improvement.

$ kvm_page_table_test -m 2 -b $i

+--------+------------------------------+
| mem_sz |      Unmap VM (s)            |
|  (GB)  | Baseline | Baseline + series |
+--------+------------------------------+
|   1    |  1.03    |  0.58             |
|   2    |  1.57    |  0.72             |
|   4    |  2.65    |  0.98             |
|   8    |  4.77    |  1.54             |
|  16    |  9.06    |  2.57             |
|  32    | 17.60    |  4.41             |
|  64    | 34.72    |  8.92             |
| 128    | 68.92    | 17.70             |   
+--------+----------+-------------------+

The 4x improvement for unmapping also holds true when the guest is
backed by PAGE_SIZE (4K) pages.

v2:
- Rebased the series on top of David Matlack's series for common
  TLB invalidation API[1].
- Implement kvm_arch_flush_remote_tlbs_range() for arm64, by extending
  the support introduced by [1].
- Use kvm_flush_remote_tlbs_memslot() introduced by [1] to flush
  only the current memslot after write-protect.
- Modified the __kvm_tlb_flush_range() macro to accepts 'level' as an
  argument to calculate the 'stride' instead of just using PAGE_SIZE.
- Split the patch that introduces the range-based TLBI to KVM and the
  implementation of IPA-based invalidation into its own patches.
- Dropped the patch that tries to optimize the mmu notifiers paths.
- Rename the function kvm_table_pte_flush() to
  kvm_pgtable_stage2_flush_range(), and accept the range of addresses to
  flush. [Oliver]
- Drop the 'tlb_level' argument for stage2_try_break_pte() and directly
  pass '0' as 'tlb_level' to kvm_pgtable_stage2_flush_range(). [Oliver]

v1: https://lore.kernel.org/all/20230109215347.3119271-1-rananta@google.com/

Thank you.
Raghavendra

[1]: https://lore.kernel.org/linux-arm-kernel/20230126184025.2294823-1-dmatlack@google.com/

Raghavendra Rao Ananta (7):
  arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range
  KVM: arm64: Add FEAT_TLBIRANGE support
  KVM: arm64: Implement  __kvm_tlb_flush_range_vmid_ipa()
  KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
  KVM: arm64: Flush only the memslot after write-protect
  KVM: arm64: Break the table entries using TLBI range instructions
  KVM: arm64: Create a fast stage-2 unmap path

 arch/arm64/include/asm/kvm_asm.h   |  21 ++++++
 arch/arm64/include/asm/kvm_host.h  |   3 +
 arch/arm64/include/asm/tlbflush.h  | 107 +++++++++++++++--------------
 arch/arm64/kvm/hyp/nvhe/hyp-main.c |  12 ++++
 arch/arm64/kvm/hyp/nvhe/tlb.c      |  28 ++++++++
 arch/arm64/kvm/hyp/pgtable.c       |  67 +++++++++++++++++-
 arch/arm64/kvm/hyp/vhe/tlb.c       |  24 +++++++
 arch/arm64/kvm/mmu.c               |  17 ++++-
 8 files changed, 222 insertions(+), 57 deletions(-)

-- 
2.39.1.519.gcb327c4b5f-goog


WARNING: multiple messages have this Message-ID (diff)
From: Raghavendra Rao Ananta <rananta@google.com>
To: Oliver Upton <oupton@google.com>, Marc Zyngier <maz@kernel.org>,
	 Ricardo Koller <ricarkol@google.com>,
	Reiji Watanabe <reijiw@google.com>,
	 James Morse <james.morse@arm.com>,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	 Suzuki K Poulose <suzuki.poulose@arm.com>,
	Will Deacon <will@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	 Jing Zhang <jingzhangos@google.com>,
	Colton Lewis <coltonlewis@google.com>,
	 Raghavendra Rao Anata <rananta@google.com>,
	linux-arm-kernel@lists.infradead.org,  kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: [PATCH v2 0/7] KVM: arm64: Add support for FEAT_TLBIRANGE
Date: Mon,  6 Feb 2023 17:23:33 +0000	[thread overview]
Message-ID: <20230206172340.2639971-1-rananta@google.com> (raw)

In certain code paths, KVM/ARM currently invalidates the entire VM's
page-tables instead of just invalidating a necessary range. For example,
when collapsing a table PTE to a block PTE, instead of iterating over
each PTE and flushing them, KVM uses 'vmalls12e1is' TLBI operation to
flush all the entries. This is inefficient since the guest would have
to refill the TLBs again, even for the addresses that aren't covered
by the table entry. The performance impact would scale poorly if many
addresses in the VM is going through this remapping.

For architectures that implement FEAT_TLBIRANGE, KVM can replace such
inefficient paths by performing the invalidations only on the range of
addresses that are in scope. This series tries to achieve the same in
the areas of stage-2 map, unmap and write-protecting the pages.

Patch-1 refactors the core arm64's __flush_tlb_range() to be used by
other entities.

Patch-2 adds a generic range-based TLBI mechanism for KVM.

Patch-3 adds support to flush a range of IPAs for KVM.

Patch-4 implements the kvm_arch_flush_remote_tlbs_range() for arm64.

Patch-5 aims to flush only the memslot that undergoes a write-protect,
instead of the entire VM.

Patch-6 operates on stage2_try_break_pte() to use the range based
TLBI instructions when breaking a table entry. The map path is the
immediate consumer of this when KVM remaps a table entry into a block.

Patch-7 introduces a fast stage-2 unmap path in which, for the right
conditions, instead of traversing each and every PTE and unmapping them,
disconnect the PTE at a higher level (say at level-1 for a 4K pagesize)
and unmap the table entries using free_removed_table(). This would allow
KVM to use the range based TLBI to flush the entire range governed at
that level.

The series is based off of upstream v6.2-rc6, and applied David
Matlack's common API for TLB invalidations[1] on top.

The performance evaluation was done on a hardware that supports
FEAT_TLBIRANGE, on a VHE configuration, using a modified kvm_page_table_test.
The modified version updates the guest code in the ADJUST_MAPPINGS case
to not only access this page but also to access up to 512 pages backwards
for every new page it iterates through. This is done to test the effect
of TLBI misses after KVM has handled a fault.

The series captures the impact in the map and unmap paths as described above.

$ kvm_page_table_test -m 2 -v 128 -s anonymous_hugetlb_2mb -b $i

+--------+------------------------------+------------------------------+
| mem_sz |    ADJUST_MAPPINGS (s)       |      Unmap VM (s)            |
|  (GB)  | Baseline | Baseline + series | Baseline | Baseline + series |
+--------+----------|-------------------+------------------------------+
|   1    |   4.15   |   4.26            | 0.50     | 0.007             |
|   2    |   6.09   |   6.08            | 0.50     | 0.009             |
|   4    |  12.65   |  11.46            | 0.50     | 0.01              |
|   8    |  25.35   |  24.75            | 0.52     | 0.02              |
|  16    |  52.17   |  48.23            | 0.53     | 0.03              |
|  32    | 100.09   |  84.53            | 0.57     | 0.06              |
|  64    | 176.46   | 166.96            | 0.75     | 0.11              |
| 128    | 340.22   | 302.82            | 0.81     | 0.20              |
+--------+----------+-------------------+----------+-------------------+

$ kvm_page_table_test -m 2 -b 128G -s anonymous_hugetlb_2mb -v $i

+--------+------------------------------+
| vCPUs  |    ADJUST_MAPPINGS (s)       |
|        | Baseline | Baseline + series |
+--------+----------|-------------------+
|   1    | 153.91   | 148.75            |
|   2    | 188.17   | 176.11            |
|   4    | 193.15   | 175.77            |
|   8    | 195.60   | 184.92            |
|  16    | 183.49   | 170.22            |
|  32    | 159.37   | 152.70            |
|  64    | 190.15   | 180.45            |
| 128    | 340.22   | 302.82            |   
+--------+----------+-------------------+

For the ADJUST_MAPPINGS cases, which maps back the 4K table entries to
2M hugepages, the series sees an average improvement of ~7%. For unmapping
2M hugepages, we see at least a 4x improvement.

$ kvm_page_table_test -m 2 -b $i

+--------+------------------------------+
| mem_sz |      Unmap VM (s)            |
|  (GB)  | Baseline | Baseline + series |
+--------+------------------------------+
|   1    |  1.03    |  0.58             |
|   2    |  1.57    |  0.72             |
|   4    |  2.65    |  0.98             |
|   8    |  4.77    |  1.54             |
|  16    |  9.06    |  2.57             |
|  32    | 17.60    |  4.41             |
|  64    | 34.72    |  8.92             |
| 128    | 68.92    | 17.70             |   
+--------+----------+-------------------+

The 4x improvement for unmapping also holds true when the guest is
backed by PAGE_SIZE (4K) pages.

v2:
- Rebased the series on top of David Matlack's series for common
  TLB invalidation API[1].
- Implement kvm_arch_flush_remote_tlbs_range() for arm64, by extending
  the support introduced by [1].
- Use kvm_flush_remote_tlbs_memslot() introduced by [1] to flush
  only the current memslot after write-protect.
- Modified the __kvm_tlb_flush_range() macro to accepts 'level' as an
  argument to calculate the 'stride' instead of just using PAGE_SIZE.
- Split the patch that introduces the range-based TLBI to KVM and the
  implementation of IPA-based invalidation into its own patches.
- Dropped the patch that tries to optimize the mmu notifiers paths.
- Rename the function kvm_table_pte_flush() to
  kvm_pgtable_stage2_flush_range(), and accept the range of addresses to
  flush. [Oliver]
- Drop the 'tlb_level' argument for stage2_try_break_pte() and directly
  pass '0' as 'tlb_level' to kvm_pgtable_stage2_flush_range(). [Oliver]

v1: https://lore.kernel.org/all/20230109215347.3119271-1-rananta@google.com/

Thank you.
Raghavendra

[1]: https://lore.kernel.org/linux-arm-kernel/20230126184025.2294823-1-dmatlack@google.com/

Raghavendra Rao Ananta (7):
  arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range
  KVM: arm64: Add FEAT_TLBIRANGE support
  KVM: arm64: Implement  __kvm_tlb_flush_range_vmid_ipa()
  KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
  KVM: arm64: Flush only the memslot after write-protect
  KVM: arm64: Break the table entries using TLBI range instructions
  KVM: arm64: Create a fast stage-2 unmap path

 arch/arm64/include/asm/kvm_asm.h   |  21 ++++++
 arch/arm64/include/asm/kvm_host.h  |   3 +
 arch/arm64/include/asm/tlbflush.h  | 107 +++++++++++++++--------------
 arch/arm64/kvm/hyp/nvhe/hyp-main.c |  12 ++++
 arch/arm64/kvm/hyp/nvhe/tlb.c      |  28 ++++++++
 arch/arm64/kvm/hyp/pgtable.c       |  67 +++++++++++++++++-
 arch/arm64/kvm/hyp/vhe/tlb.c       |  24 +++++++
 arch/arm64/kvm/mmu.c               |  17 ++++-
 8 files changed, 222 insertions(+), 57 deletions(-)

-- 
2.39.1.519.gcb327c4b5f-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

             reply	other threads:[~2023-02-06 17:24 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-06 17:23 Raghavendra Rao Ananta [this message]
2023-02-06 17:23 ` [PATCH v2 0/7] KVM: arm64: Add support for FEAT_TLBIRANGE Raghavendra Rao Ananta
2023-02-06 17:23 ` [PATCH v2 1/7] arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range Raghavendra Rao Ananta
2023-02-06 17:23   ` Raghavendra Rao Ananta
2023-02-06 17:23 ` [PATCH v2 2/7] KVM: arm64: Add FEAT_TLBIRANGE support Raghavendra Rao Ananta
2023-02-06 17:23   ` Raghavendra Rao Ananta
2023-03-30  1:19   ` Oliver Upton
2023-03-30  1:19     ` Oliver Upton
2023-04-03 17:26     ` Raghavendra Rao Ananta
2023-04-03 17:26       ` Raghavendra Rao Ananta
2023-04-04 18:41       ` Oliver Upton
2023-04-04 18:41         ` Oliver Upton
2023-04-04 18:50         ` Oliver Upton
2023-04-04 18:50           ` Oliver Upton
2023-04-04 21:39         ` Raghavendra Rao Ananta
2023-04-04 21:39           ` Raghavendra Rao Ananta
2023-02-06 17:23 ` [PATCH v2 3/7] KVM: arm64: Implement __kvm_tlb_flush_range_vmid_ipa() Raghavendra Rao Ananta
2023-02-06 17:23   ` Raghavendra Rao Ananta
2023-03-30  0:59   ` Oliver Upton
2023-03-30  0:59     ` Oliver Upton
2023-04-03 21:08     ` Raghavendra Rao Ananta
2023-04-03 21:08       ` Raghavendra Rao Ananta
2023-04-04 18:46       ` Oliver Upton
2023-04-04 18:46         ` Oliver Upton
2023-04-04 20:50         ` Raghavendra Rao Ananta
2023-04-04 20:50           ` Raghavendra Rao Ananta
2023-02-06 17:23 ` [PATCH v2 4/7] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range() Raghavendra Rao Ananta
2023-02-06 17:23   ` Raghavendra Rao Ananta
2023-03-30  0:53   ` Oliver Upton
2023-03-30  0:53     ` Oliver Upton
2023-04-03 21:23     ` Raghavendra Rao Ananta
2023-04-03 21:23       ` Raghavendra Rao Ananta
2023-04-04 19:09       ` Oliver Upton
2023-04-04 19:09         ` Oliver Upton
2023-04-04 20:59         ` Raghavendra Rao Ananta
2023-04-04 20:59           ` Raghavendra Rao Ananta
2023-02-06 17:23 ` [PATCH v2 5/7] KVM: arm64: Flush only the memslot after write-protect Raghavendra Rao Ananta
2023-02-06 17:23   ` Raghavendra Rao Ananta
2023-02-06 17:23 ` [PATCH v2 6/7] KVM: arm64: Break the table entries using TLBI range instructions Raghavendra Rao Ananta
2023-02-06 17:23   ` Raghavendra Rao Ananta
2023-03-30  0:17   ` Oliver Upton
2023-03-30  0:17     ` Oliver Upton
2023-04-03 21:25     ` Raghavendra Rao Ananta
2023-04-03 21:25       ` Raghavendra Rao Ananta
2023-02-06 17:23 ` [PATCH v2 7/7] KVM: arm64: Create a fast stage-2 unmap path Raghavendra Rao Ananta
2023-02-06 17:23   ` Raghavendra Rao Ananta
2023-03-30  0:42   ` Oliver Upton
2023-03-30  0:42     ` Oliver Upton
2023-04-04 17:52     ` Raghavendra Rao Ananta
2023-04-04 17:52       ` Raghavendra Rao Ananta
2023-04-04 19:19       ` Oliver Upton
2023-04-04 19:19         ` Oliver Upton
2023-04-04 21:07         ` Raghavendra Rao Ananta
2023-04-04 21:07           ` Raghavendra Rao Ananta
2023-04-04 21:30           ` Oliver Upton
2023-04-04 21:30             ` Oliver Upton
2023-04-04 21:45             ` Raghavendra Rao Ananta
2023-04-04 21:45               ` Raghavendra Rao Ananta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230206172340.2639971-1-rananta@google.com \
    --to=rananta@google.com \
    --cc=alexandru.elisei@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=coltonlewis@google.com \
    --cc=james.morse@arm.com \
    --cc=jingzhangos@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=reijiw@google.com \
    --cc=ricarkol@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.