All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2022-12-06 13:59 ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: kvmarm, kvmarm, linux-arm-kernel

(appologies, I'm resending this series as I managed to send the cover letter to
all but the following patches only to myself on first attempt).

This is my first upstream feature submission so please go easy ;-)

Add support for FEAT_LPA2 to KVM for both hypervisor stage 1 (for the
nvhe/protected modes) and the vm stage 2 translation tables. FEAT_LPA2 enables
52 bit PAs and VAs for 4KB and 16KB granules (note this is already supported for
64KB granules via the FEAT_LPA and FEAT_LVA extensions).

The series does not include support for FEAT_LPA2 in the kernel stage 1. This
support is provided separately by Ard Biesheuvel's series at [1]. Although my
series does not rely on Ard's work and I'm posting the patches based on top of
v6.1-rc6, I have tested with and without Ard's changes and provide the results
below. The testing highlighted some pre-existing bugs in mainline and I have
submitted fixes for these separately at [2], [3], [4]. You can find the tree
without Ard's changes at [5] and a tree with Ard's changes at [6].

The series is broken up into 3 phases; update TLBI routines, support 52-bit
output addresses, and support 52 bit input addresses.

Update TLBI Routines: The update to the range-based TLBI instructions is
technically not needed for the KVM support because KVM only uses the
non-range-based TBLI instructions as far as I can see. But I've done both parts
because I thought it was sensible to do all the updates together - the
range-based stuff will be needed by Ard's patch I think. See commit message for
details.

Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
HW advertises support for LPA2 independently for stage 1 and stage 2, and
therefore its possible to have it for one and not the other. I've assumed that
there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
of this independence and the fact that the kvm pgtable library is used for both
stage 1 and stage 2 tables, this means the library now has to remember the
in-use format on a per-page-table basis. To do this, I had to rework some
functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
a noisy patch to add this parameter.

Support 52-bit Input Addresses: The main difficulty here is that at stage 1 for
4KB pages, 52-bit IA requires a extra level of lookup, and that level is called
'-1'. (Although stage 2 can use concatenated pages at the first level, and
therefore still only uses 4 levels, the kvm pgtable library deals with both
stage 1 and stage 2 tables). So there is another noisy patch to convert all
level variables to signed.

This is all tested on the FVP, using a test harness I put together, which does a
host + guest boot test for 180 configurations, built from all the (valid)
combinations of various FVP, host kernel and guest kernel parameters:

 - hw_pa:		[48, lpa, lpa2]
 - hw_va:		[48, 52]
 - kvm_mode:		[vhe, nvhe, protected]
 - host_page_size:	[4KB, 16KB, 64KB]
 - host_pa:		[48, 52]
 - host_va:		[48, 52]
 - host_load_addr:	[low, high]
 - guest_page_size:	[64KB]
 - guest_pa:		[52]
 - guest_va:		[52]
 - guest_load_addr:	[low, high]

I provide the results for the tree at [5] (which doesn't contain Ard's series),
where all tests pass except the 12 cases where a 4KB or 16KB host kernel is
attempting to boot in high memory - these are expected to fail due to the host
kernel not supporting LPA2. When running the tests against the tree at [6]
(which does contain Ard's series plus a few minor fixes), all tests pass.

(I'm omitting guest_page_size, guest_va and guest_pa below since it is always
64KB/52/52).

+-------+-------+-----------+----------------+---------+---------+----------------+-----------------+-------+
| hw_pa | hw_va | kvm_mode  | host_page_size | host_va | host_pa | host_load_addr | guest_load_addr | pass  |
+-------+-------+-----------+----------------+---------+---------+----------------+-----------------+-------+
|  48   |  48   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   48    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   52    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   48    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      high      |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      high      |      high       | True  |
|  lpa  |  52   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |      high       | True  |
|  lpa  |  52   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   48    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   52    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   48    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      high      |       low       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      high      |      high       | True  |
| lpa2  |  52   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |       4k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      16k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   48    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      high      |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      high      |      high       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |      high       | True  |
| lpa2  |  52   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |       4k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |      16k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   48    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      high      |       low       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      high      |      high       | True  |
+-------+-------+-----------+----------------+---------+---------+----------------+-----------------+-------+

[1] https://lore.kernel.org/linux-arm-kernel/20221124123932.2648991-1-ardb@kernel.org
[2] https://lore.kernel.org/kvmarm/20221027120945.29679-1-ryan.roberts@arm.com
[3] https://lore.kernel.org/kvmarm/20221103150507.32948-1-ryan.roberts@arm.com
[4] https://lore.kernel.org/kvmarm/20221205114031.3972780-1-ryan.roberts@arm.com
[5] https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/lpa2/kvm_lkml-v1
[6] https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/lpa2/ardb_arm64-4k-lpa2_plus_kvm_2022-12-01

Thanks,
Ryan


Anshuman Khandual (1):
  arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]

Ryan Roberts (11):
  arm64/mm: Update tlb invalidation routines for FEAT_LPA2
  KVM: arm64: Add new (V)TCR_EL2 field definitions for FEAT_LPA2
  KVM: arm64: Plumbing to enable multiple pgtable formats
  KVM: arm64: Maintain page-table format info in struct kvm_pgtable
  KVM: arm64: Use LPA2 page-tables for stage2 if HW supports it
  KVM: arm64: Use LPA2 page-tables for hyp stage1 if HW supports it
  KVM: arm64: Insert PS field at TCR_EL2 assembly time
  KVM: arm64: Convert translation level parameter to s8
  KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
  KVM: arm64: Support upto 5 levels of translation in kvm_pgtable
  KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systems

 arch/arm64/include/asm/kvm_arm.h        |  79 +++---
 arch/arm64/include/asm/kvm_emulate.h    |  14 +-
 arch/arm64/include/asm/kvm_pgtable.h    | 131 +++++++--
 arch/arm64/include/asm/kvm_pkvm.h       |   5 +-
 arch/arm64/include/asm/pgtable-prot.h   |   6 +
 arch/arm64/include/asm/stage2_pgtable.h |  13 +-
 arch/arm64/include/asm/sysreg.h         |   5 +
 arch/arm64/include/asm/tlb.h            |  15 +-
 arch/arm64/include/asm/tlbflush.h       |  83 ++++--
 arch/arm64/kvm/arm.c                    |   5 +
 arch/arm64/kvm/hyp/nvhe/hyp-init.S      |   4 -
 arch/arm64/kvm/hyp/nvhe/mem_protect.c   |  21 +-
 arch/arm64/kvm/hyp/nvhe/setup.c         |  28 +-
 arch/arm64/kvm/hyp/pgtable.c            | 354 +++++++++++++++---------
 arch/arm64/kvm/mmu.c                    |  15 +-
 arch/arm64/kvm/reset.c                  |  11 +-
 16 files changed, 525 insertions(+), 264 deletions(-)

--
2.25.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2022-12-06 13:59 ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

(appologies, I'm resending this series as I managed to send the cover letter to
all but the following patches only to myself on first attempt).

This is my first upstream feature submission so please go easy ;-)

Add support for FEAT_LPA2 to KVM for both hypervisor stage 1 (for the
nvhe/protected modes) and the vm stage 2 translation tables. FEAT_LPA2 enables
52 bit PAs and VAs for 4KB and 16KB granules (note this is already supported for
64KB granules via the FEAT_LPA and FEAT_LVA extensions).

The series does not include support for FEAT_LPA2 in the kernel stage 1. This
support is provided separately by Ard Biesheuvel's series at [1]. Although my
series does not rely on Ard's work and I'm posting the patches based on top of
v6.1-rc6, I have tested with and without Ard's changes and provide the results
below. The testing highlighted some pre-existing bugs in mainline and I have
submitted fixes for these separately at [2], [3], [4]. You can find the tree
without Ard's changes at [5] and a tree with Ard's changes at [6].

The series is broken up into 3 phases; update TLBI routines, support 52-bit
output addresses, and support 52 bit input addresses.

Update TLBI Routines: The update to the range-based TLBI instructions is
technically not needed for the KVM support because KVM only uses the
non-range-based TBLI instructions as far as I can see. But I've done both parts
because I thought it was sensible to do all the updates together - the
range-based stuff will be needed by Ard's patch I think. See commit message for
details.

Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
HW advertises support for LPA2 independently for stage 1 and stage 2, and
therefore its possible to have it for one and not the other. I've assumed that
there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
of this independence and the fact that the kvm pgtable library is used for both
stage 1 and stage 2 tables, this means the library now has to remember the
in-use format on a per-page-table basis. To do this, I had to rework some
functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
a noisy patch to add this parameter.

Support 52-bit Input Addresses: The main difficulty here is that at stage 1 for
4KB pages, 52-bit IA requires a extra level of lookup, and that level is called
'-1'. (Although stage 2 can use concatenated pages at the first level, and
therefore still only uses 4 levels, the kvm pgtable library deals with both
stage 1 and stage 2 tables). So there is another noisy patch to convert all
level variables to signed.

This is all tested on the FVP, using a test harness I put together, which does a
host + guest boot test for 180 configurations, built from all the (valid)
combinations of various FVP, host kernel and guest kernel parameters:

 - hw_pa:		[48, lpa, lpa2]
 - hw_va:		[48, 52]
 - kvm_mode:		[vhe, nvhe, protected]
 - host_page_size:	[4KB, 16KB, 64KB]
 - host_pa:		[48, 52]
 - host_va:		[48, 52]
 - host_load_addr:	[low, high]
 - guest_page_size:	[64KB]
 - guest_pa:		[52]
 - guest_va:		[52]
 - guest_load_addr:	[low, high]

I provide the results for the tree at [5] (which doesn't contain Ard's series),
where all tests pass except the 12 cases where a 4KB or 16KB host kernel is
attempting to boot in high memory - these are expected to fail due to the host
kernel not supporting LPA2. When running the tests against the tree at [6]
(which does contain Ard's series plus a few minor fixes), all tests pass.

(I'm omitting guest_page_size, guest_va and guest_pa below since it is always
64KB/52/52).

+-------+-------+-----------+----------------+---------+---------+----------------+-----------------+-------+
| hw_pa | hw_va | kvm_mode  | host_page_size | host_va | host_pa | host_load_addr | guest_load_addr | pass  |
+-------+-------+-----------+----------------+---------+---------+----------------+-----------------+-------+
|  48   |  48   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   48    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   52    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   48    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      high      |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      high      |      high       | True  |
|  lpa  |  52   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |      high       | True  |
|  lpa  |  52   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   48    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   52    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   48    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      high      |       low       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      high      |      high       | True  |
| lpa2  |  52   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |       4k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      16k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   48    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      high      |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      high      |      high       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |      high       | True  |
| lpa2  |  52   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |       4k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |      16k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   48    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      high      |       low       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      high      |      high       | True  |
+-------+-------+-----------+----------------+---------+---------+----------------+-----------------+-------+

[1] https://lore.kernel.org/linux-arm-kernel/20221124123932.2648991-1-ardb@kernel.org
[2] https://lore.kernel.org/kvmarm/20221027120945.29679-1-ryan.roberts@arm.com
[3] https://lore.kernel.org/kvmarm/20221103150507.32948-1-ryan.roberts@arm.com
[4] https://lore.kernel.org/kvmarm/20221205114031.3972780-1-ryan.roberts@arm.com
[5] https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/lpa2/kvm_lkml-v1
[6] https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/lpa2/ardb_arm64-4k-lpa2_plus_kvm_2022-12-01

Thanks,
Ryan


Anshuman Khandual (1):
  arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]

Ryan Roberts (11):
  arm64/mm: Update tlb invalidation routines for FEAT_LPA2
  KVM: arm64: Add new (V)TCR_EL2 field definitions for FEAT_LPA2
  KVM: arm64: Plumbing to enable multiple pgtable formats
  KVM: arm64: Maintain page-table format info in struct kvm_pgtable
  KVM: arm64: Use LPA2 page-tables for stage2 if HW supports it
  KVM: arm64: Use LPA2 page-tables for hyp stage1 if HW supports it
  KVM: arm64: Insert PS field at TCR_EL2 assembly time
  KVM: arm64: Convert translation level parameter to s8
  KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
  KVM: arm64: Support upto 5 levels of translation in kvm_pgtable
  KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systems

 arch/arm64/include/asm/kvm_arm.h        |  79 +++---
 arch/arm64/include/asm/kvm_emulate.h    |  14 +-
 arch/arm64/include/asm/kvm_pgtable.h    | 131 +++++++--
 arch/arm64/include/asm/kvm_pkvm.h       |   5 +-
 arch/arm64/include/asm/pgtable-prot.h   |   6 +
 arch/arm64/include/asm/stage2_pgtable.h |  13 +-
 arch/arm64/include/asm/sysreg.h         |   5 +
 arch/arm64/include/asm/tlb.h            |  15 +-
 arch/arm64/include/asm/tlbflush.h       |  83 ++++--
 arch/arm64/kvm/arm.c                    |   5 +
 arch/arm64/kvm/hyp/nvhe/hyp-init.S      |   4 -
 arch/arm64/kvm/hyp/nvhe/mem_protect.c   |  21 +-
 arch/arm64/kvm/hyp/nvhe/setup.c         |  28 +-
 arch/arm64/kvm/hyp/pgtable.c            | 354 +++++++++++++++---------
 arch/arm64/kvm/mmu.c                    |  15 +-
 arch/arm64/kvm/reset.c                  |  11 +-
 16 files changed, 525 insertions(+), 264 deletions(-)

--
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2022-12-06 13:59 ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

(appologies, I'm resending this series as I managed to send the cover letter to
all but the following patches only to myself on first attempt).

This is my first upstream feature submission so please go easy ;-)

Add support for FEAT_LPA2 to KVM for both hypervisor stage 1 (for the
nvhe/protected modes) and the vm stage 2 translation tables. FEAT_LPA2 enables
52 bit PAs and VAs for 4KB and 16KB granules (note this is already supported for
64KB granules via the FEAT_LPA and FEAT_LVA extensions).

The series does not include support for FEAT_LPA2 in the kernel stage 1. This
support is provided separately by Ard Biesheuvel's series at [1]. Although my
series does not rely on Ard's work and I'm posting the patches based on top of
v6.1-rc6, I have tested with and without Ard's changes and provide the results
below. The testing highlighted some pre-existing bugs in mainline and I have
submitted fixes for these separately at [2], [3], [4]. You can find the tree
without Ard's changes at [5] and a tree with Ard's changes at [6].

The series is broken up into 3 phases; update TLBI routines, support 52-bit
output addresses, and support 52 bit input addresses.

Update TLBI Routines: The update to the range-based TLBI instructions is
technically not needed for the KVM support because KVM only uses the
non-range-based TBLI instructions as far as I can see. But I've done both parts
because I thought it was sensible to do all the updates together - the
range-based stuff will be needed by Ard's patch I think. See commit message for
details.

Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
HW advertises support for LPA2 independently for stage 1 and stage 2, and
therefore its possible to have it for one and not the other. I've assumed that
there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
of this independence and the fact that the kvm pgtable library is used for both
stage 1 and stage 2 tables, this means the library now has to remember the
in-use format on a per-page-table basis. To do this, I had to rework some
functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
a noisy patch to add this parameter.

Support 52-bit Input Addresses: The main difficulty here is that at stage 1 for
4KB pages, 52-bit IA requires a extra level of lookup, and that level is called
'-1'. (Although stage 2 can use concatenated pages at the first level, and
therefore still only uses 4 levels, the kvm pgtable library deals with both
stage 1 and stage 2 tables). So there is another noisy patch to convert all
level variables to signed.

This is all tested on the FVP, using a test harness I put together, which does a
host + guest boot test for 180 configurations, built from all the (valid)
combinations of various FVP, host kernel and guest kernel parameters:

 - hw_pa:		[48, lpa, lpa2]
 - hw_va:		[48, 52]
 - kvm_mode:		[vhe, nvhe, protected]
 - host_page_size:	[4KB, 16KB, 64KB]
 - host_pa:		[48, 52]
 - host_va:		[48, 52]
 - host_load_addr:	[low, high]
 - guest_page_size:	[64KB]
 - guest_pa:		[52]
 - guest_va:		[52]
 - guest_load_addr:	[low, high]

I provide the results for the tree at [5] (which doesn't contain Ard's series),
where all tests pass except the 12 cases where a 4KB or 16KB host kernel is
attempting to boot in high memory - these are expected to fail due to the host
kernel not supporting LPA2. When running the tests against the tree at [6]
(which does contain Ard's series plus a few minor fixes), all tests pass.

(I'm omitting guest_page_size, guest_va and guest_pa below since it is always
64KB/52/52).

+-------+-------+-----------+----------------+---------+---------+----------------+-----------------+-------+
| hw_pa | hw_va | kvm_mode  | host_page_size | host_va | host_pa | host_load_addr | guest_load_addr | pass  |
+-------+-------+-----------+----------------+---------+---------+----------------+-----------------+-------+
|  48   |  48   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   48    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   52    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   48    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      high      |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      high      |      high       | True  |
|  lpa  |  52   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |      high       | True  |
|  lpa  |  52   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   48    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   52    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   48    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      high      |       low       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      high      |      high       | True  |
| lpa2  |  52   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |       4k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      16k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   48    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      high      |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      high      |      high       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |      high       | True  |
| lpa2  |  52   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |       4k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |      16k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   48    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      high      |       low       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      high      |      high       | True  |
+-------+-------+-----------+----------------+---------+---------+----------------+-----------------+-------+

[1] https://lore.kernel.org/linux-arm-kernel/20221124123932.2648991-1-ardb@kernel.org
[2] https://lore.kernel.org/kvmarm/20221027120945.29679-1-ryan.roberts@arm.com
[3] https://lore.kernel.org/kvmarm/20221103150507.32948-1-ryan.roberts@arm.com
[4] https://lore.kernel.org/kvmarm/20221205114031.3972780-1-ryan.roberts@arm.com
[5] https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/lpa2/kvm_lkml-v1
[6] https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/lpa2/ardb_arm64-4k-lpa2_plus_kvm_2022-12-01

Thanks,
Ryan


Anshuman Khandual (1):
  arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]

Ryan Roberts (11):
  arm64/mm: Update tlb invalidation routines for FEAT_LPA2
  KVM: arm64: Add new (V)TCR_EL2 field definitions for FEAT_LPA2
  KVM: arm64: Plumbing to enable multiple pgtable formats
  KVM: arm64: Maintain page-table format info in struct kvm_pgtable
  KVM: arm64: Use LPA2 page-tables for stage2 if HW supports it
  KVM: arm64: Use LPA2 page-tables for hyp stage1 if HW supports it
  KVM: arm64: Insert PS field at TCR_EL2 assembly time
  KVM: arm64: Convert translation level parameter to s8
  KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
  KVM: arm64: Support upto 5 levels of translation in kvm_pgtable
  KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systems

 arch/arm64/include/asm/kvm_arm.h        |  79 +++---
 arch/arm64/include/asm/kvm_emulate.h    |  14 +-
 arch/arm64/include/asm/kvm_pgtable.h    | 131 +++++++--
 arch/arm64/include/asm/kvm_pkvm.h       |   5 +-
 arch/arm64/include/asm/pgtable-prot.h   |   6 +
 arch/arm64/include/asm/stage2_pgtable.h |  13 +-
 arch/arm64/include/asm/sysreg.h         |   5 +
 arch/arm64/include/asm/tlb.h            |  15 +-
 arch/arm64/include/asm/tlbflush.h       |  83 ++++--
 arch/arm64/kvm/arm.c                    |   5 +
 arch/arm64/kvm/hyp/nvhe/hyp-init.S      |   4 -
 arch/arm64/kvm/hyp/nvhe/mem_protect.c   |  21 +-
 arch/arm64/kvm/hyp/nvhe/setup.c         |  28 +-
 arch/arm64/kvm/hyp/pgtable.c            | 354 +++++++++++++++---------
 arch/arm64/kvm/mmu.c                    |  15 +-
 arch/arm64/kvm/reset.c                  |  11 +-
 16 files changed, 525 insertions(+), 264 deletions(-)

--
2.25.1


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v1 01/12] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
  2022-12-06 13:59 ` Ryan Roberts
  (?)
@ 2022-12-06 13:59   ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: kvmarm, kvmarm, linux-arm-kernel

From: Anshuman Khandual <anshuman.khandual@arm.com>

PAGE_SIZE support is tested against possible minimum and maximum values for
its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/sysreg.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 7d301700d1a9..9ad8172eea58 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -673,10 +673,12 @@
 
 /* id_aa64mmfr0 */
 #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN	0x0
+#define ID_AA64MMFR0_EL1_TGRAN4_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX	0x7
 #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN	0x0
 #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX	0x7
 #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN	0x1
+#define ID_AA64MMFR0_EL1_TGRAN16_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX	0xf
 
 #define ARM64_MIN_PARANGE_BITS		32
@@ -684,6 +686,7 @@
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT	0x0
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE		0x1
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN		0x2
+#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2		0x3
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX		0x7
 
 #ifdef CONFIG_ARM64_PA_BITS_52
@@ -800,11 +803,13 @@
 
 #if defined(CONFIG_ARM64_4K_PAGES)
 #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
 #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
 #elif defined(CONFIG_ARM64_16K_PAGES)
 #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX
 #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_2_SHIFT
-- 
2.25.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 01/12] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

From: Anshuman Khandual <anshuman.khandual@arm.com>

PAGE_SIZE support is tested against possible minimum and maximum values for
its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/sysreg.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 7d301700d1a9..9ad8172eea58 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -673,10 +673,12 @@
 
 /* id_aa64mmfr0 */
 #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN	0x0
+#define ID_AA64MMFR0_EL1_TGRAN4_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX	0x7
 #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN	0x0
 #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX	0x7
 #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN	0x1
+#define ID_AA64MMFR0_EL1_TGRAN16_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX	0xf
 
 #define ARM64_MIN_PARANGE_BITS		32
@@ -684,6 +686,7 @@
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT	0x0
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE		0x1
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN		0x2
+#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2		0x3
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX		0x7
 
 #ifdef CONFIG_ARM64_PA_BITS_52
@@ -800,11 +803,13 @@
 
 #if defined(CONFIG_ARM64_4K_PAGES)
 #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
 #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
 #elif defined(CONFIG_ARM64_16K_PAGES)
 #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX
 #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_2_SHIFT
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 01/12] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

From: Anshuman Khandual <anshuman.khandual@arm.com>

PAGE_SIZE support is tested against possible minimum and maximum values for
its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/sysreg.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 7d301700d1a9..9ad8172eea58 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -673,10 +673,12 @@
 
 /* id_aa64mmfr0 */
 #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN	0x0
+#define ID_AA64MMFR0_EL1_TGRAN4_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX	0x7
 #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN	0x0
 #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX	0x7
 #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN	0x1
+#define ID_AA64MMFR0_EL1_TGRAN16_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX	0xf
 
 #define ARM64_MIN_PARANGE_BITS		32
@@ -684,6 +686,7 @@
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT	0x0
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE		0x1
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN		0x2
+#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2		0x3
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX		0x7
 
 #ifdef CONFIG_ARM64_PA_BITS_52
@@ -800,11 +803,13 @@
 
 #if defined(CONFIG_ARM64_4K_PAGES)
 #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
 #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
 #elif defined(CONFIG_ARM64_16K_PAGES)
 #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX
 #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_2_SHIFT
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 02/12] arm64/mm: Update tlb invalidation routines for FEAT_LPA2
  2022-12-06 13:59 ` Ryan Roberts
  (?)
@ 2022-12-06 13:59   ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: kvmarm, kvmarm, linux-arm-kernel

FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
the non-range tlbi instructions can now validly take a 0 value for the
4KB granule (this is due to the extra level of translation). Secondly,
the BADDR field in the range tlbi instructions must be aligned to 64KB
when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
continue to operate correctly when LPA2 is in use.

We solve the first by always adding the level hint if the level is
between [0, 3] (previously anything other than 0 was hinted, which
breaks in the new level -1 case from kvm). When running on non-LPA2 HW,
0 is still safe to hint as the HW will fall back to non-hinted. We also
update kernel code to take advantage of the new hint for p4d flushing.
While we are at it, we replace the notion of 0 being the non-hinted
seninel with a macro, TLBI_TTL_UNKNOWN. This means callers won't need
updating if/when translation depth increases in future.

The second problem is tricker. When LPA2 is in use, we need to use the
non-range tlbi instructions to forward align to a 64KB boundary first,
then we can use range-based tlbi from there on, until we have either
invalidated all pages or we have a single page remaining. If the latter,
that is done with non-range tlbi. (Previously we invalidated a single
odd page first, but we can no longer do this because it could wreck our
64KB alignment). When LPA2 is not in use, we don't need the initial
alignemnt step. However, the bigger impact is that we can no longer use
the previous method of iterating from smallest to largest 'scale', since
this would likely unalign the boundary again for the LPA2 case. So
instead we iterate from highest to lowest scale, which guarrantees that
we remain 64KB aligned until the last op (at scale=0).

The original commit (d1d3aa9 "arm64: tlb: Use the TLBI RANGE feature in
arm64") stated this as the reason for incrementing scale:

  However, in most scenarios, the pages = 1 when flush_tlb_range() is
  called. Start from scale = 3 or other proper value (such as scale
  =ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
  to maximum, the flush order is exactly opposite to the example.

But pages=1 is already special cased by the non-range invalidation path,
which will take care of it the first time through the loop (both in the
original commit and in my change), so I don't think switching to
decrement scale should have any extra performance impact after all.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/pgtable-prot.h |  6 ++
 arch/arm64/include/asm/tlb.h          | 15 +++--
 arch/arm64/include/asm/tlbflush.h     | 83 +++++++++++++++++----------
 3 files changed, 69 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 9b165117a454..308cc02fcdf3 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -40,6 +40,12 @@ extern bool arm64_use_ng_mappings;
 #define PTE_MAYBE_NG		(arm64_use_ng_mappings ? PTE_NG : 0)
 #define PMD_MAYBE_NG		(arm64_use_ng_mappings ? PMD_SECT_NG : 0)
 
+/*
+ * For now the kernel never uses lpa2 for its stage1 tables. But kvm does and
+ * this hook allows us to update the common tlbi code to handle lpa2.
+ */
+#define lpa2_is_enabled()	false
+
 /*
  * If we have userspace only BTI we don't want to mark kernel pages
  * guarded even if the system does support BTI.
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index c995d1f4594f..3a189c435973 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -22,15 +22,15 @@ static void tlb_flush(struct mmu_gather *tlb);
 #include <asm-generic/tlb.h>
 
 /*
- * get the tlbi levels in arm64.  Default value is 0 if more than one
- * of cleared_* is set or neither is set.
- * Arm64 doesn't support p4ds now.
+ * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
+ * one of cleared_* is set or neither is set - this elides the level hinting to
+ * the hardware.
  */
 static inline int tlb_get_level(struct mmu_gather *tlb)
 {
 	/* The TTL field is only valid for the leaf entry. */
 	if (tlb->freed_tables)
-		return 0;
+		return TLBI_TTL_UNKNOWN;
 
 	if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
 				   tlb->cleared_puds ||
@@ -47,7 +47,12 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
 				   tlb->cleared_p4ds))
 		return 1;
 
-	return 0;
+	if (tlb->cleared_p4ds && !(tlb->cleared_ptes ||
+				   tlb->cleared_pmds ||
+				   tlb->cleared_puds))
+		return 0;
+
+	return TLBI_TTL_UNKNOWN;
 }
 
 static inline void tlb_flush(struct mmu_gather *tlb)
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 412a3b9a3c25..903d95a4bef5 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -93,19 +93,22 @@ static inline unsigned long get_trans_granule(void)
  * When ARMv8.4-TTL exists, TLBI operations take an additional hint for
  * the level at which the invalidation must take place. If the level is
  * wrong, no invalidation may take place. In the case where the level
- * cannot be easily determined, a 0 value for the level parameter will
- * perform a non-hinted invalidation.
+ * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform
+ * a non-hinted invalidation. Any provided level outside the hint range
+ * will also cause fall-back to non-hinted invalidation.
  *
  * For Stage-2 invalidation, use the level values provided to that effect
  * in asm/stage2_pgtable.h.
  */
 #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
 
+#define TLBI_TTL_UNKNOWN	(-1)
+
 #define __tlbi_level(op, addr, level) do {				\
 	u64 arg = addr;							\
 									\
 	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
-	    level) {							\
+	    level >= 0 && level <= 3) {					\
 		u64 ttl = level & 3;					\
 		ttl |= get_trans_granule() << 2;			\
 		arg &= ~TLBI_TTL_MASK;					\
@@ -132,17 +135,22 @@ static inline unsigned long get_trans_granule(void)
  * The address range is determined by below formula:
  * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
  *
+ * If LPA2 is in use, BADDR holds addr[52:16]. Else BADDR holds page number.
+ * See ARM DDI 0487I.a C5.5.21.
+ *
  */
-#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
-	({							\
-		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
-		__ta &= GENMASK_ULL(36, 0);			\
-		__ta |= (unsigned long)(ttl) << 37;		\
-		__ta |= (unsigned long)(num) << 39;		\
-		__ta |= (unsigned long)(scale) << 44;		\
-		__ta |= get_trans_granule() << 46;		\
-		__ta |= (unsigned long)(asid) << 48;		\
-		__ta;						\
+#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl, lpa2_ena)		\
+	({									\
+		unsigned long __addr_shift = (lpa2_ena) ? 16 : PAGE_SHIFT;	\
+		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
+		unsigned long __ta = (addr) >> __addr_shift;			\
+		__ta &= GENMASK_ULL(36, 0);					\
+		__ta |= __ttl << 37;						\
+		__ta |= (unsigned long)(num) << 39;				\
+		__ta |= (unsigned long)(scale) << 44;				\
+		__ta |= get_trans_granule() << 46;				\
+		__ta |= (unsigned long)(asid) << 48;				\
+		__ta;								\
 	})
 
 /* These macros are used by the TLBI RANGE feature. */
@@ -215,12 +223,16 @@ static inline unsigned long get_trans_granule(void)
  *		CPUs, ensuring that any walk-cache entries associated with the
  *		translation are also invalidated.
  *
- *	__flush_tlb_range(vma, start, end, stride, last_level)
+ *	__flush_tlb_range(vma, start, end, stride, last_level, tlb_level)
  *		Invalidate the virtual-address range '[start, end)' on all
  *		CPUs for the user address space corresponding to 'vma->mm'.
  *		The invalidation operations are issued at a granularity
  *		determined by 'stride' and only affect any walk-cache entries
- *		if 'last_level' is equal to false.
+ *		if 'last_level' is equal to false. tlb_level is the level at
+ *		which the invalidation must take place. If the level is wrong,
+ *		no invalidation may take place. In the case where the level
+ *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will
+ *		perform a non-hinted invalidation.
  *
  *
  *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
@@ -284,8 +296,9 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     int tlb_level)
 {
 	int num = 0;
-	int scale = 0;
+	int scale = 3;
 	unsigned long asid, addr, pages;
+	bool lpa2_ena = lpa2_is_enabled();
 
 	start = round_down(start, stride);
 	end = round_up(end, stride);
@@ -309,17 +322,25 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 
 	/*
 	 * When the CPU does not support TLB range operations, flush the TLB
-	 * entries one by one at the granularity of 'stride'. If the TLB
-	 * range ops are supported, then:
+	 * entries one by one at the granularity of 'stride'. If the TLB range
+	 * ops are supported, then:
+	 *
+	 * 1. If FEAT_LPA2 is in use, the start address of a range operation
+	 *    must be 64KB aligned, so flush pages one by one until the
+	 *    alignment is reached using the non-range operations. This step is
+	 *    skipped if LPA2 is not in use.
 	 *
-	 * 1. If 'pages' is odd, flush the first page through non-range
-	 *    operations;
+	 * 2. For remaining pages: the minimum range granularity is decided by
+	 *    'scale', so multiple range TLBI operations may be required. Start
+	 *    from scale = 3, flush the corresponding number of pages
+	 *    ((num+1)*2^(5*scale+1) starting from 'addr'), then descrease it
+	 *    until one or zero pages are left. We must start from highest scale
+	 *    to ensure 64KB start alignment is maintained in the LPA2 case.
 	 *
-	 * 2. For remaining pages: the minimum range granularity is decided
-	 *    by 'scale', so multiple range TLBI operations may be required.
-	 *    Start from scale = 0, flush the corresponding number of pages
-	 *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
-	 *    until no pages left.
+	 * 3. If there is 1 page remaining, flush it through non-range
+	 *    operations. Range operations can only span an even number of
+	 *    pages. We save this for last to ensure 64KB start alignment is
+	 *    maintained for the LPA2 case.
 	 *
 	 * Note that certain ranges can be represented by either num = 31 and
 	 * scale or num = 0 and scale + 1. The loop below favours the latter
@@ -327,7 +348,8 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 	 */
 	while (pages > 0) {
 		if (!system_supports_tlb_range() ||
-		    pages % 2 == 1) {
+		    pages == 1 ||
+		    (lpa2_ena && start != ALIGN(start, SZ_64K))) {
 			addr = __TLBI_VADDR(start, asid);
 			if (last_level) {
 				__tlbi_level(vale1is, addr, tlb_level);
@@ -344,7 +366,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 		num = __TLBI_RANGE_NUM(pages, scale);
 		if (num >= 0) {
 			addr = __TLBI_VADDR_RANGE(start, asid, scale,
-						  num, tlb_level);
+						  num, tlb_level, lpa2_ena);
 			if (last_level) {
 				__tlbi(rvale1is, addr);
 				__tlbi_user(rvale1is, addr);
@@ -355,7 +377,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
 			pages -= __TLBI_RANGE_PAGES(num, scale);
 		}
-		scale++;
+		scale--;
 	}
 	dsb(ish);
 }
@@ -366,9 +388,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
 	/*
 	 * We cannot use leaf-only invalidation here, since we may be invalidating
 	 * table entries as part of collapsing hugepages or moving page tables.
-	 * Set the tlb_level to 0 because we can not get enough information here.
+	 * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough
+	 * information here.
 	 */
-	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
+	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
 }
 
 static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
-- 
2.25.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 02/12] arm64/mm: Update tlb invalidation routines for FEAT_LPA2
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
the non-range tlbi instructions can now validly take a 0 value for the
4KB granule (this is due to the extra level of translation). Secondly,
the BADDR field in the range tlbi instructions must be aligned to 64KB
when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
continue to operate correctly when LPA2 is in use.

We solve the first by always adding the level hint if the level is
between [0, 3] (previously anything other than 0 was hinted, which
breaks in the new level -1 case from kvm). When running on non-LPA2 HW,
0 is still safe to hint as the HW will fall back to non-hinted. We also
update kernel code to take advantage of the new hint for p4d flushing.
While we are at it, we replace the notion of 0 being the non-hinted
seninel with a macro, TLBI_TTL_UNKNOWN. This means callers won't need
updating if/when translation depth increases in future.

The second problem is tricker. When LPA2 is in use, we need to use the
non-range tlbi instructions to forward align to a 64KB boundary first,
then we can use range-based tlbi from there on, until we have either
invalidated all pages or we have a single page remaining. If the latter,
that is done with non-range tlbi. (Previously we invalidated a single
odd page first, but we can no longer do this because it could wreck our
64KB alignment). When LPA2 is not in use, we don't need the initial
alignemnt step. However, the bigger impact is that we can no longer use
the previous method of iterating from smallest to largest 'scale', since
this would likely unalign the boundary again for the LPA2 case. So
instead we iterate from highest to lowest scale, which guarrantees that
we remain 64KB aligned until the last op (at scale=0).

The original commit (d1d3aa9 "arm64: tlb: Use the TLBI RANGE feature in
arm64") stated this as the reason for incrementing scale:

  However, in most scenarios, the pages = 1 when flush_tlb_range() is
  called. Start from scale = 3 or other proper value (such as scale
  =ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
  to maximum, the flush order is exactly opposite to the example.

But pages=1 is already special cased by the non-range invalidation path,
which will take care of it the first time through the loop (both in the
original commit and in my change), so I don't think switching to
decrement scale should have any extra performance impact after all.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/pgtable-prot.h |  6 ++
 arch/arm64/include/asm/tlb.h          | 15 +++--
 arch/arm64/include/asm/tlbflush.h     | 83 +++++++++++++++++----------
 3 files changed, 69 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 9b165117a454..308cc02fcdf3 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -40,6 +40,12 @@ extern bool arm64_use_ng_mappings;
 #define PTE_MAYBE_NG		(arm64_use_ng_mappings ? PTE_NG : 0)
 #define PMD_MAYBE_NG		(arm64_use_ng_mappings ? PMD_SECT_NG : 0)
 
+/*
+ * For now the kernel never uses lpa2 for its stage1 tables. But kvm does and
+ * this hook allows us to update the common tlbi code to handle lpa2.
+ */
+#define lpa2_is_enabled()	false
+
 /*
  * If we have userspace only BTI we don't want to mark kernel pages
  * guarded even if the system does support BTI.
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index c995d1f4594f..3a189c435973 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -22,15 +22,15 @@ static void tlb_flush(struct mmu_gather *tlb);
 #include <asm-generic/tlb.h>
 
 /*
- * get the tlbi levels in arm64.  Default value is 0 if more than one
- * of cleared_* is set or neither is set.
- * Arm64 doesn't support p4ds now.
+ * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
+ * one of cleared_* is set or neither is set - this elides the level hinting to
+ * the hardware.
  */
 static inline int tlb_get_level(struct mmu_gather *tlb)
 {
 	/* The TTL field is only valid for the leaf entry. */
 	if (tlb->freed_tables)
-		return 0;
+		return TLBI_TTL_UNKNOWN;
 
 	if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
 				   tlb->cleared_puds ||
@@ -47,7 +47,12 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
 				   tlb->cleared_p4ds))
 		return 1;
 
-	return 0;
+	if (tlb->cleared_p4ds && !(tlb->cleared_ptes ||
+				   tlb->cleared_pmds ||
+				   tlb->cleared_puds))
+		return 0;
+
+	return TLBI_TTL_UNKNOWN;
 }
 
 static inline void tlb_flush(struct mmu_gather *tlb)
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 412a3b9a3c25..903d95a4bef5 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -93,19 +93,22 @@ static inline unsigned long get_trans_granule(void)
  * When ARMv8.4-TTL exists, TLBI operations take an additional hint for
  * the level at which the invalidation must take place. If the level is
  * wrong, no invalidation may take place. In the case where the level
- * cannot be easily determined, a 0 value for the level parameter will
- * perform a non-hinted invalidation.
+ * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform
+ * a non-hinted invalidation. Any provided level outside the hint range
+ * will also cause fall-back to non-hinted invalidation.
  *
  * For Stage-2 invalidation, use the level values provided to that effect
  * in asm/stage2_pgtable.h.
  */
 #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
 
+#define TLBI_TTL_UNKNOWN	(-1)
+
 #define __tlbi_level(op, addr, level) do {				\
 	u64 arg = addr;							\
 									\
 	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
-	    level) {							\
+	    level >= 0 && level <= 3) {					\
 		u64 ttl = level & 3;					\
 		ttl |= get_trans_granule() << 2;			\
 		arg &= ~TLBI_TTL_MASK;					\
@@ -132,17 +135,22 @@ static inline unsigned long get_trans_granule(void)
  * The address range is determined by below formula:
  * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
  *
+ * If LPA2 is in use, BADDR holds addr[52:16]. Else BADDR holds page number.
+ * See ARM DDI 0487I.a C5.5.21.
+ *
  */
-#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
-	({							\
-		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
-		__ta &= GENMASK_ULL(36, 0);			\
-		__ta |= (unsigned long)(ttl) << 37;		\
-		__ta |= (unsigned long)(num) << 39;		\
-		__ta |= (unsigned long)(scale) << 44;		\
-		__ta |= get_trans_granule() << 46;		\
-		__ta |= (unsigned long)(asid) << 48;		\
-		__ta;						\
+#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl, lpa2_ena)		\
+	({									\
+		unsigned long __addr_shift = (lpa2_ena) ? 16 : PAGE_SHIFT;	\
+		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
+		unsigned long __ta = (addr) >> __addr_shift;			\
+		__ta &= GENMASK_ULL(36, 0);					\
+		__ta |= __ttl << 37;						\
+		__ta |= (unsigned long)(num) << 39;				\
+		__ta |= (unsigned long)(scale) << 44;				\
+		__ta |= get_trans_granule() << 46;				\
+		__ta |= (unsigned long)(asid) << 48;				\
+		__ta;								\
 	})
 
 /* These macros are used by the TLBI RANGE feature. */
@@ -215,12 +223,16 @@ static inline unsigned long get_trans_granule(void)
  *		CPUs, ensuring that any walk-cache entries associated with the
  *		translation are also invalidated.
  *
- *	__flush_tlb_range(vma, start, end, stride, last_level)
+ *	__flush_tlb_range(vma, start, end, stride, last_level, tlb_level)
  *		Invalidate the virtual-address range '[start, end)' on all
  *		CPUs for the user address space corresponding to 'vma->mm'.
  *		The invalidation operations are issued at a granularity
  *		determined by 'stride' and only affect any walk-cache entries
- *		if 'last_level' is equal to false.
+ *		if 'last_level' is equal to false. tlb_level is the level at
+ *		which the invalidation must take place. If the level is wrong,
+ *		no invalidation may take place. In the case where the level
+ *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will
+ *		perform a non-hinted invalidation.
  *
  *
  *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
@@ -284,8 +296,9 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     int tlb_level)
 {
 	int num = 0;
-	int scale = 0;
+	int scale = 3;
 	unsigned long asid, addr, pages;
+	bool lpa2_ena = lpa2_is_enabled();
 
 	start = round_down(start, stride);
 	end = round_up(end, stride);
@@ -309,17 +322,25 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 
 	/*
 	 * When the CPU does not support TLB range operations, flush the TLB
-	 * entries one by one at the granularity of 'stride'. If the TLB
-	 * range ops are supported, then:
+	 * entries one by one at the granularity of 'stride'. If the TLB range
+	 * ops are supported, then:
+	 *
+	 * 1. If FEAT_LPA2 is in use, the start address of a range operation
+	 *    must be 64KB aligned, so flush pages one by one until the
+	 *    alignment is reached using the non-range operations. This step is
+	 *    skipped if LPA2 is not in use.
 	 *
-	 * 1. If 'pages' is odd, flush the first page through non-range
-	 *    operations;
+	 * 2. For remaining pages: the minimum range granularity is decided by
+	 *    'scale', so multiple range TLBI operations may be required. Start
+	 *    from scale = 3, flush the corresponding number of pages
+	 *    ((num+1)*2^(5*scale+1) starting from 'addr'), then descrease it
+	 *    until one or zero pages are left. We must start from highest scale
+	 *    to ensure 64KB start alignment is maintained in the LPA2 case.
 	 *
-	 * 2. For remaining pages: the minimum range granularity is decided
-	 *    by 'scale', so multiple range TLBI operations may be required.
-	 *    Start from scale = 0, flush the corresponding number of pages
-	 *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
-	 *    until no pages left.
+	 * 3. If there is 1 page remaining, flush it through non-range
+	 *    operations. Range operations can only span an even number of
+	 *    pages. We save this for last to ensure 64KB start alignment is
+	 *    maintained for the LPA2 case.
 	 *
 	 * Note that certain ranges can be represented by either num = 31 and
 	 * scale or num = 0 and scale + 1. The loop below favours the latter
@@ -327,7 +348,8 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 	 */
 	while (pages > 0) {
 		if (!system_supports_tlb_range() ||
-		    pages % 2 == 1) {
+		    pages == 1 ||
+		    (lpa2_ena && start != ALIGN(start, SZ_64K))) {
 			addr = __TLBI_VADDR(start, asid);
 			if (last_level) {
 				__tlbi_level(vale1is, addr, tlb_level);
@@ -344,7 +366,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 		num = __TLBI_RANGE_NUM(pages, scale);
 		if (num >= 0) {
 			addr = __TLBI_VADDR_RANGE(start, asid, scale,
-						  num, tlb_level);
+						  num, tlb_level, lpa2_ena);
 			if (last_level) {
 				__tlbi(rvale1is, addr);
 				__tlbi_user(rvale1is, addr);
@@ -355,7 +377,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
 			pages -= __TLBI_RANGE_PAGES(num, scale);
 		}
-		scale++;
+		scale--;
 	}
 	dsb(ish);
 }
@@ -366,9 +388,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
 	/*
 	 * We cannot use leaf-only invalidation here, since we may be invalidating
 	 * table entries as part of collapsing hugepages or moving page tables.
-	 * Set the tlb_level to 0 because we can not get enough information here.
+	 * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough
+	 * information here.
 	 */
-	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
+	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
 }
 
 static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 02/12] arm64/mm: Update tlb invalidation routines for FEAT_LPA2
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
the non-range tlbi instructions can now validly take a 0 value for the
4KB granule (this is due to the extra level of translation). Secondly,
the BADDR field in the range tlbi instructions must be aligned to 64KB
when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
continue to operate correctly when LPA2 is in use.

We solve the first by always adding the level hint if the level is
between [0, 3] (previously anything other than 0 was hinted, which
breaks in the new level -1 case from kvm). When running on non-LPA2 HW,
0 is still safe to hint as the HW will fall back to non-hinted. We also
update kernel code to take advantage of the new hint for p4d flushing.
While we are at it, we replace the notion of 0 being the non-hinted
seninel with a macro, TLBI_TTL_UNKNOWN. This means callers won't need
updating if/when translation depth increases in future.

The second problem is tricker. When LPA2 is in use, we need to use the
non-range tlbi instructions to forward align to a 64KB boundary first,
then we can use range-based tlbi from there on, until we have either
invalidated all pages or we have a single page remaining. If the latter,
that is done with non-range tlbi. (Previously we invalidated a single
odd page first, but we can no longer do this because it could wreck our
64KB alignment). When LPA2 is not in use, we don't need the initial
alignemnt step. However, the bigger impact is that we can no longer use
the previous method of iterating from smallest to largest 'scale', since
this would likely unalign the boundary again for the LPA2 case. So
instead we iterate from highest to lowest scale, which guarrantees that
we remain 64KB aligned until the last op (at scale=0).

The original commit (d1d3aa9 "arm64: tlb: Use the TLBI RANGE feature in
arm64") stated this as the reason for incrementing scale:

  However, in most scenarios, the pages = 1 when flush_tlb_range() is
  called. Start from scale = 3 or other proper value (such as scale
  =ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
  to maximum, the flush order is exactly opposite to the example.

But pages=1 is already special cased by the non-range invalidation path,
which will take care of it the first time through the loop (both in the
original commit and in my change), so I don't think switching to
decrement scale should have any extra performance impact after all.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/pgtable-prot.h |  6 ++
 arch/arm64/include/asm/tlb.h          | 15 +++--
 arch/arm64/include/asm/tlbflush.h     | 83 +++++++++++++++++----------
 3 files changed, 69 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 9b165117a454..308cc02fcdf3 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -40,6 +40,12 @@ extern bool arm64_use_ng_mappings;
 #define PTE_MAYBE_NG		(arm64_use_ng_mappings ? PTE_NG : 0)
 #define PMD_MAYBE_NG		(arm64_use_ng_mappings ? PMD_SECT_NG : 0)
 
+/*
+ * For now the kernel never uses lpa2 for its stage1 tables. But kvm does and
+ * this hook allows us to update the common tlbi code to handle lpa2.
+ */
+#define lpa2_is_enabled()	false
+
 /*
  * If we have userspace only BTI we don't want to mark kernel pages
  * guarded even if the system does support BTI.
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index c995d1f4594f..3a189c435973 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -22,15 +22,15 @@ static void tlb_flush(struct mmu_gather *tlb);
 #include <asm-generic/tlb.h>
 
 /*
- * get the tlbi levels in arm64.  Default value is 0 if more than one
- * of cleared_* is set or neither is set.
- * Arm64 doesn't support p4ds now.
+ * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
+ * one of cleared_* is set or neither is set - this elides the level hinting to
+ * the hardware.
  */
 static inline int tlb_get_level(struct mmu_gather *tlb)
 {
 	/* The TTL field is only valid for the leaf entry. */
 	if (tlb->freed_tables)
-		return 0;
+		return TLBI_TTL_UNKNOWN;
 
 	if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
 				   tlb->cleared_puds ||
@@ -47,7 +47,12 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
 				   tlb->cleared_p4ds))
 		return 1;
 
-	return 0;
+	if (tlb->cleared_p4ds && !(tlb->cleared_ptes ||
+				   tlb->cleared_pmds ||
+				   tlb->cleared_puds))
+		return 0;
+
+	return TLBI_TTL_UNKNOWN;
 }
 
 static inline void tlb_flush(struct mmu_gather *tlb)
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 412a3b9a3c25..903d95a4bef5 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -93,19 +93,22 @@ static inline unsigned long get_trans_granule(void)
  * When ARMv8.4-TTL exists, TLBI operations take an additional hint for
  * the level at which the invalidation must take place. If the level is
  * wrong, no invalidation may take place. In the case where the level
- * cannot be easily determined, a 0 value for the level parameter will
- * perform a non-hinted invalidation.
+ * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform
+ * a non-hinted invalidation. Any provided level outside the hint range
+ * will also cause fall-back to non-hinted invalidation.
  *
  * For Stage-2 invalidation, use the level values provided to that effect
  * in asm/stage2_pgtable.h.
  */
 #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
 
+#define TLBI_TTL_UNKNOWN	(-1)
+
 #define __tlbi_level(op, addr, level) do {				\
 	u64 arg = addr;							\
 									\
 	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
-	    level) {							\
+	    level >= 0 && level <= 3) {					\
 		u64 ttl = level & 3;					\
 		ttl |= get_trans_granule() << 2;			\
 		arg &= ~TLBI_TTL_MASK;					\
@@ -132,17 +135,22 @@ static inline unsigned long get_trans_granule(void)
  * The address range is determined by below formula:
  * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
  *
+ * If LPA2 is in use, BADDR holds addr[52:16]. Else BADDR holds page number.
+ * See ARM DDI 0487I.a C5.5.21.
+ *
  */
-#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
-	({							\
-		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
-		__ta &= GENMASK_ULL(36, 0);			\
-		__ta |= (unsigned long)(ttl) << 37;		\
-		__ta |= (unsigned long)(num) << 39;		\
-		__ta |= (unsigned long)(scale) << 44;		\
-		__ta |= get_trans_granule() << 46;		\
-		__ta |= (unsigned long)(asid) << 48;		\
-		__ta;						\
+#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl, lpa2_ena)		\
+	({									\
+		unsigned long __addr_shift = (lpa2_ena) ? 16 : PAGE_SHIFT;	\
+		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
+		unsigned long __ta = (addr) >> __addr_shift;			\
+		__ta &= GENMASK_ULL(36, 0);					\
+		__ta |= __ttl << 37;						\
+		__ta |= (unsigned long)(num) << 39;				\
+		__ta |= (unsigned long)(scale) << 44;				\
+		__ta |= get_trans_granule() << 46;				\
+		__ta |= (unsigned long)(asid) << 48;				\
+		__ta;								\
 	})
 
 /* These macros are used by the TLBI RANGE feature. */
@@ -215,12 +223,16 @@ static inline unsigned long get_trans_granule(void)
  *		CPUs, ensuring that any walk-cache entries associated with the
  *		translation are also invalidated.
  *
- *	__flush_tlb_range(vma, start, end, stride, last_level)
+ *	__flush_tlb_range(vma, start, end, stride, last_level, tlb_level)
  *		Invalidate the virtual-address range '[start, end)' on all
  *		CPUs for the user address space corresponding to 'vma->mm'.
  *		The invalidation operations are issued at a granularity
  *		determined by 'stride' and only affect any walk-cache entries
- *		if 'last_level' is equal to false.
+ *		if 'last_level' is equal to false. tlb_level is the level at
+ *		which the invalidation must take place. If the level is wrong,
+ *		no invalidation may take place. In the case where the level
+ *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will
+ *		perform a non-hinted invalidation.
  *
  *
  *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
@@ -284,8 +296,9 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     int tlb_level)
 {
 	int num = 0;
-	int scale = 0;
+	int scale = 3;
 	unsigned long asid, addr, pages;
+	bool lpa2_ena = lpa2_is_enabled();
 
 	start = round_down(start, stride);
 	end = round_up(end, stride);
@@ -309,17 +322,25 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 
 	/*
 	 * When the CPU does not support TLB range operations, flush the TLB
-	 * entries one by one at the granularity of 'stride'. If the TLB
-	 * range ops are supported, then:
+	 * entries one by one at the granularity of 'stride'. If the TLB range
+	 * ops are supported, then:
+	 *
+	 * 1. If FEAT_LPA2 is in use, the start address of a range operation
+	 *    must be 64KB aligned, so flush pages one by one until the
+	 *    alignment is reached using the non-range operations. This step is
+	 *    skipped if LPA2 is not in use.
 	 *
-	 * 1. If 'pages' is odd, flush the first page through non-range
-	 *    operations;
+	 * 2. For remaining pages: the minimum range granularity is decided by
+	 *    'scale', so multiple range TLBI operations may be required. Start
+	 *    from scale = 3, flush the corresponding number of pages
+	 *    ((num+1)*2^(5*scale+1) starting from 'addr'), then descrease it
+	 *    until one or zero pages are left. We must start from highest scale
+	 *    to ensure 64KB start alignment is maintained in the LPA2 case.
 	 *
-	 * 2. For remaining pages: the minimum range granularity is decided
-	 *    by 'scale', so multiple range TLBI operations may be required.
-	 *    Start from scale = 0, flush the corresponding number of pages
-	 *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
-	 *    until no pages left.
+	 * 3. If there is 1 page remaining, flush it through non-range
+	 *    operations. Range operations can only span an even number of
+	 *    pages. We save this for last to ensure 64KB start alignment is
+	 *    maintained for the LPA2 case.
 	 *
 	 * Note that certain ranges can be represented by either num = 31 and
 	 * scale or num = 0 and scale + 1. The loop below favours the latter
@@ -327,7 +348,8 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 	 */
 	while (pages > 0) {
 		if (!system_supports_tlb_range() ||
-		    pages % 2 == 1) {
+		    pages == 1 ||
+		    (lpa2_ena && start != ALIGN(start, SZ_64K))) {
 			addr = __TLBI_VADDR(start, asid);
 			if (last_level) {
 				__tlbi_level(vale1is, addr, tlb_level);
@@ -344,7 +366,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 		num = __TLBI_RANGE_NUM(pages, scale);
 		if (num >= 0) {
 			addr = __TLBI_VADDR_RANGE(start, asid, scale,
-						  num, tlb_level);
+						  num, tlb_level, lpa2_ena);
 			if (last_level) {
 				__tlbi(rvale1is, addr);
 				__tlbi_user(rvale1is, addr);
@@ -355,7 +377,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
 			pages -= __TLBI_RANGE_PAGES(num, scale);
 		}
-		scale++;
+		scale--;
 	}
 	dsb(ish);
 }
@@ -366,9 +388,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
 	/*
 	 * We cannot use leaf-only invalidation here, since we may be invalidating
 	 * table entries as part of collapsing hugepages or moving page tables.
-	 * Set the tlb_level to 0 because we can not get enough information here.
+	 * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough
+	 * information here.
 	 */
-	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
+	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
 }
 
 static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 03/12] KVM: arm64: Add new (V)TCR_EL2 field definitions for FEAT_LPA2
  2022-12-06 13:59 ` Ryan Roberts
  (?)
@ 2022-12-06 13:59   ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: kvmarm, kvmarm, linux-arm-kernel

As per Arm ARM (0487I.a), (V)TCR_EL2.DS fields control whether 52 bit
input and output addresses are supported on 4K and 16K page size
configurations when FEAT_LPA2 is known to have been implemented.
Additionally, VTCR_EL2.SL2 field is added to enable encoding of a 5th
starting level of translation, which is required with 4KB IPA size of
49-52 bits if concatenated first level page tables are not used.

This adds these field definitions which will be used by KVM when
FEAT_LPA2 is enabled.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index a82f2493a72b..f9619a10d5d9 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -92,6 +92,7 @@
 #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
 
 /* TCR_EL2 Registers bits */
+#define TCR_EL2_DS		(1UL << 32)
 #define TCR_EL2_RES1		((1U << 31) | (1 << 23))
 #define TCR_EL2_TBI		(1 << 20)
 #define TCR_EL2_PS_SHIFT	16
@@ -106,6 +107,9 @@
 			 TCR_EL2_ORGN0_MASK | TCR_EL2_IRGN0_MASK | TCR_EL2_T0SZ_MASK)
 
 /* VTCR_EL2 Registers bits */
+#define VTCR_EL2_SL2_SHIFT	33
+#define VTCR_EL2_SL2_MASK	(1UL << VTCR_EL2_SL2_SHIFT)
+#define VTCR_EL2_DS		TCR_EL2_DS
 #define VTCR_EL2_RES1		(1U << 31)
 #define VTCR_EL2_HD		(1 << 22)
 #define VTCR_EL2_HA		(1 << 21)
-- 
2.25.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 03/12] KVM: arm64: Add new (V)TCR_EL2 field definitions for FEAT_LPA2
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

As per Arm ARM (0487I.a), (V)TCR_EL2.DS fields control whether 52 bit
input and output addresses are supported on 4K and 16K page size
configurations when FEAT_LPA2 is known to have been implemented.
Additionally, VTCR_EL2.SL2 field is added to enable encoding of a 5th
starting level of translation, which is required with 4KB IPA size of
49-52 bits if concatenated first level page tables are not used.

This adds these field definitions which will be used by KVM when
FEAT_LPA2 is enabled.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index a82f2493a72b..f9619a10d5d9 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -92,6 +92,7 @@
 #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
 
 /* TCR_EL2 Registers bits */
+#define TCR_EL2_DS		(1UL << 32)
 #define TCR_EL2_RES1		((1U << 31) | (1 << 23))
 #define TCR_EL2_TBI		(1 << 20)
 #define TCR_EL2_PS_SHIFT	16
@@ -106,6 +107,9 @@
 			 TCR_EL2_ORGN0_MASK | TCR_EL2_IRGN0_MASK | TCR_EL2_T0SZ_MASK)
 
 /* VTCR_EL2 Registers bits */
+#define VTCR_EL2_SL2_SHIFT	33
+#define VTCR_EL2_SL2_MASK	(1UL << VTCR_EL2_SL2_SHIFT)
+#define VTCR_EL2_DS		TCR_EL2_DS
 #define VTCR_EL2_RES1		(1U << 31)
 #define VTCR_EL2_HD		(1 << 22)
 #define VTCR_EL2_HA		(1 << 21)
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 03/12] KVM: arm64: Add new (V)TCR_EL2 field definitions for FEAT_LPA2
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

As per Arm ARM (0487I.a), (V)TCR_EL2.DS fields control whether 52 bit
input and output addresses are supported on 4K and 16K page size
configurations when FEAT_LPA2 is known to have been implemented.
Additionally, VTCR_EL2.SL2 field is added to enable encoding of a 5th
starting level of translation, which is required with 4KB IPA size of
49-52 bits if concatenated first level page tables are not used.

This adds these field definitions which will be used by KVM when
FEAT_LPA2 is enabled.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index a82f2493a72b..f9619a10d5d9 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -92,6 +92,7 @@
 #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
 
 /* TCR_EL2 Registers bits */
+#define TCR_EL2_DS		(1UL << 32)
 #define TCR_EL2_RES1		((1U << 31) | (1 << 23))
 #define TCR_EL2_TBI		(1 << 20)
 #define TCR_EL2_PS_SHIFT	16
@@ -106,6 +107,9 @@
 			 TCR_EL2_ORGN0_MASK | TCR_EL2_IRGN0_MASK | TCR_EL2_T0SZ_MASK)
 
 /* VTCR_EL2 Registers bits */
+#define VTCR_EL2_SL2_SHIFT	33
+#define VTCR_EL2_SL2_MASK	(1UL << VTCR_EL2_SL2_SHIFT)
+#define VTCR_EL2_DS		TCR_EL2_DS
 #define VTCR_EL2_RES1		(1U << 31)
 #define VTCR_EL2_HD		(1 << 22)
 #define VTCR_EL2_HA		(1 << 21)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 04/12] KVM: arm64: Plumbing to enable multiple pgtable formats
  2022-12-06 13:59 ` Ryan Roberts
  (?)
@ 2022-12-06 13:59   ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: kvmarm, kvmarm, linux-arm-kernel

FEAT_LPA2 brings support for 52-bit input and output addresses for both
stage1 and stage2 translation when using 4KB and 16KB page sizes. The
architecture allows for the HW to support FEAT_LPA2 in one or both
stages of translation. When FEAT_LPA2 is enabled for a given stage, it
effectively changes the page table format; PTE bits change meaning and
blocks can be mapped at levels that were previously not possible.

All of this means that KVM has to support 2 page table formats and
decide which to use at runtime, after querying the HW. If FEAT_LPA2 is
advertised for stage1, KVM must choose to either use the classic format
or lpa2 format according to some policy for its hyp stage1, else it must
use the classic format. Independently, if FEAT_LPA2 is advertised for
stage2, KVM must which format to use for the vm stage2 tables according
to a policy.

As a first step towards enabling FEAT_LPA2, make struct kvm_pgtable
accessible to functions that will need to take different actions
depending on the page-table format. These functions are:

  - kvm_pte_to_phys()
  - kvm_phys_to_pte()
  - kvm_level_supports_block_mapping()
  - hyp_set_prot_attr()
  - stage2_set_prot_attr()

Fix this by more consistently passing the struct kvm_pgtable around as
the first parameter of each kvm_pgtable function call. As a result of
always passing it to walker callbacks, we can remove some ad-hoc members
from walker-specific data structures because those members are
accessible through the struct kvm_pgtable (notably mmu and mm_ops).

No functional changes are intended.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  23 ++--
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |   5 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |   8 +-
 arch/arm64/kvm/hyp/pgtable.c          | 181 +++++++++++++-------------
 4 files changed, 109 insertions(+), 108 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 3252eb50ecfe..2247ed74871a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -47,16 +47,6 @@ static inline bool kvm_pte_valid(kvm_pte_t pte)
 	return pte & KVM_PTE_VALID;
 }
 
-static inline u64 kvm_pte_to_phys(kvm_pte_t pte)
-{
-	u64 pa = pte & KVM_PTE_ADDR_MASK;
-
-	if (PAGE_SHIFT == 16)
-		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
-
-	return pa;
-}
-
 static inline u64 kvm_granule_shift(u32 level)
 {
 	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
@@ -184,6 +174,16 @@ struct kvm_pgtable {
 	kvm_pgtable_force_pte_cb_t		force_pte_cb;
 };
 
+static inline u64 kvm_pte_to_phys(struct kvm_pgtable *pgt, kvm_pte_t pte)
+{
+	u64 pa = pte & KVM_PTE_ADDR_MASK;
+
+	if (PAGE_SHIFT == 16)
+		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+
+	return pa;
+}
+
 /**
  * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
  * @KVM_PGTABLE_WALK_LEAF:		Visit leaf entries, including invalid
@@ -199,7 +199,8 @@ enum kvm_pgtable_walk_flags {
 	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
 };
 
-typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
+typedef int (*kvm_pgtable_visitor_fn_t)(struct kvm_pgtable *pgt,
+					u64 addr, u64 end, u32 level,
 					kvm_pte_t *ptep,
 					enum kvm_pgtable_walk_flags flag,
 					void * const arg);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 07f9dc9848ef..6bf54c8daffa 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -417,7 +417,8 @@ struct check_walk_data {
 	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
 };
 
-static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
+static int __check_page_state_visitor(struct kvm_pgtable *pgt,
+				      u64 addr, u64 end, u32 level,
 				      kvm_pte_t *ptep,
 				      enum kvm_pgtable_walk_flags flag,
 				      void * const arg)
@@ -425,7 +426,7 @@ static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
 	struct check_walk_data *d = arg;
 	kvm_pte_t pte = *ptep;
 
-	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pgt, pte)))
 		return -EINVAL;
 
 	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index e8d4ea2fcfa0..60a6821ae98a 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -186,12 +186,13 @@ static void hpool_put_page(void *addr)
 	hyp_put_page(&hpool, addr);
 }
 
-static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
+static int finalize_host_mappings_walker(struct kvm_pgtable *pgt,
+					 u64 addr, u64 end, u32 level,
 					 kvm_pte_t *ptep,
 					 enum kvm_pgtable_walk_flags flag,
 					 void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
 	kvm_pte_t pte = *ptep;
@@ -212,7 +213,7 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
-	phys = kvm_pte_to_phys(pte);
+	phys = kvm_pte_to_phys(pgt, pte);
 	if (!addr_is_memory(phys))
 		return -EINVAL;
 
@@ -242,7 +243,6 @@ static int finalize_host_mappings(void)
 	struct kvm_pgtable_walker walker = {
 		.cb	= finalize_host_mappings_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pkvm_pgtable.mm_ops,
 	};
 	int i, ret;
 
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index cdf8e76b0be1..221e0dafb149 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -59,12 +59,13 @@ struct kvm_pgtable_walk_data {
 
 #define KVM_PHYS_INVALID (-1ULL)
 
-static bool kvm_phys_is_valid(u64 phys)
+static bool kvm_phys_is_valid(struct kvm_pgtable *pgt, u64 phys)
 {
 	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
 }
 
-static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
+static bool kvm_block_mapping_supported(struct kvm_pgtable *pgt,
+					u64 addr, u64 end, u64 phys, u32 level)
 {
 	u64 granule = kvm_granule_size(level);
 
@@ -74,7 +75,7 @@ static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
 	if (granule > (end - addr))
 		return false;
 
-	if (kvm_phys_is_valid(phys) && !IS_ALIGNED(phys, granule))
+	if (kvm_phys_is_valid(pgt, phys) && !IS_ALIGNED(phys, granule))
 		return false;
 
 	return IS_ALIGNED(addr, granule);
@@ -122,7 +123,7 @@ static bool kvm_pte_table(kvm_pte_t pte, u32 level)
 	return FIELD_GET(KVM_PTE_TYPE, pte) == KVM_PTE_TYPE_TABLE;
 }
 
-static kvm_pte_t kvm_phys_to_pte(u64 pa)
+static kvm_pte_t kvm_phys_to_pte(struct kvm_pgtable *pgt, u64 pa)
 {
 	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
 
@@ -132,9 +133,9 @@ static kvm_pte_t kvm_phys_to_pte(u64 pa)
 	return pte;
 }
 
-static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte, struct kvm_pgtable_mm_ops *mm_ops)
+static kvm_pte_t *kvm_pte_follow(struct kvm_pgtable *pgt, kvm_pte_t pte)
 {
-	return mm_ops->phys_to_virt(kvm_pte_to_phys(pte));
+	return pgt->mm_ops->phys_to_virt(kvm_pte_to_phys(pgt, pte));
 }
 
 static void kvm_clear_pte(kvm_pte_t *ptep)
@@ -142,10 +143,11 @@ static void kvm_clear_pte(kvm_pte_t *ptep)
 	WRITE_ONCE(*ptep, 0);
 }
 
-static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
-			      struct kvm_pgtable_mm_ops *mm_ops)
+static void kvm_set_table_pte(struct kvm_pgtable *pgt,
+			      kvm_pte_t *ptep, kvm_pte_t *childp)
 {
-	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
+	kvm_pte_t old = *ptep;
+	kvm_pte_t pte = kvm_phys_to_pte(pgt, pgt->mm_ops->virt_to_phys(childp));
 
 	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
 	pte |= KVM_PTE_VALID;
@@ -154,9 +156,10 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
 	smp_store_release(ptep, pte);
 }
 
-static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
+static kvm_pte_t kvm_init_valid_leaf_pte(struct kvm_pgtable *pgt,
+					 u64 pa, kvm_pte_t attr, u32 level)
 {
-	kvm_pte_t pte = kvm_phys_to_pte(pa);
+	kvm_pte_t pte = kvm_phys_to_pte(pgt, pa);
 	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
 							   KVM_PTE_TYPE_BLOCK;
 
@@ -177,7 +180,8 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
 				  enum kvm_pgtable_walk_flags flag)
 {
 	struct kvm_pgtable_walker *walker = data->walker;
-	return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
+	return walker->cb(data->pgt,
+			  addr, data->end, level, ptep, flag, walker->arg);
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
@@ -213,7 +217,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(pte, data->pgt->mm_ops);
+	childp = kvm_pte_follow(data->pgt, pte);
 	ret = __kvm_pgtable_walk(data, childp, level + 1);
 	if (ret)
 		goto out;
@@ -292,7 +296,8 @@ struct leaf_walk_data {
 	u32		level;
 };
 
-static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int leaf_walker(struct kvm_pgtable *pgt,
+		       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		       enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct leaf_walk_data *data = arg;
@@ -329,10 +334,10 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
 struct hyp_map_data {
 	u64				phys;
 	kvm_pte_t			attr;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
+static int hyp_set_prot_attr(struct kvm_pgtable *pgt,
+			     enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
 {
 	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
 	u32 mtype = device ? MT_DEVICE_nGnRE : MT_NORMAL;
@@ -383,21 +388,22 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
 	return prot;
 }
 
-static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
+static bool hyp_map_walker_try_leaf(struct kvm_pgtable *pgt,
+				    u64 addr, u64 end, u32 level,
 				    kvm_pte_t *ptep, struct hyp_map_data *data)
 {
 	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
 
-	if (!kvm_block_mapping_supported(addr, end, phys, level))
+	if (!kvm_block_mapping_supported(pgt, addr, end, phys, level))
 		return false;
 
 	data->phys += granule;
-	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	new = kvm_init_valid_leaf_pte(pgt, phys, data->attr, level);
 	if (old == new)
 		return true;
 	if (!kvm_pte_valid(old))
-		data->mm_ops->get_page(ptep);
+		pgt->mm_ops->get_page(ptep);
 	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
 		return false;
 
@@ -405,14 +411,15 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	return true;
 }
 
-static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int hyp_map_walker(struct kvm_pgtable *pgt,
+			  u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			  enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t *childp;
 	struct hyp_map_data *data = arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 
-	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
+	if (hyp_map_walker_try_leaf(pgt, addr, end, level, ptep, data))
 		return 0;
 
 	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
@@ -422,7 +429,7 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (!childp)
 		return -ENOMEM;
 
-	kvm_set_table_pte(ptep, childp, mm_ops);
+	kvm_set_table_pte(pgt, ptep, childp);
 	mm_ops->get_page(ptep);
 	return 0;
 }
@@ -433,7 +440,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	int ret;
 	struct hyp_map_data map_data = {
 		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
-		.mm_ops	= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_map_walker,
@@ -441,7 +447,7 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 		.arg	= &map_data,
 	};
 
-	ret = hyp_set_prot_attr(prot, &map_data.attr);
+	ret = hyp_set_prot_attr(pgt, prot, &map_data.attr);
 	if (ret)
 		return ret;
 
@@ -453,22 +459,22 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 
 struct hyp_unmap_data {
 	u64				unmapped;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int hyp_unmap_walker(struct kvm_pgtable *pgt,
+			    u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			    enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t pte = *ptep, *childp = NULL;
 	u64 granule = kvm_granule_size(level);
 	struct hyp_unmap_data *data = arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 
 	if (!kvm_pte_valid(pte))
 		return -EINVAL;
 
 	if (kvm_pte_table(pte, level)) {
-		childp = kvm_pte_follow(pte, mm_ops);
+		childp = kvm_pte_follow(pgt, pte);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
@@ -498,9 +504,7 @@ static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 
 u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
-	struct hyp_unmap_data unmap_data = {
-		.mm_ops	= pgt->mm_ops,
-	};
+	struct hyp_unmap_data unmap_data = {};
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_unmap_walker,
 		.arg	= &unmap_data,
@@ -532,10 +536,11 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	return 0;
 }
 
-static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int hyp_free_walker(struct kvm_pgtable *pgt,
+			   u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			   enum kvm_pgtable_walk_flags flag, void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep;
 
 	if (!kvm_pte_valid(pte))
@@ -544,7 +549,7 @@ static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	mm_ops->put_page(ptep);
 
 	if (kvm_pte_table(pte, level))
-		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
+		mm_ops->put_page(kvm_pte_follow(pgt, pte));
 
 	return 0;
 }
@@ -554,7 +559,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
@@ -570,11 +574,8 @@ struct stage2_map_data {
 	kvm_pte_t			*anchor;
 	kvm_pte_t			*childp;
 
-	struct kvm_s2_mmu		*mmu;
 	void				*memcache;
 
-	struct kvm_pgtable_mm_ops	*mm_ops;
-
 	/* Force mappings to page granularity */
 	bool				force_pte;
 };
@@ -708,29 +709,30 @@ static bool stage2_pte_executable(kvm_pte_t pte)
 	return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
 }
 
-static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
+static bool stage2_leaf_mapping_allowed(struct kvm_pgtable *pgt,
+					u64 addr, u64 end, u32 level,
 					struct stage2_map_data *data)
 {
 	if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
 		return false;
 
-	return kvm_block_mapping_supported(addr, end, data->phys, level);
+	return kvm_block_mapping_supported(pgt, addr, end, data->phys, level);
 }
 
-static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
+static int stage2_map_walker_try_leaf(struct kvm_pgtable *pgt,
+				      u64 addr, u64 end, u32 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
 	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
-	struct kvm_pgtable *pgt = data->mmu->pgt;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 
-	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
+	if (!stage2_leaf_mapping_allowed(pgt, addr, end, level, data))
 		return -E2BIG;
 
-	if (kvm_phys_is_valid(phys))
-		new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	if (kvm_phys_is_valid(pgt, phys))
+		new = kvm_init_valid_leaf_pte(pgt, phys, data->attr, level);
 	else
 		new = kvm_init_invalid_leaf_owner(data->owner_id);
 
@@ -744,36 +746,37 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 		if (!stage2_pte_needs_update(old, new))
 			return -EAGAIN;
 
-		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
+		stage2_put_pte(ptep, pgt->mmu, addr, level, mm_ops);
 	}
 
 	/* Perform CMOs before installation of the guest stage-2 PTE */
 	if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pgt, new),
 						granule);
 
 	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
-		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
+		mm_ops->icache_inval_pou(kvm_pte_follow(pgt, new), granule);
 
 	smp_store_release(ptep, new);
 	if (stage2_pte_is_counted(new))
 		mm_ops->get_page(ptep);
-	if (kvm_phys_is_valid(phys))
+	if (kvm_phys_is_valid(pgt, phys))
 		data->phys += granule;
 	return 0;
 }
 
-static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
+static int stage2_map_walk_table_pre(struct kvm_pgtable *pgt,
+				     u64 addr, u64 end, u32 level,
 				     kvm_pte_t *ptep,
 				     struct stage2_map_data *data)
 {
 	if (data->anchor)
 		return 0;
 
-	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
+	if (!stage2_leaf_mapping_allowed(pgt, addr, end, level, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
+	data->childp = kvm_pte_follow(pgt, *ptep);
 	kvm_clear_pte(ptep);
 
 	/*
@@ -781,15 +784,16 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 	 * entries below us which would otherwise need invalidating
 	 * individually.
 	 */
-	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
+	kvm_call_hyp(__kvm_tlb_flush_vmid, pgt->mmu);
 	data->anchor = ptep;
 	return 0;
 }
 
-static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_map_walk_leaf(struct kvm_pgtable *pgt,
+				u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 				struct stage2_map_data *data)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t *childp, pte = *ptep;
 	int ret;
 
@@ -800,7 +804,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		return 0;
 	}
 
-	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
+	ret = stage2_map_walker_try_leaf(pgt, addr, end, level, ptep, data);
 	if (ret != -E2BIG)
 		return ret;
 
@@ -820,19 +824,20 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 * will be mapped lazily.
 	 */
 	if (stage2_pte_is_counted(pte))
-		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
+		stage2_put_pte(ptep, pgt->mmu, addr, level, mm_ops);
 
-	kvm_set_table_pte(ptep, childp, mm_ops);
+	kvm_set_table_pte(pgt, ptep, childp);
 	mm_ops->get_page(ptep);
 
 	return 0;
 }
 
-static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
+static int stage2_map_walk_table_post(struct kvm_pgtable *pgt,
+				      u64 addr, u64 end, u32 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t *childp;
 	int ret = 0;
 
@@ -843,9 +848,9 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 		childp = data->childp;
 		data->anchor = NULL;
 		data->childp = NULL;
-		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
+		ret = stage2_map_walk_leaf(pgt, addr, end, level, ptep, data);
 	} else {
-		childp = kvm_pte_follow(*ptep, mm_ops);
+		childp = kvm_pte_follow(pgt, *ptep);
 	}
 
 	mm_ops->put_page(childp);
@@ -873,18 +878,19 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
  * the page-table, installing the block entry when it revisits the anchor
  * pointer and clearing the anchor to NULL.
  */
-static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_map_walker(struct kvm_pgtable *pgt,
+			     u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			     enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct stage2_map_data *data = arg;
 
 	switch (flag) {
 	case KVM_PGTABLE_WALK_TABLE_PRE:
-		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
+		return stage2_map_walk_table_pre(pgt, addr, end, level, ptep, data);
 	case KVM_PGTABLE_WALK_LEAF:
-		return stage2_map_walk_leaf(addr, end, level, ptep, data);
+		return stage2_map_walk_leaf(pgt, addr, end, level, ptep, data);
 	case KVM_PGTABLE_WALK_TABLE_POST:
-		return stage2_map_walk_table_post(addr, end, level, ptep, data);
+		return stage2_map_walk_table_post(pgt, addr, end, level, ptep, data);
 	}
 
 	return -EINVAL;
@@ -897,9 +903,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	int ret;
 	struct stage2_map_data map_data = {
 		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
-		.mmu		= pgt->mmu,
 		.memcache	= mc,
-		.mm_ops		= pgt->mm_ops,
 		.force_pte	= pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
 	};
 	struct kvm_pgtable_walker walker = {
@@ -928,9 +932,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	int ret;
 	struct stage2_map_data map_data = {
 		.phys		= KVM_PHYS_INVALID,
-		.mmu		= pgt->mmu,
 		.memcache	= mc,
-		.mm_ops		= pgt->mm_ops,
 		.owner_id	= owner_id,
 		.force_pte	= true,
 	};
@@ -949,11 +951,11 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	return ret;
 }
 
-static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_unmap_walker(struct kvm_pgtable *pgt,
+			       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
-	struct kvm_pgtable *pgt = arg;
 	struct kvm_s2_mmu *mmu = pgt->mmu;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep, *childp = NULL;
@@ -968,7 +970,7 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	}
 
 	if (kvm_pte_table(pte, level)) {
-		childp = kvm_pte_follow(pte, mm_ops);
+		childp = kvm_pte_follow(pgt, pte);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
@@ -984,7 +986,7 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	stage2_put_pte(ptep, mmu, addr, level, mm_ops);
 
 	if (need_flush && mm_ops->dcache_clean_inval_poc)
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pgt, pte),
 					       kvm_granule_size(level));
 
 	if (childp)
@@ -997,7 +999,6 @@ int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_unmap_walker,
-		.arg	= pgt,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
 	};
 
@@ -1009,16 +1010,16 @@ struct stage2_attr_data {
 	kvm_pte_t			attr_clr;
 	kvm_pte_t			pte;
 	u32				level;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_attr_walker(struct kvm_pgtable *pgt,
+			      u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
 	kvm_pte_t pte = *ptep;
 	struct stage2_attr_data *data = arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 
 	if (!kvm_pte_valid(pte))
 		return 0;
@@ -1040,7 +1041,7 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		 */
 		if (mm_ops->icache_inval_pou &&
 		    stage2_pte_executable(pte) && !stage2_pte_executable(*ptep))
-			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
+			mm_ops->icache_inval_pou(kvm_pte_follow(pgt, pte),
 						  kvm_granule_size(level));
 		WRITE_ONCE(*ptep, pte);
 	}
@@ -1058,7 +1059,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 	struct stage2_attr_data data = {
 		.attr_set	= attr_set & attr_mask,
 		.attr_clr	= attr_clr & attr_mask,
-		.mm_ops		= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_attr_walker,
@@ -1140,11 +1140,11 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 	return ret;
 }
 
-static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_flush_walker(struct kvm_pgtable *pgt,
+			       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
-	struct kvm_pgtable *pgt = arg;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep;
 
@@ -1152,7 +1152,7 @@ static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		return 0;
 
 	if (mm_ops->dcache_clean_inval_poc)
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pgt, pte),
 					       kvm_granule_size(level));
 	return 0;
 }
@@ -1162,7 +1162,6 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_flush_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF,
-		.arg	= pgt,
 	};
 
 	if (stage2_has_fwb(pgt))
@@ -1200,11 +1199,12 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	return 0;
 }
 
-static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_free_walker(struct kvm_pgtable *pgt,
+			      u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep;
 
 	if (!stage2_pte_is_counted(pte))
@@ -1213,7 +1213,7 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	mm_ops->put_page(ptep);
 
 	if (kvm_pte_table(pte, level))
-		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
+		mm_ops->put_page(kvm_pte_follow(pgt, pte));
 
 	return 0;
 }
@@ -1225,7 +1225,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 		.cb	= stage2_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF |
 			  KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-- 
2.25.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 04/12] KVM: arm64: Plumbing to enable multiple pgtable formats
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

FEAT_LPA2 brings support for 52-bit input and output addresses for both
stage1 and stage2 translation when using 4KB and 16KB page sizes. The
architecture allows for the HW to support FEAT_LPA2 in one or both
stages of translation. When FEAT_LPA2 is enabled for a given stage, it
effectively changes the page table format; PTE bits change meaning and
blocks can be mapped at levels that were previously not possible.

All of this means that KVM has to support 2 page table formats and
decide which to use at runtime, after querying the HW. If FEAT_LPA2 is
advertised for stage1, KVM must choose to either use the classic format
or lpa2 format according to some policy for its hyp stage1, else it must
use the classic format. Independently, if FEAT_LPA2 is advertised for
stage2, KVM must which format to use for the vm stage2 tables according
to a policy.

As a first step towards enabling FEAT_LPA2, make struct kvm_pgtable
accessible to functions that will need to take different actions
depending on the page-table format. These functions are:

  - kvm_pte_to_phys()
  - kvm_phys_to_pte()
  - kvm_level_supports_block_mapping()
  - hyp_set_prot_attr()
  - stage2_set_prot_attr()

Fix this by more consistently passing the struct kvm_pgtable around as
the first parameter of each kvm_pgtable function call. As a result of
always passing it to walker callbacks, we can remove some ad-hoc members
from walker-specific data structures because those members are
accessible through the struct kvm_pgtable (notably mmu and mm_ops).

No functional changes are intended.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  23 ++--
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |   5 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |   8 +-
 arch/arm64/kvm/hyp/pgtable.c          | 181 +++++++++++++-------------
 4 files changed, 109 insertions(+), 108 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 3252eb50ecfe..2247ed74871a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -47,16 +47,6 @@ static inline bool kvm_pte_valid(kvm_pte_t pte)
 	return pte & KVM_PTE_VALID;
 }
 
-static inline u64 kvm_pte_to_phys(kvm_pte_t pte)
-{
-	u64 pa = pte & KVM_PTE_ADDR_MASK;
-
-	if (PAGE_SHIFT == 16)
-		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
-
-	return pa;
-}
-
 static inline u64 kvm_granule_shift(u32 level)
 {
 	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
@@ -184,6 +174,16 @@ struct kvm_pgtable {
 	kvm_pgtable_force_pte_cb_t		force_pte_cb;
 };
 
+static inline u64 kvm_pte_to_phys(struct kvm_pgtable *pgt, kvm_pte_t pte)
+{
+	u64 pa = pte & KVM_PTE_ADDR_MASK;
+
+	if (PAGE_SHIFT == 16)
+		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+
+	return pa;
+}
+
 /**
  * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
  * @KVM_PGTABLE_WALK_LEAF:		Visit leaf entries, including invalid
@@ -199,7 +199,8 @@ enum kvm_pgtable_walk_flags {
 	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
 };
 
-typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
+typedef int (*kvm_pgtable_visitor_fn_t)(struct kvm_pgtable *pgt,
+					u64 addr, u64 end, u32 level,
 					kvm_pte_t *ptep,
 					enum kvm_pgtable_walk_flags flag,
 					void * const arg);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 07f9dc9848ef..6bf54c8daffa 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -417,7 +417,8 @@ struct check_walk_data {
 	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
 };
 
-static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
+static int __check_page_state_visitor(struct kvm_pgtable *pgt,
+				      u64 addr, u64 end, u32 level,
 				      kvm_pte_t *ptep,
 				      enum kvm_pgtable_walk_flags flag,
 				      void * const arg)
@@ -425,7 +426,7 @@ static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
 	struct check_walk_data *d = arg;
 	kvm_pte_t pte = *ptep;
 
-	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pgt, pte)))
 		return -EINVAL;
 
 	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index e8d4ea2fcfa0..60a6821ae98a 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -186,12 +186,13 @@ static void hpool_put_page(void *addr)
 	hyp_put_page(&hpool, addr);
 }
 
-static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
+static int finalize_host_mappings_walker(struct kvm_pgtable *pgt,
+					 u64 addr, u64 end, u32 level,
 					 kvm_pte_t *ptep,
 					 enum kvm_pgtable_walk_flags flag,
 					 void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
 	kvm_pte_t pte = *ptep;
@@ -212,7 +213,7 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
-	phys = kvm_pte_to_phys(pte);
+	phys = kvm_pte_to_phys(pgt, pte);
 	if (!addr_is_memory(phys))
 		return -EINVAL;
 
@@ -242,7 +243,6 @@ static int finalize_host_mappings(void)
 	struct kvm_pgtable_walker walker = {
 		.cb	= finalize_host_mappings_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pkvm_pgtable.mm_ops,
 	};
 	int i, ret;
 
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index cdf8e76b0be1..221e0dafb149 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -59,12 +59,13 @@ struct kvm_pgtable_walk_data {
 
 #define KVM_PHYS_INVALID (-1ULL)
 
-static bool kvm_phys_is_valid(u64 phys)
+static bool kvm_phys_is_valid(struct kvm_pgtable *pgt, u64 phys)
 {
 	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
 }
 
-static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
+static bool kvm_block_mapping_supported(struct kvm_pgtable *pgt,
+					u64 addr, u64 end, u64 phys, u32 level)
 {
 	u64 granule = kvm_granule_size(level);
 
@@ -74,7 +75,7 @@ static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
 	if (granule > (end - addr))
 		return false;
 
-	if (kvm_phys_is_valid(phys) && !IS_ALIGNED(phys, granule))
+	if (kvm_phys_is_valid(pgt, phys) && !IS_ALIGNED(phys, granule))
 		return false;
 
 	return IS_ALIGNED(addr, granule);
@@ -122,7 +123,7 @@ static bool kvm_pte_table(kvm_pte_t pte, u32 level)
 	return FIELD_GET(KVM_PTE_TYPE, pte) == KVM_PTE_TYPE_TABLE;
 }
 
-static kvm_pte_t kvm_phys_to_pte(u64 pa)
+static kvm_pte_t kvm_phys_to_pte(struct kvm_pgtable *pgt, u64 pa)
 {
 	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
 
@@ -132,9 +133,9 @@ static kvm_pte_t kvm_phys_to_pte(u64 pa)
 	return pte;
 }
 
-static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte, struct kvm_pgtable_mm_ops *mm_ops)
+static kvm_pte_t *kvm_pte_follow(struct kvm_pgtable *pgt, kvm_pte_t pte)
 {
-	return mm_ops->phys_to_virt(kvm_pte_to_phys(pte));
+	return pgt->mm_ops->phys_to_virt(kvm_pte_to_phys(pgt, pte));
 }
 
 static void kvm_clear_pte(kvm_pte_t *ptep)
@@ -142,10 +143,11 @@ static void kvm_clear_pte(kvm_pte_t *ptep)
 	WRITE_ONCE(*ptep, 0);
 }
 
-static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
-			      struct kvm_pgtable_mm_ops *mm_ops)
+static void kvm_set_table_pte(struct kvm_pgtable *pgt,
+			      kvm_pte_t *ptep, kvm_pte_t *childp)
 {
-	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
+	kvm_pte_t old = *ptep;
+	kvm_pte_t pte = kvm_phys_to_pte(pgt, pgt->mm_ops->virt_to_phys(childp));
 
 	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
 	pte |= KVM_PTE_VALID;
@@ -154,9 +156,10 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
 	smp_store_release(ptep, pte);
 }
 
-static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
+static kvm_pte_t kvm_init_valid_leaf_pte(struct kvm_pgtable *pgt,
+					 u64 pa, kvm_pte_t attr, u32 level)
 {
-	kvm_pte_t pte = kvm_phys_to_pte(pa);
+	kvm_pte_t pte = kvm_phys_to_pte(pgt, pa);
 	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
 							   KVM_PTE_TYPE_BLOCK;
 
@@ -177,7 +180,8 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
 				  enum kvm_pgtable_walk_flags flag)
 {
 	struct kvm_pgtable_walker *walker = data->walker;
-	return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
+	return walker->cb(data->pgt,
+			  addr, data->end, level, ptep, flag, walker->arg);
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
@@ -213,7 +217,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(pte, data->pgt->mm_ops);
+	childp = kvm_pte_follow(data->pgt, pte);
 	ret = __kvm_pgtable_walk(data, childp, level + 1);
 	if (ret)
 		goto out;
@@ -292,7 +296,8 @@ struct leaf_walk_data {
 	u32		level;
 };
 
-static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int leaf_walker(struct kvm_pgtable *pgt,
+		       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		       enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct leaf_walk_data *data = arg;
@@ -329,10 +334,10 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
 struct hyp_map_data {
 	u64				phys;
 	kvm_pte_t			attr;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
+static int hyp_set_prot_attr(struct kvm_pgtable *pgt,
+			     enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
 {
 	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
 	u32 mtype = device ? MT_DEVICE_nGnRE : MT_NORMAL;
@@ -383,21 +388,22 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
 	return prot;
 }
 
-static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
+static bool hyp_map_walker_try_leaf(struct kvm_pgtable *pgt,
+				    u64 addr, u64 end, u32 level,
 				    kvm_pte_t *ptep, struct hyp_map_data *data)
 {
 	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
 
-	if (!kvm_block_mapping_supported(addr, end, phys, level))
+	if (!kvm_block_mapping_supported(pgt, addr, end, phys, level))
 		return false;
 
 	data->phys += granule;
-	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	new = kvm_init_valid_leaf_pte(pgt, phys, data->attr, level);
 	if (old == new)
 		return true;
 	if (!kvm_pte_valid(old))
-		data->mm_ops->get_page(ptep);
+		pgt->mm_ops->get_page(ptep);
 	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
 		return false;
 
@@ -405,14 +411,15 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	return true;
 }
 
-static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int hyp_map_walker(struct kvm_pgtable *pgt,
+			  u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			  enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t *childp;
 	struct hyp_map_data *data = arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 
-	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
+	if (hyp_map_walker_try_leaf(pgt, addr, end, level, ptep, data))
 		return 0;
 
 	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
@@ -422,7 +429,7 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (!childp)
 		return -ENOMEM;
 
-	kvm_set_table_pte(ptep, childp, mm_ops);
+	kvm_set_table_pte(pgt, ptep, childp);
 	mm_ops->get_page(ptep);
 	return 0;
 }
@@ -433,7 +440,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	int ret;
 	struct hyp_map_data map_data = {
 		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
-		.mm_ops	= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_map_walker,
@@ -441,7 +447,7 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 		.arg	= &map_data,
 	};
 
-	ret = hyp_set_prot_attr(prot, &map_data.attr);
+	ret = hyp_set_prot_attr(pgt, prot, &map_data.attr);
 	if (ret)
 		return ret;
 
@@ -453,22 +459,22 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 
 struct hyp_unmap_data {
 	u64				unmapped;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int hyp_unmap_walker(struct kvm_pgtable *pgt,
+			    u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			    enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t pte = *ptep, *childp = NULL;
 	u64 granule = kvm_granule_size(level);
 	struct hyp_unmap_data *data = arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 
 	if (!kvm_pte_valid(pte))
 		return -EINVAL;
 
 	if (kvm_pte_table(pte, level)) {
-		childp = kvm_pte_follow(pte, mm_ops);
+		childp = kvm_pte_follow(pgt, pte);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
@@ -498,9 +504,7 @@ static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 
 u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
-	struct hyp_unmap_data unmap_data = {
-		.mm_ops	= pgt->mm_ops,
-	};
+	struct hyp_unmap_data unmap_data = {};
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_unmap_walker,
 		.arg	= &unmap_data,
@@ -532,10 +536,11 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	return 0;
 }
 
-static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int hyp_free_walker(struct kvm_pgtable *pgt,
+			   u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			   enum kvm_pgtable_walk_flags flag, void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep;
 
 	if (!kvm_pte_valid(pte))
@@ -544,7 +549,7 @@ static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	mm_ops->put_page(ptep);
 
 	if (kvm_pte_table(pte, level))
-		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
+		mm_ops->put_page(kvm_pte_follow(pgt, pte));
 
 	return 0;
 }
@@ -554,7 +559,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
@@ -570,11 +574,8 @@ struct stage2_map_data {
 	kvm_pte_t			*anchor;
 	kvm_pte_t			*childp;
 
-	struct kvm_s2_mmu		*mmu;
 	void				*memcache;
 
-	struct kvm_pgtable_mm_ops	*mm_ops;
-
 	/* Force mappings to page granularity */
 	bool				force_pte;
 };
@@ -708,29 +709,30 @@ static bool stage2_pte_executable(kvm_pte_t pte)
 	return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
 }
 
-static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
+static bool stage2_leaf_mapping_allowed(struct kvm_pgtable *pgt,
+					u64 addr, u64 end, u32 level,
 					struct stage2_map_data *data)
 {
 	if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
 		return false;
 
-	return kvm_block_mapping_supported(addr, end, data->phys, level);
+	return kvm_block_mapping_supported(pgt, addr, end, data->phys, level);
 }
 
-static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
+static int stage2_map_walker_try_leaf(struct kvm_pgtable *pgt,
+				      u64 addr, u64 end, u32 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
 	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
-	struct kvm_pgtable *pgt = data->mmu->pgt;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 
-	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
+	if (!stage2_leaf_mapping_allowed(pgt, addr, end, level, data))
 		return -E2BIG;
 
-	if (kvm_phys_is_valid(phys))
-		new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	if (kvm_phys_is_valid(pgt, phys))
+		new = kvm_init_valid_leaf_pte(pgt, phys, data->attr, level);
 	else
 		new = kvm_init_invalid_leaf_owner(data->owner_id);
 
@@ -744,36 +746,37 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 		if (!stage2_pte_needs_update(old, new))
 			return -EAGAIN;
 
-		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
+		stage2_put_pte(ptep, pgt->mmu, addr, level, mm_ops);
 	}
 
 	/* Perform CMOs before installation of the guest stage-2 PTE */
 	if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pgt, new),
 						granule);
 
 	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
-		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
+		mm_ops->icache_inval_pou(kvm_pte_follow(pgt, new), granule);
 
 	smp_store_release(ptep, new);
 	if (stage2_pte_is_counted(new))
 		mm_ops->get_page(ptep);
-	if (kvm_phys_is_valid(phys))
+	if (kvm_phys_is_valid(pgt, phys))
 		data->phys += granule;
 	return 0;
 }
 
-static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
+static int stage2_map_walk_table_pre(struct kvm_pgtable *pgt,
+				     u64 addr, u64 end, u32 level,
 				     kvm_pte_t *ptep,
 				     struct stage2_map_data *data)
 {
 	if (data->anchor)
 		return 0;
 
-	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
+	if (!stage2_leaf_mapping_allowed(pgt, addr, end, level, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
+	data->childp = kvm_pte_follow(pgt, *ptep);
 	kvm_clear_pte(ptep);
 
 	/*
@@ -781,15 +784,16 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 	 * entries below us which would otherwise need invalidating
 	 * individually.
 	 */
-	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
+	kvm_call_hyp(__kvm_tlb_flush_vmid, pgt->mmu);
 	data->anchor = ptep;
 	return 0;
 }
 
-static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_map_walk_leaf(struct kvm_pgtable *pgt,
+				u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 				struct stage2_map_data *data)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t *childp, pte = *ptep;
 	int ret;
 
@@ -800,7 +804,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		return 0;
 	}
 
-	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
+	ret = stage2_map_walker_try_leaf(pgt, addr, end, level, ptep, data);
 	if (ret != -E2BIG)
 		return ret;
 
@@ -820,19 +824,20 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 * will be mapped lazily.
 	 */
 	if (stage2_pte_is_counted(pte))
-		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
+		stage2_put_pte(ptep, pgt->mmu, addr, level, mm_ops);
 
-	kvm_set_table_pte(ptep, childp, mm_ops);
+	kvm_set_table_pte(pgt, ptep, childp);
 	mm_ops->get_page(ptep);
 
 	return 0;
 }
 
-static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
+static int stage2_map_walk_table_post(struct kvm_pgtable *pgt,
+				      u64 addr, u64 end, u32 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t *childp;
 	int ret = 0;
 
@@ -843,9 +848,9 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 		childp = data->childp;
 		data->anchor = NULL;
 		data->childp = NULL;
-		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
+		ret = stage2_map_walk_leaf(pgt, addr, end, level, ptep, data);
 	} else {
-		childp = kvm_pte_follow(*ptep, mm_ops);
+		childp = kvm_pte_follow(pgt, *ptep);
 	}
 
 	mm_ops->put_page(childp);
@@ -873,18 +878,19 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
  * the page-table, installing the block entry when it revisits the anchor
  * pointer and clearing the anchor to NULL.
  */
-static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_map_walker(struct kvm_pgtable *pgt,
+			     u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			     enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct stage2_map_data *data = arg;
 
 	switch (flag) {
 	case KVM_PGTABLE_WALK_TABLE_PRE:
-		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
+		return stage2_map_walk_table_pre(pgt, addr, end, level, ptep, data);
 	case KVM_PGTABLE_WALK_LEAF:
-		return stage2_map_walk_leaf(addr, end, level, ptep, data);
+		return stage2_map_walk_leaf(pgt, addr, end, level, ptep, data);
 	case KVM_PGTABLE_WALK_TABLE_POST:
-		return stage2_map_walk_table_post(addr, end, level, ptep, data);
+		return stage2_map_walk_table_post(pgt, addr, end, level, ptep, data);
 	}
 
 	return -EINVAL;
@@ -897,9 +903,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	int ret;
 	struct stage2_map_data map_data = {
 		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
-		.mmu		= pgt->mmu,
 		.memcache	= mc,
-		.mm_ops		= pgt->mm_ops,
 		.force_pte	= pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
 	};
 	struct kvm_pgtable_walker walker = {
@@ -928,9 +932,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	int ret;
 	struct stage2_map_data map_data = {
 		.phys		= KVM_PHYS_INVALID,
-		.mmu		= pgt->mmu,
 		.memcache	= mc,
-		.mm_ops		= pgt->mm_ops,
 		.owner_id	= owner_id,
 		.force_pte	= true,
 	};
@@ -949,11 +951,11 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	return ret;
 }
 
-static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_unmap_walker(struct kvm_pgtable *pgt,
+			       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
-	struct kvm_pgtable *pgt = arg;
 	struct kvm_s2_mmu *mmu = pgt->mmu;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep, *childp = NULL;
@@ -968,7 +970,7 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	}
 
 	if (kvm_pte_table(pte, level)) {
-		childp = kvm_pte_follow(pte, mm_ops);
+		childp = kvm_pte_follow(pgt, pte);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
@@ -984,7 +986,7 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	stage2_put_pte(ptep, mmu, addr, level, mm_ops);
 
 	if (need_flush && mm_ops->dcache_clean_inval_poc)
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pgt, pte),
 					       kvm_granule_size(level));
 
 	if (childp)
@@ -997,7 +999,6 @@ int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_unmap_walker,
-		.arg	= pgt,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
 	};
 
@@ -1009,16 +1010,16 @@ struct stage2_attr_data {
 	kvm_pte_t			attr_clr;
 	kvm_pte_t			pte;
 	u32				level;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_attr_walker(struct kvm_pgtable *pgt,
+			      u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
 	kvm_pte_t pte = *ptep;
 	struct stage2_attr_data *data = arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 
 	if (!kvm_pte_valid(pte))
 		return 0;
@@ -1040,7 +1041,7 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		 */
 		if (mm_ops->icache_inval_pou &&
 		    stage2_pte_executable(pte) && !stage2_pte_executable(*ptep))
-			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
+			mm_ops->icache_inval_pou(kvm_pte_follow(pgt, pte),
 						  kvm_granule_size(level));
 		WRITE_ONCE(*ptep, pte);
 	}
@@ -1058,7 +1059,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 	struct stage2_attr_data data = {
 		.attr_set	= attr_set & attr_mask,
 		.attr_clr	= attr_clr & attr_mask,
-		.mm_ops		= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_attr_walker,
@@ -1140,11 +1140,11 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 	return ret;
 }
 
-static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_flush_walker(struct kvm_pgtable *pgt,
+			       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
-	struct kvm_pgtable *pgt = arg;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep;
 
@@ -1152,7 +1152,7 @@ static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		return 0;
 
 	if (mm_ops->dcache_clean_inval_poc)
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pgt, pte),
 					       kvm_granule_size(level));
 	return 0;
 }
@@ -1162,7 +1162,6 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_flush_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF,
-		.arg	= pgt,
 	};
 
 	if (stage2_has_fwb(pgt))
@@ -1200,11 +1199,12 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	return 0;
 }
 
-static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_free_walker(struct kvm_pgtable *pgt,
+			      u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep;
 
 	if (!stage2_pte_is_counted(pte))
@@ -1213,7 +1213,7 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	mm_ops->put_page(ptep);
 
 	if (kvm_pte_table(pte, level))
-		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
+		mm_ops->put_page(kvm_pte_follow(pgt, pte));
 
 	return 0;
 }
@@ -1225,7 +1225,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 		.cb	= stage2_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF |
 			  KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 04/12] KVM: arm64: Plumbing to enable multiple pgtable formats
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

FEAT_LPA2 brings support for 52-bit input and output addresses for both
stage1 and stage2 translation when using 4KB and 16KB page sizes. The
architecture allows for the HW to support FEAT_LPA2 in one or both
stages of translation. When FEAT_LPA2 is enabled for a given stage, it
effectively changes the page table format; PTE bits change meaning and
blocks can be mapped at levels that were previously not possible.

All of this means that KVM has to support 2 page table formats and
decide which to use at runtime, after querying the HW. If FEAT_LPA2 is
advertised for stage1, KVM must choose to either use the classic format
or lpa2 format according to some policy for its hyp stage1, else it must
use the classic format. Independently, if FEAT_LPA2 is advertised for
stage2, KVM must which format to use for the vm stage2 tables according
to a policy.

As a first step towards enabling FEAT_LPA2, make struct kvm_pgtable
accessible to functions that will need to take different actions
depending on the page-table format. These functions are:

  - kvm_pte_to_phys()
  - kvm_phys_to_pte()
  - kvm_level_supports_block_mapping()
  - hyp_set_prot_attr()
  - stage2_set_prot_attr()

Fix this by more consistently passing the struct kvm_pgtable around as
the first parameter of each kvm_pgtable function call. As a result of
always passing it to walker callbacks, we can remove some ad-hoc members
from walker-specific data structures because those members are
accessible through the struct kvm_pgtable (notably mmu and mm_ops).

No functional changes are intended.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  23 ++--
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |   5 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |   8 +-
 arch/arm64/kvm/hyp/pgtable.c          | 181 +++++++++++++-------------
 4 files changed, 109 insertions(+), 108 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 3252eb50ecfe..2247ed74871a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -47,16 +47,6 @@ static inline bool kvm_pte_valid(kvm_pte_t pte)
 	return pte & KVM_PTE_VALID;
 }
 
-static inline u64 kvm_pte_to_phys(kvm_pte_t pte)
-{
-	u64 pa = pte & KVM_PTE_ADDR_MASK;
-
-	if (PAGE_SHIFT == 16)
-		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
-
-	return pa;
-}
-
 static inline u64 kvm_granule_shift(u32 level)
 {
 	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
@@ -184,6 +174,16 @@ struct kvm_pgtable {
 	kvm_pgtable_force_pte_cb_t		force_pte_cb;
 };
 
+static inline u64 kvm_pte_to_phys(struct kvm_pgtable *pgt, kvm_pte_t pte)
+{
+	u64 pa = pte & KVM_PTE_ADDR_MASK;
+
+	if (PAGE_SHIFT == 16)
+		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+
+	return pa;
+}
+
 /**
  * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
  * @KVM_PGTABLE_WALK_LEAF:		Visit leaf entries, including invalid
@@ -199,7 +199,8 @@ enum kvm_pgtable_walk_flags {
 	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
 };
 
-typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
+typedef int (*kvm_pgtable_visitor_fn_t)(struct kvm_pgtable *pgt,
+					u64 addr, u64 end, u32 level,
 					kvm_pte_t *ptep,
 					enum kvm_pgtable_walk_flags flag,
 					void * const arg);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 07f9dc9848ef..6bf54c8daffa 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -417,7 +417,8 @@ struct check_walk_data {
 	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
 };
 
-static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
+static int __check_page_state_visitor(struct kvm_pgtable *pgt,
+				      u64 addr, u64 end, u32 level,
 				      kvm_pte_t *ptep,
 				      enum kvm_pgtable_walk_flags flag,
 				      void * const arg)
@@ -425,7 +426,7 @@ static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
 	struct check_walk_data *d = arg;
 	kvm_pte_t pte = *ptep;
 
-	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pgt, pte)))
 		return -EINVAL;
 
 	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index e8d4ea2fcfa0..60a6821ae98a 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -186,12 +186,13 @@ static void hpool_put_page(void *addr)
 	hyp_put_page(&hpool, addr);
 }
 
-static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
+static int finalize_host_mappings_walker(struct kvm_pgtable *pgt,
+					 u64 addr, u64 end, u32 level,
 					 kvm_pte_t *ptep,
 					 enum kvm_pgtable_walk_flags flag,
 					 void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
 	kvm_pte_t pte = *ptep;
@@ -212,7 +213,7 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
-	phys = kvm_pte_to_phys(pte);
+	phys = kvm_pte_to_phys(pgt, pte);
 	if (!addr_is_memory(phys))
 		return -EINVAL;
 
@@ -242,7 +243,6 @@ static int finalize_host_mappings(void)
 	struct kvm_pgtable_walker walker = {
 		.cb	= finalize_host_mappings_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pkvm_pgtable.mm_ops,
 	};
 	int i, ret;
 
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index cdf8e76b0be1..221e0dafb149 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -59,12 +59,13 @@ struct kvm_pgtable_walk_data {
 
 #define KVM_PHYS_INVALID (-1ULL)
 
-static bool kvm_phys_is_valid(u64 phys)
+static bool kvm_phys_is_valid(struct kvm_pgtable *pgt, u64 phys)
 {
 	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
 }
 
-static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
+static bool kvm_block_mapping_supported(struct kvm_pgtable *pgt,
+					u64 addr, u64 end, u64 phys, u32 level)
 {
 	u64 granule = kvm_granule_size(level);
 
@@ -74,7 +75,7 @@ static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
 	if (granule > (end - addr))
 		return false;
 
-	if (kvm_phys_is_valid(phys) && !IS_ALIGNED(phys, granule))
+	if (kvm_phys_is_valid(pgt, phys) && !IS_ALIGNED(phys, granule))
 		return false;
 
 	return IS_ALIGNED(addr, granule);
@@ -122,7 +123,7 @@ static bool kvm_pte_table(kvm_pte_t pte, u32 level)
 	return FIELD_GET(KVM_PTE_TYPE, pte) == KVM_PTE_TYPE_TABLE;
 }
 
-static kvm_pte_t kvm_phys_to_pte(u64 pa)
+static kvm_pte_t kvm_phys_to_pte(struct kvm_pgtable *pgt, u64 pa)
 {
 	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
 
@@ -132,9 +133,9 @@ static kvm_pte_t kvm_phys_to_pte(u64 pa)
 	return pte;
 }
 
-static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte, struct kvm_pgtable_mm_ops *mm_ops)
+static kvm_pte_t *kvm_pte_follow(struct kvm_pgtable *pgt, kvm_pte_t pte)
 {
-	return mm_ops->phys_to_virt(kvm_pte_to_phys(pte));
+	return pgt->mm_ops->phys_to_virt(kvm_pte_to_phys(pgt, pte));
 }
 
 static void kvm_clear_pte(kvm_pte_t *ptep)
@@ -142,10 +143,11 @@ static void kvm_clear_pte(kvm_pte_t *ptep)
 	WRITE_ONCE(*ptep, 0);
 }
 
-static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
-			      struct kvm_pgtable_mm_ops *mm_ops)
+static void kvm_set_table_pte(struct kvm_pgtable *pgt,
+			      kvm_pte_t *ptep, kvm_pte_t *childp)
 {
-	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
+	kvm_pte_t old = *ptep;
+	kvm_pte_t pte = kvm_phys_to_pte(pgt, pgt->mm_ops->virt_to_phys(childp));
 
 	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
 	pte |= KVM_PTE_VALID;
@@ -154,9 +156,10 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
 	smp_store_release(ptep, pte);
 }
 
-static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
+static kvm_pte_t kvm_init_valid_leaf_pte(struct kvm_pgtable *pgt,
+					 u64 pa, kvm_pte_t attr, u32 level)
 {
-	kvm_pte_t pte = kvm_phys_to_pte(pa);
+	kvm_pte_t pte = kvm_phys_to_pte(pgt, pa);
 	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
 							   KVM_PTE_TYPE_BLOCK;
 
@@ -177,7 +180,8 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
 				  enum kvm_pgtable_walk_flags flag)
 {
 	struct kvm_pgtable_walker *walker = data->walker;
-	return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
+	return walker->cb(data->pgt,
+			  addr, data->end, level, ptep, flag, walker->arg);
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
@@ -213,7 +217,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(pte, data->pgt->mm_ops);
+	childp = kvm_pte_follow(data->pgt, pte);
 	ret = __kvm_pgtable_walk(data, childp, level + 1);
 	if (ret)
 		goto out;
@@ -292,7 +296,8 @@ struct leaf_walk_data {
 	u32		level;
 };
 
-static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int leaf_walker(struct kvm_pgtable *pgt,
+		       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		       enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct leaf_walk_data *data = arg;
@@ -329,10 +334,10 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
 struct hyp_map_data {
 	u64				phys;
 	kvm_pte_t			attr;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
+static int hyp_set_prot_attr(struct kvm_pgtable *pgt,
+			     enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
 {
 	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
 	u32 mtype = device ? MT_DEVICE_nGnRE : MT_NORMAL;
@@ -383,21 +388,22 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
 	return prot;
 }
 
-static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
+static bool hyp_map_walker_try_leaf(struct kvm_pgtable *pgt,
+				    u64 addr, u64 end, u32 level,
 				    kvm_pte_t *ptep, struct hyp_map_data *data)
 {
 	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
 
-	if (!kvm_block_mapping_supported(addr, end, phys, level))
+	if (!kvm_block_mapping_supported(pgt, addr, end, phys, level))
 		return false;
 
 	data->phys += granule;
-	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	new = kvm_init_valid_leaf_pte(pgt, phys, data->attr, level);
 	if (old == new)
 		return true;
 	if (!kvm_pte_valid(old))
-		data->mm_ops->get_page(ptep);
+		pgt->mm_ops->get_page(ptep);
 	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
 		return false;
 
@@ -405,14 +411,15 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	return true;
 }
 
-static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int hyp_map_walker(struct kvm_pgtable *pgt,
+			  u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			  enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t *childp;
 	struct hyp_map_data *data = arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 
-	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
+	if (hyp_map_walker_try_leaf(pgt, addr, end, level, ptep, data))
 		return 0;
 
 	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
@@ -422,7 +429,7 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (!childp)
 		return -ENOMEM;
 
-	kvm_set_table_pte(ptep, childp, mm_ops);
+	kvm_set_table_pte(pgt, ptep, childp);
 	mm_ops->get_page(ptep);
 	return 0;
 }
@@ -433,7 +440,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	int ret;
 	struct hyp_map_data map_data = {
 		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
-		.mm_ops	= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_map_walker,
@@ -441,7 +447,7 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 		.arg	= &map_data,
 	};
 
-	ret = hyp_set_prot_attr(prot, &map_data.attr);
+	ret = hyp_set_prot_attr(pgt, prot, &map_data.attr);
 	if (ret)
 		return ret;
 
@@ -453,22 +459,22 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 
 struct hyp_unmap_data {
 	u64				unmapped;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int hyp_unmap_walker(struct kvm_pgtable *pgt,
+			    u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			    enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t pte = *ptep, *childp = NULL;
 	u64 granule = kvm_granule_size(level);
 	struct hyp_unmap_data *data = arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 
 	if (!kvm_pte_valid(pte))
 		return -EINVAL;
 
 	if (kvm_pte_table(pte, level)) {
-		childp = kvm_pte_follow(pte, mm_ops);
+		childp = kvm_pte_follow(pgt, pte);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
@@ -498,9 +504,7 @@ static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 
 u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
-	struct hyp_unmap_data unmap_data = {
-		.mm_ops	= pgt->mm_ops,
-	};
+	struct hyp_unmap_data unmap_data = {};
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_unmap_walker,
 		.arg	= &unmap_data,
@@ -532,10 +536,11 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	return 0;
 }
 
-static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int hyp_free_walker(struct kvm_pgtable *pgt,
+			   u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			   enum kvm_pgtable_walk_flags flag, void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep;
 
 	if (!kvm_pte_valid(pte))
@@ -544,7 +549,7 @@ static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	mm_ops->put_page(ptep);
 
 	if (kvm_pte_table(pte, level))
-		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
+		mm_ops->put_page(kvm_pte_follow(pgt, pte));
 
 	return 0;
 }
@@ -554,7 +559,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
@@ -570,11 +574,8 @@ struct stage2_map_data {
 	kvm_pte_t			*anchor;
 	kvm_pte_t			*childp;
 
-	struct kvm_s2_mmu		*mmu;
 	void				*memcache;
 
-	struct kvm_pgtable_mm_ops	*mm_ops;
-
 	/* Force mappings to page granularity */
 	bool				force_pte;
 };
@@ -708,29 +709,30 @@ static bool stage2_pte_executable(kvm_pte_t pte)
 	return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
 }
 
-static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
+static bool stage2_leaf_mapping_allowed(struct kvm_pgtable *pgt,
+					u64 addr, u64 end, u32 level,
 					struct stage2_map_data *data)
 {
 	if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
 		return false;
 
-	return kvm_block_mapping_supported(addr, end, data->phys, level);
+	return kvm_block_mapping_supported(pgt, addr, end, data->phys, level);
 }
 
-static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
+static int stage2_map_walker_try_leaf(struct kvm_pgtable *pgt,
+				      u64 addr, u64 end, u32 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
 	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
-	struct kvm_pgtable *pgt = data->mmu->pgt;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 
-	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
+	if (!stage2_leaf_mapping_allowed(pgt, addr, end, level, data))
 		return -E2BIG;
 
-	if (kvm_phys_is_valid(phys))
-		new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	if (kvm_phys_is_valid(pgt, phys))
+		new = kvm_init_valid_leaf_pte(pgt, phys, data->attr, level);
 	else
 		new = kvm_init_invalid_leaf_owner(data->owner_id);
 
@@ -744,36 +746,37 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 		if (!stage2_pte_needs_update(old, new))
 			return -EAGAIN;
 
-		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
+		stage2_put_pte(ptep, pgt->mmu, addr, level, mm_ops);
 	}
 
 	/* Perform CMOs before installation of the guest stage-2 PTE */
 	if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pgt, new),
 						granule);
 
 	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
-		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
+		mm_ops->icache_inval_pou(kvm_pte_follow(pgt, new), granule);
 
 	smp_store_release(ptep, new);
 	if (stage2_pte_is_counted(new))
 		mm_ops->get_page(ptep);
-	if (kvm_phys_is_valid(phys))
+	if (kvm_phys_is_valid(pgt, phys))
 		data->phys += granule;
 	return 0;
 }
 
-static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
+static int stage2_map_walk_table_pre(struct kvm_pgtable *pgt,
+				     u64 addr, u64 end, u32 level,
 				     kvm_pte_t *ptep,
 				     struct stage2_map_data *data)
 {
 	if (data->anchor)
 		return 0;
 
-	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
+	if (!stage2_leaf_mapping_allowed(pgt, addr, end, level, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
+	data->childp = kvm_pte_follow(pgt, *ptep);
 	kvm_clear_pte(ptep);
 
 	/*
@@ -781,15 +784,16 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 	 * entries below us which would otherwise need invalidating
 	 * individually.
 	 */
-	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
+	kvm_call_hyp(__kvm_tlb_flush_vmid, pgt->mmu);
 	data->anchor = ptep;
 	return 0;
 }
 
-static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_map_walk_leaf(struct kvm_pgtable *pgt,
+				u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 				struct stage2_map_data *data)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t *childp, pte = *ptep;
 	int ret;
 
@@ -800,7 +804,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		return 0;
 	}
 
-	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
+	ret = stage2_map_walker_try_leaf(pgt, addr, end, level, ptep, data);
 	if (ret != -E2BIG)
 		return ret;
 
@@ -820,19 +824,20 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 * will be mapped lazily.
 	 */
 	if (stage2_pte_is_counted(pte))
-		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
+		stage2_put_pte(ptep, pgt->mmu, addr, level, mm_ops);
 
-	kvm_set_table_pte(ptep, childp, mm_ops);
+	kvm_set_table_pte(pgt, ptep, childp);
 	mm_ops->get_page(ptep);
 
 	return 0;
 }
 
-static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
+static int stage2_map_walk_table_post(struct kvm_pgtable *pgt,
+				      u64 addr, u64 end, u32 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t *childp;
 	int ret = 0;
 
@@ -843,9 +848,9 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 		childp = data->childp;
 		data->anchor = NULL;
 		data->childp = NULL;
-		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
+		ret = stage2_map_walk_leaf(pgt, addr, end, level, ptep, data);
 	} else {
-		childp = kvm_pte_follow(*ptep, mm_ops);
+		childp = kvm_pte_follow(pgt, *ptep);
 	}
 
 	mm_ops->put_page(childp);
@@ -873,18 +878,19 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
  * the page-table, installing the block entry when it revisits the anchor
  * pointer and clearing the anchor to NULL.
  */
-static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_map_walker(struct kvm_pgtable *pgt,
+			     u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			     enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct stage2_map_data *data = arg;
 
 	switch (flag) {
 	case KVM_PGTABLE_WALK_TABLE_PRE:
-		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
+		return stage2_map_walk_table_pre(pgt, addr, end, level, ptep, data);
 	case KVM_PGTABLE_WALK_LEAF:
-		return stage2_map_walk_leaf(addr, end, level, ptep, data);
+		return stage2_map_walk_leaf(pgt, addr, end, level, ptep, data);
 	case KVM_PGTABLE_WALK_TABLE_POST:
-		return stage2_map_walk_table_post(addr, end, level, ptep, data);
+		return stage2_map_walk_table_post(pgt, addr, end, level, ptep, data);
 	}
 
 	return -EINVAL;
@@ -897,9 +903,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	int ret;
 	struct stage2_map_data map_data = {
 		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
-		.mmu		= pgt->mmu,
 		.memcache	= mc,
-		.mm_ops		= pgt->mm_ops,
 		.force_pte	= pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
 	};
 	struct kvm_pgtable_walker walker = {
@@ -928,9 +932,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	int ret;
 	struct stage2_map_data map_data = {
 		.phys		= KVM_PHYS_INVALID,
-		.mmu		= pgt->mmu,
 		.memcache	= mc,
-		.mm_ops		= pgt->mm_ops,
 		.owner_id	= owner_id,
 		.force_pte	= true,
 	};
@@ -949,11 +951,11 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	return ret;
 }
 
-static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_unmap_walker(struct kvm_pgtable *pgt,
+			       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
-	struct kvm_pgtable *pgt = arg;
 	struct kvm_s2_mmu *mmu = pgt->mmu;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep, *childp = NULL;
@@ -968,7 +970,7 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	}
 
 	if (kvm_pte_table(pte, level)) {
-		childp = kvm_pte_follow(pte, mm_ops);
+		childp = kvm_pte_follow(pgt, pte);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
@@ -984,7 +986,7 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	stage2_put_pte(ptep, mmu, addr, level, mm_ops);
 
 	if (need_flush && mm_ops->dcache_clean_inval_poc)
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pgt, pte),
 					       kvm_granule_size(level));
 
 	if (childp)
@@ -997,7 +999,6 @@ int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_unmap_walker,
-		.arg	= pgt,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
 	};
 
@@ -1009,16 +1010,16 @@ struct stage2_attr_data {
 	kvm_pte_t			attr_clr;
 	kvm_pte_t			pte;
 	u32				level;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_attr_walker(struct kvm_pgtable *pgt,
+			      u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
 	kvm_pte_t pte = *ptep;
 	struct stage2_attr_data *data = arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 
 	if (!kvm_pte_valid(pte))
 		return 0;
@@ -1040,7 +1041,7 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		 */
 		if (mm_ops->icache_inval_pou &&
 		    stage2_pte_executable(pte) && !stage2_pte_executable(*ptep))
-			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
+			mm_ops->icache_inval_pou(kvm_pte_follow(pgt, pte),
 						  kvm_granule_size(level));
 		WRITE_ONCE(*ptep, pte);
 	}
@@ -1058,7 +1059,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 	struct stage2_attr_data data = {
 		.attr_set	= attr_set & attr_mask,
 		.attr_clr	= attr_clr & attr_mask,
-		.mm_ops		= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_attr_walker,
@@ -1140,11 +1140,11 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 	return ret;
 }
 
-static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_flush_walker(struct kvm_pgtable *pgt,
+			       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
-	struct kvm_pgtable *pgt = arg;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep;
 
@@ -1152,7 +1152,7 @@ static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		return 0;
 
 	if (mm_ops->dcache_clean_inval_poc)
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pgt, pte),
 					       kvm_granule_size(level));
 	return 0;
 }
@@ -1162,7 +1162,6 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_flush_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF,
-		.arg	= pgt,
 	};
 
 	if (stage2_has_fwb(pgt))
@@ -1200,11 +1199,12 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	return 0;
 }
 
-static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_free_walker(struct kvm_pgtable *pgt,
+			      u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
 	kvm_pte_t pte = *ptep;
 
 	if (!stage2_pte_is_counted(pte))
@@ -1213,7 +1213,7 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	mm_ops->put_page(ptep);
 
 	if (kvm_pte_table(pte, level))
-		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
+		mm_ops->put_page(kvm_pte_follow(pgt, pte));
 
 	return 0;
 }
@@ -1225,7 +1225,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 		.cb	= stage2_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF |
 			  KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 05/12] KVM: arm64: Maintain page-table format info in struct kvm_pgtable
  2022-12-06 13:59 ` Ryan Roberts
  (?)
@ 2022-12-06 13:59   ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: kvmarm, kvmarm, linux-arm-kernel

As the next step on the journey to supporting FEAT_LPA2 in KVM, add a
flag to struct kvm_pgtable, which functions can then use to select the
approprate behavior for either the `classic` or `lpa2` page-table
formats. For now, all page-tables remain in the `classic` format.

No functional changes are intended.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 2 ++
 arch/arm64/kvm/hyp/pgtable.c         | 2 ++
 arch/arm64/kvm/mmu.c                 | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 2247ed74871a..744e224d964b 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -157,6 +157,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
  * @start_level:	Level at which the page-table walk starts.
  * @pgd:		Pointer to the first top-level entry of the page-table.
  * @mm_ops:		Memory management callbacks.
+ * @lpa2_ena:		Format used for page-table; false->classic, true->lpa2.
  * @mmu:		Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
  * @flags:		Stage-2 page-table flags.
  * @force_pte_cb:	Function that returns true if page level mappings must
@@ -167,6 +168,7 @@ struct kvm_pgtable {
 	u32					start_level;
 	kvm_pte_t				*pgd;
 	struct kvm_pgtable_mm_ops		*mm_ops;
+	bool					lpa2_ena;
 
 	/* Stage-2 only */
 	struct kvm_s2_mmu			*mmu;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 221e0dafb149..c7799cd50af8 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -530,6 +530,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	pgt->ia_bits		= va_bits;
 	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
 	pgt->mm_ops		= mm_ops;
+	pgt->lpa2_ena		= false;
 	pgt->mmu		= NULL;
 	pgt->force_pte_cb	= NULL;
 
@@ -1190,6 +1191,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	pgt->ia_bits		= ia_bits;
 	pgt->start_level	= start_level;
 	pgt->mm_ops		= mm_ops;
+	pgt->lpa2_ena		= false;
 	pgt->mmu		= mmu;
 	pgt->flags		= flags;
 	pgt->force_pte_cb	= force_pte_cb;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1ef0704420d9..e3fe3e194fd1 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -645,6 +645,7 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
 				   CONFIG_PGTABLE_LEVELS),
 		.mm_ops		= &kvm_user_mm_ops,
+		.lpa2_ena	= lpa2_is_enabled(),
 	};
 	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
 	u32 level = ~0;
-- 
2.25.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 05/12] KVM: arm64: Maintain page-table format info in struct kvm_pgtable
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

As the next step on the journey to supporting FEAT_LPA2 in KVM, add a
flag to struct kvm_pgtable, which functions can then use to select the
approprate behavior for either the `classic` or `lpa2` page-table
formats. For now, all page-tables remain in the `classic` format.

No functional changes are intended.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 2 ++
 arch/arm64/kvm/hyp/pgtable.c         | 2 ++
 arch/arm64/kvm/mmu.c                 | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 2247ed74871a..744e224d964b 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -157,6 +157,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
  * @start_level:	Level at which the page-table walk starts.
  * @pgd:		Pointer to the first top-level entry of the page-table.
  * @mm_ops:		Memory management callbacks.
+ * @lpa2_ena:		Format used for page-table; false->classic, true->lpa2.
  * @mmu:		Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
  * @flags:		Stage-2 page-table flags.
  * @force_pte_cb:	Function that returns true if page level mappings must
@@ -167,6 +168,7 @@ struct kvm_pgtable {
 	u32					start_level;
 	kvm_pte_t				*pgd;
 	struct kvm_pgtable_mm_ops		*mm_ops;
+	bool					lpa2_ena;
 
 	/* Stage-2 only */
 	struct kvm_s2_mmu			*mmu;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 221e0dafb149..c7799cd50af8 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -530,6 +530,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	pgt->ia_bits		= va_bits;
 	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
 	pgt->mm_ops		= mm_ops;
+	pgt->lpa2_ena		= false;
 	pgt->mmu		= NULL;
 	pgt->force_pte_cb	= NULL;
 
@@ -1190,6 +1191,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	pgt->ia_bits		= ia_bits;
 	pgt->start_level	= start_level;
 	pgt->mm_ops		= mm_ops;
+	pgt->lpa2_ena		= false;
 	pgt->mmu		= mmu;
 	pgt->flags		= flags;
 	pgt->force_pte_cb	= force_pte_cb;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1ef0704420d9..e3fe3e194fd1 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -645,6 +645,7 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
 				   CONFIG_PGTABLE_LEVELS),
 		.mm_ops		= &kvm_user_mm_ops,
+		.lpa2_ena	= lpa2_is_enabled(),
 	};
 	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
 	u32 level = ~0;
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 05/12] KVM: arm64: Maintain page-table format info in struct kvm_pgtable
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

As the next step on the journey to supporting FEAT_LPA2 in KVM, add a
flag to struct kvm_pgtable, which functions can then use to select the
approprate behavior for either the `classic` or `lpa2` page-table
formats. For now, all page-tables remain in the `classic` format.

No functional changes are intended.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 2 ++
 arch/arm64/kvm/hyp/pgtable.c         | 2 ++
 arch/arm64/kvm/mmu.c                 | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 2247ed74871a..744e224d964b 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -157,6 +157,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
  * @start_level:	Level at which the page-table walk starts.
  * @pgd:		Pointer to the first top-level entry of the page-table.
  * @mm_ops:		Memory management callbacks.
+ * @lpa2_ena:		Format used for page-table; false->classic, true->lpa2.
  * @mmu:		Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
  * @flags:		Stage-2 page-table flags.
  * @force_pte_cb:	Function that returns true if page level mappings must
@@ -167,6 +168,7 @@ struct kvm_pgtable {
 	u32					start_level;
 	kvm_pte_t				*pgd;
 	struct kvm_pgtable_mm_ops		*mm_ops;
+	bool					lpa2_ena;
 
 	/* Stage-2 only */
 	struct kvm_s2_mmu			*mmu;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 221e0dafb149..c7799cd50af8 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -530,6 +530,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	pgt->ia_bits		= va_bits;
 	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
 	pgt->mm_ops		= mm_ops;
+	pgt->lpa2_ena		= false;
 	pgt->mmu		= NULL;
 	pgt->force_pte_cb	= NULL;
 
@@ -1190,6 +1191,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	pgt->ia_bits		= ia_bits;
 	pgt->start_level	= start_level;
 	pgt->mm_ops		= mm_ops;
+	pgt->lpa2_ena		= false;
 	pgt->mmu		= mmu;
 	pgt->flags		= flags;
 	pgt->force_pte_cb	= force_pte_cb;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1ef0704420d9..e3fe3e194fd1 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -645,6 +645,7 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
 				   CONFIG_PGTABLE_LEVELS),
 		.mm_ops		= &kvm_user_mm_ops,
+		.lpa2_ena	= lpa2_is_enabled(),
 	};
 	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
 	u32 level = ~0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 06/12] KVM: arm64: Use LPA2 page-tables for stage2 if HW supports it
  2022-12-06 13:59 ` Ryan Roberts
  (?)
@ 2022-12-06 13:59   ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: kvmarm, kvmarm, linux-arm-kernel

Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
page size we are using, always use LPA2-style page-tables for stage 2,
regardless of the VMM-requested IPA size or HW-implemented PA size. When
in use we can now support up to 52-bit IPA and PA sizes.

We use the preparitory work that tracks the page-table format in struct
kvm_pgtable and passes the pgt pointer to all kvm_pgtable functions that
need to modify their behavior based on the format.

Note that FEAT_LPA2 brings support for bigger block mappings (512GB with
4KB, 64GB with 16KB). We explicitly don't enable these in the library
because stage2_apply_range() works on batch sizes of the largest used
block mapping, and increasing the size of the batch would lead to soft
lockups. See commit 5994bc9e05c2 ("KVM: arm64: Limit
stage2_apply_range() batch size to largest block").

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h  | 42 ++++++++++++++++++++-----
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 12 +++----
 arch/arm64/kvm/hyp/pgtable.c          | 45 ++++++++++++++++++++++-----
 3 files changed, 78 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 744e224d964b..a7fd547dcc71 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -25,12 +25,32 @@
 #define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
 #endif
 
-static inline u64 kvm_get_parange(u64 mmfr0)
+static inline bool kvm_supports_stage2_lpa2(u64 mmfr0)
 {
+	unsigned int tgran;
+
+	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
+						ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
+	return (tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2 &&
+		PAGE_SIZE != SZ_64K);
+}
+
+static inline u64 kvm_get_parange_max(bool lpa2_ena)
+{
+	if (lpa2_ena ||
+	   (IS_ENABLED(CONFIG_ARM64_PA_BITS_52) && PAGE_SIZE == SZ_64K))
+		return ID_AA64MMFR0_EL1_PARANGE_52;
+	else
+		return ID_AA64MMFR0_EL1_PARANGE_48;
+}
+
+static inline u64 kvm_get_parange(u64 mmfr0, bool lpa2_ena)
+{
+	u64 parange_max = kvm_get_parange_max(lpa2_ena);
 	u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
 				ID_AA64MMFR0_EL1_PARANGE_SHIFT);
-	if (parange > ID_AA64MMFR0_EL1_PARANGE_MAX)
-		parange = ID_AA64MMFR0_EL1_PARANGE_MAX;
+	if (parange > parange_max)
+		parange = parange_max;
 
 	return parange;
 }
@@ -41,6 +61,8 @@ typedef u64 kvm_pte_t;
 
 #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
 #define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
+#define KVM_PTE_ADDR_MASK_LPA2		GENMASK(49, PAGE_SHIFT)
+#define KVM_PTE_ADDR_51_50_LPA2		GENMASK(9, 8)
 
 static inline bool kvm_pte_valid(kvm_pte_t pte)
 {
@@ -178,10 +200,16 @@ struct kvm_pgtable {
 
 static inline u64 kvm_pte_to_phys(struct kvm_pgtable *pgt, kvm_pte_t pte)
 {
-	u64 pa = pte & KVM_PTE_ADDR_MASK;
+	u64 pa;
 
-	if (PAGE_SHIFT == 16)
-		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+	if (pgt->lpa2_ena) {
+		pa = pte & KVM_PTE_ADDR_MASK_LPA2;
+		pa |= FIELD_GET(KVM_PTE_ADDR_51_50_LPA2, pte) << 50;
+	} else {
+		pa = pte & KVM_PTE_ADDR_MASK;
+		if (PAGE_SHIFT == 16)
+			pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+	}
 
 	return pa;
 }
@@ -287,7 +315,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
  * kvm_get_vtcr() - Helper to construct VTCR_EL2
  * @mmfr0:	Sanitized value of SYS_ID_AA64MMFR0_EL1 register.
  * @mmfr1:	Sanitized value of SYS_ID_AA64MMFR1_EL1 register.
- * @phys_shfit:	Value to set in VTCR_EL2.T0SZ.
+ * @phys_shift:	Value to set in VTCR_EL2.T0SZ, or 0 to infer from parange.
  *
  * The VTCR value is common across all the physical CPUs on the system.
  * We use system wide sanitised values to fill in different fields,
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 6bf54c8daffa..43e729694deb 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -105,14 +105,12 @@ static int prepare_s2_pool(void *pgt_pool_base)
 
 static void prepare_host_vtcr(void)
 {
-	u32 parange, phys_shift;
-
-	/* The host stage 2 is id-mapped, so use parange for T0SZ */
-	parange = kvm_get_parange(id_aa64mmfr0_el1_sys_val);
-	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
-
+	/*
+	 * The host stage 2 is id-mapped; passing phys_shift=0 forces parange to
+	 * be used for T0SZ.
+	 */
 	host_kvm.arch.vtcr = kvm_get_vtcr(id_aa64mmfr0_el1_sys_val,
-					  id_aa64mmfr1_el1_sys_val, phys_shift);
+					  id_aa64mmfr1_el1_sys_val, 0);
 }
 
 static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index c7799cd50af8..8ed7353f07bc 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -61,7 +61,10 @@ struct kvm_pgtable_walk_data {
 
 static bool kvm_phys_is_valid(struct kvm_pgtable *pgt, u64 phys)
 {
-	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
+	u64 parange_max = kvm_get_parange_max(pgt->lpa2_ena);
+	u8 shift = id_aa64mmfr0_parange_to_phys_shift(parange_max);
+
+	return phys < BIT(shift);
 }
 
 static bool kvm_block_mapping_supported(struct kvm_pgtable *pgt,
@@ -125,10 +128,16 @@ static bool kvm_pte_table(kvm_pte_t pte, u32 level)
 
 static kvm_pte_t kvm_phys_to_pte(struct kvm_pgtable *pgt, u64 pa)
 {
-	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
+	kvm_pte_t pte;
 
-	if (PAGE_SHIFT == 16)
-		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
+	if (pgt->lpa2_ena) {
+		pte = pa & KVM_PTE_ADDR_MASK_LPA2;
+		pte |= FIELD_PREP(KVM_PTE_ADDR_51_50_LPA2, pa >> 50);
+	} else {
+		pte = pa & KVM_PTE_ADDR_MASK;
+		if (PAGE_SHIFT == 16)
+			pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
+	}
 
 	return pte;
 }
@@ -585,8 +594,24 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 {
 	u64 vtcr = VTCR_EL2_FLAGS;
 	u8 lvls;
+	u64 parange;
+	bool lpa2_ena = false;
+
+	/*
+	 * If stage 2 reports that it supports FEAT_LPA2 for our page size, then
+	 * we always use the LPA2 format regardless of IA and OA size.
+	 */
+	lpa2_ena = kvm_supports_stage2_lpa2(mmfr0);
+
+	parange = kvm_get_parange(mmfr0, lpa2_ena);
 
-	vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
+	/*
+	 * Infer IPA size to be equal to PA size if phys_shift is 0.
+	 */
+	if (phys_shift == 0)
+		phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
+
+	vtcr |= parange << VTCR_EL2_PS_SHIFT;
 	vtcr |= VTCR_EL2_T0SZ(phys_shift);
 	/*
 	 * Use a minimum 2 level page table to prevent splitting
@@ -604,6 +629,9 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 	 */
 	vtcr |= VTCR_EL2_HA;
 
+	if (lpa2_ena)
+		vtcr |= VTCR_EL2_DS;
+
 	/* Set the vmid bits */
 	vtcr |= (get_vmid_bits(mmfr1) == 16) ?
 		VTCR_EL2_VS_16BIT :
@@ -641,7 +669,9 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
 	if (prot & KVM_PGTABLE_PROT_W)
 		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
 
-	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
+	if (!pgt->lpa2_ena)
+		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
+
 	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
 	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
 	*ptep = attr;
@@ -1182,6 +1212,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
 	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+	bool lpa2_ena = (vtcr & VTCR_EL2_DS) != 0;
 
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
 	pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
@@ -1191,7 +1222,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	pgt->ia_bits		= ia_bits;
 	pgt->start_level	= start_level;
 	pgt->mm_ops		= mm_ops;
-	pgt->lpa2_ena		= false;
+	pgt->lpa2_ena		= lpa2_ena;
 	pgt->mmu		= mmu;
 	pgt->flags		= flags;
 	pgt->force_pte_cb	= force_pte_cb;
-- 
2.25.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 06/12] KVM: arm64: Use LPA2 page-tables for stage2 if HW supports it
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
page size we are using, always use LPA2-style page-tables for stage 2,
regardless of the VMM-requested IPA size or HW-implemented PA size. When
in use we can now support up to 52-bit IPA and PA sizes.

We use the preparitory work that tracks the page-table format in struct
kvm_pgtable and passes the pgt pointer to all kvm_pgtable functions that
need to modify their behavior based on the format.

Note that FEAT_LPA2 brings support for bigger block mappings (512GB with
4KB, 64GB with 16KB). We explicitly don't enable these in the library
because stage2_apply_range() works on batch sizes of the largest used
block mapping, and increasing the size of the batch would lead to soft
lockups. See commit 5994bc9e05c2 ("KVM: arm64: Limit
stage2_apply_range() batch size to largest block").

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h  | 42 ++++++++++++++++++++-----
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 12 +++----
 arch/arm64/kvm/hyp/pgtable.c          | 45 ++++++++++++++++++++++-----
 3 files changed, 78 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 744e224d964b..a7fd547dcc71 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -25,12 +25,32 @@
 #define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
 #endif
 
-static inline u64 kvm_get_parange(u64 mmfr0)
+static inline bool kvm_supports_stage2_lpa2(u64 mmfr0)
 {
+	unsigned int tgran;
+
+	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
+						ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
+	return (tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2 &&
+		PAGE_SIZE != SZ_64K);
+}
+
+static inline u64 kvm_get_parange_max(bool lpa2_ena)
+{
+	if (lpa2_ena ||
+	   (IS_ENABLED(CONFIG_ARM64_PA_BITS_52) && PAGE_SIZE == SZ_64K))
+		return ID_AA64MMFR0_EL1_PARANGE_52;
+	else
+		return ID_AA64MMFR0_EL1_PARANGE_48;
+}
+
+static inline u64 kvm_get_parange(u64 mmfr0, bool lpa2_ena)
+{
+	u64 parange_max = kvm_get_parange_max(lpa2_ena);
 	u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
 				ID_AA64MMFR0_EL1_PARANGE_SHIFT);
-	if (parange > ID_AA64MMFR0_EL1_PARANGE_MAX)
-		parange = ID_AA64MMFR0_EL1_PARANGE_MAX;
+	if (parange > parange_max)
+		parange = parange_max;
 
 	return parange;
 }
@@ -41,6 +61,8 @@ typedef u64 kvm_pte_t;
 
 #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
 #define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
+#define KVM_PTE_ADDR_MASK_LPA2		GENMASK(49, PAGE_SHIFT)
+#define KVM_PTE_ADDR_51_50_LPA2		GENMASK(9, 8)
 
 static inline bool kvm_pte_valid(kvm_pte_t pte)
 {
@@ -178,10 +200,16 @@ struct kvm_pgtable {
 
 static inline u64 kvm_pte_to_phys(struct kvm_pgtable *pgt, kvm_pte_t pte)
 {
-	u64 pa = pte & KVM_PTE_ADDR_MASK;
+	u64 pa;
 
-	if (PAGE_SHIFT == 16)
-		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+	if (pgt->lpa2_ena) {
+		pa = pte & KVM_PTE_ADDR_MASK_LPA2;
+		pa |= FIELD_GET(KVM_PTE_ADDR_51_50_LPA2, pte) << 50;
+	} else {
+		pa = pte & KVM_PTE_ADDR_MASK;
+		if (PAGE_SHIFT == 16)
+			pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+	}
 
 	return pa;
 }
@@ -287,7 +315,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
  * kvm_get_vtcr() - Helper to construct VTCR_EL2
  * @mmfr0:	Sanitized value of SYS_ID_AA64MMFR0_EL1 register.
  * @mmfr1:	Sanitized value of SYS_ID_AA64MMFR1_EL1 register.
- * @phys_shfit:	Value to set in VTCR_EL2.T0SZ.
+ * @phys_shift:	Value to set in VTCR_EL2.T0SZ, or 0 to infer from parange.
  *
  * The VTCR value is common across all the physical CPUs on the system.
  * We use system wide sanitised values to fill in different fields,
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 6bf54c8daffa..43e729694deb 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -105,14 +105,12 @@ static int prepare_s2_pool(void *pgt_pool_base)
 
 static void prepare_host_vtcr(void)
 {
-	u32 parange, phys_shift;
-
-	/* The host stage 2 is id-mapped, so use parange for T0SZ */
-	parange = kvm_get_parange(id_aa64mmfr0_el1_sys_val);
-	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
-
+	/*
+	 * The host stage 2 is id-mapped; passing phys_shift=0 forces parange to
+	 * be used for T0SZ.
+	 */
 	host_kvm.arch.vtcr = kvm_get_vtcr(id_aa64mmfr0_el1_sys_val,
-					  id_aa64mmfr1_el1_sys_val, phys_shift);
+					  id_aa64mmfr1_el1_sys_val, 0);
 }
 
 static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index c7799cd50af8..8ed7353f07bc 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -61,7 +61,10 @@ struct kvm_pgtable_walk_data {
 
 static bool kvm_phys_is_valid(struct kvm_pgtable *pgt, u64 phys)
 {
-	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
+	u64 parange_max = kvm_get_parange_max(pgt->lpa2_ena);
+	u8 shift = id_aa64mmfr0_parange_to_phys_shift(parange_max);
+
+	return phys < BIT(shift);
 }
 
 static bool kvm_block_mapping_supported(struct kvm_pgtable *pgt,
@@ -125,10 +128,16 @@ static bool kvm_pte_table(kvm_pte_t pte, u32 level)
 
 static kvm_pte_t kvm_phys_to_pte(struct kvm_pgtable *pgt, u64 pa)
 {
-	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
+	kvm_pte_t pte;
 
-	if (PAGE_SHIFT == 16)
-		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
+	if (pgt->lpa2_ena) {
+		pte = pa & KVM_PTE_ADDR_MASK_LPA2;
+		pte |= FIELD_PREP(KVM_PTE_ADDR_51_50_LPA2, pa >> 50);
+	} else {
+		pte = pa & KVM_PTE_ADDR_MASK;
+		if (PAGE_SHIFT == 16)
+			pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
+	}
 
 	return pte;
 }
@@ -585,8 +594,24 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 {
 	u64 vtcr = VTCR_EL2_FLAGS;
 	u8 lvls;
+	u64 parange;
+	bool lpa2_ena = false;
+
+	/*
+	 * If stage 2 reports that it supports FEAT_LPA2 for our page size, then
+	 * we always use the LPA2 format regardless of IA and OA size.
+	 */
+	lpa2_ena = kvm_supports_stage2_lpa2(mmfr0);
+
+	parange = kvm_get_parange(mmfr0, lpa2_ena);
 
-	vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
+	/*
+	 * Infer IPA size to be equal to PA size if phys_shift is 0.
+	 */
+	if (phys_shift == 0)
+		phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
+
+	vtcr |= parange << VTCR_EL2_PS_SHIFT;
 	vtcr |= VTCR_EL2_T0SZ(phys_shift);
 	/*
 	 * Use a minimum 2 level page table to prevent splitting
@@ -604,6 +629,9 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 	 */
 	vtcr |= VTCR_EL2_HA;
 
+	if (lpa2_ena)
+		vtcr |= VTCR_EL2_DS;
+
 	/* Set the vmid bits */
 	vtcr |= (get_vmid_bits(mmfr1) == 16) ?
 		VTCR_EL2_VS_16BIT :
@@ -641,7 +669,9 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
 	if (prot & KVM_PGTABLE_PROT_W)
 		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
 
-	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
+	if (!pgt->lpa2_ena)
+		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
+
 	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
 	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
 	*ptep = attr;
@@ -1182,6 +1212,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
 	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+	bool lpa2_ena = (vtcr & VTCR_EL2_DS) != 0;
 
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
 	pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
@@ -1191,7 +1222,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	pgt->ia_bits		= ia_bits;
 	pgt->start_level	= start_level;
 	pgt->mm_ops		= mm_ops;
-	pgt->lpa2_ena		= false;
+	pgt->lpa2_ena		= lpa2_ena;
 	pgt->mmu		= mmu;
 	pgt->flags		= flags;
 	pgt->force_pte_cb	= force_pte_cb;
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 06/12] KVM: arm64: Use LPA2 page-tables for stage2 if HW supports it
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
page size we are using, always use LPA2-style page-tables for stage 2,
regardless of the VMM-requested IPA size or HW-implemented PA size. When
in use we can now support up to 52-bit IPA and PA sizes.

We use the preparitory work that tracks the page-table format in struct
kvm_pgtable and passes the pgt pointer to all kvm_pgtable functions that
need to modify their behavior based on the format.

Note that FEAT_LPA2 brings support for bigger block mappings (512GB with
4KB, 64GB with 16KB). We explicitly don't enable these in the library
because stage2_apply_range() works on batch sizes of the largest used
block mapping, and increasing the size of the batch would lead to soft
lockups. See commit 5994bc9e05c2 ("KVM: arm64: Limit
stage2_apply_range() batch size to largest block").

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h  | 42 ++++++++++++++++++++-----
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 12 +++----
 arch/arm64/kvm/hyp/pgtable.c          | 45 ++++++++++++++++++++++-----
 3 files changed, 78 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 744e224d964b..a7fd547dcc71 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -25,12 +25,32 @@
 #define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
 #endif
 
-static inline u64 kvm_get_parange(u64 mmfr0)
+static inline bool kvm_supports_stage2_lpa2(u64 mmfr0)
 {
+	unsigned int tgran;
+
+	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
+						ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
+	return (tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2 &&
+		PAGE_SIZE != SZ_64K);
+}
+
+static inline u64 kvm_get_parange_max(bool lpa2_ena)
+{
+	if (lpa2_ena ||
+	   (IS_ENABLED(CONFIG_ARM64_PA_BITS_52) && PAGE_SIZE == SZ_64K))
+		return ID_AA64MMFR0_EL1_PARANGE_52;
+	else
+		return ID_AA64MMFR0_EL1_PARANGE_48;
+}
+
+static inline u64 kvm_get_parange(u64 mmfr0, bool lpa2_ena)
+{
+	u64 parange_max = kvm_get_parange_max(lpa2_ena);
 	u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
 				ID_AA64MMFR0_EL1_PARANGE_SHIFT);
-	if (parange > ID_AA64MMFR0_EL1_PARANGE_MAX)
-		parange = ID_AA64MMFR0_EL1_PARANGE_MAX;
+	if (parange > parange_max)
+		parange = parange_max;
 
 	return parange;
 }
@@ -41,6 +61,8 @@ typedef u64 kvm_pte_t;
 
 #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
 #define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
+#define KVM_PTE_ADDR_MASK_LPA2		GENMASK(49, PAGE_SHIFT)
+#define KVM_PTE_ADDR_51_50_LPA2		GENMASK(9, 8)
 
 static inline bool kvm_pte_valid(kvm_pte_t pte)
 {
@@ -178,10 +200,16 @@ struct kvm_pgtable {
 
 static inline u64 kvm_pte_to_phys(struct kvm_pgtable *pgt, kvm_pte_t pte)
 {
-	u64 pa = pte & KVM_PTE_ADDR_MASK;
+	u64 pa;
 
-	if (PAGE_SHIFT == 16)
-		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+	if (pgt->lpa2_ena) {
+		pa = pte & KVM_PTE_ADDR_MASK_LPA2;
+		pa |= FIELD_GET(KVM_PTE_ADDR_51_50_LPA2, pte) << 50;
+	} else {
+		pa = pte & KVM_PTE_ADDR_MASK;
+		if (PAGE_SHIFT == 16)
+			pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+	}
 
 	return pa;
 }
@@ -287,7 +315,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
  * kvm_get_vtcr() - Helper to construct VTCR_EL2
  * @mmfr0:	Sanitized value of SYS_ID_AA64MMFR0_EL1 register.
  * @mmfr1:	Sanitized value of SYS_ID_AA64MMFR1_EL1 register.
- * @phys_shfit:	Value to set in VTCR_EL2.T0SZ.
+ * @phys_shift:	Value to set in VTCR_EL2.T0SZ, or 0 to infer from parange.
  *
  * The VTCR value is common across all the physical CPUs on the system.
  * We use system wide sanitised values to fill in different fields,
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 6bf54c8daffa..43e729694deb 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -105,14 +105,12 @@ static int prepare_s2_pool(void *pgt_pool_base)
 
 static void prepare_host_vtcr(void)
 {
-	u32 parange, phys_shift;
-
-	/* The host stage 2 is id-mapped, so use parange for T0SZ */
-	parange = kvm_get_parange(id_aa64mmfr0_el1_sys_val);
-	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
-
+	/*
+	 * The host stage 2 is id-mapped; passing phys_shift=0 forces parange to
+	 * be used for T0SZ.
+	 */
 	host_kvm.arch.vtcr = kvm_get_vtcr(id_aa64mmfr0_el1_sys_val,
-					  id_aa64mmfr1_el1_sys_val, phys_shift);
+					  id_aa64mmfr1_el1_sys_val, 0);
 }
 
 static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index c7799cd50af8..8ed7353f07bc 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -61,7 +61,10 @@ struct kvm_pgtable_walk_data {
 
 static bool kvm_phys_is_valid(struct kvm_pgtable *pgt, u64 phys)
 {
-	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
+	u64 parange_max = kvm_get_parange_max(pgt->lpa2_ena);
+	u8 shift = id_aa64mmfr0_parange_to_phys_shift(parange_max);
+
+	return phys < BIT(shift);
 }
 
 static bool kvm_block_mapping_supported(struct kvm_pgtable *pgt,
@@ -125,10 +128,16 @@ static bool kvm_pte_table(kvm_pte_t pte, u32 level)
 
 static kvm_pte_t kvm_phys_to_pte(struct kvm_pgtable *pgt, u64 pa)
 {
-	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
+	kvm_pte_t pte;
 
-	if (PAGE_SHIFT == 16)
-		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
+	if (pgt->lpa2_ena) {
+		pte = pa & KVM_PTE_ADDR_MASK_LPA2;
+		pte |= FIELD_PREP(KVM_PTE_ADDR_51_50_LPA2, pa >> 50);
+	} else {
+		pte = pa & KVM_PTE_ADDR_MASK;
+		if (PAGE_SHIFT == 16)
+			pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
+	}
 
 	return pte;
 }
@@ -585,8 +594,24 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 {
 	u64 vtcr = VTCR_EL2_FLAGS;
 	u8 lvls;
+	u64 parange;
+	bool lpa2_ena = false;
+
+	/*
+	 * If stage 2 reports that it supports FEAT_LPA2 for our page size, then
+	 * we always use the LPA2 format regardless of IA and OA size.
+	 */
+	lpa2_ena = kvm_supports_stage2_lpa2(mmfr0);
+
+	parange = kvm_get_parange(mmfr0, lpa2_ena);
 
-	vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
+	/*
+	 * Infer IPA size to be equal to PA size if phys_shift is 0.
+	 */
+	if (phys_shift == 0)
+		phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
+
+	vtcr |= parange << VTCR_EL2_PS_SHIFT;
 	vtcr |= VTCR_EL2_T0SZ(phys_shift);
 	/*
 	 * Use a minimum 2 level page table to prevent splitting
@@ -604,6 +629,9 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 	 */
 	vtcr |= VTCR_EL2_HA;
 
+	if (lpa2_ena)
+		vtcr |= VTCR_EL2_DS;
+
 	/* Set the vmid bits */
 	vtcr |= (get_vmid_bits(mmfr1) == 16) ?
 		VTCR_EL2_VS_16BIT :
@@ -641,7 +669,9 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
 	if (prot & KVM_PGTABLE_PROT_W)
 		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
 
-	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
+	if (!pgt->lpa2_ena)
+		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
+
 	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
 	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
 	*ptep = attr;
@@ -1182,6 +1212,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
 	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+	bool lpa2_ena = (vtcr & VTCR_EL2_DS) != 0;
 
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
 	pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
@@ -1191,7 +1222,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	pgt->ia_bits		= ia_bits;
 	pgt->start_level	= start_level;
 	pgt->mm_ops		= mm_ops;
-	pgt->lpa2_ena		= false;
+	pgt->lpa2_ena		= lpa2_ena;
 	pgt->mmu		= mmu;
 	pgt->flags		= flags;
 	pgt->force_pte_cb	= force_pte_cb;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 07/12] KVM: arm64: Use LPA2 page-tables for hyp stage1 if HW supports it
  2022-12-06 13:59 ` Ryan Roberts
  (?)
@ 2022-12-06 13:59   ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: kvmarm, kvmarm, linux-arm-kernel

Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
page size we are using, always use LPA2-style page-tables for hyp stage
1, regardless of the IPA or PA size requirements. When in use we can now
support up to 52-bit IPA and PA sizes.

For the protected kvm case, the host creates the initial page-tables
using either the lpa2 or `classic` format as determined by whats
reported in mmfr0 and also sets the TCR_EL2.DS bit in the params
structure. The hypervisor then looks at this DS bit to determine the
format that it should use to re-create the page-tables.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 18 +++++++++++++++++-
 arch/arm64/kvm/arm.c                 |  2 ++
 arch/arm64/kvm/hyp/nvhe/setup.c      | 18 +++++++++++++-----
 arch/arm64/kvm/hyp/pgtable.c         |  7 ++++---
 arch/arm64/kvm/mmu.c                 |  3 ++-
 5 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index a7fd547dcc71..d6f4dcdd00fd 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -25,6 +25,21 @@
 #define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
 #endif
 
+static inline bool kvm_supports_hyp_lpa2(void)
+{
+#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
+	u64 mmfr0;
+	unsigned int tgran;
+
+	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
+						ID_AA64MMFR0_EL1_TGRAN_SHIFT);
+	return (tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2);
+#else
+	return false;
+#endif
+}
+
 static inline bool kvm_supports_stage2_lpa2(u64 mmfr0)
 {
 	unsigned int tgran;
@@ -253,11 +268,12 @@ struct kvm_pgtable_walker {
  * @pgt:	Uninitialised page-table structure to initialise.
  * @va_bits:	Maximum virtual address bits.
  * @mm_ops:	Memory management callbacks.
+ * @lpa2_ena:	Whether to use the lpa2 page-table format.
  *
  * Return: 0 on success, negative error code on failure.
  */
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
-			 struct kvm_pgtable_mm_ops *mm_ops);
+			 struct kvm_pgtable_mm_ops *mm_ops, bool lpa2_ena);
 
 /**
  * kvm_pgtable_hyp_destroy() - Destroy an unused hypervisor stage-1 page-table.
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 803055da3ee3..a234c6252c3c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1537,6 +1537,8 @@ static void cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 	tcr = (read_sysreg(tcr_el1) & TCR_EL2_MASK) | TCR_EL2_RES1;
 	tcr &= ~TCR_T0SZ_MASK;
 	tcr |= TCR_T0SZ(hyp_va_bits);
+	if (kvm_supports_hyp_lpa2())
+		tcr |= TCR_EL2_DS;
 	params->tcr_el2 = tcr;
 
 	params->pgd_pa = kvm_mmu_get_httbr();
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 60a6821ae98a..b44e87b9d168 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -56,7 +56,7 @@ static int divide_memory_pool(void *virt, unsigned long size)
 
 static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 				 unsigned long *per_cpu_base,
-				 u32 hyp_va_bits)
+				 u32 hyp_va_bits, bool lpa2_ena)
 {
 	void *start, *end, *virt = hyp_phys_to_virt(phys);
 	unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
@@ -66,7 +66,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	/* Recreate the hyp page-table using the early page allocator */
 	hyp_early_alloc_init(hyp_pgt_base, pgt_size);
 	ret = kvm_pgtable_hyp_init(&pkvm_pgtable, hyp_va_bits,
-				   &hyp_early_alloc_mm_ops);
+				   &hyp_early_alloc_mm_ops, lpa2_ena);
 	if (ret)
 		return ret;
 
@@ -304,10 +304,11 @@ void __noreturn __pkvm_init_finalise(void)
 int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
 		unsigned long *per_cpu_base, u32 hyp_va_bits)
 {
-	struct kvm_nvhe_init_params *params;
+	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
 	void *virt = hyp_phys_to_virt(phys);
 	void (*fn)(phys_addr_t params_pa, void *finalize_fn_va);
 	int ret;
+	bool lpa2_ena;
 
 	BUG_ON(kvm_check_pvm_sysreg_table());
 
@@ -321,14 +322,21 @@ int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
 	if (ret)
 		return ret;
 
-	ret = recreate_hyp_mappings(phys, size, per_cpu_base, hyp_va_bits);
+	/*
+	 * The host has already done the hard work to figure out if LPA2 is
+	 * supported at stage 1 and passed the info in the in the DS bit of the
+	 * TCR. Extract and pass on so that the page-tables are constructed with
+	 * the correct format.
+	 */
+	lpa2_ena = (params->tcr_el2 & TCR_EL2_DS) != 0;
+	ret = recreate_hyp_mappings(phys, size, per_cpu_base,
+				    hyp_va_bits, lpa2_ena);
 	if (ret)
 		return ret;
 
 	update_nvhe_init_params();
 
 	/* Jump in the idmap page to switch to the new page-tables */
-	params = this_cpu_ptr(&kvm_init_params);
 	fn = (typeof(fn))__hyp_pa(__pkvm_init_switch_pgd);
 	fn(__hyp_pa(params), __pkvm_init_finalise);
 
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 8ed7353f07bc..cde852f91db8 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -369,7 +369,8 @@ static int hyp_set_prot_attr(struct kvm_pgtable *pgt,
 	}
 
 	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
-	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
+	if (!pgt->lpa2_ena)
+		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
 	attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
 	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
 	*ptep = attr;
@@ -528,7 +529,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 }
 
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
-			 struct kvm_pgtable_mm_ops *mm_ops)
+			 struct kvm_pgtable_mm_ops *mm_ops, bool lpa2_ena)
 {
 	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
 
@@ -539,7 +540,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	pgt->ia_bits		= va_bits;
 	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
 	pgt->mm_ops		= mm_ops;
-	pgt->lpa2_ena		= false;
+	pgt->lpa2_ena		= lpa2_ena;
 	pgt->mmu		= NULL;
 	pgt->force_pte_cb	= NULL;
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index e3fe3e194fd1..13e48539f022 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1684,7 +1684,8 @@ int kvm_mmu_init(u32 *hyp_va_bits)
 		goto out;
 	}
 
-	err = kvm_pgtable_hyp_init(hyp_pgtable, *hyp_va_bits, &kvm_hyp_mm_ops);
+	err = kvm_pgtable_hyp_init(hyp_pgtable, *hyp_va_bits,
+				   &kvm_hyp_mm_ops, kvm_supports_hyp_lpa2());
 	if (err)
 		goto out_free_pgtable;
 
-- 
2.25.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 07/12] KVM: arm64: Use LPA2 page-tables for hyp stage1 if HW supports it
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
page size we are using, always use LPA2-style page-tables for hyp stage
1, regardless of the IPA or PA size requirements. When in use we can now
support up to 52-bit IPA and PA sizes.

For the protected kvm case, the host creates the initial page-tables
using either the lpa2 or `classic` format as determined by whats
reported in mmfr0 and also sets the TCR_EL2.DS bit in the params
structure. The hypervisor then looks at this DS bit to determine the
format that it should use to re-create the page-tables.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 18 +++++++++++++++++-
 arch/arm64/kvm/arm.c                 |  2 ++
 arch/arm64/kvm/hyp/nvhe/setup.c      | 18 +++++++++++++-----
 arch/arm64/kvm/hyp/pgtable.c         |  7 ++++---
 arch/arm64/kvm/mmu.c                 |  3 ++-
 5 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index a7fd547dcc71..d6f4dcdd00fd 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -25,6 +25,21 @@
 #define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
 #endif
 
+static inline bool kvm_supports_hyp_lpa2(void)
+{
+#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
+	u64 mmfr0;
+	unsigned int tgran;
+
+	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
+						ID_AA64MMFR0_EL1_TGRAN_SHIFT);
+	return (tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2);
+#else
+	return false;
+#endif
+}
+
 static inline bool kvm_supports_stage2_lpa2(u64 mmfr0)
 {
 	unsigned int tgran;
@@ -253,11 +268,12 @@ struct kvm_pgtable_walker {
  * @pgt:	Uninitialised page-table structure to initialise.
  * @va_bits:	Maximum virtual address bits.
  * @mm_ops:	Memory management callbacks.
+ * @lpa2_ena:	Whether to use the lpa2 page-table format.
  *
  * Return: 0 on success, negative error code on failure.
  */
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
-			 struct kvm_pgtable_mm_ops *mm_ops);
+			 struct kvm_pgtable_mm_ops *mm_ops, bool lpa2_ena);
 
 /**
  * kvm_pgtable_hyp_destroy() - Destroy an unused hypervisor stage-1 page-table.
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 803055da3ee3..a234c6252c3c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1537,6 +1537,8 @@ static void cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 	tcr = (read_sysreg(tcr_el1) & TCR_EL2_MASK) | TCR_EL2_RES1;
 	tcr &= ~TCR_T0SZ_MASK;
 	tcr |= TCR_T0SZ(hyp_va_bits);
+	if (kvm_supports_hyp_lpa2())
+		tcr |= TCR_EL2_DS;
 	params->tcr_el2 = tcr;
 
 	params->pgd_pa = kvm_mmu_get_httbr();
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 60a6821ae98a..b44e87b9d168 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -56,7 +56,7 @@ static int divide_memory_pool(void *virt, unsigned long size)
 
 static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 				 unsigned long *per_cpu_base,
-				 u32 hyp_va_bits)
+				 u32 hyp_va_bits, bool lpa2_ena)
 {
 	void *start, *end, *virt = hyp_phys_to_virt(phys);
 	unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
@@ -66,7 +66,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	/* Recreate the hyp page-table using the early page allocator */
 	hyp_early_alloc_init(hyp_pgt_base, pgt_size);
 	ret = kvm_pgtable_hyp_init(&pkvm_pgtable, hyp_va_bits,
-				   &hyp_early_alloc_mm_ops);
+				   &hyp_early_alloc_mm_ops, lpa2_ena);
 	if (ret)
 		return ret;
 
@@ -304,10 +304,11 @@ void __noreturn __pkvm_init_finalise(void)
 int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
 		unsigned long *per_cpu_base, u32 hyp_va_bits)
 {
-	struct kvm_nvhe_init_params *params;
+	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
 	void *virt = hyp_phys_to_virt(phys);
 	void (*fn)(phys_addr_t params_pa, void *finalize_fn_va);
 	int ret;
+	bool lpa2_ena;
 
 	BUG_ON(kvm_check_pvm_sysreg_table());
 
@@ -321,14 +322,21 @@ int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
 	if (ret)
 		return ret;
 
-	ret = recreate_hyp_mappings(phys, size, per_cpu_base, hyp_va_bits);
+	/*
+	 * The host has already done the hard work to figure out if LPA2 is
+	 * supported at stage 1 and passed the info in the in the DS bit of the
+	 * TCR. Extract and pass on so that the page-tables are constructed with
+	 * the correct format.
+	 */
+	lpa2_ena = (params->tcr_el2 & TCR_EL2_DS) != 0;
+	ret = recreate_hyp_mappings(phys, size, per_cpu_base,
+				    hyp_va_bits, lpa2_ena);
 	if (ret)
 		return ret;
 
 	update_nvhe_init_params();
 
 	/* Jump in the idmap page to switch to the new page-tables */
-	params = this_cpu_ptr(&kvm_init_params);
 	fn = (typeof(fn))__hyp_pa(__pkvm_init_switch_pgd);
 	fn(__hyp_pa(params), __pkvm_init_finalise);
 
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 8ed7353f07bc..cde852f91db8 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -369,7 +369,8 @@ static int hyp_set_prot_attr(struct kvm_pgtable *pgt,
 	}
 
 	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
-	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
+	if (!pgt->lpa2_ena)
+		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
 	attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
 	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
 	*ptep = attr;
@@ -528,7 +529,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 }
 
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
-			 struct kvm_pgtable_mm_ops *mm_ops)
+			 struct kvm_pgtable_mm_ops *mm_ops, bool lpa2_ena)
 {
 	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
 
@@ -539,7 +540,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	pgt->ia_bits		= va_bits;
 	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
 	pgt->mm_ops		= mm_ops;
-	pgt->lpa2_ena		= false;
+	pgt->lpa2_ena		= lpa2_ena;
 	pgt->mmu		= NULL;
 	pgt->force_pte_cb	= NULL;
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index e3fe3e194fd1..13e48539f022 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1684,7 +1684,8 @@ int kvm_mmu_init(u32 *hyp_va_bits)
 		goto out;
 	}
 
-	err = kvm_pgtable_hyp_init(hyp_pgtable, *hyp_va_bits, &kvm_hyp_mm_ops);
+	err = kvm_pgtable_hyp_init(hyp_pgtable, *hyp_va_bits,
+				   &kvm_hyp_mm_ops, kvm_supports_hyp_lpa2());
 	if (err)
 		goto out_free_pgtable;
 
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 07/12] KVM: arm64: Use LPA2 page-tables for hyp stage1 if HW supports it
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
page size we are using, always use LPA2-style page-tables for hyp stage
1, regardless of the IPA or PA size requirements. When in use we can now
support up to 52-bit IPA and PA sizes.

For the protected kvm case, the host creates the initial page-tables
using either the lpa2 or `classic` format as determined by whats
reported in mmfr0 and also sets the TCR_EL2.DS bit in the params
structure. The hypervisor then looks at this DS bit to determine the
format that it should use to re-create the page-tables.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 18 +++++++++++++++++-
 arch/arm64/kvm/arm.c                 |  2 ++
 arch/arm64/kvm/hyp/nvhe/setup.c      | 18 +++++++++++++-----
 arch/arm64/kvm/hyp/pgtable.c         |  7 ++++---
 arch/arm64/kvm/mmu.c                 |  3 ++-
 5 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index a7fd547dcc71..d6f4dcdd00fd 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -25,6 +25,21 @@
 #define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
 #endif
 
+static inline bool kvm_supports_hyp_lpa2(void)
+{
+#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
+	u64 mmfr0;
+	unsigned int tgran;
+
+	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
+						ID_AA64MMFR0_EL1_TGRAN_SHIFT);
+	return (tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2);
+#else
+	return false;
+#endif
+}
+
 static inline bool kvm_supports_stage2_lpa2(u64 mmfr0)
 {
 	unsigned int tgran;
@@ -253,11 +268,12 @@ struct kvm_pgtable_walker {
  * @pgt:	Uninitialised page-table structure to initialise.
  * @va_bits:	Maximum virtual address bits.
  * @mm_ops:	Memory management callbacks.
+ * @lpa2_ena:	Whether to use the lpa2 page-table format.
  *
  * Return: 0 on success, negative error code on failure.
  */
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
-			 struct kvm_pgtable_mm_ops *mm_ops);
+			 struct kvm_pgtable_mm_ops *mm_ops, bool lpa2_ena);
 
 /**
  * kvm_pgtable_hyp_destroy() - Destroy an unused hypervisor stage-1 page-table.
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 803055da3ee3..a234c6252c3c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1537,6 +1537,8 @@ static void cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 	tcr = (read_sysreg(tcr_el1) & TCR_EL2_MASK) | TCR_EL2_RES1;
 	tcr &= ~TCR_T0SZ_MASK;
 	tcr |= TCR_T0SZ(hyp_va_bits);
+	if (kvm_supports_hyp_lpa2())
+		tcr |= TCR_EL2_DS;
 	params->tcr_el2 = tcr;
 
 	params->pgd_pa = kvm_mmu_get_httbr();
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 60a6821ae98a..b44e87b9d168 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -56,7 +56,7 @@ static int divide_memory_pool(void *virt, unsigned long size)
 
 static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 				 unsigned long *per_cpu_base,
-				 u32 hyp_va_bits)
+				 u32 hyp_va_bits, bool lpa2_ena)
 {
 	void *start, *end, *virt = hyp_phys_to_virt(phys);
 	unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
@@ -66,7 +66,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	/* Recreate the hyp page-table using the early page allocator */
 	hyp_early_alloc_init(hyp_pgt_base, pgt_size);
 	ret = kvm_pgtable_hyp_init(&pkvm_pgtable, hyp_va_bits,
-				   &hyp_early_alloc_mm_ops);
+				   &hyp_early_alloc_mm_ops, lpa2_ena);
 	if (ret)
 		return ret;
 
@@ -304,10 +304,11 @@ void __noreturn __pkvm_init_finalise(void)
 int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
 		unsigned long *per_cpu_base, u32 hyp_va_bits)
 {
-	struct kvm_nvhe_init_params *params;
+	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
 	void *virt = hyp_phys_to_virt(phys);
 	void (*fn)(phys_addr_t params_pa, void *finalize_fn_va);
 	int ret;
+	bool lpa2_ena;
 
 	BUG_ON(kvm_check_pvm_sysreg_table());
 
@@ -321,14 +322,21 @@ int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
 	if (ret)
 		return ret;
 
-	ret = recreate_hyp_mappings(phys, size, per_cpu_base, hyp_va_bits);
+	/*
+	 * The host has already done the hard work to figure out if LPA2 is
+	 * supported at stage 1 and passed the info in the in the DS bit of the
+	 * TCR. Extract and pass on so that the page-tables are constructed with
+	 * the correct format.
+	 */
+	lpa2_ena = (params->tcr_el2 & TCR_EL2_DS) != 0;
+	ret = recreate_hyp_mappings(phys, size, per_cpu_base,
+				    hyp_va_bits, lpa2_ena);
 	if (ret)
 		return ret;
 
 	update_nvhe_init_params();
 
 	/* Jump in the idmap page to switch to the new page-tables */
-	params = this_cpu_ptr(&kvm_init_params);
 	fn = (typeof(fn))__hyp_pa(__pkvm_init_switch_pgd);
 	fn(__hyp_pa(params), __pkvm_init_finalise);
 
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 8ed7353f07bc..cde852f91db8 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -369,7 +369,8 @@ static int hyp_set_prot_attr(struct kvm_pgtable *pgt,
 	}
 
 	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
-	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
+	if (!pgt->lpa2_ena)
+		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
 	attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
 	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
 	*ptep = attr;
@@ -528,7 +529,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 }
 
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
-			 struct kvm_pgtable_mm_ops *mm_ops)
+			 struct kvm_pgtable_mm_ops *mm_ops, bool lpa2_ena)
 {
 	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
 
@@ -539,7 +540,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	pgt->ia_bits		= va_bits;
 	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
 	pgt->mm_ops		= mm_ops;
-	pgt->lpa2_ena		= false;
+	pgt->lpa2_ena		= lpa2_ena;
 	pgt->mmu		= NULL;
 	pgt->force_pte_cb	= NULL;
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index e3fe3e194fd1..13e48539f022 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1684,7 +1684,8 @@ int kvm_mmu_init(u32 *hyp_va_bits)
 		goto out;
 	}
 
-	err = kvm_pgtable_hyp_init(hyp_pgtable, *hyp_va_bits, &kvm_hyp_mm_ops);
+	err = kvm_pgtable_hyp_init(hyp_pgtable, *hyp_va_bits,
+				   &kvm_hyp_mm_ops, kvm_supports_hyp_lpa2());
 	if (err)
 		goto out_free_pgtable;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 08/12] KVM: arm64: Insert PS field at TCR_EL2 assembly time
  2022-12-06 13:59 ` Ryan Roberts
  (?)
@ 2022-12-06 13:59   ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: kvmarm, kvmarm, linux-arm-kernel

With the addition of LPA2 support in the hypervisor, the PA size
supported by the HW must be capped with a runtime decision, rather than
simply using a compile-time decision based on PA_BITS.For example, on a
system that advertises 52 bit PA but does not support FEAT_LPA2, A 4KB
or 16KB kernel compiled with LPA2 support must still limit the PA size
to 48 bits.

Therefore, move the insertion of the PS field into TCR_EL2 out of
__kvm_hyp_init assembly code and instead do it in cpu_prepare_hyp_mode()
where the rest of TCR_EL2 is assembled. This allows us to figure out PS
with kvm_get_parange(), which has the appropriate logic to ensure the
above requirement. (and the PS field of VTCR_EL2 is already populated
this way).

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/kvm/arm.c               | 5 ++++-
 arch/arm64/kvm/hyp/nvhe/hyp-init.S | 4 ----
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a234c6252c3c..ac30d849a308 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1522,6 +1522,8 @@ static void cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 {
 	struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
 	unsigned long tcr;
+	bool lpa2_ena = kvm_supports_hyp_lpa2();
+	u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 
 	/*
 	 * Calculate the raw per-cpu offset without a translation from the
@@ -1537,7 +1539,8 @@ static void cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 	tcr = (read_sysreg(tcr_el1) & TCR_EL2_MASK) | TCR_EL2_RES1;
 	tcr &= ~TCR_T0SZ_MASK;
 	tcr |= TCR_T0SZ(hyp_va_bits);
-	if (kvm_supports_hyp_lpa2())
+	tcr |= kvm_get_parange(mmfr0, lpa2_ena) << TCR_EL2_PS_SHIFT;
+	if (lpa2_ena)
 		tcr |= TCR_EL2_DS;
 	params->tcr_el2 = tcr;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index c953fb4b9a13..3cc6dd2ff253 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -108,11 +108,7 @@ alternative_if ARM64_HAS_CNP
 alternative_else_nop_endif
 	msr	ttbr0_el2, x2
 
-	/*
-	 * Set the PS bits in TCR_EL2.
-	 */
 	ldr	x0, [x0, #NVHE_INIT_TCR_EL2]
-	tcr_compute_pa_size x0, #TCR_EL2_PS_SHIFT, x1, x2
 	msr	tcr_el2, x0
 
 	isb
-- 
2.25.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 08/12] KVM: arm64: Insert PS field at TCR_EL2 assembly time
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

With the addition of LPA2 support in the hypervisor, the PA size
supported by the HW must be capped with a runtime decision, rather than
simply using a compile-time decision based on PA_BITS.For example, on a
system that advertises 52 bit PA but does not support FEAT_LPA2, A 4KB
or 16KB kernel compiled with LPA2 support must still limit the PA size
to 48 bits.

Therefore, move the insertion of the PS field into TCR_EL2 out of
__kvm_hyp_init assembly code and instead do it in cpu_prepare_hyp_mode()
where the rest of TCR_EL2 is assembled. This allows us to figure out PS
with kvm_get_parange(), which has the appropriate logic to ensure the
above requirement. (and the PS field of VTCR_EL2 is already populated
this way).

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/kvm/arm.c               | 5 ++++-
 arch/arm64/kvm/hyp/nvhe/hyp-init.S | 4 ----
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a234c6252c3c..ac30d849a308 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1522,6 +1522,8 @@ static void cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 {
 	struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
 	unsigned long tcr;
+	bool lpa2_ena = kvm_supports_hyp_lpa2();
+	u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 
 	/*
 	 * Calculate the raw per-cpu offset without a translation from the
@@ -1537,7 +1539,8 @@ static void cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 	tcr = (read_sysreg(tcr_el1) & TCR_EL2_MASK) | TCR_EL2_RES1;
 	tcr &= ~TCR_T0SZ_MASK;
 	tcr |= TCR_T0SZ(hyp_va_bits);
-	if (kvm_supports_hyp_lpa2())
+	tcr |= kvm_get_parange(mmfr0, lpa2_ena) << TCR_EL2_PS_SHIFT;
+	if (lpa2_ena)
 		tcr |= TCR_EL2_DS;
 	params->tcr_el2 = tcr;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index c953fb4b9a13..3cc6dd2ff253 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -108,11 +108,7 @@ alternative_if ARM64_HAS_CNP
 alternative_else_nop_endif
 	msr	ttbr0_el2, x2
 
-	/*
-	 * Set the PS bits in TCR_EL2.
-	 */
 	ldr	x0, [x0, #NVHE_INIT_TCR_EL2]
-	tcr_compute_pa_size x0, #TCR_EL2_PS_SHIFT, x1, x2
 	msr	tcr_el2, x0
 
 	isb
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 08/12] KVM: arm64: Insert PS field at TCR_EL2 assembly time
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

With the addition of LPA2 support in the hypervisor, the PA size
supported by the HW must be capped with a runtime decision, rather than
simply using a compile-time decision based on PA_BITS.For example, on a
system that advertises 52 bit PA but does not support FEAT_LPA2, A 4KB
or 16KB kernel compiled with LPA2 support must still limit the PA size
to 48 bits.

Therefore, move the insertion of the PS field into TCR_EL2 out of
__kvm_hyp_init assembly code and instead do it in cpu_prepare_hyp_mode()
where the rest of TCR_EL2 is assembled. This allows us to figure out PS
with kvm_get_parange(), which has the appropriate logic to ensure the
above requirement. (and the PS field of VTCR_EL2 is already populated
this way).

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/kvm/arm.c               | 5 ++++-
 arch/arm64/kvm/hyp/nvhe/hyp-init.S | 4 ----
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a234c6252c3c..ac30d849a308 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1522,6 +1522,8 @@ static void cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 {
 	struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
 	unsigned long tcr;
+	bool lpa2_ena = kvm_supports_hyp_lpa2();
+	u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 
 	/*
 	 * Calculate the raw per-cpu offset without a translation from the
@@ -1537,7 +1539,8 @@ static void cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 	tcr = (read_sysreg(tcr_el1) & TCR_EL2_MASK) | TCR_EL2_RES1;
 	tcr &= ~TCR_T0SZ_MASK;
 	tcr |= TCR_T0SZ(hyp_va_bits);
-	if (kvm_supports_hyp_lpa2())
+	tcr |= kvm_get_parange(mmfr0, lpa2_ena) << TCR_EL2_PS_SHIFT;
+	if (lpa2_ena)
 		tcr |= TCR_EL2_DS;
 	params->tcr_el2 = tcr;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index c953fb4b9a13..3cc6dd2ff253 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -108,11 +108,7 @@ alternative_if ARM64_HAS_CNP
 alternative_else_nop_endif
 	msr	ttbr0_el2, x2
 
-	/*
-	 * Set the PS bits in TCR_EL2.
-	 */
 	ldr	x0, [x0, #NVHE_INIT_TCR_EL2]
-	tcr_compute_pa_size x0, #TCR_EL2_PS_SHIFT, x1, x2
 	msr	tcr_el2, x0
 
 	isb
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 09/12] KVM: arm64: Convert translation level parameter to s8
  2022-12-06 13:59 ` Ryan Roberts
  (?)
@ 2022-12-06 13:59   ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: kvmarm, kvmarm, linux-arm-kernel

With the introduction of FEAT_LPA2, the Arm ARM adds a new level of
translation, level -1, so levels can now be in the range [-1;3]. 3 is
always the last level and the first level is determined based on the
number of VA bits in use.

Convert level variables to use a signed type in preparation for
supporting this new level -1.

Since the last level is always anchored at 3, and the first level varies
to suit the number of VA/IPA bits, take the opportunity to replace
KVM_PGTABLE_MAX_LEVELS with the 2 macros KVM_PGTABLE_FIRST_LEVEL and
KVM_PGTABLE_LAST_LEVEL. This removes the assumption from the code that
levels run from 0 to KVM_PGTABLE_MAX_LEVELS - 1, which will soon no
longer be true.

No behavioral changes intended.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h  |  2 +-
 arch/arm64/include/asm/kvm_pgtable.h  | 21 +++---
 arch/arm64/include/asm/kvm_pkvm.h     |  5 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  4 +-
 arch/arm64/kvm/hyp/pgtable.c          | 94 ++++++++++++++-------------
 arch/arm64/kvm/mmu.c                  | 11 ++--
 7 files changed, 75 insertions(+), 68 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 9bdba47f7e14..270f49e7f29a 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -341,7 +341,7 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
 }
 
-static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
+static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
 {
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
 }
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index d6f4dcdd00fd..a282a3d5ddbc 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,7 +11,8 @@
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 
-#define KVM_PGTABLE_MAX_LEVELS		4U
+#define KVM_PGTABLE_FIRST_LEVEL		0
+#define KVM_PGTABLE_LAST_LEVEL		3
 
 /*
  * The largest supported block sizes for KVM (no 52-bit PA support):
@@ -20,9 +21,9 @@
  *  - 64K (level 2):	512MB
  */
 #ifdef CONFIG_ARM64_4K_PAGES
-#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1U
+#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1
 #else
-#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
+#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2
 #endif
 
 static inline bool kvm_supports_hyp_lpa2(void)
@@ -84,18 +85,18 @@ static inline bool kvm_pte_valid(kvm_pte_t pte)
 	return pte & KVM_PTE_VALID;
 }
 
-static inline u64 kvm_granule_shift(u32 level)
+static inline u64 kvm_granule_shift(s8 level)
 {
-	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
+	/* Assumes KVM_PGTABLE_LAST_LEVEL is 3 */
 	return ARM64_HW_PGTABLE_LEVEL_SHIFT(level);
 }
 
-static inline u64 kvm_granule_size(u32 level)
+static inline u64 kvm_granule_size(s8 level)
 {
 	return BIT(kvm_granule_shift(level));
 }
 
-static inline bool kvm_level_supports_block_mapping(u32 level)
+static inline bool kvm_level_supports_block_mapping(s8 level)
 {
 	return level >= KVM_PGTABLE_MIN_BLOCK_LEVEL;
 }
@@ -202,7 +203,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
  */
 struct kvm_pgtable {
 	u32					ia_bits;
-	u32					start_level;
+	s8					start_level;
 	kvm_pte_t				*pgd;
 	struct kvm_pgtable_mm_ops		*mm_ops;
 	bool					lpa2_ena;
@@ -245,7 +246,7 @@ enum kvm_pgtable_walk_flags {
 };
 
 typedef int (*kvm_pgtable_visitor_fn_t)(struct kvm_pgtable *pgt,
-					u64 addr, u64 end, u32 level,
+					u64 addr, u64 end, s8 level,
 					kvm_pte_t *ptep,
 					enum kvm_pgtable_walk_flags flag,
 					void * const arg);
@@ -581,7 +582,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
  * Return: 0 on success, negative error code on failure.
  */
 int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
-			 kvm_pte_t *ptep, u32 *level);
+			 kvm_pte_t *ptep, s8 *level);
 
 /**
  * kvm_pgtable_stage2_pte_prot() - Retrieve the protection attributes of a
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 9f4ad2a8df59..addcf63cf8d5 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -16,10 +16,11 @@ extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
-	unsigned long total = 0, i;
+	unsigned long total = 0;
+	int i;
 
 	/* Provision the worst case scenario */
-	for (i = 0; i < KVM_PGTABLE_MAX_LEVELS; i++) {
+	for (i = KVM_PGTABLE_FIRST_LEVEL; i <= KVM_PGTABLE_LAST_LEVEL; i++) {
 		nr_pages = DIV_ROUND_UP(nr_pages, PTRS_PER_PTE);
 		total += nr_pages;
 	}
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 43e729694deb..96a5567a9db3 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -281,7 +281,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 {
 	struct kvm_mem_range cur;
 	kvm_pte_t pte;
-	u32 level;
+	s8 level;
 	int ret;
 
 	hyp_assert_lock_held(&host_kvm.lock);
@@ -300,7 +300,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 		cur.start = ALIGN_DOWN(addr, granule);
 		cur.end = cur.start + granule;
 		level++;
-	} while ((level < KVM_PGTABLE_MAX_LEVELS) &&
+	} while ((level <= KVM_PGTABLE_LAST_LEVEL) &&
 			!(kvm_level_supports_block_mapping(level) &&
 			  range_included(&cur, range)));
 
@@ -416,7 +416,7 @@ struct check_walk_data {
 };
 
 static int __check_page_state_visitor(struct kvm_pgtable *pgt,
-				      u64 addr, u64 end, u32 level,
+				      u64 addr, u64 end, s8 level,
 				      kvm_pte_t *ptep,
 				      enum kvm_pgtable_walk_flags flag,
 				      void * const arg)
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index b44e87b9d168..0355c53b3530 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -187,7 +187,7 @@ static void hpool_put_page(void *addr)
 }
 
 static int finalize_host_mappings_walker(struct kvm_pgtable *pgt,
-					 u64 addr, u64 end, u32 level,
+					 u64 addr, u64 end, s8 level,
 					 kvm_pte_t *ptep,
 					 enum kvm_pgtable_walk_flags flag,
 					 void * const arg)
@@ -210,7 +210,7 @@ static int finalize_host_mappings_walker(struct kvm_pgtable *pgt,
 	if (flag != KVM_PGTABLE_WALK_LEAF)
 		return 0;
 
-	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
+	if (level != KVM_PGTABLE_LAST_LEVEL)
 		return -EINVAL;
 
 	phys = kvm_pte_to_phys(pgt, pte);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index cde852f91db8..274f839bd0d7 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -68,7 +68,7 @@ static bool kvm_phys_is_valid(struct kvm_pgtable *pgt, u64 phys)
 }
 
 static bool kvm_block_mapping_supported(struct kvm_pgtable *pgt,
-					u64 addr, u64 end, u64 phys, u32 level)
+					u64 addr, u64 end, u64 phys, s8 level)
 {
 	u64 granule = kvm_granule_size(level);
 
@@ -84,7 +84,7 @@ static bool kvm_block_mapping_supported(struct kvm_pgtable *pgt,
 	return IS_ALIGNED(addr, granule);
 }
 
-static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
+static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, s8 level)
 {
 	u64 shift = kvm_granule_shift(level);
 	u64 mask = BIT(PAGE_SHIFT - 3) - 1;
@@ -105,7 +105,7 @@ static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
 	return __kvm_pgd_page_idx(data->pgt, data->addr);
 }
 
-static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
+static u32 kvm_pgd_pages(u32 ia_bits, s8 start_level)
 {
 	struct kvm_pgtable pgt = {
 		.ia_bits	= ia_bits,
@@ -115,9 +115,9 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
 	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
 }
 
-static bool kvm_pte_table(kvm_pte_t pte, u32 level)
+static bool kvm_pte_table(kvm_pte_t pte, s8 level)
 {
-	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
+	if (level == KVM_PGTABLE_LAST_LEVEL)
 		return false;
 
 	if (!kvm_pte_valid(pte))
@@ -166,11 +166,11 @@ static void kvm_set_table_pte(struct kvm_pgtable *pgt,
 }
 
 static kvm_pte_t kvm_init_valid_leaf_pte(struct kvm_pgtable *pgt,
-					 u64 pa, kvm_pte_t attr, u32 level)
+					 u64 pa, kvm_pte_t attr, s8 level)
 {
 	kvm_pte_t pte = kvm_phys_to_pte(pgt, pa);
-	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
-							   KVM_PTE_TYPE_BLOCK;
+	u64 type = (level == KVM_PGTABLE_LAST_LEVEL) ? KVM_PTE_TYPE_PAGE :
+						       KVM_PTE_TYPE_BLOCK;
 
 	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
 	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
@@ -185,7 +185,7 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
 }
 
 static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
-				  u32 level, kvm_pte_t *ptep,
+				  s8 level, kvm_pte_t *ptep,
 				  enum kvm_pgtable_walk_flags flag)
 {
 	struct kvm_pgtable_walker *walker = data->walker;
@@ -194,10 +194,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      kvm_pte_t *pgtable, u32 level);
+			      kvm_pte_t *pgtable, s8 level);
 
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
-				      kvm_pte_t *ptep, u32 level)
+				      kvm_pte_t *ptep, s8 level)
 {
 	int ret = 0;
 	u64 addr = data->addr;
@@ -241,12 +241,12 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      kvm_pte_t *pgtable, u32 level)
+			      kvm_pte_t *pgtable, s8 level)
 {
 	u32 idx;
 	int ret = 0;
 
-	if (WARN_ON_ONCE(level >= KVM_PGTABLE_MAX_LEVELS))
+	if (WARN_ON_ONCE(level > KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
@@ -302,11 +302,11 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 
 struct leaf_walk_data {
 	kvm_pte_t	pte;
-	u32		level;
+	s8		level;
 };
 
 static int leaf_walker(struct kvm_pgtable *pgt,
-		       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+		       u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 		       enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct leaf_walk_data *data = arg;
@@ -318,7 +318,7 @@ static int leaf_walker(struct kvm_pgtable *pgt,
 }
 
 int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
-			 kvm_pte_t *ptep, u32 *level)
+			 kvm_pte_t *ptep, s8 *level)
 {
 	struct leaf_walk_data data;
 	struct kvm_pgtable_walker walker = {
@@ -399,7 +399,7 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
 }
 
 static bool hyp_map_walker_try_leaf(struct kvm_pgtable *pgt,
-				    u64 addr, u64 end, u32 level,
+				    u64 addr, u64 end, s8 level,
 				    kvm_pte_t *ptep, struct hyp_map_data *data)
 {
 	kvm_pte_t new, old = *ptep;
@@ -422,7 +422,7 @@ static bool hyp_map_walker_try_leaf(struct kvm_pgtable *pgt,
 }
 
 static int hyp_map_walker(struct kvm_pgtable *pgt,
-			  u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			  u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			  enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t *childp;
@@ -432,7 +432,7 @@ static int hyp_map_walker(struct kvm_pgtable *pgt,
 	if (hyp_map_walker_try_leaf(pgt, addr, end, level, ptep, data))
 		return 0;
 
-	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(level == KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
@@ -472,7 +472,7 @@ struct hyp_unmap_data {
 };
 
 static int hyp_unmap_walker(struct kvm_pgtable *pgt,
-			    u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			    u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			    enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t pte = *ptep, *childp = NULL;
@@ -531,14 +531,18 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 			 struct kvm_pgtable_mm_ops *mm_ops, bool lpa2_ena)
 {
-	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
+	s8 start_level = KVM_PGTABLE_LAST_LEVEL + 1 -
+			 ARM64_HW_PGTABLE_LEVELS(va_bits);
+	if (start_level < KVM_PGTABLE_FIRST_LEVEL ||
+	    start_level > KVM_PGTABLE_LAST_LEVEL)
+		return -EINVAL;
 
 	pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
 	pgt->ia_bits		= va_bits;
-	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
+	pgt->start_level	= start_level;
 	pgt->mm_ops		= mm_ops;
 	pgt->lpa2_ena		= lpa2_ena;
 	pgt->mmu		= NULL;
@@ -548,7 +552,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 }
 
 static int hyp_free_walker(struct kvm_pgtable *pgt,
-			   u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			   u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			   enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
@@ -594,7 +598,7 @@ struct stage2_map_data {
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 {
 	u64 vtcr = VTCR_EL2_FLAGS;
-	u8 lvls;
+	s8 levels;
 	u64 parange;
 	bool lpa2_ena = false;
 
@@ -618,10 +622,10 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 	 * Use a minimum 2 level page table to prevent splitting
 	 * host PMD huge pages at stage2.
 	 */
-	lvls = stage2_pgtable_levels(phys_shift);
-	if (lvls < 2)
-		lvls = 2;
-	vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
+	levels = stage2_pgtable_levels(phys_shift);
+	if (levels < 2)
+		levels = 2;
+	vtcr |= VTCR_EL2_LVLS_TO_SL0(levels);
 
 	/*
 	 * Enable the Hardware Access Flag management, unconditionally
@@ -716,7 +720,7 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 }
 
 static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
-			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
+			   s8 level, struct kvm_pgtable_mm_ops *mm_ops)
 {
 	/*
 	 * Clear the existing PTE, and perform break-before-make with
@@ -742,17 +746,17 @@ static bool stage2_pte_executable(kvm_pte_t pte)
 }
 
 static bool stage2_leaf_mapping_allowed(struct kvm_pgtable *pgt,
-					u64 addr, u64 end, u32 level,
+					u64 addr, u64 end, s8 level,
 					struct stage2_map_data *data)
 {
-	if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
+	if (data->force_pte && level < KVM_PGTABLE_LAST_LEVEL)
 		return false;
 
 	return kvm_block_mapping_supported(pgt, addr, end, data->phys, level);
 }
 
 static int stage2_map_walker_try_leaf(struct kvm_pgtable *pgt,
-				      u64 addr, u64 end, u32 level,
+				      u64 addr, u64 end, s8 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
@@ -798,7 +802,7 @@ static int stage2_map_walker_try_leaf(struct kvm_pgtable *pgt,
 }
 
 static int stage2_map_walk_table_pre(struct kvm_pgtable *pgt,
-				     u64 addr, u64 end, u32 level,
+				     u64 addr, u64 end, s8 level,
 				     kvm_pte_t *ptep,
 				     struct stage2_map_data *data)
 {
@@ -822,7 +826,7 @@ static int stage2_map_walk_table_pre(struct kvm_pgtable *pgt,
 }
 
 static int stage2_map_walk_leaf(struct kvm_pgtable *pgt,
-				u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+				u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 				struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
@@ -840,7 +844,7 @@ static int stage2_map_walk_leaf(struct kvm_pgtable *pgt,
 	if (ret != -E2BIG)
 		return ret;
 
-	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(level == KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	if (!data->memcache)
@@ -865,7 +869,7 @@ static int stage2_map_walk_leaf(struct kvm_pgtable *pgt,
 }
 
 static int stage2_map_walk_table_post(struct kvm_pgtable *pgt,
-				      u64 addr, u64 end, u32 level,
+				      u64 addr, u64 end, s8 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
@@ -911,7 +915,7 @@ static int stage2_map_walk_table_post(struct kvm_pgtable *pgt,
  * pointer and clearing the anchor to NULL.
  */
 static int stage2_map_walker(struct kvm_pgtable *pgt,
-			     u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			     u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			     enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct stage2_map_data *data = arg;
@@ -984,7 +988,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 }
 
 static int stage2_unmap_walker(struct kvm_pgtable *pgt,
-			       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			       u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
@@ -1041,11 +1045,11 @@ struct stage2_attr_data {
 	kvm_pte_t			attr_set;
 	kvm_pte_t			attr_clr;
 	kvm_pte_t			pte;
-	u32				level;
+	s8				level;
 };
 
 static int stage2_attr_walker(struct kvm_pgtable *pgt,
-			      u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			      u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
@@ -1084,7 +1088,7 @@ static int stage2_attr_walker(struct kvm_pgtable *pgt,
 static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 				    u64 size, kvm_pte_t attr_set,
 				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
-				    u32 *level)
+				    s8 *level)
 {
 	int ret;
 	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
@@ -1151,7 +1155,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 				   enum kvm_pgtable_prot prot)
 {
 	int ret;
-	u32 level;
+	s8 level;
 	kvm_pte_t set = 0, clr = 0;
 
 	if (prot & KVM_PTE_LEAF_ATTR_HI_SW)
@@ -1173,7 +1177,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 }
 
 static int stage2_flush_walker(struct kvm_pgtable *pgt,
-			       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			       u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
@@ -1212,7 +1216,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	u64 vtcr = mmu->arch->vtcr;
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
-	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 	bool lpa2_ena = (vtcr & VTCR_EL2_DS) != 0;
 
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
@@ -1234,7 +1238,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 }
 
 static int stage2_free_walker(struct kvm_pgtable *pgt,
-			      u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			      u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 13e48539f022..4ce46be3f0a0 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -642,18 +642,19 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 	struct kvm_pgtable pgt = {
 		.pgd		= (kvm_pte_t *)kvm->mm->pgd,
 		.ia_bits	= vabits_actual,
-		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
-				   CONFIG_PGTABLE_LEVELS),
+		.start_level	= (KVM_PGTABLE_LAST_LEVEL -
+				   CONFIG_PGTABLE_LEVELS + 1),
 		.mm_ops		= &kvm_user_mm_ops,
 		.lpa2_ena	= lpa2_is_enabled(),
 	};
 	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
-	u32 level = ~0;
+	s8 level = ~0;
 	int ret;
 
 	ret = kvm_pgtable_get_leaf(&pgt, addr, &pte, &level);
 	VM_BUG_ON(ret);
-	VM_BUG_ON(level >= KVM_PGTABLE_MAX_LEVELS);
+	VM_BUG_ON(level > KVM_PGTABLE_LAST_LEVEL);
+	VM_BUG_ON(level < KVM_PGTABLE_FIRST_LEVEL);
 	VM_BUG_ON(!(pte & PTE_VALID));
 
 	return BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(level));
@@ -1138,7 +1139,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
 	bool use_read_lock = false;
-	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
+	s8 fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	unsigned long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
-- 
2.25.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 09/12] KVM: arm64: Convert translation level parameter to s8
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

With the introduction of FEAT_LPA2, the Arm ARM adds a new level of
translation, level -1, so levels can now be in the range [-1;3]. 3 is
always the last level and the first level is determined based on the
number of VA bits in use.

Convert level variables to use a signed type in preparation for
supporting this new level -1.

Since the last level is always anchored at 3, and the first level varies
to suit the number of VA/IPA bits, take the opportunity to replace
KVM_PGTABLE_MAX_LEVELS with the 2 macros KVM_PGTABLE_FIRST_LEVEL and
KVM_PGTABLE_LAST_LEVEL. This removes the assumption from the code that
levels run from 0 to KVM_PGTABLE_MAX_LEVELS - 1, which will soon no
longer be true.

No behavioral changes intended.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h  |  2 +-
 arch/arm64/include/asm/kvm_pgtable.h  | 21 +++---
 arch/arm64/include/asm/kvm_pkvm.h     |  5 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  4 +-
 arch/arm64/kvm/hyp/pgtable.c          | 94 ++++++++++++++-------------
 arch/arm64/kvm/mmu.c                  | 11 ++--
 7 files changed, 75 insertions(+), 68 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 9bdba47f7e14..270f49e7f29a 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -341,7 +341,7 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
 }
 
-static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
+static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
 {
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
 }
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index d6f4dcdd00fd..a282a3d5ddbc 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,7 +11,8 @@
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 
-#define KVM_PGTABLE_MAX_LEVELS		4U
+#define KVM_PGTABLE_FIRST_LEVEL		0
+#define KVM_PGTABLE_LAST_LEVEL		3
 
 /*
  * The largest supported block sizes for KVM (no 52-bit PA support):
@@ -20,9 +21,9 @@
  *  - 64K (level 2):	512MB
  */
 #ifdef CONFIG_ARM64_4K_PAGES
-#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1U
+#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1
 #else
-#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
+#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2
 #endif
 
 static inline bool kvm_supports_hyp_lpa2(void)
@@ -84,18 +85,18 @@ static inline bool kvm_pte_valid(kvm_pte_t pte)
 	return pte & KVM_PTE_VALID;
 }
 
-static inline u64 kvm_granule_shift(u32 level)
+static inline u64 kvm_granule_shift(s8 level)
 {
-	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
+	/* Assumes KVM_PGTABLE_LAST_LEVEL is 3 */
 	return ARM64_HW_PGTABLE_LEVEL_SHIFT(level);
 }
 
-static inline u64 kvm_granule_size(u32 level)
+static inline u64 kvm_granule_size(s8 level)
 {
 	return BIT(kvm_granule_shift(level));
 }
 
-static inline bool kvm_level_supports_block_mapping(u32 level)
+static inline bool kvm_level_supports_block_mapping(s8 level)
 {
 	return level >= KVM_PGTABLE_MIN_BLOCK_LEVEL;
 }
@@ -202,7 +203,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
  */
 struct kvm_pgtable {
 	u32					ia_bits;
-	u32					start_level;
+	s8					start_level;
 	kvm_pte_t				*pgd;
 	struct kvm_pgtable_mm_ops		*mm_ops;
 	bool					lpa2_ena;
@@ -245,7 +246,7 @@ enum kvm_pgtable_walk_flags {
 };
 
 typedef int (*kvm_pgtable_visitor_fn_t)(struct kvm_pgtable *pgt,
-					u64 addr, u64 end, u32 level,
+					u64 addr, u64 end, s8 level,
 					kvm_pte_t *ptep,
 					enum kvm_pgtable_walk_flags flag,
 					void * const arg);
@@ -581,7 +582,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
  * Return: 0 on success, negative error code on failure.
  */
 int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
-			 kvm_pte_t *ptep, u32 *level);
+			 kvm_pte_t *ptep, s8 *level);
 
 /**
  * kvm_pgtable_stage2_pte_prot() - Retrieve the protection attributes of a
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 9f4ad2a8df59..addcf63cf8d5 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -16,10 +16,11 @@ extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
-	unsigned long total = 0, i;
+	unsigned long total = 0;
+	int i;
 
 	/* Provision the worst case scenario */
-	for (i = 0; i < KVM_PGTABLE_MAX_LEVELS; i++) {
+	for (i = KVM_PGTABLE_FIRST_LEVEL; i <= KVM_PGTABLE_LAST_LEVEL; i++) {
 		nr_pages = DIV_ROUND_UP(nr_pages, PTRS_PER_PTE);
 		total += nr_pages;
 	}
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 43e729694deb..96a5567a9db3 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -281,7 +281,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 {
 	struct kvm_mem_range cur;
 	kvm_pte_t pte;
-	u32 level;
+	s8 level;
 	int ret;
 
 	hyp_assert_lock_held(&host_kvm.lock);
@@ -300,7 +300,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 		cur.start = ALIGN_DOWN(addr, granule);
 		cur.end = cur.start + granule;
 		level++;
-	} while ((level < KVM_PGTABLE_MAX_LEVELS) &&
+	} while ((level <= KVM_PGTABLE_LAST_LEVEL) &&
 			!(kvm_level_supports_block_mapping(level) &&
 			  range_included(&cur, range)));
 
@@ -416,7 +416,7 @@ struct check_walk_data {
 };
 
 static int __check_page_state_visitor(struct kvm_pgtable *pgt,
-				      u64 addr, u64 end, u32 level,
+				      u64 addr, u64 end, s8 level,
 				      kvm_pte_t *ptep,
 				      enum kvm_pgtable_walk_flags flag,
 				      void * const arg)
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index b44e87b9d168..0355c53b3530 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -187,7 +187,7 @@ static void hpool_put_page(void *addr)
 }
 
 static int finalize_host_mappings_walker(struct kvm_pgtable *pgt,
-					 u64 addr, u64 end, u32 level,
+					 u64 addr, u64 end, s8 level,
 					 kvm_pte_t *ptep,
 					 enum kvm_pgtable_walk_flags flag,
 					 void * const arg)
@@ -210,7 +210,7 @@ static int finalize_host_mappings_walker(struct kvm_pgtable *pgt,
 	if (flag != KVM_PGTABLE_WALK_LEAF)
 		return 0;
 
-	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
+	if (level != KVM_PGTABLE_LAST_LEVEL)
 		return -EINVAL;
 
 	phys = kvm_pte_to_phys(pgt, pte);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index cde852f91db8..274f839bd0d7 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -68,7 +68,7 @@ static bool kvm_phys_is_valid(struct kvm_pgtable *pgt, u64 phys)
 }
 
 static bool kvm_block_mapping_supported(struct kvm_pgtable *pgt,
-					u64 addr, u64 end, u64 phys, u32 level)
+					u64 addr, u64 end, u64 phys, s8 level)
 {
 	u64 granule = kvm_granule_size(level);
 
@@ -84,7 +84,7 @@ static bool kvm_block_mapping_supported(struct kvm_pgtable *pgt,
 	return IS_ALIGNED(addr, granule);
 }
 
-static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
+static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, s8 level)
 {
 	u64 shift = kvm_granule_shift(level);
 	u64 mask = BIT(PAGE_SHIFT - 3) - 1;
@@ -105,7 +105,7 @@ static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
 	return __kvm_pgd_page_idx(data->pgt, data->addr);
 }
 
-static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
+static u32 kvm_pgd_pages(u32 ia_bits, s8 start_level)
 {
 	struct kvm_pgtable pgt = {
 		.ia_bits	= ia_bits,
@@ -115,9 +115,9 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
 	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
 }
 
-static bool kvm_pte_table(kvm_pte_t pte, u32 level)
+static bool kvm_pte_table(kvm_pte_t pte, s8 level)
 {
-	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
+	if (level == KVM_PGTABLE_LAST_LEVEL)
 		return false;
 
 	if (!kvm_pte_valid(pte))
@@ -166,11 +166,11 @@ static void kvm_set_table_pte(struct kvm_pgtable *pgt,
 }
 
 static kvm_pte_t kvm_init_valid_leaf_pte(struct kvm_pgtable *pgt,
-					 u64 pa, kvm_pte_t attr, u32 level)
+					 u64 pa, kvm_pte_t attr, s8 level)
 {
 	kvm_pte_t pte = kvm_phys_to_pte(pgt, pa);
-	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
-							   KVM_PTE_TYPE_BLOCK;
+	u64 type = (level == KVM_PGTABLE_LAST_LEVEL) ? KVM_PTE_TYPE_PAGE :
+						       KVM_PTE_TYPE_BLOCK;
 
 	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
 	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
@@ -185,7 +185,7 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
 }
 
 static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
-				  u32 level, kvm_pte_t *ptep,
+				  s8 level, kvm_pte_t *ptep,
 				  enum kvm_pgtable_walk_flags flag)
 {
 	struct kvm_pgtable_walker *walker = data->walker;
@@ -194,10 +194,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      kvm_pte_t *pgtable, u32 level);
+			      kvm_pte_t *pgtable, s8 level);
 
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
-				      kvm_pte_t *ptep, u32 level)
+				      kvm_pte_t *ptep, s8 level)
 {
 	int ret = 0;
 	u64 addr = data->addr;
@@ -241,12 +241,12 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      kvm_pte_t *pgtable, u32 level)
+			      kvm_pte_t *pgtable, s8 level)
 {
 	u32 idx;
 	int ret = 0;
 
-	if (WARN_ON_ONCE(level >= KVM_PGTABLE_MAX_LEVELS))
+	if (WARN_ON_ONCE(level > KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
@@ -302,11 +302,11 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 
 struct leaf_walk_data {
 	kvm_pte_t	pte;
-	u32		level;
+	s8		level;
 };
 
 static int leaf_walker(struct kvm_pgtable *pgt,
-		       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+		       u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 		       enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct leaf_walk_data *data = arg;
@@ -318,7 +318,7 @@ static int leaf_walker(struct kvm_pgtable *pgt,
 }
 
 int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
-			 kvm_pte_t *ptep, u32 *level)
+			 kvm_pte_t *ptep, s8 *level)
 {
 	struct leaf_walk_data data;
 	struct kvm_pgtable_walker walker = {
@@ -399,7 +399,7 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
 }
 
 static bool hyp_map_walker_try_leaf(struct kvm_pgtable *pgt,
-				    u64 addr, u64 end, u32 level,
+				    u64 addr, u64 end, s8 level,
 				    kvm_pte_t *ptep, struct hyp_map_data *data)
 {
 	kvm_pte_t new, old = *ptep;
@@ -422,7 +422,7 @@ static bool hyp_map_walker_try_leaf(struct kvm_pgtable *pgt,
 }
 
 static int hyp_map_walker(struct kvm_pgtable *pgt,
-			  u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			  u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			  enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t *childp;
@@ -432,7 +432,7 @@ static int hyp_map_walker(struct kvm_pgtable *pgt,
 	if (hyp_map_walker_try_leaf(pgt, addr, end, level, ptep, data))
 		return 0;
 
-	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(level == KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
@@ -472,7 +472,7 @@ struct hyp_unmap_data {
 };
 
 static int hyp_unmap_walker(struct kvm_pgtable *pgt,
-			    u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			    u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			    enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t pte = *ptep, *childp = NULL;
@@ -531,14 +531,18 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 			 struct kvm_pgtable_mm_ops *mm_ops, bool lpa2_ena)
 {
-	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
+	s8 start_level = KVM_PGTABLE_LAST_LEVEL + 1 -
+			 ARM64_HW_PGTABLE_LEVELS(va_bits);
+	if (start_level < KVM_PGTABLE_FIRST_LEVEL ||
+	    start_level > KVM_PGTABLE_LAST_LEVEL)
+		return -EINVAL;
 
 	pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
 	pgt->ia_bits		= va_bits;
-	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
+	pgt->start_level	= start_level;
 	pgt->mm_ops		= mm_ops;
 	pgt->lpa2_ena		= lpa2_ena;
 	pgt->mmu		= NULL;
@@ -548,7 +552,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 }
 
 static int hyp_free_walker(struct kvm_pgtable *pgt,
-			   u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			   u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			   enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
@@ -594,7 +598,7 @@ struct stage2_map_data {
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 {
 	u64 vtcr = VTCR_EL2_FLAGS;
-	u8 lvls;
+	s8 levels;
 	u64 parange;
 	bool lpa2_ena = false;
 
@@ -618,10 +622,10 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 	 * Use a minimum 2 level page table to prevent splitting
 	 * host PMD huge pages at stage2.
 	 */
-	lvls = stage2_pgtable_levels(phys_shift);
-	if (lvls < 2)
-		lvls = 2;
-	vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
+	levels = stage2_pgtable_levels(phys_shift);
+	if (levels < 2)
+		levels = 2;
+	vtcr |= VTCR_EL2_LVLS_TO_SL0(levels);
 
 	/*
 	 * Enable the Hardware Access Flag management, unconditionally
@@ -716,7 +720,7 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 }
 
 static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
-			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
+			   s8 level, struct kvm_pgtable_mm_ops *mm_ops)
 {
 	/*
 	 * Clear the existing PTE, and perform break-before-make with
@@ -742,17 +746,17 @@ static bool stage2_pte_executable(kvm_pte_t pte)
 }
 
 static bool stage2_leaf_mapping_allowed(struct kvm_pgtable *pgt,
-					u64 addr, u64 end, u32 level,
+					u64 addr, u64 end, s8 level,
 					struct stage2_map_data *data)
 {
-	if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
+	if (data->force_pte && level < KVM_PGTABLE_LAST_LEVEL)
 		return false;
 
 	return kvm_block_mapping_supported(pgt, addr, end, data->phys, level);
 }
 
 static int stage2_map_walker_try_leaf(struct kvm_pgtable *pgt,
-				      u64 addr, u64 end, u32 level,
+				      u64 addr, u64 end, s8 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
@@ -798,7 +802,7 @@ static int stage2_map_walker_try_leaf(struct kvm_pgtable *pgt,
 }
 
 static int stage2_map_walk_table_pre(struct kvm_pgtable *pgt,
-				     u64 addr, u64 end, u32 level,
+				     u64 addr, u64 end, s8 level,
 				     kvm_pte_t *ptep,
 				     struct stage2_map_data *data)
 {
@@ -822,7 +826,7 @@ static int stage2_map_walk_table_pre(struct kvm_pgtable *pgt,
 }
 
 static int stage2_map_walk_leaf(struct kvm_pgtable *pgt,
-				u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+				u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 				struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
@@ -840,7 +844,7 @@ static int stage2_map_walk_leaf(struct kvm_pgtable *pgt,
 	if (ret != -E2BIG)
 		return ret;
 
-	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(level == KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	if (!data->memcache)
@@ -865,7 +869,7 @@ static int stage2_map_walk_leaf(struct kvm_pgtable *pgt,
 }
 
 static int stage2_map_walk_table_post(struct kvm_pgtable *pgt,
-				      u64 addr, u64 end, u32 level,
+				      u64 addr, u64 end, s8 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
@@ -911,7 +915,7 @@ static int stage2_map_walk_table_post(struct kvm_pgtable *pgt,
  * pointer and clearing the anchor to NULL.
  */
 static int stage2_map_walker(struct kvm_pgtable *pgt,
-			     u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			     u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			     enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct stage2_map_data *data = arg;
@@ -984,7 +988,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 }
 
 static int stage2_unmap_walker(struct kvm_pgtable *pgt,
-			       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			       u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
@@ -1041,11 +1045,11 @@ struct stage2_attr_data {
 	kvm_pte_t			attr_set;
 	kvm_pte_t			attr_clr;
 	kvm_pte_t			pte;
-	u32				level;
+	s8				level;
 };
 
 static int stage2_attr_walker(struct kvm_pgtable *pgt,
-			      u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			      u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
@@ -1084,7 +1088,7 @@ static int stage2_attr_walker(struct kvm_pgtable *pgt,
 static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 				    u64 size, kvm_pte_t attr_set,
 				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
-				    u32 *level)
+				    s8 *level)
 {
 	int ret;
 	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
@@ -1151,7 +1155,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 				   enum kvm_pgtable_prot prot)
 {
 	int ret;
-	u32 level;
+	s8 level;
 	kvm_pte_t set = 0, clr = 0;
 
 	if (prot & KVM_PTE_LEAF_ATTR_HI_SW)
@@ -1173,7 +1177,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 }
 
 static int stage2_flush_walker(struct kvm_pgtable *pgt,
-			       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			       u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
@@ -1212,7 +1216,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	u64 vtcr = mmu->arch->vtcr;
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
-	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 	bool lpa2_ena = (vtcr & VTCR_EL2_DS) != 0;
 
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
@@ -1234,7 +1238,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 }
 
 static int stage2_free_walker(struct kvm_pgtable *pgt,
-			      u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			      u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 13e48539f022..4ce46be3f0a0 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -642,18 +642,19 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 	struct kvm_pgtable pgt = {
 		.pgd		= (kvm_pte_t *)kvm->mm->pgd,
 		.ia_bits	= vabits_actual,
-		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
-				   CONFIG_PGTABLE_LEVELS),
+		.start_level	= (KVM_PGTABLE_LAST_LEVEL -
+				   CONFIG_PGTABLE_LEVELS + 1),
 		.mm_ops		= &kvm_user_mm_ops,
 		.lpa2_ena	= lpa2_is_enabled(),
 	};
 	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
-	u32 level = ~0;
+	s8 level = ~0;
 	int ret;
 
 	ret = kvm_pgtable_get_leaf(&pgt, addr, &pte, &level);
 	VM_BUG_ON(ret);
-	VM_BUG_ON(level >= KVM_PGTABLE_MAX_LEVELS);
+	VM_BUG_ON(level > KVM_PGTABLE_LAST_LEVEL);
+	VM_BUG_ON(level < KVM_PGTABLE_FIRST_LEVEL);
 	VM_BUG_ON(!(pte & PTE_VALID));
 
 	return BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(level));
@@ -1138,7 +1139,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
 	bool use_read_lock = false;
-	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
+	s8 fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	unsigned long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 09/12] KVM: arm64: Convert translation level parameter to s8
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

With the introduction of FEAT_LPA2, the Arm ARM adds a new level of
translation, level -1, so levels can now be in the range [-1;3]. 3 is
always the last level and the first level is determined based on the
number of VA bits in use.

Convert level variables to use a signed type in preparation for
supporting this new level -1.

Since the last level is always anchored at 3, and the first level varies
to suit the number of VA/IPA bits, take the opportunity to replace
KVM_PGTABLE_MAX_LEVELS with the 2 macros KVM_PGTABLE_FIRST_LEVEL and
KVM_PGTABLE_LAST_LEVEL. This removes the assumption from the code that
levels run from 0 to KVM_PGTABLE_MAX_LEVELS - 1, which will soon no
longer be true.

No behavioral changes intended.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h  |  2 +-
 arch/arm64/include/asm/kvm_pgtable.h  | 21 +++---
 arch/arm64/include/asm/kvm_pkvm.h     |  5 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  4 +-
 arch/arm64/kvm/hyp/pgtable.c          | 94 ++++++++++++++-------------
 arch/arm64/kvm/mmu.c                  | 11 ++--
 7 files changed, 75 insertions(+), 68 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 9bdba47f7e14..270f49e7f29a 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -341,7 +341,7 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
 }
 
-static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
+static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
 {
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
 }
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index d6f4dcdd00fd..a282a3d5ddbc 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,7 +11,8 @@
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 
-#define KVM_PGTABLE_MAX_LEVELS		4U
+#define KVM_PGTABLE_FIRST_LEVEL		0
+#define KVM_PGTABLE_LAST_LEVEL		3
 
 /*
  * The largest supported block sizes for KVM (no 52-bit PA support):
@@ -20,9 +21,9 @@
  *  - 64K (level 2):	512MB
  */
 #ifdef CONFIG_ARM64_4K_PAGES
-#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1U
+#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1
 #else
-#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
+#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2
 #endif
 
 static inline bool kvm_supports_hyp_lpa2(void)
@@ -84,18 +85,18 @@ static inline bool kvm_pte_valid(kvm_pte_t pte)
 	return pte & KVM_PTE_VALID;
 }
 
-static inline u64 kvm_granule_shift(u32 level)
+static inline u64 kvm_granule_shift(s8 level)
 {
-	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
+	/* Assumes KVM_PGTABLE_LAST_LEVEL is 3 */
 	return ARM64_HW_PGTABLE_LEVEL_SHIFT(level);
 }
 
-static inline u64 kvm_granule_size(u32 level)
+static inline u64 kvm_granule_size(s8 level)
 {
 	return BIT(kvm_granule_shift(level));
 }
 
-static inline bool kvm_level_supports_block_mapping(u32 level)
+static inline bool kvm_level_supports_block_mapping(s8 level)
 {
 	return level >= KVM_PGTABLE_MIN_BLOCK_LEVEL;
 }
@@ -202,7 +203,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
  */
 struct kvm_pgtable {
 	u32					ia_bits;
-	u32					start_level;
+	s8					start_level;
 	kvm_pte_t				*pgd;
 	struct kvm_pgtable_mm_ops		*mm_ops;
 	bool					lpa2_ena;
@@ -245,7 +246,7 @@ enum kvm_pgtable_walk_flags {
 };
 
 typedef int (*kvm_pgtable_visitor_fn_t)(struct kvm_pgtable *pgt,
-					u64 addr, u64 end, u32 level,
+					u64 addr, u64 end, s8 level,
 					kvm_pte_t *ptep,
 					enum kvm_pgtable_walk_flags flag,
 					void * const arg);
@@ -581,7 +582,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
  * Return: 0 on success, negative error code on failure.
  */
 int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
-			 kvm_pte_t *ptep, u32 *level);
+			 kvm_pte_t *ptep, s8 *level);
 
 /**
  * kvm_pgtable_stage2_pte_prot() - Retrieve the protection attributes of a
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 9f4ad2a8df59..addcf63cf8d5 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -16,10 +16,11 @@ extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
-	unsigned long total = 0, i;
+	unsigned long total = 0;
+	int i;
 
 	/* Provision the worst case scenario */
-	for (i = 0; i < KVM_PGTABLE_MAX_LEVELS; i++) {
+	for (i = KVM_PGTABLE_FIRST_LEVEL; i <= KVM_PGTABLE_LAST_LEVEL; i++) {
 		nr_pages = DIV_ROUND_UP(nr_pages, PTRS_PER_PTE);
 		total += nr_pages;
 	}
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 43e729694deb..96a5567a9db3 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -281,7 +281,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 {
 	struct kvm_mem_range cur;
 	kvm_pte_t pte;
-	u32 level;
+	s8 level;
 	int ret;
 
 	hyp_assert_lock_held(&host_kvm.lock);
@@ -300,7 +300,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 		cur.start = ALIGN_DOWN(addr, granule);
 		cur.end = cur.start + granule;
 		level++;
-	} while ((level < KVM_PGTABLE_MAX_LEVELS) &&
+	} while ((level <= KVM_PGTABLE_LAST_LEVEL) &&
 			!(kvm_level_supports_block_mapping(level) &&
 			  range_included(&cur, range)));
 
@@ -416,7 +416,7 @@ struct check_walk_data {
 };
 
 static int __check_page_state_visitor(struct kvm_pgtable *pgt,
-				      u64 addr, u64 end, u32 level,
+				      u64 addr, u64 end, s8 level,
 				      kvm_pte_t *ptep,
 				      enum kvm_pgtable_walk_flags flag,
 				      void * const arg)
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index b44e87b9d168..0355c53b3530 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -187,7 +187,7 @@ static void hpool_put_page(void *addr)
 }
 
 static int finalize_host_mappings_walker(struct kvm_pgtable *pgt,
-					 u64 addr, u64 end, u32 level,
+					 u64 addr, u64 end, s8 level,
 					 kvm_pte_t *ptep,
 					 enum kvm_pgtable_walk_flags flag,
 					 void * const arg)
@@ -210,7 +210,7 @@ static int finalize_host_mappings_walker(struct kvm_pgtable *pgt,
 	if (flag != KVM_PGTABLE_WALK_LEAF)
 		return 0;
 
-	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
+	if (level != KVM_PGTABLE_LAST_LEVEL)
 		return -EINVAL;
 
 	phys = kvm_pte_to_phys(pgt, pte);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index cde852f91db8..274f839bd0d7 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -68,7 +68,7 @@ static bool kvm_phys_is_valid(struct kvm_pgtable *pgt, u64 phys)
 }
 
 static bool kvm_block_mapping_supported(struct kvm_pgtable *pgt,
-					u64 addr, u64 end, u64 phys, u32 level)
+					u64 addr, u64 end, u64 phys, s8 level)
 {
 	u64 granule = kvm_granule_size(level);
 
@@ -84,7 +84,7 @@ static bool kvm_block_mapping_supported(struct kvm_pgtable *pgt,
 	return IS_ALIGNED(addr, granule);
 }
 
-static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
+static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, s8 level)
 {
 	u64 shift = kvm_granule_shift(level);
 	u64 mask = BIT(PAGE_SHIFT - 3) - 1;
@@ -105,7 +105,7 @@ static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
 	return __kvm_pgd_page_idx(data->pgt, data->addr);
 }
 
-static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
+static u32 kvm_pgd_pages(u32 ia_bits, s8 start_level)
 {
 	struct kvm_pgtable pgt = {
 		.ia_bits	= ia_bits,
@@ -115,9 +115,9 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
 	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
 }
 
-static bool kvm_pte_table(kvm_pte_t pte, u32 level)
+static bool kvm_pte_table(kvm_pte_t pte, s8 level)
 {
-	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
+	if (level == KVM_PGTABLE_LAST_LEVEL)
 		return false;
 
 	if (!kvm_pte_valid(pte))
@@ -166,11 +166,11 @@ static void kvm_set_table_pte(struct kvm_pgtable *pgt,
 }
 
 static kvm_pte_t kvm_init_valid_leaf_pte(struct kvm_pgtable *pgt,
-					 u64 pa, kvm_pte_t attr, u32 level)
+					 u64 pa, kvm_pte_t attr, s8 level)
 {
 	kvm_pte_t pte = kvm_phys_to_pte(pgt, pa);
-	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
-							   KVM_PTE_TYPE_BLOCK;
+	u64 type = (level == KVM_PGTABLE_LAST_LEVEL) ? KVM_PTE_TYPE_PAGE :
+						       KVM_PTE_TYPE_BLOCK;
 
 	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
 	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
@@ -185,7 +185,7 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
 }
 
 static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
-				  u32 level, kvm_pte_t *ptep,
+				  s8 level, kvm_pte_t *ptep,
 				  enum kvm_pgtable_walk_flags flag)
 {
 	struct kvm_pgtable_walker *walker = data->walker;
@@ -194,10 +194,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      kvm_pte_t *pgtable, u32 level);
+			      kvm_pte_t *pgtable, s8 level);
 
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
-				      kvm_pte_t *ptep, u32 level)
+				      kvm_pte_t *ptep, s8 level)
 {
 	int ret = 0;
 	u64 addr = data->addr;
@@ -241,12 +241,12 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      kvm_pte_t *pgtable, u32 level)
+			      kvm_pte_t *pgtable, s8 level)
 {
 	u32 idx;
 	int ret = 0;
 
-	if (WARN_ON_ONCE(level >= KVM_PGTABLE_MAX_LEVELS))
+	if (WARN_ON_ONCE(level > KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
@@ -302,11 +302,11 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 
 struct leaf_walk_data {
 	kvm_pte_t	pte;
-	u32		level;
+	s8		level;
 };
 
 static int leaf_walker(struct kvm_pgtable *pgt,
-		       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+		       u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 		       enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct leaf_walk_data *data = arg;
@@ -318,7 +318,7 @@ static int leaf_walker(struct kvm_pgtable *pgt,
 }
 
 int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
-			 kvm_pte_t *ptep, u32 *level)
+			 kvm_pte_t *ptep, s8 *level)
 {
 	struct leaf_walk_data data;
 	struct kvm_pgtable_walker walker = {
@@ -399,7 +399,7 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
 }
 
 static bool hyp_map_walker_try_leaf(struct kvm_pgtable *pgt,
-				    u64 addr, u64 end, u32 level,
+				    u64 addr, u64 end, s8 level,
 				    kvm_pte_t *ptep, struct hyp_map_data *data)
 {
 	kvm_pte_t new, old = *ptep;
@@ -422,7 +422,7 @@ static bool hyp_map_walker_try_leaf(struct kvm_pgtable *pgt,
 }
 
 static int hyp_map_walker(struct kvm_pgtable *pgt,
-			  u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			  u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			  enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t *childp;
@@ -432,7 +432,7 @@ static int hyp_map_walker(struct kvm_pgtable *pgt,
 	if (hyp_map_walker_try_leaf(pgt, addr, end, level, ptep, data))
 		return 0;
 
-	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(level == KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
@@ -472,7 +472,7 @@ struct hyp_unmap_data {
 };
 
 static int hyp_unmap_walker(struct kvm_pgtable *pgt,
-			    u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			    u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			    enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t pte = *ptep, *childp = NULL;
@@ -531,14 +531,18 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 			 struct kvm_pgtable_mm_ops *mm_ops, bool lpa2_ena)
 {
-	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
+	s8 start_level = KVM_PGTABLE_LAST_LEVEL + 1 -
+			 ARM64_HW_PGTABLE_LEVELS(va_bits);
+	if (start_level < KVM_PGTABLE_FIRST_LEVEL ||
+	    start_level > KVM_PGTABLE_LAST_LEVEL)
+		return -EINVAL;
 
 	pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
 	pgt->ia_bits		= va_bits;
-	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
+	pgt->start_level	= start_level;
 	pgt->mm_ops		= mm_ops;
 	pgt->lpa2_ena		= lpa2_ena;
 	pgt->mmu		= NULL;
@@ -548,7 +552,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 }
 
 static int hyp_free_walker(struct kvm_pgtable *pgt,
-			   u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			   u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			   enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
@@ -594,7 +598,7 @@ struct stage2_map_data {
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 {
 	u64 vtcr = VTCR_EL2_FLAGS;
-	u8 lvls;
+	s8 levels;
 	u64 parange;
 	bool lpa2_ena = false;
 
@@ -618,10 +622,10 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 	 * Use a minimum 2 level page table to prevent splitting
 	 * host PMD huge pages at stage2.
 	 */
-	lvls = stage2_pgtable_levels(phys_shift);
-	if (lvls < 2)
-		lvls = 2;
-	vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
+	levels = stage2_pgtable_levels(phys_shift);
+	if (levels < 2)
+		levels = 2;
+	vtcr |= VTCR_EL2_LVLS_TO_SL0(levels);
 
 	/*
 	 * Enable the Hardware Access Flag management, unconditionally
@@ -716,7 +720,7 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 }
 
 static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
-			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
+			   s8 level, struct kvm_pgtable_mm_ops *mm_ops)
 {
 	/*
 	 * Clear the existing PTE, and perform break-before-make with
@@ -742,17 +746,17 @@ static bool stage2_pte_executable(kvm_pte_t pte)
 }
 
 static bool stage2_leaf_mapping_allowed(struct kvm_pgtable *pgt,
-					u64 addr, u64 end, u32 level,
+					u64 addr, u64 end, s8 level,
 					struct stage2_map_data *data)
 {
-	if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
+	if (data->force_pte && level < KVM_PGTABLE_LAST_LEVEL)
 		return false;
 
 	return kvm_block_mapping_supported(pgt, addr, end, data->phys, level);
 }
 
 static int stage2_map_walker_try_leaf(struct kvm_pgtable *pgt,
-				      u64 addr, u64 end, u32 level,
+				      u64 addr, u64 end, s8 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
@@ -798,7 +802,7 @@ static int stage2_map_walker_try_leaf(struct kvm_pgtable *pgt,
 }
 
 static int stage2_map_walk_table_pre(struct kvm_pgtable *pgt,
-				     u64 addr, u64 end, u32 level,
+				     u64 addr, u64 end, s8 level,
 				     kvm_pte_t *ptep,
 				     struct stage2_map_data *data)
 {
@@ -822,7 +826,7 @@ static int stage2_map_walk_table_pre(struct kvm_pgtable *pgt,
 }
 
 static int stage2_map_walk_leaf(struct kvm_pgtable *pgt,
-				u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+				u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 				struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
@@ -840,7 +844,7 @@ static int stage2_map_walk_leaf(struct kvm_pgtable *pgt,
 	if (ret != -E2BIG)
 		return ret;
 
-	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(level == KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	if (!data->memcache)
@@ -865,7 +869,7 @@ static int stage2_map_walk_leaf(struct kvm_pgtable *pgt,
 }
 
 static int stage2_map_walk_table_post(struct kvm_pgtable *pgt,
-				      u64 addr, u64 end, u32 level,
+				      u64 addr, u64 end, s8 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
 {
@@ -911,7 +915,7 @@ static int stage2_map_walk_table_post(struct kvm_pgtable *pgt,
  * pointer and clearing the anchor to NULL.
  */
 static int stage2_map_walker(struct kvm_pgtable *pgt,
-			     u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			     u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			     enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct stage2_map_data *data = arg;
@@ -984,7 +988,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 }
 
 static int stage2_unmap_walker(struct kvm_pgtable *pgt,
-			       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			       u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
@@ -1041,11 +1045,11 @@ struct stage2_attr_data {
 	kvm_pte_t			attr_set;
 	kvm_pte_t			attr_clr;
 	kvm_pte_t			pte;
-	u32				level;
+	s8				level;
 };
 
 static int stage2_attr_walker(struct kvm_pgtable *pgt,
-			      u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			      u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
@@ -1084,7 +1088,7 @@ static int stage2_attr_walker(struct kvm_pgtable *pgt,
 static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 				    u64 size, kvm_pte_t attr_set,
 				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
-				    u32 *level)
+				    s8 *level)
 {
 	int ret;
 	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
@@ -1151,7 +1155,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 				   enum kvm_pgtable_prot prot)
 {
 	int ret;
-	u32 level;
+	s8 level;
 	kvm_pte_t set = 0, clr = 0;
 
 	if (prot & KVM_PTE_LEAF_ATTR_HI_SW)
@@ -1173,7 +1177,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 }
 
 static int stage2_flush_walker(struct kvm_pgtable *pgt,
-			       u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			       u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
@@ -1212,7 +1216,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	u64 vtcr = mmu->arch->vtcr;
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
-	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 	bool lpa2_ena = (vtcr & VTCR_EL2_DS) != 0;
 
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
@@ -1234,7 +1238,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 }
 
 static int stage2_free_walker(struct kvm_pgtable *pgt,
-			      u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			      u64 addr, u64 end, s8 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 13e48539f022..4ce46be3f0a0 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -642,18 +642,19 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 	struct kvm_pgtable pgt = {
 		.pgd		= (kvm_pte_t *)kvm->mm->pgd,
 		.ia_bits	= vabits_actual,
-		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
-				   CONFIG_PGTABLE_LEVELS),
+		.start_level	= (KVM_PGTABLE_LAST_LEVEL -
+				   CONFIG_PGTABLE_LEVELS + 1),
 		.mm_ops		= &kvm_user_mm_ops,
 		.lpa2_ena	= lpa2_is_enabled(),
 	};
 	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
-	u32 level = ~0;
+	s8 level = ~0;
 	int ret;
 
 	ret = kvm_pgtable_get_leaf(&pgt, addr, &pte, &level);
 	VM_BUG_ON(ret);
-	VM_BUG_ON(level >= KVM_PGTABLE_MAX_LEVELS);
+	VM_BUG_ON(level > KVM_PGTABLE_LAST_LEVEL);
+	VM_BUG_ON(level < KVM_PGTABLE_FIRST_LEVEL);
 	VM_BUG_ON(!(pte & PTE_VALID));
 
 	return BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(level));
@@ -1138,7 +1139,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
 	bool use_read_lock = false;
-	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
+	s8 fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	unsigned long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
  2022-12-06 13:59 ` Ryan Roberts
  (?)
@ 2022-12-06 13:59   ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: kvmarm, kvmarm, linux-arm-kernel

In order to support 5 level translation, FEAT_LPA2 introduces the 1-bit
SL2 field within VTCR_EL2 to extend the existing 2-bit SL0 field. The
SL2[0]:SL0[1:0] encodings have no simple algorithmic relationship to the
start levels they represent (that I can find, at least), so replace the
existing macros with functions that do lookups to encode and decode the
values. These new functions no longer make hardcoded assumptions about
the maximum level and instead rely on KVM_PGTABLE_FIRST_LEVEL and
KVM_PGTABLE_LAST_LEVEL.

This is preparatory work for enabling 52-bit IPA for 4KB and 16KB pages
with FEAT_LPA2.

No functional change intended.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h        | 75 ++++++++++++++-----------
 arch/arm64/include/asm/kvm_pgtable.h    | 33 +++++++++++
 arch/arm64/include/asm/stage2_pgtable.h | 13 ++++-
 arch/arm64/kvm/hyp/pgtable.c            | 67 +++++++++++++++++++++-
 4 files changed, 150 insertions(+), 38 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index f9619a10d5d9..94bbb05e348f 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -150,58 +150,65 @@
 				 VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1)
 
 /*
- * VTCR_EL2:SL0 indicates the entry level for Stage2 translation.
- * Interestingly, it depends on the page size.
- * See D.10.2.121, VTCR_EL2, in ARM DDI 0487C.a
+ * VTCR_EL2.{SL0, SL2} indicates the entry level for Stage2 translation.
+ * Interestingly, it depends on the page size. See D17.2.157, VTCR_EL2, in ARM
+ * DDI 0487I.a
  *
- *	-----------------------------------------
- *	| Entry level		|  4K  | 16K/64K |
- *	------------------------------------------
- *	| Level: 0		|  2   |   -     |
- *	------------------------------------------
- *	| Level: 1		|  1   |   2     |
- *	------------------------------------------
- *	| Level: 2		|  0   |   1     |
- *	------------------------------------------
- *	| Level: 3		|  -   |   0     |
- *	------------------------------------------
+ *      ----------------------------------------------------------
+ *      | Entry level           |    4K    |    16K   |    64K   |
+ *      |                       |  SL2:SL0 |  SL2:SL0 |  SL2:SL0 |
+ *      ----------------------------------------------------------
+ *      | Level: -1             |  0b100   |     -    |     -    |
+ *      ----------------------------------------------------------
+ *      | Level: 0              |  0b010   |  0b011   |     -    |
+ *      ----------------------------------------------------------
+ *      | Level: 1              |  0b001   |  0b010   |  0b010   |
+ *      ----------------------------------------------------------
+ *      | Level: 2              |  0b000   |  0b001   |  0b001   |
+ *      ----------------------------------------------------------
+ *      | Level: 3              |  0b011   |  0b000   |  0b000   |
+ *      ----------------------------------------------------------
  *
- * The table roughly translates to :
- *
- *	SL0(PAGE_SIZE, Entry_level) = TGRAN_SL0_BASE - Entry_Level
- *
- * Where TGRAN_SL0_BASE is a magic number depending on the page size:
- * 	TGRAN_SL0_BASE(4K) = 2
- *	TGRAN_SL0_BASE(16K) = 3
- *	TGRAN_SL0_BASE(64K) = 3
- * provided we take care of ruling out the unsupported cases and
- * Entry_Level = 4 - Number_of_levels.
+ * There is no concise algorithm to convert between the SLx encodings and the
+ * level numbers, so we implement 2 helpers kvm_vtcr_el2_sl_encode()
+ * kvm_vtcr_el2_sl_decode() which can convert between the representations. These
+ * helpers use a concatenated form of SLx: SL2[0]:SL0[1:0] as the 3 LSBs in u8.
+ * If an invalid input value is provided, VTCR_EL2_SLx_ENC_INVAL is returned. We
+ * declare the appropriate encoded values here for the compiled in page size.
  *
+ * See kvm_pgtable.h for documentation on the helpers.
  */
+#define VTCR_EL2_SLx_ENC_INVAL		255
+
 #ifdef CONFIG_ARM64_64K_PAGES
 
 #define VTCR_EL2_TGRAN			VTCR_EL2_TG0_64K
-#define VTCR_EL2_TGRAN_SL0_BASE		3UL
+#define VTCR_EL2_SLx_ENC_Lm1		VTCR_EL2_SLx_ENC_INVAL
+#define VTCR_EL2_SLx_ENC_L0		VTCR_EL2_SLx_ENC_INVAL
+#define VTCR_EL2_SLx_ENC_Lp1		2
+#define VTCR_EL2_SLx_ENC_Lp2		1
+#define VTCR_EL2_SLx_ENC_Lp3		0
 
 #elif defined(CONFIG_ARM64_16K_PAGES)
 
 #define VTCR_EL2_TGRAN			VTCR_EL2_TG0_16K
-#define VTCR_EL2_TGRAN_SL0_BASE		3UL
+#define VTCR_EL2_SLx_ENC_Lm1		VTCR_EL2_SLx_ENC_INVAL
+#define VTCR_EL2_SLx_ENC_L0		3
+#define VTCR_EL2_SLx_ENC_Lp1		2
+#define VTCR_EL2_SLx_ENC_Lp2		1
+#define VTCR_EL2_SLx_ENC_Lp3		0
 
 #else	/* 4K */
 
 #define VTCR_EL2_TGRAN			VTCR_EL2_TG0_4K
-#define VTCR_EL2_TGRAN_SL0_BASE		2UL
+#define VTCR_EL2_SLx_ENC_Lm1		4
+#define VTCR_EL2_SLx_ENC_L0		2
+#define VTCR_EL2_SLx_ENC_Lp1		1
+#define VTCR_EL2_SLx_ENC_Lp2		0
+#define VTCR_EL2_SLx_ENC_Lp3		3
 
 #endif
 
-#define VTCR_EL2_LVLS_TO_SL0(levels)	\
-	((VTCR_EL2_TGRAN_SL0_BASE - (4 - (levels))) << VTCR_EL2_SL0_SHIFT)
-#define VTCR_EL2_SL0_TO_LVLS(sl0)	\
-	((sl0) + 4 - VTCR_EL2_TGRAN_SL0_BASE)
-#define VTCR_EL2_LVLS(vtcr)		\
-	VTCR_EL2_SL0_TO_LVLS(((vtcr) & VTCR_EL2_SL0_MASK) >> VTCR_EL2_SL0_SHIFT)
-
 #define VTCR_EL2_FLAGS			(VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN)
 #define VTCR_EL2_IPA(vtcr)		(64 - ((vtcr) & VTCR_EL2_T0SZ_MASK))
 
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index a282a3d5ddbc..3e0b64052c51 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -328,6 +328,39 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
  */
 u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
 
+/**
+ * kvm_vtcr_el2_sl_encode() - Helper to encode start level for vtcr_el2.
+ * @sl_dec:     Start level to be encoded.
+ *
+ * Takes an unencoded translation start level value and returns it encoded for
+ * use in vtcr_el2 register. The returned value has SL0 (a 2 bit field) in bits
+ * [1:0] and SL2 (a 1 bit field) in bit [2]. The user is responsible for
+ * extracting and packing in the correct locations of vctr_el2.
+ *
+ * Do not call this function with a value that is out of range for the page size
+ * in operation. A warning will be output if this is detected and the function
+ * returns VTCR_EL2_SLx_ENC_INVAL. See comment in kvm_arm.h for more info.
+ *
+ * Return: 3 bit value containing SL2[0]:SL0[1:0], or VTCR_EL2_SLx_ENC_INVAL.
+ */
+u8 kvm_vtcr_el2_sl_encode(s8 sl_dec);
+
+/**
+ * kvm_vtcr_el2_sl_decode() - Helper to decode start level for vtcr_el2.
+ * @sl_enc:     Start level encoded as SL2[0]:SL0[1:0].
+ *
+ * Takes an encoded translation start level value, as used in the vtcr_el2
+ * register and returns it decoded. See kvm_vtcr_el2_sl_encode() for description
+ * of input encoding.
+ *
+ * Do not call this function with a value that is invalid for the page size in
+ * operation. A warning will be output if this is detected and the function
+ * returns VTCR_EL2_SLx_ENC_INVAL. See comment in kvm_arm.h for more info.
+ *
+ * Return: Decoded start level, or VTCR_EL2_SLx_ENC_INVAL.
+ */
+s8 kvm_vtcr_el2_sl_decode(u8 sl_enc);
+
 /**
  * kvm_get_vtcr() - Helper to construct VTCR_EL2
  * @mmfr0:	Sanitized value of SYS_ID_AA64MMFR0_EL1 register.
diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
index c8dca8ae359c..02c5e04d4958 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -21,7 +21,18 @@
  * (IPA_SHIFT - 4).
  */
 #define stage2_pgtable_levels(ipa)	ARM64_HW_PGTABLE_LEVELS((ipa) - 4)
-#define kvm_stage2_levels(kvm)		VTCR_EL2_LVLS(kvm->arch.vtcr)
+static inline s8 kvm_stage2_levels(struct kvm *kvm)
+{
+	u64 vtcr = kvm->arch.vtcr;
+	u8 slx;
+	s8 start_level;
+
+	slx = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
+	slx |= FIELD_GET(VTCR_EL2_SL2_MASK, vtcr) << 2;
+	start_level = kvm_vtcr_el2_sl_decode(slx);
+
+	return KVM_PGTABLE_LAST_LEVEL + 1 - start_level;
+}
 
 /*
  * kvm_mmmu_cache_min_pages() is the number of pages required to install
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 274f839bd0d7..8ebd9aaed2c4 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -595,12 +595,67 @@ struct stage2_map_data {
 	bool				force_pte;
 };
 
+u8 kvm_vtcr_el2_sl_encode(s8 sl_dec)
+{
+	u8 sl_enc = VTCR_EL2_SLx_ENC_INVAL;
+
+	BUILD_BUG_ON(KVM_PGTABLE_FIRST_LEVEL < -1);
+	BUILD_BUG_ON(KVM_PGTABLE_LAST_LEVEL > 3);
+
+	switch (sl_dec) {
+	case -1:
+		sl_enc = VTCR_EL2_SLx_ENC_Lm1;
+		break;
+	case 0:
+		sl_enc = VTCR_EL2_SLx_ENC_L0;
+		break;
+	case 1:
+		sl_enc = VTCR_EL2_SLx_ENC_Lp1;
+		break;
+	case 2:
+		sl_enc = VTCR_EL2_SLx_ENC_Lp2;
+		break;
+	case 3:
+		sl_enc = VTCR_EL2_SLx_ENC_Lp3;
+		break;
+	}
+
+	WARN_ON_ONCE(sl_enc == VTCR_EL2_SLx_ENC_INVAL);
+	return sl_enc;
+}
+
+s8 kvm_vtcr_el2_sl_decode(u8 sl_enc)
+{
+	s8 sl_dec = VTCR_EL2_SLx_ENC_INVAL;
+
+	BUILD_BUG_ON(KVM_PGTABLE_FIRST_LEVEL < -1);
+	BUILD_BUG_ON(KVM_PGTABLE_LAST_LEVEL > 3);
+
+	if (sl_enc == VTCR_EL2_SLx_ENC_Lm1)
+		sl_dec = -1;
+	else if (sl_enc == VTCR_EL2_SLx_ENC_L0)
+		sl_dec = 0;
+	else if (sl_enc == VTCR_EL2_SLx_ENC_Lp1)
+		sl_dec = 1;
+	else if (sl_enc == VTCR_EL2_SLx_ENC_Lp2)
+		sl_dec = 2;
+	else if (sl_enc == VTCR_EL2_SLx_ENC_Lp3)
+		sl_dec = 3;
+
+	if (WARN_ON_ONCE(sl_dec == VTCR_EL2_SLx_ENC_INVAL ||
+			 sl_enc == VTCR_EL2_SLx_ENC_INVAL))
+		sl_dec = VTCR_EL2_SLx_ENC_INVAL;
+
+	return sl_dec;
+}
+
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 {
 	u64 vtcr = VTCR_EL2_FLAGS;
 	s8 levels;
 	u64 parange;
 	bool lpa2_ena = false;
+	u8 slx;
 
 	/*
 	 * If stage 2 reports that it supports FEAT_LPA2 for our page size, then
@@ -625,7 +680,9 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 	levels = stage2_pgtable_levels(phys_shift);
 	if (levels < 2)
 		levels = 2;
-	vtcr |= VTCR_EL2_LVLS_TO_SL0(levels);
+	slx = kvm_vtcr_el2_sl_encode(KVM_PGTABLE_LAST_LEVEL + 1 - levels);
+	vtcr |= FIELD_PREP(VTCR_EL2_SL0_MASK, slx);
+	vtcr |= FIELD_PREP(VTCR_EL2_SL2_MASK, slx >> 2);
 
 	/*
 	 * Enable the Hardware Access Flag management, unconditionally
@@ -1215,10 +1272,14 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	size_t pgd_sz;
 	u64 vtcr = mmu->arch->vtcr;
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
-	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
-	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+	u8 slx;
+	s8 start_level;
 	bool lpa2_ena = (vtcr & VTCR_EL2_DS) != 0;
 
+	slx = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
+	slx |= FIELD_GET(VTCR_EL2_SL2_MASK, vtcr) << 2;
+	start_level = kvm_vtcr_el2_sl_decode(slx);
+
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
 	pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
 	if (!pgt->pgd)
-- 
2.25.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

In order to support 5 level translation, FEAT_LPA2 introduces the 1-bit
SL2 field within VTCR_EL2 to extend the existing 2-bit SL0 field. The
SL2[0]:SL0[1:0] encodings have no simple algorithmic relationship to the
start levels they represent (that I can find, at least), so replace the
existing macros with functions that do lookups to encode and decode the
values. These new functions no longer make hardcoded assumptions about
the maximum level and instead rely on KVM_PGTABLE_FIRST_LEVEL and
KVM_PGTABLE_LAST_LEVEL.

This is preparatory work for enabling 52-bit IPA for 4KB and 16KB pages
with FEAT_LPA2.

No functional change intended.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h        | 75 ++++++++++++++-----------
 arch/arm64/include/asm/kvm_pgtable.h    | 33 +++++++++++
 arch/arm64/include/asm/stage2_pgtable.h | 13 ++++-
 arch/arm64/kvm/hyp/pgtable.c            | 67 +++++++++++++++++++++-
 4 files changed, 150 insertions(+), 38 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index f9619a10d5d9..94bbb05e348f 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -150,58 +150,65 @@
 				 VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1)
 
 /*
- * VTCR_EL2:SL0 indicates the entry level for Stage2 translation.
- * Interestingly, it depends on the page size.
- * See D.10.2.121, VTCR_EL2, in ARM DDI 0487C.a
+ * VTCR_EL2.{SL0, SL2} indicates the entry level for Stage2 translation.
+ * Interestingly, it depends on the page size. See D17.2.157, VTCR_EL2, in ARM
+ * DDI 0487I.a
  *
- *	-----------------------------------------
- *	| Entry level		|  4K  | 16K/64K |
- *	------------------------------------------
- *	| Level: 0		|  2   |   -     |
- *	------------------------------------------
- *	| Level: 1		|  1   |   2     |
- *	------------------------------------------
- *	| Level: 2		|  0   |   1     |
- *	------------------------------------------
- *	| Level: 3		|  -   |   0     |
- *	------------------------------------------
+ *      ----------------------------------------------------------
+ *      | Entry level           |    4K    |    16K   |    64K   |
+ *      |                       |  SL2:SL0 |  SL2:SL0 |  SL2:SL0 |
+ *      ----------------------------------------------------------
+ *      | Level: -1             |  0b100   |     -    |     -    |
+ *      ----------------------------------------------------------
+ *      | Level: 0              |  0b010   |  0b011   |     -    |
+ *      ----------------------------------------------------------
+ *      | Level: 1              |  0b001   |  0b010   |  0b010   |
+ *      ----------------------------------------------------------
+ *      | Level: 2              |  0b000   |  0b001   |  0b001   |
+ *      ----------------------------------------------------------
+ *      | Level: 3              |  0b011   |  0b000   |  0b000   |
+ *      ----------------------------------------------------------
  *
- * The table roughly translates to :
- *
- *	SL0(PAGE_SIZE, Entry_level) = TGRAN_SL0_BASE - Entry_Level
- *
- * Where TGRAN_SL0_BASE is a magic number depending on the page size:
- * 	TGRAN_SL0_BASE(4K) = 2
- *	TGRAN_SL0_BASE(16K) = 3
- *	TGRAN_SL0_BASE(64K) = 3
- * provided we take care of ruling out the unsupported cases and
- * Entry_Level = 4 - Number_of_levels.
+ * There is no concise algorithm to convert between the SLx encodings and the
+ * level numbers, so we implement 2 helpers kvm_vtcr_el2_sl_encode()
+ * kvm_vtcr_el2_sl_decode() which can convert between the representations. These
+ * helpers use a concatenated form of SLx: SL2[0]:SL0[1:0] as the 3 LSBs in u8.
+ * If an invalid input value is provided, VTCR_EL2_SLx_ENC_INVAL is returned. We
+ * declare the appropriate encoded values here for the compiled in page size.
  *
+ * See kvm_pgtable.h for documentation on the helpers.
  */
+#define VTCR_EL2_SLx_ENC_INVAL		255
+
 #ifdef CONFIG_ARM64_64K_PAGES
 
 #define VTCR_EL2_TGRAN			VTCR_EL2_TG0_64K
-#define VTCR_EL2_TGRAN_SL0_BASE		3UL
+#define VTCR_EL2_SLx_ENC_Lm1		VTCR_EL2_SLx_ENC_INVAL
+#define VTCR_EL2_SLx_ENC_L0		VTCR_EL2_SLx_ENC_INVAL
+#define VTCR_EL2_SLx_ENC_Lp1		2
+#define VTCR_EL2_SLx_ENC_Lp2		1
+#define VTCR_EL2_SLx_ENC_Lp3		0
 
 #elif defined(CONFIG_ARM64_16K_PAGES)
 
 #define VTCR_EL2_TGRAN			VTCR_EL2_TG0_16K
-#define VTCR_EL2_TGRAN_SL0_BASE		3UL
+#define VTCR_EL2_SLx_ENC_Lm1		VTCR_EL2_SLx_ENC_INVAL
+#define VTCR_EL2_SLx_ENC_L0		3
+#define VTCR_EL2_SLx_ENC_Lp1		2
+#define VTCR_EL2_SLx_ENC_Lp2		1
+#define VTCR_EL2_SLx_ENC_Lp3		0
 
 #else	/* 4K */
 
 #define VTCR_EL2_TGRAN			VTCR_EL2_TG0_4K
-#define VTCR_EL2_TGRAN_SL0_BASE		2UL
+#define VTCR_EL2_SLx_ENC_Lm1		4
+#define VTCR_EL2_SLx_ENC_L0		2
+#define VTCR_EL2_SLx_ENC_Lp1		1
+#define VTCR_EL2_SLx_ENC_Lp2		0
+#define VTCR_EL2_SLx_ENC_Lp3		3
 
 #endif
 
-#define VTCR_EL2_LVLS_TO_SL0(levels)	\
-	((VTCR_EL2_TGRAN_SL0_BASE - (4 - (levels))) << VTCR_EL2_SL0_SHIFT)
-#define VTCR_EL2_SL0_TO_LVLS(sl0)	\
-	((sl0) + 4 - VTCR_EL2_TGRAN_SL0_BASE)
-#define VTCR_EL2_LVLS(vtcr)		\
-	VTCR_EL2_SL0_TO_LVLS(((vtcr) & VTCR_EL2_SL0_MASK) >> VTCR_EL2_SL0_SHIFT)
-
 #define VTCR_EL2_FLAGS			(VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN)
 #define VTCR_EL2_IPA(vtcr)		(64 - ((vtcr) & VTCR_EL2_T0SZ_MASK))
 
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index a282a3d5ddbc..3e0b64052c51 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -328,6 +328,39 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
  */
 u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
 
+/**
+ * kvm_vtcr_el2_sl_encode() - Helper to encode start level for vtcr_el2.
+ * @sl_dec:     Start level to be encoded.
+ *
+ * Takes an unencoded translation start level value and returns it encoded for
+ * use in vtcr_el2 register. The returned value has SL0 (a 2 bit field) in bits
+ * [1:0] and SL2 (a 1 bit field) in bit [2]. The user is responsible for
+ * extracting and packing in the correct locations of vctr_el2.
+ *
+ * Do not call this function with a value that is out of range for the page size
+ * in operation. A warning will be output if this is detected and the function
+ * returns VTCR_EL2_SLx_ENC_INVAL. See comment in kvm_arm.h for more info.
+ *
+ * Return: 3 bit value containing SL2[0]:SL0[1:0], or VTCR_EL2_SLx_ENC_INVAL.
+ */
+u8 kvm_vtcr_el2_sl_encode(s8 sl_dec);
+
+/**
+ * kvm_vtcr_el2_sl_decode() - Helper to decode start level for vtcr_el2.
+ * @sl_enc:     Start level encoded as SL2[0]:SL0[1:0].
+ *
+ * Takes an encoded translation start level value, as used in the vtcr_el2
+ * register and returns it decoded. See kvm_vtcr_el2_sl_encode() for description
+ * of input encoding.
+ *
+ * Do not call this function with a value that is invalid for the page size in
+ * operation. A warning will be output if this is detected and the function
+ * returns VTCR_EL2_SLx_ENC_INVAL. See comment in kvm_arm.h for more info.
+ *
+ * Return: Decoded start level, or VTCR_EL2_SLx_ENC_INVAL.
+ */
+s8 kvm_vtcr_el2_sl_decode(u8 sl_enc);
+
 /**
  * kvm_get_vtcr() - Helper to construct VTCR_EL2
  * @mmfr0:	Sanitized value of SYS_ID_AA64MMFR0_EL1 register.
diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
index c8dca8ae359c..02c5e04d4958 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -21,7 +21,18 @@
  * (IPA_SHIFT - 4).
  */
 #define stage2_pgtable_levels(ipa)	ARM64_HW_PGTABLE_LEVELS((ipa) - 4)
-#define kvm_stage2_levels(kvm)		VTCR_EL2_LVLS(kvm->arch.vtcr)
+static inline s8 kvm_stage2_levels(struct kvm *kvm)
+{
+	u64 vtcr = kvm->arch.vtcr;
+	u8 slx;
+	s8 start_level;
+
+	slx = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
+	slx |= FIELD_GET(VTCR_EL2_SL2_MASK, vtcr) << 2;
+	start_level = kvm_vtcr_el2_sl_decode(slx);
+
+	return KVM_PGTABLE_LAST_LEVEL + 1 - start_level;
+}
 
 /*
  * kvm_mmmu_cache_min_pages() is the number of pages required to install
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 274f839bd0d7..8ebd9aaed2c4 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -595,12 +595,67 @@ struct stage2_map_data {
 	bool				force_pte;
 };
 
+u8 kvm_vtcr_el2_sl_encode(s8 sl_dec)
+{
+	u8 sl_enc = VTCR_EL2_SLx_ENC_INVAL;
+
+	BUILD_BUG_ON(KVM_PGTABLE_FIRST_LEVEL < -1);
+	BUILD_BUG_ON(KVM_PGTABLE_LAST_LEVEL > 3);
+
+	switch (sl_dec) {
+	case -1:
+		sl_enc = VTCR_EL2_SLx_ENC_Lm1;
+		break;
+	case 0:
+		sl_enc = VTCR_EL2_SLx_ENC_L0;
+		break;
+	case 1:
+		sl_enc = VTCR_EL2_SLx_ENC_Lp1;
+		break;
+	case 2:
+		sl_enc = VTCR_EL2_SLx_ENC_Lp2;
+		break;
+	case 3:
+		sl_enc = VTCR_EL2_SLx_ENC_Lp3;
+		break;
+	}
+
+	WARN_ON_ONCE(sl_enc == VTCR_EL2_SLx_ENC_INVAL);
+	return sl_enc;
+}
+
+s8 kvm_vtcr_el2_sl_decode(u8 sl_enc)
+{
+	s8 sl_dec = VTCR_EL2_SLx_ENC_INVAL;
+
+	BUILD_BUG_ON(KVM_PGTABLE_FIRST_LEVEL < -1);
+	BUILD_BUG_ON(KVM_PGTABLE_LAST_LEVEL > 3);
+
+	if (sl_enc == VTCR_EL2_SLx_ENC_Lm1)
+		sl_dec = -1;
+	else if (sl_enc == VTCR_EL2_SLx_ENC_L0)
+		sl_dec = 0;
+	else if (sl_enc == VTCR_EL2_SLx_ENC_Lp1)
+		sl_dec = 1;
+	else if (sl_enc == VTCR_EL2_SLx_ENC_Lp2)
+		sl_dec = 2;
+	else if (sl_enc == VTCR_EL2_SLx_ENC_Lp3)
+		sl_dec = 3;
+
+	if (WARN_ON_ONCE(sl_dec == VTCR_EL2_SLx_ENC_INVAL ||
+			 sl_enc == VTCR_EL2_SLx_ENC_INVAL))
+		sl_dec = VTCR_EL2_SLx_ENC_INVAL;
+
+	return sl_dec;
+}
+
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 {
 	u64 vtcr = VTCR_EL2_FLAGS;
 	s8 levels;
 	u64 parange;
 	bool lpa2_ena = false;
+	u8 slx;
 
 	/*
 	 * If stage 2 reports that it supports FEAT_LPA2 for our page size, then
@@ -625,7 +680,9 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 	levels = stage2_pgtable_levels(phys_shift);
 	if (levels < 2)
 		levels = 2;
-	vtcr |= VTCR_EL2_LVLS_TO_SL0(levels);
+	slx = kvm_vtcr_el2_sl_encode(KVM_PGTABLE_LAST_LEVEL + 1 - levels);
+	vtcr |= FIELD_PREP(VTCR_EL2_SL0_MASK, slx);
+	vtcr |= FIELD_PREP(VTCR_EL2_SL2_MASK, slx >> 2);
 
 	/*
 	 * Enable the Hardware Access Flag management, unconditionally
@@ -1215,10 +1272,14 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	size_t pgd_sz;
 	u64 vtcr = mmu->arch->vtcr;
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
-	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
-	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+	u8 slx;
+	s8 start_level;
 	bool lpa2_ena = (vtcr & VTCR_EL2_DS) != 0;
 
+	slx = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
+	slx |= FIELD_GET(VTCR_EL2_SL2_MASK, vtcr) << 2;
+	start_level = kvm_vtcr_el2_sl_decode(slx);
+
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
 	pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
 	if (!pgt->pgd)
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

In order to support 5 level translation, FEAT_LPA2 introduces the 1-bit
SL2 field within VTCR_EL2 to extend the existing 2-bit SL0 field. The
SL2[0]:SL0[1:0] encodings have no simple algorithmic relationship to the
start levels they represent (that I can find, at least), so replace the
existing macros with functions that do lookups to encode and decode the
values. These new functions no longer make hardcoded assumptions about
the maximum level and instead rely on KVM_PGTABLE_FIRST_LEVEL and
KVM_PGTABLE_LAST_LEVEL.

This is preparatory work for enabling 52-bit IPA for 4KB and 16KB pages
with FEAT_LPA2.

No functional change intended.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h        | 75 ++++++++++++++-----------
 arch/arm64/include/asm/kvm_pgtable.h    | 33 +++++++++++
 arch/arm64/include/asm/stage2_pgtable.h | 13 ++++-
 arch/arm64/kvm/hyp/pgtable.c            | 67 +++++++++++++++++++++-
 4 files changed, 150 insertions(+), 38 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index f9619a10d5d9..94bbb05e348f 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -150,58 +150,65 @@
 				 VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1)
 
 /*
- * VTCR_EL2:SL0 indicates the entry level for Stage2 translation.
- * Interestingly, it depends on the page size.
- * See D.10.2.121, VTCR_EL2, in ARM DDI 0487C.a
+ * VTCR_EL2.{SL0, SL2} indicates the entry level for Stage2 translation.
+ * Interestingly, it depends on the page size. See D17.2.157, VTCR_EL2, in ARM
+ * DDI 0487I.a
  *
- *	-----------------------------------------
- *	| Entry level		|  4K  | 16K/64K |
- *	------------------------------------------
- *	| Level: 0		|  2   |   -     |
- *	------------------------------------------
- *	| Level: 1		|  1   |   2     |
- *	------------------------------------------
- *	| Level: 2		|  0   |   1     |
- *	------------------------------------------
- *	| Level: 3		|  -   |   0     |
- *	------------------------------------------
+ *      ----------------------------------------------------------
+ *      | Entry level           |    4K    |    16K   |    64K   |
+ *      |                       |  SL2:SL0 |  SL2:SL0 |  SL2:SL0 |
+ *      ----------------------------------------------------------
+ *      | Level: -1             |  0b100   |     -    |     -    |
+ *      ----------------------------------------------------------
+ *      | Level: 0              |  0b010   |  0b011   |     -    |
+ *      ----------------------------------------------------------
+ *      | Level: 1              |  0b001   |  0b010   |  0b010   |
+ *      ----------------------------------------------------------
+ *      | Level: 2              |  0b000   |  0b001   |  0b001   |
+ *      ----------------------------------------------------------
+ *      | Level: 3              |  0b011   |  0b000   |  0b000   |
+ *      ----------------------------------------------------------
  *
- * The table roughly translates to :
- *
- *	SL0(PAGE_SIZE, Entry_level) = TGRAN_SL0_BASE - Entry_Level
- *
- * Where TGRAN_SL0_BASE is a magic number depending on the page size:
- * 	TGRAN_SL0_BASE(4K) = 2
- *	TGRAN_SL0_BASE(16K) = 3
- *	TGRAN_SL0_BASE(64K) = 3
- * provided we take care of ruling out the unsupported cases and
- * Entry_Level = 4 - Number_of_levels.
+ * There is no concise algorithm to convert between the SLx encodings and the
+ * level numbers, so we implement 2 helpers kvm_vtcr_el2_sl_encode()
+ * kvm_vtcr_el2_sl_decode() which can convert between the representations. These
+ * helpers use a concatenated form of SLx: SL2[0]:SL0[1:0] as the 3 LSBs in u8.
+ * If an invalid input value is provided, VTCR_EL2_SLx_ENC_INVAL is returned. We
+ * declare the appropriate encoded values here for the compiled in page size.
  *
+ * See kvm_pgtable.h for documentation on the helpers.
  */
+#define VTCR_EL2_SLx_ENC_INVAL		255
+
 #ifdef CONFIG_ARM64_64K_PAGES
 
 #define VTCR_EL2_TGRAN			VTCR_EL2_TG0_64K
-#define VTCR_EL2_TGRAN_SL0_BASE		3UL
+#define VTCR_EL2_SLx_ENC_Lm1		VTCR_EL2_SLx_ENC_INVAL
+#define VTCR_EL2_SLx_ENC_L0		VTCR_EL2_SLx_ENC_INVAL
+#define VTCR_EL2_SLx_ENC_Lp1		2
+#define VTCR_EL2_SLx_ENC_Lp2		1
+#define VTCR_EL2_SLx_ENC_Lp3		0
 
 #elif defined(CONFIG_ARM64_16K_PAGES)
 
 #define VTCR_EL2_TGRAN			VTCR_EL2_TG0_16K
-#define VTCR_EL2_TGRAN_SL0_BASE		3UL
+#define VTCR_EL2_SLx_ENC_Lm1		VTCR_EL2_SLx_ENC_INVAL
+#define VTCR_EL2_SLx_ENC_L0		3
+#define VTCR_EL2_SLx_ENC_Lp1		2
+#define VTCR_EL2_SLx_ENC_Lp2		1
+#define VTCR_EL2_SLx_ENC_Lp3		0
 
 #else	/* 4K */
 
 #define VTCR_EL2_TGRAN			VTCR_EL2_TG0_4K
-#define VTCR_EL2_TGRAN_SL0_BASE		2UL
+#define VTCR_EL2_SLx_ENC_Lm1		4
+#define VTCR_EL2_SLx_ENC_L0		2
+#define VTCR_EL2_SLx_ENC_Lp1		1
+#define VTCR_EL2_SLx_ENC_Lp2		0
+#define VTCR_EL2_SLx_ENC_Lp3		3
 
 #endif
 
-#define VTCR_EL2_LVLS_TO_SL0(levels)	\
-	((VTCR_EL2_TGRAN_SL0_BASE - (4 - (levels))) << VTCR_EL2_SL0_SHIFT)
-#define VTCR_EL2_SL0_TO_LVLS(sl0)	\
-	((sl0) + 4 - VTCR_EL2_TGRAN_SL0_BASE)
-#define VTCR_EL2_LVLS(vtcr)		\
-	VTCR_EL2_SL0_TO_LVLS(((vtcr) & VTCR_EL2_SL0_MASK) >> VTCR_EL2_SL0_SHIFT)
-
 #define VTCR_EL2_FLAGS			(VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN)
 #define VTCR_EL2_IPA(vtcr)		(64 - ((vtcr) & VTCR_EL2_T0SZ_MASK))
 
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index a282a3d5ddbc..3e0b64052c51 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -328,6 +328,39 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
  */
 u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
 
+/**
+ * kvm_vtcr_el2_sl_encode() - Helper to encode start level for vtcr_el2.
+ * @sl_dec:     Start level to be encoded.
+ *
+ * Takes an unencoded translation start level value and returns it encoded for
+ * use in vtcr_el2 register. The returned value has SL0 (a 2 bit field) in bits
+ * [1:0] and SL2 (a 1 bit field) in bit [2]. The user is responsible for
+ * extracting and packing in the correct locations of vctr_el2.
+ *
+ * Do not call this function with a value that is out of range for the page size
+ * in operation. A warning will be output if this is detected and the function
+ * returns VTCR_EL2_SLx_ENC_INVAL. See comment in kvm_arm.h for more info.
+ *
+ * Return: 3 bit value containing SL2[0]:SL0[1:0], or VTCR_EL2_SLx_ENC_INVAL.
+ */
+u8 kvm_vtcr_el2_sl_encode(s8 sl_dec);
+
+/**
+ * kvm_vtcr_el2_sl_decode() - Helper to decode start level for vtcr_el2.
+ * @sl_enc:     Start level encoded as SL2[0]:SL0[1:0].
+ *
+ * Takes an encoded translation start level value, as used in the vtcr_el2
+ * register and returns it decoded. See kvm_vtcr_el2_sl_encode() for description
+ * of input encoding.
+ *
+ * Do not call this function with a value that is invalid for the page size in
+ * operation. A warning will be output if this is detected and the function
+ * returns VTCR_EL2_SLx_ENC_INVAL. See comment in kvm_arm.h for more info.
+ *
+ * Return: Decoded start level, or VTCR_EL2_SLx_ENC_INVAL.
+ */
+s8 kvm_vtcr_el2_sl_decode(u8 sl_enc);
+
 /**
  * kvm_get_vtcr() - Helper to construct VTCR_EL2
  * @mmfr0:	Sanitized value of SYS_ID_AA64MMFR0_EL1 register.
diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
index c8dca8ae359c..02c5e04d4958 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -21,7 +21,18 @@
  * (IPA_SHIFT - 4).
  */
 #define stage2_pgtable_levels(ipa)	ARM64_HW_PGTABLE_LEVELS((ipa) - 4)
-#define kvm_stage2_levels(kvm)		VTCR_EL2_LVLS(kvm->arch.vtcr)
+static inline s8 kvm_stage2_levels(struct kvm *kvm)
+{
+	u64 vtcr = kvm->arch.vtcr;
+	u8 slx;
+	s8 start_level;
+
+	slx = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
+	slx |= FIELD_GET(VTCR_EL2_SL2_MASK, vtcr) << 2;
+	start_level = kvm_vtcr_el2_sl_decode(slx);
+
+	return KVM_PGTABLE_LAST_LEVEL + 1 - start_level;
+}
 
 /*
  * kvm_mmmu_cache_min_pages() is the number of pages required to install
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 274f839bd0d7..8ebd9aaed2c4 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -595,12 +595,67 @@ struct stage2_map_data {
 	bool				force_pte;
 };
 
+u8 kvm_vtcr_el2_sl_encode(s8 sl_dec)
+{
+	u8 sl_enc = VTCR_EL2_SLx_ENC_INVAL;
+
+	BUILD_BUG_ON(KVM_PGTABLE_FIRST_LEVEL < -1);
+	BUILD_BUG_ON(KVM_PGTABLE_LAST_LEVEL > 3);
+
+	switch (sl_dec) {
+	case -1:
+		sl_enc = VTCR_EL2_SLx_ENC_Lm1;
+		break;
+	case 0:
+		sl_enc = VTCR_EL2_SLx_ENC_L0;
+		break;
+	case 1:
+		sl_enc = VTCR_EL2_SLx_ENC_Lp1;
+		break;
+	case 2:
+		sl_enc = VTCR_EL2_SLx_ENC_Lp2;
+		break;
+	case 3:
+		sl_enc = VTCR_EL2_SLx_ENC_Lp3;
+		break;
+	}
+
+	WARN_ON_ONCE(sl_enc == VTCR_EL2_SLx_ENC_INVAL);
+	return sl_enc;
+}
+
+s8 kvm_vtcr_el2_sl_decode(u8 sl_enc)
+{
+	s8 sl_dec = VTCR_EL2_SLx_ENC_INVAL;
+
+	BUILD_BUG_ON(KVM_PGTABLE_FIRST_LEVEL < -1);
+	BUILD_BUG_ON(KVM_PGTABLE_LAST_LEVEL > 3);
+
+	if (sl_enc == VTCR_EL2_SLx_ENC_Lm1)
+		sl_dec = -1;
+	else if (sl_enc == VTCR_EL2_SLx_ENC_L0)
+		sl_dec = 0;
+	else if (sl_enc == VTCR_EL2_SLx_ENC_Lp1)
+		sl_dec = 1;
+	else if (sl_enc == VTCR_EL2_SLx_ENC_Lp2)
+		sl_dec = 2;
+	else if (sl_enc == VTCR_EL2_SLx_ENC_Lp3)
+		sl_dec = 3;
+
+	if (WARN_ON_ONCE(sl_dec == VTCR_EL2_SLx_ENC_INVAL ||
+			 sl_enc == VTCR_EL2_SLx_ENC_INVAL))
+		sl_dec = VTCR_EL2_SLx_ENC_INVAL;
+
+	return sl_dec;
+}
+
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 {
 	u64 vtcr = VTCR_EL2_FLAGS;
 	s8 levels;
 	u64 parange;
 	bool lpa2_ena = false;
+	u8 slx;
 
 	/*
 	 * If stage 2 reports that it supports FEAT_LPA2 for our page size, then
@@ -625,7 +680,9 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 	levels = stage2_pgtable_levels(phys_shift);
 	if (levels < 2)
 		levels = 2;
-	vtcr |= VTCR_EL2_LVLS_TO_SL0(levels);
+	slx = kvm_vtcr_el2_sl_encode(KVM_PGTABLE_LAST_LEVEL + 1 - levels);
+	vtcr |= FIELD_PREP(VTCR_EL2_SL0_MASK, slx);
+	vtcr |= FIELD_PREP(VTCR_EL2_SL2_MASK, slx >> 2);
 
 	/*
 	 * Enable the Hardware Access Flag management, unconditionally
@@ -1215,10 +1272,14 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	size_t pgd_sz;
 	u64 vtcr = mmu->arch->vtcr;
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
-	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
-	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+	u8 slx;
+	s8 start_level;
 	bool lpa2_ena = (vtcr & VTCR_EL2_DS) != 0;
 
+	slx = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
+	slx |= FIELD_GET(VTCR_EL2_SL2_MASK, vtcr) << 2;
+	start_level = kvm_vtcr_el2_sl_decode(slx);
+
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
 	pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
 	if (!pgt->pgd)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 11/12] KVM: arm64: Support upto 5 levels of translation in kvm_pgtable
  2022-12-06 13:59 ` Ryan Roberts
  (?)
@ 2022-12-06 13:59   ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: kvmarm, kvmarm, linux-arm-kernel

FEAT_LPA2 increases the maximum levels of translation from 4 to 5 for
the 4KB page case, when IA is >48 bits. While we can still use 4 levels
for stage2 translation in this case (due to stage2 allowing concatenated
page tables for first level lookup), the same kvm_pgtable library is
used for the hyp stage1 page tables and stage1 does not support
concatenation.

Therefore, modify the library to support upto 5 levels. Previous patches
already laid the groundwork for this by refactoring code to work in
terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL. So we just
need to change these macros.

The hardware sometimes encodes the new level differently from the
others: One such place is when reading the level from the FSC field in
the ESR_EL2 register. We never expect to see the lowest level (-1) here
since the stage 2 page tables always use concatenated tables for first
level lookup and therefore only use 4 levels of lookup. So we get away
with just adding a comment to explain why we are not being careful about
decoding level -1.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 10 ++++++++++
 arch/arm64/include/asm/kvm_pgtable.h |  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 270f49e7f29a..6f68febfb214 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -343,6 +343,16 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
 
 static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
 {
+	/*
+	 * Note: With the introduction of FEAT_LPA2 an extra level of
+	 * translation (level -1) is added. This level (obviously) doesn't
+	 * follow the previous convention of encoding the 4 levels in the 2 LSBs
+	 * of the FSC so this function breaks if the fault is for level -1.
+	 *
+	 * However, stage2 tables always use concatenated tables for first level
+	 * lookup and therefore it is guaranteed that the level will be between
+	 * 0 and 3, and this function continues to work.
+	 */
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
 }
 
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 3e0b64052c51..3655279e6a7d 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,7 +11,7 @@
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 
-#define KVM_PGTABLE_FIRST_LEVEL		0
+#define KVM_PGTABLE_FIRST_LEVEL		-1
 #define KVM_PGTABLE_LAST_LEVEL		3
 
 /*
-- 
2.25.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 11/12] KVM: arm64: Support upto 5 levels of translation in kvm_pgtable
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

FEAT_LPA2 increases the maximum levels of translation from 4 to 5 for
the 4KB page case, when IA is >48 bits. While we can still use 4 levels
for stage2 translation in this case (due to stage2 allowing concatenated
page tables for first level lookup), the same kvm_pgtable library is
used for the hyp stage1 page tables and stage1 does not support
concatenation.

Therefore, modify the library to support upto 5 levels. Previous patches
already laid the groundwork for this by refactoring code to work in
terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL. So we just
need to change these macros.

The hardware sometimes encodes the new level differently from the
others: One such place is when reading the level from the FSC field in
the ESR_EL2 register. We never expect to see the lowest level (-1) here
since the stage 2 page tables always use concatenated tables for first
level lookup and therefore only use 4 levels of lookup. So we get away
with just adding a comment to explain why we are not being careful about
decoding level -1.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 10 ++++++++++
 arch/arm64/include/asm/kvm_pgtable.h |  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 270f49e7f29a..6f68febfb214 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -343,6 +343,16 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
 
 static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
 {
+	/*
+	 * Note: With the introduction of FEAT_LPA2 an extra level of
+	 * translation (level -1) is added. This level (obviously) doesn't
+	 * follow the previous convention of encoding the 4 levels in the 2 LSBs
+	 * of the FSC so this function breaks if the fault is for level -1.
+	 *
+	 * However, stage2 tables always use concatenated tables for first level
+	 * lookup and therefore it is guaranteed that the level will be between
+	 * 0 and 3, and this function continues to work.
+	 */
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
 }
 
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 3e0b64052c51..3655279e6a7d 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,7 +11,7 @@
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 
-#define KVM_PGTABLE_FIRST_LEVEL		0
+#define KVM_PGTABLE_FIRST_LEVEL		-1
 #define KVM_PGTABLE_LAST_LEVEL		3
 
 /*
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 11/12] KVM: arm64: Support upto 5 levels of translation in kvm_pgtable
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

FEAT_LPA2 increases the maximum levels of translation from 4 to 5 for
the 4KB page case, when IA is >48 bits. While we can still use 4 levels
for stage2 translation in this case (due to stage2 allowing concatenated
page tables for first level lookup), the same kvm_pgtable library is
used for the hyp stage1 page tables and stage1 does not support
concatenation.

Therefore, modify the library to support upto 5 levels. Previous patches
already laid the groundwork for this by refactoring code to work in
terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL. So we just
need to change these macros.

The hardware sometimes encodes the new level differently from the
others: One such place is when reading the level from the FSC field in
the ESR_EL2 register. We never expect to see the lowest level (-1) here
since the stage 2 page tables always use concatenated tables for first
level lookup and therefore only use 4 levels of lookup. So we get away
with just adding a comment to explain why we are not being careful about
decoding level -1.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 10 ++++++++++
 arch/arm64/include/asm/kvm_pgtable.h |  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 270f49e7f29a..6f68febfb214 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -343,6 +343,16 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
 
 static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
 {
+	/*
+	 * Note: With the introduction of FEAT_LPA2 an extra level of
+	 * translation (level -1) is added. This level (obviously) doesn't
+	 * follow the previous convention of encoding the 4 levels in the 2 LSBs
+	 * of the FSC so this function breaks if the fault is for level -1.
+	 *
+	 * However, stage2 tables always use concatenated tables for first level
+	 * lookup and therefore it is guaranteed that the level will be between
+	 * 0 and 3, and this function continues to work.
+	 */
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
 }
 
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 3e0b64052c51..3655279e6a7d 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,7 +11,7 @@
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 
-#define KVM_PGTABLE_FIRST_LEVEL		0
+#define KVM_PGTABLE_FIRST_LEVEL		-1
 #define KVM_PGTABLE_LAST_LEVEL		3
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 12/12] KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systems
  2022-12-06 13:59 ` Ryan Roberts
  (?)
@ 2022-12-06 13:59   ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: kvmarm, kvmarm, linux-arm-kernel

With all the page-table infrastructure in place, we can finally increase
the maximum permisable IPA size to 52-bits on 4KB and 16KB page systems
that have FEAT_LPA2.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/kvm/reset.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 5ae18472205a..548756c3f43c 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -118,7 +118,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
 		kfree(buf);
 		return ret;
 	}
-	
+
 	vcpu->arch.sve_state = buf;
 	vcpu_set_flag(vcpu, VCPU_SVE_FINALIZED);
 	return 0;
@@ -361,12 +361,11 @@ int kvm_set_ipa_limit(void)
 	parange = cpuid_feature_extract_unsigned_field(mmfr0,
 				ID_AA64MMFR0_EL1_PARANGE_SHIFT);
 	/*
-	 * IPA size beyond 48 bits could not be supported
-	 * on either 4K or 16K page size. Hence let's cap
-	 * it to 48 bits, in case it's reported as larger
-	 * on the system.
+	 * IPA size beyond 48 bits for 4K and 16K page size is only supported
+	 * when LPA2 is available. So if we have LPA2, enable it, else cap to 48
+	 * bits, in case it's reported as larger on the system.
 	 */
-	if (PAGE_SIZE != SZ_64K)
+	if (!kvm_supports_stage2_lpa2(mmfr0) && PAGE_SIZE != SZ_64K)
 		parange = min(parange, (unsigned int)ID_AA64MMFR0_EL1_PARANGE_48);
 
 	/*
-- 
2.25.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 12/12] KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systems
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

With all the page-table infrastructure in place, we can finally increase
the maximum permisable IPA size to 52-bits on 4KB and 16KB page systems
that have FEAT_LPA2.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/kvm/reset.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 5ae18472205a..548756c3f43c 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -118,7 +118,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
 		kfree(buf);
 		return ret;
 	}
-	
+
 	vcpu->arch.sve_state = buf;
 	vcpu_set_flag(vcpu, VCPU_SVE_FINALIZED);
 	return 0;
@@ -361,12 +361,11 @@ int kvm_set_ipa_limit(void)
 	parange = cpuid_feature_extract_unsigned_field(mmfr0,
 				ID_AA64MMFR0_EL1_PARANGE_SHIFT);
 	/*
-	 * IPA size beyond 48 bits could not be supported
-	 * on either 4K or 16K page size. Hence let's cap
-	 * it to 48 bits, in case it's reported as larger
-	 * on the system.
+	 * IPA size beyond 48 bits for 4K and 16K page size is only supported
+	 * when LPA2 is available. So if we have LPA2, enable it, else cap to 48
+	 * bits, in case it's reported as larger on the system.
 	 */
-	if (PAGE_SIZE != SZ_64K)
+	if (!kvm_supports_stage2_lpa2(mmfr0) && PAGE_SIZE != SZ_64K)
 		parange = min(parange, (unsigned int)ID_AA64MMFR0_EL1_PARANGE_48);
 
 	/*
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 12/12] KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systems
@ 2022-12-06 13:59   ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-06 13:59 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual
  Cc: Ryan Roberts, James Morse, Alexandru Elisei, Oliver Upton,
	linux-arm-kernel, kvmarm, kvmarm

With all the page-table infrastructure in place, we can finally increase
the maximum permisable IPA size to 52-bits on 4KB and 16KB page systems
that have FEAT_LPA2.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/kvm/reset.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 5ae18472205a..548756c3f43c 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -118,7 +118,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
 		kfree(buf);
 		return ret;
 	}
-	
+
 	vcpu->arch.sve_state = buf;
 	vcpu_set_flag(vcpu, VCPU_SVE_FINALIZED);
 	return 0;
@@ -361,12 +361,11 @@ int kvm_set_ipa_limit(void)
 	parange = cpuid_feature_extract_unsigned_field(mmfr0,
 				ID_AA64MMFR0_EL1_PARANGE_SHIFT);
 	/*
-	 * IPA size beyond 48 bits could not be supported
-	 * on either 4K or 16K page size. Hence let's cap
-	 * it to 48 bits, in case it's reported as larger
-	 * on the system.
+	 * IPA size beyond 48 bits for 4K and 16K page size is only supported
+	 * when LPA2 is available. So if we have LPA2, enable it, else cap to 48
+	 * bits, in case it's reported as larger on the system.
 	 */
-	if (PAGE_SIZE != SZ_64K)
+	if (!kvm_supports_stage2_lpa2(mmfr0) && PAGE_SIZE != SZ_64K)
 		parange = min(parange, (unsigned int)ID_AA64MMFR0_EL1_PARANGE_48);
 
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 01/12] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
  2022-12-06 13:59   ` Ryan Roberts
  (?)
@ 2022-12-14 19:16     ` Oliver Upton
  -1 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-14 19:16 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

On Tue, Dec 06, 2022 at 01:59:19PM +0000, Ryan Roberts wrote:
> From: Anshuman Khandual <anshuman.khandual@arm.com>
> 
> PAGE_SIZE support is tested against possible minimum and maximum values for
> its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
> or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
> and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
> adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/include/asm/sysreg.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 7d301700d1a9..9ad8172eea58 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -673,10 +673,12 @@
>  
>  /* id_aa64mmfr0 */
>  #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN	0x0
> +#define ID_AA64MMFR0_EL1_TGRAN4_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
>  #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX	0x7
>  #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN	0x0
>  #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX	0x7
>  #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN	0x1
> +#define ID_AA64MMFR0_EL1_TGRAN16_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
>  #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX	0xf
>  
>  #define ARM64_MIN_PARANGE_BITS		32
> @@ -684,6 +686,7 @@
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT	0x0
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE		0x1
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN		0x2
> +#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2		0x3
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX		0x7
>  
>  #ifdef CONFIG_ARM64_PA_BITS_52
> @@ -800,11 +803,13 @@
>  
>  #if defined(CONFIG_ARM64_4K_PAGES)
>  #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_SHIFT
> +#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
>  #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
>  #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
>  #elif defined(CONFIG_ARM64_16K_PAGES)
>  #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_SHIFT
> +#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT

Can you use the 52_BIT suffix instead for these macros? LPA2 can map to
multiple values (i.e. no support for 4KB granule). Also provides a
direct description of what feature we're testing for.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 01/12] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
@ 2022-12-14 19:16     ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-14 19:16 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

On Tue, Dec 06, 2022 at 01:59:19PM +0000, Ryan Roberts wrote:
> From: Anshuman Khandual <anshuman.khandual@arm.com>
> 
> PAGE_SIZE support is tested against possible minimum and maximum values for
> its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
> or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
> and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
> adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/include/asm/sysreg.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 7d301700d1a9..9ad8172eea58 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -673,10 +673,12 @@
>  
>  /* id_aa64mmfr0 */
>  #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN	0x0
> +#define ID_AA64MMFR0_EL1_TGRAN4_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
>  #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX	0x7
>  #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN	0x0
>  #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX	0x7
>  #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN	0x1
> +#define ID_AA64MMFR0_EL1_TGRAN16_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
>  #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX	0xf
>  
>  #define ARM64_MIN_PARANGE_BITS		32
> @@ -684,6 +686,7 @@
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT	0x0
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE		0x1
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN		0x2
> +#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2		0x3
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX		0x7
>  
>  #ifdef CONFIG_ARM64_PA_BITS_52
> @@ -800,11 +803,13 @@
>  
>  #if defined(CONFIG_ARM64_4K_PAGES)
>  #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_SHIFT
> +#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
>  #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
>  #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
>  #elif defined(CONFIG_ARM64_16K_PAGES)
>  #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_SHIFT
> +#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT

Can you use the 52_BIT suffix instead for these macros? LPA2 can map to
multiple values (i.e. no support for 4KB granule). Also provides a
direct description of what feature we're testing for.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 01/12] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
@ 2022-12-14 19:16     ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-14 19:16 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

On Tue, Dec 06, 2022 at 01:59:19PM +0000, Ryan Roberts wrote:
> From: Anshuman Khandual <anshuman.khandual@arm.com>
> 
> PAGE_SIZE support is tested against possible minimum and maximum values for
> its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
> or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
> and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
> adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/include/asm/sysreg.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 7d301700d1a9..9ad8172eea58 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -673,10 +673,12 @@
>  
>  /* id_aa64mmfr0 */
>  #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN	0x0
> +#define ID_AA64MMFR0_EL1_TGRAN4_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
>  #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX	0x7
>  #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN	0x0
>  #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX	0x7
>  #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN	0x1
> +#define ID_AA64MMFR0_EL1_TGRAN16_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
>  #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX	0xf
>  
>  #define ARM64_MIN_PARANGE_BITS		32
> @@ -684,6 +686,7 @@
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT	0x0
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE		0x1
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN		0x2
> +#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2		0x3
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX		0x7
>  
>  #ifdef CONFIG_ARM64_PA_BITS_52
> @@ -800,11 +803,13 @@
>  
>  #if defined(CONFIG_ARM64_4K_PAGES)
>  #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_SHIFT
> +#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
>  #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
>  #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
>  #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
>  #elif defined(CONFIG_ARM64_16K_PAGES)
>  #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_SHIFT
> +#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT

Can you use the 52_BIT suffix instead for these macros? LPA2 can map to
multiple values (i.e. no support for 4KB granule). Also provides a
direct description of what feature we're testing for.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
  2022-12-06 13:59 ` Ryan Roberts
  (?)
@ 2022-12-15  0:52   ` Oliver Upton
  -1 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-15  0:52 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> (appologies, I'm resending this series as I managed to send the cover letter to
> all but the following patches only to myself on first attempt).
> 
> This is my first upstream feature submission so please go easy ;-)

Welcome :)

> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
> HW advertises support for LPA2 independently for stage 1 and stage 2, and
> therefore its possible to have it for one and not the other. I've assumed that
> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
> of this independence and the fact that the kvm pgtable library is used for both
> stage 1 and stage 2 tables, this means the library now has to remember the
> in-use format on a per-page-table basis. To do this, I had to rework some
> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
> a noisy patch to add this parameter.

Mismatch between the translation stages is an interesting problem...

Given that userspace is responsible for setting up the IPA space, I
can't really think of a strong use case for 52 bit IPAs with a 48 bit
VA. Sure, the VMM could construct a sparse IPA space or remap the same
HVA at multiple IPAs to artificially saturate the address space, but
neither seems terribly compelling.

Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
!LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
its guest.

Marc, is there any real reason for this or is it just a byproduct of how
LPA support was added to KVM?

> Support 52-bit Input Addresses: The main difficulty here is that at stage 1 for
> 4KB pages, 52-bit IA requires a extra level of lookup, and that level is called
> '-1'. (Although stage 2 can use concatenated pages at the first level, and
> therefore still only uses 4 levels, the kvm pgtable library deals with both
> stage 1 and stage 2 tables). So there is another noisy patch to convert all
> level variables to signed.
> 
> This is all tested on the FVP, using a test harness I put together, which does a
> host + guest boot test for 180 configurations, built from all the (valid)
> combinations of various FVP, host kernel and guest kernel parameters:
> 
>  - hw_pa:		[48, lpa, lpa2]
>  - hw_va:		[48, 52]
>  - kvm_mode:		[vhe, nvhe, protected]
>  - host_page_size:	[4KB, 16KB, 64KB]
>  - host_pa:		[48, 52]
>  - host_va:		[48, 52]
>  - host_load_addr:	[low, high]
>  - guest_page_size:	[64KB]
>  - guest_pa:		[52]
>  - guest_va:		[52]
>  - guest_load_addr:	[low, high]

Wow, what a matrix!

In a later revision of this series it might be good to add support for
LPA2 guests in KVM selftests. We currently constrain the IPA size to
48bits on !64K kernels.

I'll have a deeper look at this series in the coming days.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2022-12-15  0:52   ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-15  0:52 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> (appologies, I'm resending this series as I managed to send the cover letter to
> all but the following patches only to myself on first attempt).
> 
> This is my first upstream feature submission so please go easy ;-)

Welcome :)

> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
> HW advertises support for LPA2 independently for stage 1 and stage 2, and
> therefore its possible to have it for one and not the other. I've assumed that
> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
> of this independence and the fact that the kvm pgtable library is used for both
> stage 1 and stage 2 tables, this means the library now has to remember the
> in-use format on a per-page-table basis. To do this, I had to rework some
> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
> a noisy patch to add this parameter.

Mismatch between the translation stages is an interesting problem...

Given that userspace is responsible for setting up the IPA space, I
can't really think of a strong use case for 52 bit IPAs with a 48 bit
VA. Sure, the VMM could construct a sparse IPA space or remap the same
HVA at multiple IPAs to artificially saturate the address space, but
neither seems terribly compelling.

Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
!LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
its guest.

Marc, is there any real reason for this or is it just a byproduct of how
LPA support was added to KVM?

> Support 52-bit Input Addresses: The main difficulty here is that at stage 1 for
> 4KB pages, 52-bit IA requires a extra level of lookup, and that level is called
> '-1'. (Although stage 2 can use concatenated pages at the first level, and
> therefore still only uses 4 levels, the kvm pgtable library deals with both
> stage 1 and stage 2 tables). So there is another noisy patch to convert all
> level variables to signed.
> 
> This is all tested on the FVP, using a test harness I put together, which does a
> host + guest boot test for 180 configurations, built from all the (valid)
> combinations of various FVP, host kernel and guest kernel parameters:
> 
>  - hw_pa:		[48, lpa, lpa2]
>  - hw_va:		[48, 52]
>  - kvm_mode:		[vhe, nvhe, protected]
>  - host_page_size:	[4KB, 16KB, 64KB]
>  - host_pa:		[48, 52]
>  - host_va:		[48, 52]
>  - host_load_addr:	[low, high]
>  - guest_page_size:	[64KB]
>  - guest_pa:		[52]
>  - guest_va:		[52]
>  - guest_load_addr:	[low, high]

Wow, what a matrix!

In a later revision of this series it might be good to add support for
LPA2 guests in KVM selftests. We currently constrain the IPA size to
48bits on !64K kernels.

I'll have a deeper look at this series in the coming days.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2022-12-15  0:52   ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-15  0:52 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> (appologies, I'm resending this series as I managed to send the cover letter to
> all but the following patches only to myself on first attempt).
> 
> This is my first upstream feature submission so please go easy ;-)

Welcome :)

> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
> HW advertises support for LPA2 independently for stage 1 and stage 2, and
> therefore its possible to have it for one and not the other. I've assumed that
> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
> of this independence and the fact that the kvm pgtable library is used for both
> stage 1 and stage 2 tables, this means the library now has to remember the
> in-use format on a per-page-table basis. To do this, I had to rework some
> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
> a noisy patch to add this parameter.

Mismatch between the translation stages is an interesting problem...

Given that userspace is responsible for setting up the IPA space, I
can't really think of a strong use case for 52 bit IPAs with a 48 bit
VA. Sure, the VMM could construct a sparse IPA space or remap the same
HVA at multiple IPAs to artificially saturate the address space, but
neither seems terribly compelling.

Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
!LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
its guest.

Marc, is there any real reason for this or is it just a byproduct of how
LPA support was added to KVM?

> Support 52-bit Input Addresses: The main difficulty here is that at stage 1 for
> 4KB pages, 52-bit IA requires a extra level of lookup, and that level is called
> '-1'. (Although stage 2 can use concatenated pages at the first level, and
> therefore still only uses 4 levels, the kvm pgtable library deals with both
> stage 1 and stage 2 tables). So there is another noisy patch to convert all
> level variables to signed.
> 
> This is all tested on the FVP, using a test harness I put together, which does a
> host + guest boot test for 180 configurations, built from all the (valid)
> combinations of various FVP, host kernel and guest kernel parameters:
> 
>  - hw_pa:		[48, lpa, lpa2]
>  - hw_va:		[48, 52]
>  - kvm_mode:		[vhe, nvhe, protected]
>  - host_page_size:	[4KB, 16KB, 64KB]
>  - host_pa:		[48, 52]
>  - host_va:		[48, 52]
>  - host_load_addr:	[low, high]
>  - guest_page_size:	[64KB]
>  - guest_pa:		[52]
>  - guest_va:		[52]
>  - guest_load_addr:	[low, high]

Wow, what a matrix!

In a later revision of this series it might be good to add support for
LPA2 guests in KVM selftests. We currently constrain the IPA size to
48bits on !64K kernels.

I'll have a deeper look at this series in the coming days.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 01/12] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
  2022-12-14 19:16     ` Oliver Upton
  (?)
@ 2022-12-15  0:53       ` Oliver Upton
  -1 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-15  0:53 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Anshuman Khandual, Marc Zyngier, Catalin Marinas, kvmarm,
	Will Deacon, kvmarm, linux-arm-kernel

On Wed, Dec 14, 2022 at 07:16:09PM +0000, Oliver Upton wrote:
> On Tue, Dec 06, 2022 at 01:59:19PM +0000, Ryan Roberts wrote:
> > From: Anshuman Khandual <anshuman.khandual@arm.com>
> > 
> > PAGE_SIZE support is tested against possible minimum and maximum values for
> > its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
> > or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
> > and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
> > adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).
> > 
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> > ---
> >  arch/arm64/include/asm/sysreg.h | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> > index 7d301700d1a9..9ad8172eea58 100644
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -673,10 +673,12 @@
> >  
> >  /* id_aa64mmfr0 */
> >  #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN	0x0
> > +#define ID_AA64MMFR0_EL1_TGRAN4_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
> >  #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX	0x7
> >  #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN	0x0
> >  #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX	0x7
> >  #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN	0x1
> > +#define ID_AA64MMFR0_EL1_TGRAN16_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
> >  #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX	0xf
> >  
> >  #define ARM64_MIN_PARANGE_BITS		32
> > @@ -684,6 +686,7 @@
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT	0x0
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE		0x1
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN		0x2
> > +#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2		0x3
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX		0x7
> >  
> >  #ifdef CONFIG_ARM64_PA_BITS_52
> > @@ -800,11 +803,13 @@
> >  
> >  #if defined(CONFIG_ARM64_4K_PAGES)
> >  #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_SHIFT
> > +#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
> >  #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
> >  #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
> >  #elif defined(CONFIG_ARM64_16K_PAGES)
> >  #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_SHIFT
> > +#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
> 
> Can you use the 52_BIT suffix instead for these macros? LPA2 can map to
> multiple values (i.e. no support for 4KB granule). Also provides a
> direct description of what feature we're testing for.

Ignore me. I had to educate myself with Ard's series, and I now see that
this pattern is followed there too.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 01/12] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
@ 2022-12-15  0:53       ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-15  0:53 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

On Wed, Dec 14, 2022 at 07:16:09PM +0000, Oliver Upton wrote:
> On Tue, Dec 06, 2022 at 01:59:19PM +0000, Ryan Roberts wrote:
> > From: Anshuman Khandual <anshuman.khandual@arm.com>
> > 
> > PAGE_SIZE support is tested against possible minimum and maximum values for
> > its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
> > or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
> > and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
> > adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).
> > 
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> > ---
> >  arch/arm64/include/asm/sysreg.h | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> > index 7d301700d1a9..9ad8172eea58 100644
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -673,10 +673,12 @@
> >  
> >  /* id_aa64mmfr0 */
> >  #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN	0x0
> > +#define ID_AA64MMFR0_EL1_TGRAN4_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
> >  #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX	0x7
> >  #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN	0x0
> >  #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX	0x7
> >  #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN	0x1
> > +#define ID_AA64MMFR0_EL1_TGRAN16_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
> >  #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX	0xf
> >  
> >  #define ARM64_MIN_PARANGE_BITS		32
> > @@ -684,6 +686,7 @@
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT	0x0
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE		0x1
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN		0x2
> > +#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2		0x3
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX		0x7
> >  
> >  #ifdef CONFIG_ARM64_PA_BITS_52
> > @@ -800,11 +803,13 @@
> >  
> >  #if defined(CONFIG_ARM64_4K_PAGES)
> >  #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_SHIFT
> > +#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
> >  #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
> >  #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
> >  #elif defined(CONFIG_ARM64_16K_PAGES)
> >  #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_SHIFT
> > +#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
> 
> Can you use the 52_BIT suffix instead for these macros? LPA2 can map to
> multiple values (i.e. no support for 4KB granule). Also provides a
> direct description of what feature we're testing for.

Ignore me. I had to educate myself with Ard's series, and I now see that
this pattern is followed there too.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 01/12] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
@ 2022-12-15  0:53       ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-15  0:53 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

On Wed, Dec 14, 2022 at 07:16:09PM +0000, Oliver Upton wrote:
> On Tue, Dec 06, 2022 at 01:59:19PM +0000, Ryan Roberts wrote:
> > From: Anshuman Khandual <anshuman.khandual@arm.com>
> > 
> > PAGE_SIZE support is tested against possible minimum and maximum values for
> > its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
> > or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
> > and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
> > adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).
> > 
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> > ---
> >  arch/arm64/include/asm/sysreg.h | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> > index 7d301700d1a9..9ad8172eea58 100644
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -673,10 +673,12 @@
> >  
> >  /* id_aa64mmfr0 */
> >  #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN	0x0
> > +#define ID_AA64MMFR0_EL1_TGRAN4_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
> >  #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX	0x7
> >  #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN	0x0
> >  #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX	0x7
> >  #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN	0x1
> > +#define ID_AA64MMFR0_EL1_TGRAN16_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
> >  #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX	0xf
> >  
> >  #define ARM64_MIN_PARANGE_BITS		32
> > @@ -684,6 +686,7 @@
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT	0x0
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE		0x1
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN		0x2
> > +#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2		0x3
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX		0x7
> >  
> >  #ifdef CONFIG_ARM64_PA_BITS_52
> > @@ -800,11 +803,13 @@
> >  
> >  #if defined(CONFIG_ARM64_4K_PAGES)
> >  #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_SHIFT
> > +#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
> >  #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
> >  #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
> >  #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
> >  #elif defined(CONFIG_ARM64_16K_PAGES)
> >  #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_SHIFT
> > +#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
> 
> Can you use the 52_BIT suffix instead for these macros? LPA2 can map to
> multiple values (i.e. no support for 4KB granule). Also provides a
> direct description of what feature we're testing for.

Ignore me. I had to educate myself with Ard's series, and I now see that
this pattern is followed there too.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
  2022-12-15  0:52   ` Oliver Upton
  (?)
@ 2022-12-15  9:33     ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-15  9:33 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

On 15/12/2022 00:52, Oliver Upton wrote:
> On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
>> (appologies, I'm resending this series as I managed to send the cover letter to
>> all but the following patches only to myself on first attempt).
>>
>> This is my first upstream feature submission so please go easy ;-)
> 
> Welcome :)
> 
>> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
>> HW advertises support for LPA2 independently for stage 1 and stage 2, and
>> therefore its possible to have it for one and not the other. I've assumed that
>> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
>> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
>> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
>> of this independence and the fact that the kvm pgtable library is used for both
>> stage 1 and stage 2 tables, this means the library now has to remember the
>> in-use format on a per-page-table basis. To do this, I had to rework some
>> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
>> a noisy patch to add this parameter.
> 
> Mismatch between the translation stages is an interesting problem...
> 
> Given that userspace is responsible for setting up the IPA space, I
> can't really think of a strong use case for 52 bit IPAs with a 48 bit
> VA. Sure, the VMM could construct a sparse IPA space or remap the same
> HVA at multiple IPAs to artificially saturate the address space, but
> neither seems terribly compelling.
> 
> Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> its guest.

I guess a simpler approach would be to only use LPA2 if its supported by both
stage1 and stage2. Then the code could just use a static key in the few required
places. However, there is also a place where kvm_pgtable walks the user space s1
page table that is constructed by the kernel. For this to keep working, the
kernel would need to decide whether to use LPA2 based on the same criteria. But
it feels odd to have the kernel depend on LPA2 support at stage2. I'll wait for
your fuller review.

> 
> Marc, is there any real reason for this or is it just a byproduct of how
> LPA support was added to KVM?
> 
>> Support 52-bit Input Addresses: The main difficulty here is that at stage 1 for
>> 4KB pages, 52-bit IA requires a extra level of lookup, and that level is called
>> '-1'. (Although stage 2 can use concatenated pages at the first level, and
>> therefore still only uses 4 levels, the kvm pgtable library deals with both
>> stage 1 and stage 2 tables). So there is another noisy patch to convert all
>> level variables to signed.
>>
>> This is all tested on the FVP, using a test harness I put together, which does a
>> host + guest boot test for 180 configurations, built from all the (valid)
>> combinations of various FVP, host kernel and guest kernel parameters:
>>
>>  - hw_pa:		[48, lpa, lpa2]
>>  - hw_va:		[48, 52]
>>  - kvm_mode:		[vhe, nvhe, protected]
>>  - host_page_size:	[4KB, 16KB, 64KB]
>>  - host_pa:		[48, 52]
>>  - host_va:		[48, 52]
>>  - host_load_addr:	[low, high]
>>  - guest_page_size:	[64KB]
>>  - guest_pa:		[52]
>>  - guest_va:		[52]
>>  - guest_load_addr:	[low, high]
> 
> Wow, what a matrix!
> 
> In a later revision of this series it might be good to add support for
> LPA2 guests in KVM selftests. We currently constrain the IPA size to
> 48bits on !64K kernels.

Ahh - I did have a quick look at kselftests and kvm-unit-tests but it looked
like they were hard-coded for 48-bit IPA and it looked like quite an effort to
rework. I guess if it already supports 52 bit IPA for 64K kernels then I missed
something. I'll take another look and aim to get some tests implemented for a
future revision.
> 
> I'll have a deeper look at this series in the coming days.

Thanks!

> 
> --
> Thanks,
> Oliver

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2022-12-15  9:33     ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-15  9:33 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

On 15/12/2022 00:52, Oliver Upton wrote:
> On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
>> (appologies, I'm resending this series as I managed to send the cover letter to
>> all but the following patches only to myself on first attempt).
>>
>> This is my first upstream feature submission so please go easy ;-)
> 
> Welcome :)
> 
>> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
>> HW advertises support for LPA2 independently for stage 1 and stage 2, and
>> therefore its possible to have it for one and not the other. I've assumed that
>> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
>> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
>> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
>> of this independence and the fact that the kvm pgtable library is used for both
>> stage 1 and stage 2 tables, this means the library now has to remember the
>> in-use format on a per-page-table basis. To do this, I had to rework some
>> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
>> a noisy patch to add this parameter.
> 
> Mismatch between the translation stages is an interesting problem...
> 
> Given that userspace is responsible for setting up the IPA space, I
> can't really think of a strong use case for 52 bit IPAs with a 48 bit
> VA. Sure, the VMM could construct a sparse IPA space or remap the same
> HVA at multiple IPAs to artificially saturate the address space, but
> neither seems terribly compelling.
> 
> Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> its guest.

I guess a simpler approach would be to only use LPA2 if its supported by both
stage1 and stage2. Then the code could just use a static key in the few required
places. However, there is also a place where kvm_pgtable walks the user space s1
page table that is constructed by the kernel. For this to keep working, the
kernel would need to decide whether to use LPA2 based on the same criteria. But
it feels odd to have the kernel depend on LPA2 support at stage2. I'll wait for
your fuller review.

> 
> Marc, is there any real reason for this or is it just a byproduct of how
> LPA support was added to KVM?
> 
>> Support 52-bit Input Addresses: The main difficulty here is that at stage 1 for
>> 4KB pages, 52-bit IA requires a extra level of lookup, and that level is called
>> '-1'. (Although stage 2 can use concatenated pages at the first level, and
>> therefore still only uses 4 levels, the kvm pgtable library deals with both
>> stage 1 and stage 2 tables). So there is another noisy patch to convert all
>> level variables to signed.
>>
>> This is all tested on the FVP, using a test harness I put together, which does a
>> host + guest boot test for 180 configurations, built from all the (valid)
>> combinations of various FVP, host kernel and guest kernel parameters:
>>
>>  - hw_pa:		[48, lpa, lpa2]
>>  - hw_va:		[48, 52]
>>  - kvm_mode:		[vhe, nvhe, protected]
>>  - host_page_size:	[4KB, 16KB, 64KB]
>>  - host_pa:		[48, 52]
>>  - host_va:		[48, 52]
>>  - host_load_addr:	[low, high]
>>  - guest_page_size:	[64KB]
>>  - guest_pa:		[52]
>>  - guest_va:		[52]
>>  - guest_load_addr:	[low, high]
> 
> Wow, what a matrix!
> 
> In a later revision of this series it might be good to add support for
> LPA2 guests in KVM selftests. We currently constrain the IPA size to
> 48bits on !64K kernels.

Ahh - I did have a quick look at kselftests and kvm-unit-tests but it looked
like they were hard-coded for 48-bit IPA and it looked like quite an effort to
rework. I guess if it already supports 52 bit IPA for 64K kernels then I missed
something. I'll take another look and aim to get some tests implemented for a
future revision.
> 
> I'll have a deeper look at this series in the coming days.

Thanks!

> 
> --
> Thanks,
> Oliver


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2022-12-15  9:33     ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-15  9:33 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

On 15/12/2022 00:52, Oliver Upton wrote:
> On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
>> (appologies, I'm resending this series as I managed to send the cover letter to
>> all but the following patches only to myself on first attempt).
>>
>> This is my first upstream feature submission so please go easy ;-)
> 
> Welcome :)
> 
>> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
>> HW advertises support for LPA2 independently for stage 1 and stage 2, and
>> therefore its possible to have it for one and not the other. I've assumed that
>> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
>> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
>> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
>> of this independence and the fact that the kvm pgtable library is used for both
>> stage 1 and stage 2 tables, this means the library now has to remember the
>> in-use format on a per-page-table basis. To do this, I had to rework some
>> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
>> a noisy patch to add this parameter.
> 
> Mismatch between the translation stages is an interesting problem...
> 
> Given that userspace is responsible for setting up the IPA space, I
> can't really think of a strong use case for 52 bit IPAs with a 48 bit
> VA. Sure, the VMM could construct a sparse IPA space or remap the same
> HVA at multiple IPAs to artificially saturate the address space, but
> neither seems terribly compelling.
> 
> Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> its guest.

I guess a simpler approach would be to only use LPA2 if its supported by both
stage1 and stage2. Then the code could just use a static key in the few required
places. However, there is also a place where kvm_pgtable walks the user space s1
page table that is constructed by the kernel. For this to keep working, the
kernel would need to decide whether to use LPA2 based on the same criteria. But
it feels odd to have the kernel depend on LPA2 support at stage2. I'll wait for
your fuller review.

> 
> Marc, is there any real reason for this or is it just a byproduct of how
> LPA support was added to KVM?
> 
>> Support 52-bit Input Addresses: The main difficulty here is that at stage 1 for
>> 4KB pages, 52-bit IA requires a extra level of lookup, and that level is called
>> '-1'. (Although stage 2 can use concatenated pages at the first level, and
>> therefore still only uses 4 levels, the kvm pgtable library deals with both
>> stage 1 and stage 2 tables). So there is another noisy patch to convert all
>> level variables to signed.
>>
>> This is all tested on the FVP, using a test harness I put together, which does a
>> host + guest boot test for 180 configurations, built from all the (valid)
>> combinations of various FVP, host kernel and guest kernel parameters:
>>
>>  - hw_pa:		[48, lpa, lpa2]
>>  - hw_va:		[48, 52]
>>  - kvm_mode:		[vhe, nvhe, protected]
>>  - host_page_size:	[4KB, 16KB, 64KB]
>>  - host_pa:		[48, 52]
>>  - host_va:		[48, 52]
>>  - host_load_addr:	[low, high]
>>  - guest_page_size:	[64KB]
>>  - guest_pa:		[52]
>>  - guest_va:		[52]
>>  - guest_load_addr:	[low, high]
> 
> Wow, what a matrix!
> 
> In a later revision of this series it might be good to add support for
> LPA2 guests in KVM selftests. We currently constrain the IPA size to
> 48bits on !64K kernels.

Ahh - I did have a quick look at kselftests and kvm-unit-tests but it looked
like they were hard-coded for 48-bit IPA and it looked like quite an effort to
rework. I guess if it already supports 52 bit IPA for 64K kernels then I missed
something. I'll take another look and aim to get some tests implemented for a
future revision.
> 
> I'll have a deeper look at this series in the coming days.

Thanks!

> 
> --
> Thanks,
> Oliver


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
  2022-12-15  0:52   ` Oliver Upton
  (?)
@ 2022-12-15  9:35     ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2022-12-15  9:35 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Anshuman Khandual, Catalin Marinas, kvmarm, kvmarm, Will Deacon,
	linux-arm-kernel

On Thu, 15 Dec 2022 00:52:28 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> > (appologies, I'm resending this series as I managed to send the cover letter to
> > all but the following patches only to myself on first attempt).
> > 
> > This is my first upstream feature submission so please go easy ;-)
> 
> Welcome :)
> 
> > Support 52-bit Output Addresses: FEAT_LPA2 changes the format of
> > the PTEs. The HW advertises support for LPA2 independently for
> > stage 1 and stage 2, and therefore its possible to have it for one
> > and not the other. I've assumed that there is a valid case for
> > this if stage 1 is not supported but stage 2 is, KVM could still
> > then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> > then be consumed by a 64KB page guest kernel with the help of
> > FEAT_LPA). Because of this independence and the fact that the kvm
> > pgtable library is used for both stage 1 and stage 2 tables, this
> > means the library now has to remember the in-use format on a
> > per-page-table basis. To do this, I had to rework some functions
> > to take a `struct kvm_pgtable *` parameter, and as a result, there
> > is a noisy patch to add this parameter.
> 
> Mismatch between the translation stages is an interesting problem...
> 
> Given that userspace is responsible for setting up the IPA space, I
> can't really think of a strong use case for 52 bit IPAs with a 48 bit
> VA. Sure, the VMM could construct a sparse IPA space or remap the same
> HVA at multiple IPAs to artificially saturate the address space, but
> neither seems terribly compelling.
> 
> Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> its guest.
> 
> Marc, is there any real reason for this or is it just a byproduct of how
> LPA support was added to KVM?

My recollection is hazy, but LPA came first, and LVA only landed much
later (because the two features were made independent in the
architecture, something that was later abandoned for LPA2, which
implies large VAs as well).

So yes, the VMM can place memory wherever it wants in the 52bit IPA
space, even if its own VA space is limited to 48 bits. And it doesn't
have to be memory, by the way. You could place all the emulated MMIO
above the 48bit limit, for example, and that doesn't require any trick
other than the HW supporting 52bit PAs, and VTCR_EL2 being correctly
configured.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2022-12-15  9:35     ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2022-12-15  9:35 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Ryan Roberts, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

On Thu, 15 Dec 2022 00:52:28 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> > (appologies, I'm resending this series as I managed to send the cover letter to
> > all but the following patches only to myself on first attempt).
> > 
> > This is my first upstream feature submission so please go easy ;-)
> 
> Welcome :)
> 
> > Support 52-bit Output Addresses: FEAT_LPA2 changes the format of
> > the PTEs. The HW advertises support for LPA2 independently for
> > stage 1 and stage 2, and therefore its possible to have it for one
> > and not the other. I've assumed that there is a valid case for
> > this if stage 1 is not supported but stage 2 is, KVM could still
> > then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> > then be consumed by a 64KB page guest kernel with the help of
> > FEAT_LPA). Because of this independence and the fact that the kvm
> > pgtable library is used for both stage 1 and stage 2 tables, this
> > means the library now has to remember the in-use format on a
> > per-page-table basis. To do this, I had to rework some functions
> > to take a `struct kvm_pgtable *` parameter, and as a result, there
> > is a noisy patch to add this parameter.
> 
> Mismatch between the translation stages is an interesting problem...
> 
> Given that userspace is responsible for setting up the IPA space, I
> can't really think of a strong use case for 52 bit IPAs with a 48 bit
> VA. Sure, the VMM could construct a sparse IPA space or remap the same
> HVA at multiple IPAs to artificially saturate the address space, but
> neither seems terribly compelling.
> 
> Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> its guest.
> 
> Marc, is there any real reason for this or is it just a byproduct of how
> LPA support was added to KVM?

My recollection is hazy, but LPA came first, and LVA only landed much
later (because the two features were made independent in the
architecture, something that was later abandoned for LPA2, which
implies large VAs as well).

So yes, the VMM can place memory wherever it wants in the 52bit IPA
space, even if its own VA space is limited to 48 bits. And it doesn't
have to be memory, by the way. You could place all the emulated MMIO
above the 48bit limit, for example, and that doesn't require any trick
other than the HW supporting 52bit PAs, and VTCR_EL2 being correctly
configured.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2022-12-15  9:35     ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2022-12-15  9:35 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Ryan Roberts, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

On Thu, 15 Dec 2022 00:52:28 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> > (appologies, I'm resending this series as I managed to send the cover letter to
> > all but the following patches only to myself on first attempt).
> > 
> > This is my first upstream feature submission so please go easy ;-)
> 
> Welcome :)
> 
> > Support 52-bit Output Addresses: FEAT_LPA2 changes the format of
> > the PTEs. The HW advertises support for LPA2 independently for
> > stage 1 and stage 2, and therefore its possible to have it for one
> > and not the other. I've assumed that there is a valid case for
> > this if stage 1 is not supported but stage 2 is, KVM could still
> > then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> > then be consumed by a 64KB page guest kernel with the help of
> > FEAT_LPA). Because of this independence and the fact that the kvm
> > pgtable library is used for both stage 1 and stage 2 tables, this
> > means the library now has to remember the in-use format on a
> > per-page-table basis. To do this, I had to rework some functions
> > to take a `struct kvm_pgtable *` parameter, and as a result, there
> > is a noisy patch to add this parameter.
> 
> Mismatch between the translation stages is an interesting problem...
> 
> Given that userspace is responsible for setting up the IPA space, I
> can't really think of a strong use case for 52 bit IPAs with a 48 bit
> VA. Sure, the VMM could construct a sparse IPA space or remap the same
> HVA at multiple IPAs to artificially saturate the address space, but
> neither seems terribly compelling.
> 
> Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> its guest.
> 
> Marc, is there any real reason for this or is it just a byproduct of how
> LPA support was added to KVM?

My recollection is hazy, but LPA came first, and LVA only landed much
later (because the two features were made independent in the
architecture, something that was later abandoned for LPA2, which
implies large VAs as well).

So yes, the VMM can place memory wherever it wants in the 52bit IPA
space, even if its own VA space is limited to 48 bits. And it doesn't
have to be memory, by the way. You could place all the emulated MMIO
above the 48bit limit, for example, and that doesn't require any trick
other than the HW supporting 52bit PAs, and VTCR_EL2 being correctly
configured.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
  2022-12-15  9:33     ` Ryan Roberts
  (?)
@ 2022-12-15 18:12       ` Oliver Upton
  -1 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-15 18:12 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

On Thu, Dec 15, 2022 at 09:33:17AM +0000, Ryan Roberts wrote:
> On 15/12/2022 00:52, Oliver Upton wrote:
> > On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> >> (appologies, I'm resending this series as I managed to send the cover letter to
> >> all but the following patches only to myself on first attempt).
> >>
> >> This is my first upstream feature submission so please go easy ;-)
> > 
> > Welcome :)
> > 
> >> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
> >> HW advertises support for LPA2 independently for stage 1 and stage 2, and
> >> therefore its possible to have it for one and not the other. I've assumed that
> >> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
> >> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> >> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
> >> of this independence and the fact that the kvm pgtable library is used for both
> >> stage 1 and stage 2 tables, this means the library now has to remember the
> >> in-use format on a per-page-table basis. To do this, I had to rework some
> >> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
> >> a noisy patch to add this parameter.
> > 
> > Mismatch between the translation stages is an interesting problem...
> > 
> > Given that userspace is responsible for setting up the IPA space, I
> > can't really think of a strong use case for 52 bit IPAs with a 48 bit
> > VA. Sure, the VMM could construct a sparse IPA space or remap the same
> > HVA at multiple IPAs to artificially saturate the address space, but
> > neither seems terribly compelling.
> > 
> > Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> > !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> > its guest.
> 
> I guess a simpler approach would be to only use LPA2 if its supported by both
> stage1 and stage2. Then the code could just use a static key in the few required
> places.

Ah, you caught on quick to what I was thinking :-)

What I'm groaning about in particular is the changes to the TLB
invalidation path, as it feels like a static key is warranted there.
Nonetheless, it is all a bit of a mess depending on LPA2 support in both
the kernel and KVM.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2022-12-15 18:12       ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-15 18:12 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

On Thu, Dec 15, 2022 at 09:33:17AM +0000, Ryan Roberts wrote:
> On 15/12/2022 00:52, Oliver Upton wrote:
> > On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> >> (appologies, I'm resending this series as I managed to send the cover letter to
> >> all but the following patches only to myself on first attempt).
> >>
> >> This is my first upstream feature submission so please go easy ;-)
> > 
> > Welcome :)
> > 
> >> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
> >> HW advertises support for LPA2 independently for stage 1 and stage 2, and
> >> therefore its possible to have it for one and not the other. I've assumed that
> >> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
> >> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> >> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
> >> of this independence and the fact that the kvm pgtable library is used for both
> >> stage 1 and stage 2 tables, this means the library now has to remember the
> >> in-use format on a per-page-table basis. To do this, I had to rework some
> >> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
> >> a noisy patch to add this parameter.
> > 
> > Mismatch between the translation stages is an interesting problem...
> > 
> > Given that userspace is responsible for setting up the IPA space, I
> > can't really think of a strong use case for 52 bit IPAs with a 48 bit
> > VA. Sure, the VMM could construct a sparse IPA space or remap the same
> > HVA at multiple IPAs to artificially saturate the address space, but
> > neither seems terribly compelling.
> > 
> > Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> > !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> > its guest.
> 
> I guess a simpler approach would be to only use LPA2 if its supported by both
> stage1 and stage2. Then the code could just use a static key in the few required
> places.

Ah, you caught on quick to what I was thinking :-)

What I'm groaning about in particular is the changes to the TLB
invalidation path, as it feels like a static key is warranted there.
Nonetheless, it is all a bit of a mess depending on LPA2 support in both
the kernel and KVM.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2022-12-15 18:12       ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-15 18:12 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

On Thu, Dec 15, 2022 at 09:33:17AM +0000, Ryan Roberts wrote:
> On 15/12/2022 00:52, Oliver Upton wrote:
> > On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> >> (appologies, I'm resending this series as I managed to send the cover letter to
> >> all but the following patches only to myself on first attempt).
> >>
> >> This is my first upstream feature submission so please go easy ;-)
> > 
> > Welcome :)
> > 
> >> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
> >> HW advertises support for LPA2 independently for stage 1 and stage 2, and
> >> therefore its possible to have it for one and not the other. I've assumed that
> >> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
> >> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> >> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
> >> of this independence and the fact that the kvm pgtable library is used for both
> >> stage 1 and stage 2 tables, this means the library now has to remember the
> >> in-use format on a per-page-table basis. To do this, I had to rework some
> >> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
> >> a noisy patch to add this parameter.
> > 
> > Mismatch between the translation stages is an interesting problem...
> > 
> > Given that userspace is responsible for setting up the IPA space, I
> > can't really think of a strong use case for 52 bit IPAs with a 48 bit
> > VA. Sure, the VMM could construct a sparse IPA space or remap the same
> > HVA at multiple IPAs to artificially saturate the address space, but
> > neither seems terribly compelling.
> > 
> > Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> > !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> > its guest.
> 
> I guess a simpler approach would be to only use LPA2 if its supported by both
> stage1 and stage2. Then the code could just use a static key in the few required
> places.

Ah, you caught on quick to what I was thinking :-)

What I'm groaning about in particular is the changes to the TLB
invalidation path, as it feels like a static key is warranted there.
Nonetheless, it is all a bit of a mess depending on LPA2 support in both
the kernel and KVM.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 05/12] KVM: arm64: Maintain page-table format info in struct kvm_pgtable
  2022-12-06 13:59   ` Ryan Roberts
  (?)
@ 2022-12-19 19:45     ` Oliver Upton
  -1 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-19 19:45 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

Hi Ryan,

On Tue, Dec 06, 2022 at 01:59:23PM +0000, Ryan Roberts wrote:
> As the next step on the journey to supporting FEAT_LPA2 in KVM, add a
> flag to struct kvm_pgtable, which functions can then use to select the
> approprate behavior for either the `classic` or `lpa2` page-table
> formats. For now, all page-tables remain in the `classic` format.
> 
> No functional changes are intended.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 2 ++
>  arch/arm64/kvm/hyp/pgtable.c         | 2 ++
>  arch/arm64/kvm/mmu.c                 | 1 +
>  3 files changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 2247ed74871a..744e224d964b 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -157,6 +157,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
>   * @start_level:	Level at which the page-table walk starts.
>   * @pgd:		Pointer to the first top-level entry of the page-table.
>   * @mm_ops:		Memory management callbacks.
> + * @lpa2_ena:		Format used for page-table; false->classic, true->lpa2.

I'd prefer that we describe the paging structure purely in terms of
input and output address. If you add the latter it should be possible to
decide if LPA2 is actually in use.

(i.e. PAGE_SIZE != SZ_64K && pgt->oa_bits > 48)

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 05/12] KVM: arm64: Maintain page-table format info in struct kvm_pgtable
@ 2022-12-19 19:45     ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-19 19:45 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

Hi Ryan,

On Tue, Dec 06, 2022 at 01:59:23PM +0000, Ryan Roberts wrote:
> As the next step on the journey to supporting FEAT_LPA2 in KVM, add a
> flag to struct kvm_pgtable, which functions can then use to select the
> approprate behavior for either the `classic` or `lpa2` page-table
> formats. For now, all page-tables remain in the `classic` format.
> 
> No functional changes are intended.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 2 ++
>  arch/arm64/kvm/hyp/pgtable.c         | 2 ++
>  arch/arm64/kvm/mmu.c                 | 1 +
>  3 files changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 2247ed74871a..744e224d964b 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -157,6 +157,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
>   * @start_level:	Level at which the page-table walk starts.
>   * @pgd:		Pointer to the first top-level entry of the page-table.
>   * @mm_ops:		Memory management callbacks.
> + * @lpa2_ena:		Format used for page-table; false->classic, true->lpa2.

I'd prefer that we describe the paging structure purely in terms of
input and output address. If you add the latter it should be possible to
decide if LPA2 is actually in use.

(i.e. PAGE_SIZE != SZ_64K && pgt->oa_bits > 48)

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 05/12] KVM: arm64: Maintain page-table format info in struct kvm_pgtable
@ 2022-12-19 19:45     ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-19 19:45 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

Hi Ryan,

On Tue, Dec 06, 2022 at 01:59:23PM +0000, Ryan Roberts wrote:
> As the next step on the journey to supporting FEAT_LPA2 in KVM, add a
> flag to struct kvm_pgtable, which functions can then use to select the
> approprate behavior for either the `classic` or `lpa2` page-table
> formats. For now, all page-tables remain in the `classic` format.
> 
> No functional changes are intended.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 2 ++
>  arch/arm64/kvm/hyp/pgtable.c         | 2 ++
>  arch/arm64/kvm/mmu.c                 | 1 +
>  3 files changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 2247ed74871a..744e224d964b 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -157,6 +157,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
>   * @start_level:	Level at which the page-table walk starts.
>   * @pgd:		Pointer to the first top-level entry of the page-table.
>   * @mm_ops:		Memory management callbacks.
> + * @lpa2_ena:		Format used for page-table; false->classic, true->lpa2.

I'd prefer that we describe the paging structure purely in terms of
input and output address. If you add the latter it should be possible to
decide if LPA2 is actually in use.

(i.e. PAGE_SIZE != SZ_64K && pgt->oa_bits > 48)

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
  2022-12-06 13:59   ` Ryan Roberts
  (?)
@ 2022-12-20  0:06     ` Oliver Upton
  -1 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-20  0:06 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

Hi Ryan,

On Tue, Dec 06, 2022 at 01:59:28PM +0000, Ryan Roberts wrote:
> In order to support 5 level translation, FEAT_LPA2 introduces the 1-bit
> SL2 field within VTCR_EL2 to extend the existing 2-bit SL0 field. The
> SL2[0]:SL0[1:0] encodings have no simple algorithmic relationship to the
> start levels they represent (that I can find, at least), so replace the
> existing macros with functions that do lookups to encode and decode the
> values. These new functions no longer make hardcoded assumptions about
> the maximum level and instead rely on KVM_PGTABLE_FIRST_LEVEL and
> KVM_PGTABLE_LAST_LEVEL.
> 
> This is preparatory work for enabling 52-bit IPA for 4KB and 16KB pages
> with FEAT_LPA2.
> 
> No functional change intended.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>

Why do we need to support 5-level paging at stage-2?

A configuration of start_level = 0, T0SZ = 12 with 4K paging would
result in 16 concatenated tables at level 0, avoiding the level -1
lookup altogether.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
@ 2022-12-20  0:06     ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-20  0:06 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

Hi Ryan,

On Tue, Dec 06, 2022 at 01:59:28PM +0000, Ryan Roberts wrote:
> In order to support 5 level translation, FEAT_LPA2 introduces the 1-bit
> SL2 field within VTCR_EL2 to extend the existing 2-bit SL0 field. The
> SL2[0]:SL0[1:0] encodings have no simple algorithmic relationship to the
> start levels they represent (that I can find, at least), so replace the
> existing macros with functions that do lookups to encode and decode the
> values. These new functions no longer make hardcoded assumptions about
> the maximum level and instead rely on KVM_PGTABLE_FIRST_LEVEL and
> KVM_PGTABLE_LAST_LEVEL.
> 
> This is preparatory work for enabling 52-bit IPA for 4KB and 16KB pages
> with FEAT_LPA2.
> 
> No functional change intended.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>

Why do we need to support 5-level paging at stage-2?

A configuration of start_level = 0, T0SZ = 12 with 4K paging would
result in 16 concatenated tables at level 0, avoiding the level -1
lookup altogether.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
@ 2022-12-20  0:06     ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-20  0:06 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

Hi Ryan,

On Tue, Dec 06, 2022 at 01:59:28PM +0000, Ryan Roberts wrote:
> In order to support 5 level translation, FEAT_LPA2 introduces the 1-bit
> SL2 field within VTCR_EL2 to extend the existing 2-bit SL0 field. The
> SL2[0]:SL0[1:0] encodings have no simple algorithmic relationship to the
> start levels they represent (that I can find, at least), so replace the
> existing macros with functions that do lookups to encode and decode the
> values. These new functions no longer make hardcoded assumptions about
> the maximum level and instead rely on KVM_PGTABLE_FIRST_LEVEL and
> KVM_PGTABLE_LAST_LEVEL.
> 
> This is preparatory work for enabling 52-bit IPA for 4KB and 16KB pages
> with FEAT_LPA2.
> 
> No functional change intended.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>

Why do we need to support 5-level paging at stage-2?

A configuration of start_level = 0, T0SZ = 12 with 4K paging would
result in 16 concatenated tables at level 0, avoiding the level -1
lookup altogether.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
  2022-12-20  0:06     ` Oliver Upton
  (?)
@ 2022-12-20  9:01       ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-20  9:01 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

On 20/12/2022 00:06, Oliver Upton wrote:
> Hi Ryan,
> 
> On Tue, Dec 06, 2022 at 01:59:28PM +0000, Ryan Roberts wrote:
>> In order to support 5 level translation, FEAT_LPA2 introduces the 1-bit
>> SL2 field within VTCR_EL2 to extend the existing 2-bit SL0 field. The
>> SL2[0]:SL0[1:0] encodings have no simple algorithmic relationship to the
>> start levels they represent (that I can find, at least), so replace the
>> existing macros with functions that do lookups to encode and decode the
>> values. These new functions no longer make hardcoded assumptions about
>> the maximum level and instead rely on KVM_PGTABLE_FIRST_LEVEL and
>> KVM_PGTABLE_LAST_LEVEL.
>>
>> This is preparatory work for enabling 52-bit IPA for 4KB and 16KB pages
>> with FEAT_LPA2.
>>
>> No functional change intended.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> 
> Why do we need to support 5-level paging at stage-2?
> 
> A configuration of start_level = 0, T0SZ = 12 with 4K paging would
> result in 16 concatenated tables at level 0, avoiding the level -1
> lookup altogether.

Yes, agreed. And that's exactly what the code does. So we could remove this
patch from the series and everything would continue to function correctly. But I
was trying to make things more consistent and maintainable (this now works in
terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL for example).

That said, I haven't exactly been consistent in my refactoring; patch 11 just
adds a comment to kvm_vcpu_trap_get_fault_level() explaining that the new -1
level encodings will never be seen due to stage2 never using 5 levels of
translation.

So happy to remove this and replace with a comment describing the limitations if
that's your preference?

> 
> --
> Thanks,
> Oliver

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
@ 2022-12-20  9:01       ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-20  9:01 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

On 20/12/2022 00:06, Oliver Upton wrote:
> Hi Ryan,
> 
> On Tue, Dec 06, 2022 at 01:59:28PM +0000, Ryan Roberts wrote:
>> In order to support 5 level translation, FEAT_LPA2 introduces the 1-bit
>> SL2 field within VTCR_EL2 to extend the existing 2-bit SL0 field. The
>> SL2[0]:SL0[1:0] encodings have no simple algorithmic relationship to the
>> start levels they represent (that I can find, at least), so replace the
>> existing macros with functions that do lookups to encode and decode the
>> values. These new functions no longer make hardcoded assumptions about
>> the maximum level and instead rely on KVM_PGTABLE_FIRST_LEVEL and
>> KVM_PGTABLE_LAST_LEVEL.
>>
>> This is preparatory work for enabling 52-bit IPA for 4KB and 16KB pages
>> with FEAT_LPA2.
>>
>> No functional change intended.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> 
> Why do we need to support 5-level paging at stage-2?
> 
> A configuration of start_level = 0, T0SZ = 12 with 4K paging would
> result in 16 concatenated tables at level 0, avoiding the level -1
> lookup altogether.

Yes, agreed. And that's exactly what the code does. So we could remove this
patch from the series and everything would continue to function correctly. But I
was trying to make things more consistent and maintainable (this now works in
terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL for example).

That said, I haven't exactly been consistent in my refactoring; patch 11 just
adds a comment to kvm_vcpu_trap_get_fault_level() explaining that the new -1
level encodings will never be seen due to stage2 never using 5 levels of
translation.

So happy to remove this and replace with a comment describing the limitations if
that's your preference?

> 
> --
> Thanks,
> Oliver


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
@ 2022-12-20  9:01       ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2022-12-20  9:01 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

On 20/12/2022 00:06, Oliver Upton wrote:
> Hi Ryan,
> 
> On Tue, Dec 06, 2022 at 01:59:28PM +0000, Ryan Roberts wrote:
>> In order to support 5 level translation, FEAT_LPA2 introduces the 1-bit
>> SL2 field within VTCR_EL2 to extend the existing 2-bit SL0 field. The
>> SL2[0]:SL0[1:0] encodings have no simple algorithmic relationship to the
>> start levels they represent (that I can find, at least), so replace the
>> existing macros with functions that do lookups to encode and decode the
>> values. These new functions no longer make hardcoded assumptions about
>> the maximum level and instead rely on KVM_PGTABLE_FIRST_LEVEL and
>> KVM_PGTABLE_LAST_LEVEL.
>>
>> This is preparatory work for enabling 52-bit IPA for 4KB and 16KB pages
>> with FEAT_LPA2.
>>
>> No functional change intended.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> 
> Why do we need to support 5-level paging at stage-2?
> 
> A configuration of start_level = 0, T0SZ = 12 with 4K paging would
> result in 16 concatenated tables at level 0, avoiding the level -1
> lookup altogether.

Yes, agreed. And that's exactly what the code does. So we could remove this
patch from the series and everything would continue to function correctly. But I
was trying to make things more consistent and maintainable (this now works in
terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL for example).

That said, I haven't exactly been consistent in my refactoring; patch 11 just
adds a comment to kvm_vcpu_trap_get_fault_level() explaining that the new -1
level encodings will never be seen due to stage2 never using 5 levels of
translation.

So happy to remove this and replace with a comment describing the limitations if
that's your preference?

> 
> --
> Thanks,
> Oliver


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
  2022-12-20  9:01       ` Ryan Roberts
  (?)
@ 2022-12-20 18:08         ` Oliver Upton
  -1 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-20 18:08 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

On Tue, Dec 20, 2022 at 09:01:19AM +0000, Ryan Roberts wrote:
> On 20/12/2022 00:06, Oliver Upton wrote:
> > Hi Ryan,
> > 
> > On Tue, Dec 06, 2022 at 01:59:28PM +0000, Ryan Roberts wrote:
> >> In order to support 5 level translation, FEAT_LPA2 introduces the 1-bit
> >> SL2 field within VTCR_EL2 to extend the existing 2-bit SL0 field. The
> >> SL2[0]:SL0[1:0] encodings have no simple algorithmic relationship to the
> >> start levels they represent (that I can find, at least), so replace the
> >> existing macros with functions that do lookups to encode and decode the
> >> values. These new functions no longer make hardcoded assumptions about
> >> the maximum level and instead rely on KVM_PGTABLE_FIRST_LEVEL and
> >> KVM_PGTABLE_LAST_LEVEL.
> >>
> >> This is preparatory work for enabling 52-bit IPA for 4KB and 16KB pages
> >> with FEAT_LPA2.
> >>
> >> No functional change intended.
> >>
> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> > 
> > Why do we need to support 5-level paging at stage-2?
> > 
> > A configuration of start_level = 0, T0SZ = 12 with 4K paging would
> > result in 16 concatenated tables at level 0, avoiding the level -1
> > lookup altogether.
> 
> Yes, agreed. And that's exactly what the code does. So we could remove this
> patch from the series and everything would continue to function correctly. But I
> was trying to make things more consistent and maintainable (this now works in
> terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL for example).

My largest concern was the plumbing that was added for setting a start
level of -1 that is effectively dead code. I worry about it because it
can be confusing for newcomers and can be an open invitation to mess
things up later down the line.

> So happy to remove this and replace with a comment describing the limitations if
> that's your preference?

Marc, feel free to put me in line here if I'm not thinking about this
right, but adding support for an unused feature is likely less
maintainable. So, I'd prefer we drop the patch.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
@ 2022-12-20 18:08         ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-20 18:08 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

On Tue, Dec 20, 2022 at 09:01:19AM +0000, Ryan Roberts wrote:
> On 20/12/2022 00:06, Oliver Upton wrote:
> > Hi Ryan,
> > 
> > On Tue, Dec 06, 2022 at 01:59:28PM +0000, Ryan Roberts wrote:
> >> In order to support 5 level translation, FEAT_LPA2 introduces the 1-bit
> >> SL2 field within VTCR_EL2 to extend the existing 2-bit SL0 field. The
> >> SL2[0]:SL0[1:0] encodings have no simple algorithmic relationship to the
> >> start levels they represent (that I can find, at least), so replace the
> >> existing macros with functions that do lookups to encode and decode the
> >> values. These new functions no longer make hardcoded assumptions about
> >> the maximum level and instead rely on KVM_PGTABLE_FIRST_LEVEL and
> >> KVM_PGTABLE_LAST_LEVEL.
> >>
> >> This is preparatory work for enabling 52-bit IPA for 4KB and 16KB pages
> >> with FEAT_LPA2.
> >>
> >> No functional change intended.
> >>
> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> > 
> > Why do we need to support 5-level paging at stage-2?
> > 
> > A configuration of start_level = 0, T0SZ = 12 with 4K paging would
> > result in 16 concatenated tables at level 0, avoiding the level -1
> > lookup altogether.
> 
> Yes, agreed. And that's exactly what the code does. So we could remove this
> patch from the series and everything would continue to function correctly. But I
> was trying to make things more consistent and maintainable (this now works in
> terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL for example).

My largest concern was the plumbing that was added for setting a start
level of -1 that is effectively dead code. I worry about it because it
can be confusing for newcomers and can be an open invitation to mess
things up later down the line.

> So happy to remove this and replace with a comment describing the limitations if
> that's your preference?

Marc, feel free to put me in line here if I'm not thinking about this
right, but adding support for an unused feature is likely less
maintainable. So, I'd prefer we drop the patch.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
@ 2022-12-20 18:08         ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-20 18:08 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	Suzuki K Poulose, Anshuman Khandual, James Morse,
	Alexandru Elisei, linux-arm-kernel, kvmarm, kvmarm

On Tue, Dec 20, 2022 at 09:01:19AM +0000, Ryan Roberts wrote:
> On 20/12/2022 00:06, Oliver Upton wrote:
> > Hi Ryan,
> > 
> > On Tue, Dec 06, 2022 at 01:59:28PM +0000, Ryan Roberts wrote:
> >> In order to support 5 level translation, FEAT_LPA2 introduces the 1-bit
> >> SL2 field within VTCR_EL2 to extend the existing 2-bit SL0 field. The
> >> SL2[0]:SL0[1:0] encodings have no simple algorithmic relationship to the
> >> start levels they represent (that I can find, at least), so replace the
> >> existing macros with functions that do lookups to encode and decode the
> >> values. These new functions no longer make hardcoded assumptions about
> >> the maximum level and instead rely on KVM_PGTABLE_FIRST_LEVEL and
> >> KVM_PGTABLE_LAST_LEVEL.
> >>
> >> This is preparatory work for enabling 52-bit IPA for 4KB and 16KB pages
> >> with FEAT_LPA2.
> >>
> >> No functional change intended.
> >>
> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> > 
> > Why do we need to support 5-level paging at stage-2?
> > 
> > A configuration of start_level = 0, T0SZ = 12 with 4K paging would
> > result in 16 concatenated tables at level 0, avoiding the level -1
> > lookup altogether.
> 
> Yes, agreed. And that's exactly what the code does. So we could remove this
> patch from the series and everything would continue to function correctly. But I
> was trying to make things more consistent and maintainable (this now works in
> terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL for example).

My largest concern was the plumbing that was added for setting a start
level of -1 that is effectively dead code. I worry about it because it
can be confusing for newcomers and can be an open invitation to mess
things up later down the line.

> So happy to remove this and replace with a comment describing the limitations if
> that's your preference?

Marc, feel free to put me in line here if I'm not thinking about this
right, but adding support for an unused feature is likely less
maintainable. So, I'd prefer we drop the patch.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
  2022-12-15 18:12       ` Oliver Upton
  (?)
@ 2022-12-20 18:28         ` Oliver Upton
  -1 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-20 18:28 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Anshuman Khandual, Marc Zyngier, Catalin Marinas, kvmarm,
	Will Deacon, kvmarm, linux-arm-kernel

On Thu, Dec 15, 2022 at 06:12:14PM +0000, Oliver Upton wrote:
> On Thu, Dec 15, 2022 at 09:33:17AM +0000, Ryan Roberts wrote:
> > On 15/12/2022 00:52, Oliver Upton wrote:
> > > On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> > >> (appologies, I'm resending this series as I managed to send the cover letter to
> > >> all but the following patches only to myself on first attempt).
> > >>
> > >> This is my first upstream feature submission so please go easy ;-)
> > > 
> > > Welcome :)
> > > 
> > >> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
> > >> HW advertises support for LPA2 independently for stage 1 and stage 2, and
> > >> therefore its possible to have it for one and not the other. I've assumed that
> > >> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
> > >> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> > >> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
> > >> of this independence and the fact that the kvm pgtable library is used for both
> > >> stage 1 and stage 2 tables, this means the library now has to remember the
> > >> in-use format on a per-page-table basis. To do this, I had to rework some
> > >> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
> > >> a noisy patch to add this parameter.
> > > 
> > > Mismatch between the translation stages is an interesting problem...
> > > 
> > > Given that userspace is responsible for setting up the IPA space, I
> > > can't really think of a strong use case for 52 bit IPAs with a 48 bit
> > > VA. Sure, the VMM could construct a sparse IPA space or remap the same
> > > HVA at multiple IPAs to artificially saturate the address space, but
> > > neither seems terribly compelling.
> > > 
> > > Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> > > !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> > > its guest.
> > 
> > I guess a simpler approach would be to only use LPA2 if its supported by both
> > stage1 and stage2. Then the code could just use a static key in the few required
> > places.
> 
> Ah, you caught on quick to what I was thinking :-)

Just wanted to revisit this...

Ryan, you say that it is possible for hardware to support LPA2 for a
single stage of translation. Are you basing that statement on something
in the Arm ARM or the fact that there are two different enumerations
for stage-1 and stage-2?

In my cursory search I wasn't able to find anything that would suggest
it is possible for only a single stage to implement the feature. The one
possibility I can think of is the NV case, where the L0 hypervisor for
some reason does not support LPA2 in its emulated stage-2 but still
advertises support for LPA2 at stage-1. I'd say that's quite a stupid
L0, but I should really hold my tongue until KVM actually does NV ;-)

I want to make sure there is a strong sense of what LPA2 means in terms
of the architecture to inform how we use it in KVM.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2022-12-20 18:28         ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-20 18:28 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

On Thu, Dec 15, 2022 at 06:12:14PM +0000, Oliver Upton wrote:
> On Thu, Dec 15, 2022 at 09:33:17AM +0000, Ryan Roberts wrote:
> > On 15/12/2022 00:52, Oliver Upton wrote:
> > > On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> > >> (appologies, I'm resending this series as I managed to send the cover letter to
> > >> all but the following patches only to myself on first attempt).
> > >>
> > >> This is my first upstream feature submission so please go easy ;-)
> > > 
> > > Welcome :)
> > > 
> > >> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
> > >> HW advertises support for LPA2 independently for stage 1 and stage 2, and
> > >> therefore its possible to have it for one and not the other. I've assumed that
> > >> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
> > >> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> > >> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
> > >> of this independence and the fact that the kvm pgtable library is used for both
> > >> stage 1 and stage 2 tables, this means the library now has to remember the
> > >> in-use format on a per-page-table basis. To do this, I had to rework some
> > >> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
> > >> a noisy patch to add this parameter.
> > > 
> > > Mismatch between the translation stages is an interesting problem...
> > > 
> > > Given that userspace is responsible for setting up the IPA space, I
> > > can't really think of a strong use case for 52 bit IPAs with a 48 bit
> > > VA. Sure, the VMM could construct a sparse IPA space or remap the same
> > > HVA at multiple IPAs to artificially saturate the address space, but
> > > neither seems terribly compelling.
> > > 
> > > Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> > > !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> > > its guest.
> > 
> > I guess a simpler approach would be to only use LPA2 if its supported by both
> > stage1 and stage2. Then the code could just use a static key in the few required
> > places.
> 
> Ah, you caught on quick to what I was thinking :-)

Just wanted to revisit this...

Ryan, you say that it is possible for hardware to support LPA2 for a
single stage of translation. Are you basing that statement on something
in the Arm ARM or the fact that there are two different enumerations
for stage-1 and stage-2?

In my cursory search I wasn't able to find anything that would suggest
it is possible for only a single stage to implement the feature. The one
possibility I can think of is the NV case, where the L0 hypervisor for
some reason does not support LPA2 in its emulated stage-2 but still
advertises support for LPA2 at stage-1. I'd say that's quite a stupid
L0, but I should really hold my tongue until KVM actually does NV ;-)

I want to make sure there is a strong sense of what LPA2 means in terms
of the architecture to inform how we use it in KVM.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2022-12-20 18:28         ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2022-12-20 18:28 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

On Thu, Dec 15, 2022 at 06:12:14PM +0000, Oliver Upton wrote:
> On Thu, Dec 15, 2022 at 09:33:17AM +0000, Ryan Roberts wrote:
> > On 15/12/2022 00:52, Oliver Upton wrote:
> > > On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> > >> (appologies, I'm resending this series as I managed to send the cover letter to
> > >> all but the following patches only to myself on first attempt).
> > >>
> > >> This is my first upstream feature submission so please go easy ;-)
> > > 
> > > Welcome :)
> > > 
> > >> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
> > >> HW advertises support for LPA2 independently for stage 1 and stage 2, and
> > >> therefore its possible to have it for one and not the other. I've assumed that
> > >> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
> > >> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> > >> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
> > >> of this independence and the fact that the kvm pgtable library is used for both
> > >> stage 1 and stage 2 tables, this means the library now has to remember the
> > >> in-use format on a per-page-table basis. To do this, I had to rework some
> > >> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
> > >> a noisy patch to add this parameter.
> > > 
> > > Mismatch between the translation stages is an interesting problem...
> > > 
> > > Given that userspace is responsible for setting up the IPA space, I
> > > can't really think of a strong use case for 52 bit IPAs with a 48 bit
> > > VA. Sure, the VMM could construct a sparse IPA space or remap the same
> > > HVA at multiple IPAs to artificially saturate the address space, but
> > > neither seems terribly compelling.
> > > 
> > > Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> > > !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> > > its guest.
> > 
> > I guess a simpler approach would be to only use LPA2 if its supported by both
> > stage1 and stage2. Then the code could just use a static key in the few required
> > places.
> 
> Ah, you caught on quick to what I was thinking :-)

Just wanted to revisit this...

Ryan, you say that it is possible for hardware to support LPA2 for a
single stage of translation. Are you basing that statement on something
in the Arm ARM or the fact that there are two different enumerations
for stage-1 and stage-2?

In my cursory search I wasn't able to find anything that would suggest
it is possible for only a single stage to implement the feature. The one
possibility I can think of is the NV case, where the L0 hypervisor for
some reason does not support LPA2 in its emulated stage-2 but still
advertises support for LPA2 at stage-1. I'd say that's quite a stupid
L0, but I should really hold my tongue until KVM actually does NV ;-)

I want to make sure there is a strong sense of what LPA2 means in terms
of the architecture to inform how we use it in KVM.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
  2022-12-20 18:28         ` Oliver Upton
@ 2023-02-20 14:17           ` Ryan Roberts
  -1 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2023-02-20 14:17 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

Hi Oliver,

Apologies for having gone quiet on this. I came back to this work today only to
notice that you sent the below response on the 20th Dec but it did not get
picked up by my mail client somehow (although I'm sure it was operator error). I
just spotted it on lore.kernel.org.

I'm planning to post a second version soon-ish, with all your comments
addressed. I think everything except the below is pretty clear and straight forward.


On 20/12/2022 18:28, Oliver Upton wrote:
> On Thu, Dec 15, 2022 at 06:12:14PM +0000, Oliver Upton wrote:
>> On Thu, Dec 15, 2022 at 09:33:17AM +0000, Ryan Roberts wrote:
>>> On 15/12/2022 00:52, Oliver Upton wrote:
>>>> On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
>>>>> (appologies, I'm resending this series as I managed to send the cover letter to
>>>>> all but the following patches only to myself on first attempt).
>>>>>
>>>>> This is my first upstream feature submission so please go easy ;-)
>>>>
>>>> Welcome :)
>>>>
>>>>> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
>>>>> HW advertises support for LPA2 independently for stage 1 and stage 2, and
>>>>> therefore its possible to have it for one and not the other. I've assumed that
>>>>> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
>>>>> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
>>>>> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
>>>>> of this independence and the fact that the kvm pgtable library is used for both
>>>>> stage 1 and stage 2 tables, this means the library now has to remember the
>>>>> in-use format on a per-page-table basis. To do this, I had to rework some
>>>>> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
>>>>> a noisy patch to add this parameter.
>>>>
>>>> Mismatch between the translation stages is an interesting problem...
>>>>
>>>> Given that userspace is responsible for setting up the IPA space, I
>>>> can't really think of a strong use case for 52 bit IPAs with a 48 bit
>>>> VA. Sure, the VMM could construct a sparse IPA space or remap the same
>>>> HVA at multiple IPAs to artificially saturate the address space, but
>>>> neither seems terribly compelling.
>>>>
>>>> Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
>>>> !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
>>>> its guest.
>>>
>>> I guess a simpler approach would be to only use LPA2 if its supported by both
>>> stage1 and stage2. Then the code could just use a static key in the few required
>>> places.
>>
>> Ah, you caught on quick to what I was thinking :-)
> 
> Just wanted to revisit this...
> 
> Ryan, you say that it is possible for hardware to support LPA2 for a
> single stage of translation. Are you basing that statement on something
> in the Arm ARM or the fact that there are two different enumerations
> for stage-1 and stage-2?

Its based on there being 2 separate enumerations. I've dug into this with our
architecture folks; while it is clearly possible that the HW (or L0 hyp) to
present an ID register that says one stage supports LPA2 and the other doesn't,
the real intention behind having the 2 fields separated out is for an L0 hyp to
be able to limit the stage2 granule sizes that it advertises to guest
hypervisors. There are no anticipated use cases where HW or L0 hypervisor might
want to advertise support for LPA2 in one stage and not the other.

So on that basis, it sounds to me like we should just test for LPA2 support in
both stages and require both to be supported. That simplifies things
significantly - I can just use a static key to globally flip between pte
formats, and a bunch of the noisy refactoring disappears.

> 
> In my cursory search I wasn't able to find anything that would suggest
> it is possible for only a single stage to implement the feature. The one
> possibility I can think of is the NV case, where the L0 hypervisor for
> some reason does not support LPA2 in its emulated stage-2 but still
> advertises support for LPA2 at stage-1. I'd say that's quite a stupid
> L0, but I should really hold my tongue until KVM actually does NV ;-)
> 
> I want to make sure there is a strong sense of what LPA2 means in terms
> of the architecture to inform how we use it in KVM.
> 
> --
> Thanks,
> Oliver
> 

Thanks,
Ryan



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2023-02-20 14:17           ` Ryan Roberts
  0 siblings, 0 replies; 78+ messages in thread
From: Ryan Roberts @ 2023-02-20 14:17 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

Hi Oliver,

Apologies for having gone quiet on this. I came back to this work today only to
notice that you sent the below response on the 20th Dec but it did not get
picked up by my mail client somehow (although I'm sure it was operator error). I
just spotted it on lore.kernel.org.

I'm planning to post a second version soon-ish, with all your comments
addressed. I think everything except the below is pretty clear and straight forward.


On 20/12/2022 18:28, Oliver Upton wrote:
> On Thu, Dec 15, 2022 at 06:12:14PM +0000, Oliver Upton wrote:
>> On Thu, Dec 15, 2022 at 09:33:17AM +0000, Ryan Roberts wrote:
>>> On 15/12/2022 00:52, Oliver Upton wrote:
>>>> On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
>>>>> (appologies, I'm resending this series as I managed to send the cover letter to
>>>>> all but the following patches only to myself on first attempt).
>>>>>
>>>>> This is my first upstream feature submission so please go easy ;-)
>>>>
>>>> Welcome :)
>>>>
>>>>> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
>>>>> HW advertises support for LPA2 independently for stage 1 and stage 2, and
>>>>> therefore its possible to have it for one and not the other. I've assumed that
>>>>> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
>>>>> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
>>>>> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
>>>>> of this independence and the fact that the kvm pgtable library is used for both
>>>>> stage 1 and stage 2 tables, this means the library now has to remember the
>>>>> in-use format on a per-page-table basis. To do this, I had to rework some
>>>>> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
>>>>> a noisy patch to add this parameter.
>>>>
>>>> Mismatch between the translation stages is an interesting problem...
>>>>
>>>> Given that userspace is responsible for setting up the IPA space, I
>>>> can't really think of a strong use case for 52 bit IPAs with a 48 bit
>>>> VA. Sure, the VMM could construct a sparse IPA space or remap the same
>>>> HVA at multiple IPAs to artificially saturate the address space, but
>>>> neither seems terribly compelling.
>>>>
>>>> Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
>>>> !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
>>>> its guest.
>>>
>>> I guess a simpler approach would be to only use LPA2 if its supported by both
>>> stage1 and stage2. Then the code could just use a static key in the few required
>>> places.
>>
>> Ah, you caught on quick to what I was thinking :-)
> 
> Just wanted to revisit this...
> 
> Ryan, you say that it is possible for hardware to support LPA2 for a
> single stage of translation. Are you basing that statement on something
> in the Arm ARM or the fact that there are two different enumerations
> for stage-1 and stage-2?

Its based on there being 2 separate enumerations. I've dug into this with our
architecture folks; while it is clearly possible that the HW (or L0 hyp) to
present an ID register that says one stage supports LPA2 and the other doesn't,
the real intention behind having the 2 fields separated out is for an L0 hyp to
be able to limit the stage2 granule sizes that it advertises to guest
hypervisors. There are no anticipated use cases where HW or L0 hypervisor might
want to advertise support for LPA2 in one stage and not the other.

So on that basis, it sounds to me like we should just test for LPA2 support in
both stages and require both to be supported. That simplifies things
significantly - I can just use a static key to globally flip between pte
formats, and a bunch of the noisy refactoring disappears.

> 
> In my cursory search I wasn't able to find anything that would suggest
> it is possible for only a single stage to implement the feature. The one
> possibility I can think of is the NV case, where the L0 hypervisor for
> some reason does not support LPA2 in its emulated stage-2 but still
> advertises support for LPA2 at stage-1. I'd say that's quite a stupid
> L0, but I should really hold my tongue until KVM actually does NV ;-)
> 
> I want to make sure there is a strong sense of what LPA2 means in terms
> of the architecture to inform how we use it in KVM.
> 
> --
> Thanks,
> Oliver
> 

Thanks,
Ryan



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
  2023-02-20 14:17           ` Ryan Roberts
@ 2023-02-22 20:42             ` Oliver Upton
  -1 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2023-02-22 20:42 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

Hi Ryan,

On Mon, Feb 20, 2023 at 02:17:30PM +0000, Ryan Roberts wrote:
> Hi Oliver,
> 
> Apologies for having gone quiet on this. I came back to this work today only to
> notice that you sent the below response on the 20th Dec but it did not get
> picked up by my mail client somehow (although I'm sure it was operator error). I
> just spotted it on lore.kernel.org.

Huh, sounds like the arm mail server is not a fan of me... Alex reported
my messages arriving in spam as well. I'll let you decide what that
means about what I have to say :)

> I'm planning to post a second version soon-ish, with all your comments
> addressed. I think everything except the below is pretty clear and straight forward.

Great!

> On 20/12/2022 18:28, Oliver Upton wrote:
> > Ryan, you say that it is possible for hardware to support LPA2 for a
> > single stage of translation. Are you basing that statement on something
> > in the Arm ARM or the fact that there are two different enumerations
> > for stage-1 and stage-2?
> 
> Its based on there being 2 separate enumerations. I've dug into this with our
> architecture folks; while it is clearly possible that the HW (or L0 hyp) to
> present an ID register that says one stage supports LPA2 and the other doesn't,
> the real intention behind having the 2 fields separated out is for an L0 hyp to
> be able to limit the stage2 granule sizes that it advertises to guest
> hypervisors. There are no anticipated use cases where HW or L0 hypervisor might
> want to advertise support for LPA2 in one stage and not the other.

Yep, this is exactly what I was getting at. My impression of the stage-2
enumerations was that they solely exist for choking down the supported
granule size, I was quite surprised to see LPA2 show up in both fields
independently.

> So on that basis, it sounds to me like we should just test for LPA2 support in
> both stages and require both to be supported. That simplifies things
> significantly - I can just use a static key to globally flip between pte
> formats, and a bunch of the noisy refactoring disappears.

Whoever wants to take advantage of split support is welcome to share
their use case and upstream the patches. Otherwise, I think the simpler
approach to enlightening KVM of LPA2 reduces friction on actually
getting the initial enablement done.

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2023-02-22 20:42             ` Oliver Upton
  0 siblings, 0 replies; 78+ messages in thread
From: Oliver Upton @ 2023-02-22 20:42 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Marc Zyngier, Anshuman Khandual, kvmarm, Catalin Marinas, kvmarm,
	Will Deacon, linux-arm-kernel

Hi Ryan,

On Mon, Feb 20, 2023 at 02:17:30PM +0000, Ryan Roberts wrote:
> Hi Oliver,
> 
> Apologies for having gone quiet on this. I came back to this work today only to
> notice that you sent the below response on the 20th Dec but it did not get
> picked up by my mail client somehow (although I'm sure it was operator error). I
> just spotted it on lore.kernel.org.

Huh, sounds like the arm mail server is not a fan of me... Alex reported
my messages arriving in spam as well. I'll let you decide what that
means about what I have to say :)

> I'm planning to post a second version soon-ish, with all your comments
> addressed. I think everything except the below is pretty clear and straight forward.

Great!

> On 20/12/2022 18:28, Oliver Upton wrote:
> > Ryan, you say that it is possible for hardware to support LPA2 for a
> > single stage of translation. Are you basing that statement on something
> > in the Arm ARM or the fact that there are two different enumerations
> > for stage-1 and stage-2?
> 
> Its based on there being 2 separate enumerations. I've dug into this with our
> architecture folks; while it is clearly possible that the HW (or L0 hyp) to
> present an ID register that says one stage supports LPA2 and the other doesn't,
> the real intention behind having the 2 fields separated out is for an L0 hyp to
> be able to limit the stage2 granule sizes that it advertises to guest
> hypervisors. There are no anticipated use cases where HW or L0 hypervisor might
> want to advertise support for LPA2 in one stage and not the other.

Yep, this is exactly what I was getting at. My impression of the stage-2
enumerations was that they solely exist for choking down the supported
granule size, I was quite surprised to see LPA2 show up in both fields
independently.

> So on that basis, it sounds to me like we should just test for LPA2 support in
> both stages and require both to be supported. That simplifies things
> significantly - I can just use a static key to globally flip between pte
> formats, and a bunch of the noisy refactoring disappears.

Whoever wants to take advantage of split support is welcome to share
their use case and upstream the patches. Otherwise, I think the simpler
approach to enlightening KVM of LPA2 reduces friction on actually
getting the initial enablement done.

-- 
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
  2023-02-22 20:42             ` Oliver Upton
@ 2023-02-23  9:53               ` Catalin Marinas
  -1 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2023-02-23  9:53 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Ryan Roberts, Marc Zyngier, Anshuman Khandual, kvmarm, kvmarm,
	Will Deacon, linux-arm-kernel

On Wed, Feb 22, 2023 at 08:42:37PM +0000, Oliver Upton wrote:
> On Mon, Feb 20, 2023 at 02:17:30PM +0000, Ryan Roberts wrote:
> > Apologies for having gone quiet on this. I came back to this work today only to
> > notice that you sent the below response on the 20th Dec but it did not get
> > picked up by my mail client somehow (although I'm sure it was operator error). I
> > just spotted it on lore.kernel.org.
> 
> Huh, sounds like the arm mail server is not a fan of me... Alex reported
> my messages arriving in spam as well. I'll let you decide what that
> means about what I have to say :)

Nothing personal ;), for me most linux.dev emails ended up in the
outlook spam folder. For the benefit of the Arm people on this thread,
since I added linux.dev to outlook's safe senders list, I haven't seen
any of these emails in spam (well, until IT tweaks some filters again;
in the past I was struggling to get google.com emails and the safe
senders list did not make any difference).

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2023-02-23  9:53               ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2023-02-23  9:53 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Ryan Roberts, Marc Zyngier, Anshuman Khandual, kvmarm, kvmarm,
	Will Deacon, linux-arm-kernel

On Wed, Feb 22, 2023 at 08:42:37PM +0000, Oliver Upton wrote:
> On Mon, Feb 20, 2023 at 02:17:30PM +0000, Ryan Roberts wrote:
> > Apologies for having gone quiet on this. I came back to this work today only to
> > notice that you sent the below response on the 20th Dec but it did not get
> > picked up by my mail client somehow (although I'm sure it was operator error). I
> > just spotted it on lore.kernel.org.
> 
> Huh, sounds like the arm mail server is not a fan of me... Alex reported
> my messages arriving in spam as well. I'll let you decide what that
> means about what I have to say :)

Nothing personal ;), for me most linux.dev emails ended up in the
outlook spam folder. For the benefit of the Arm people on this thread,
since I added linux.dev to outlook's safe senders list, I haven't seen
any of these emails in spam (well, until IT tweaks some filters again;
in the past I was struggling to get google.com emails and the safe
senders list did not make any difference).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2023-02-23  9:54 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-06 13:59 [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2 Ryan Roberts
2022-12-06 13:59 ` Ryan Roberts
2022-12-06 13:59 ` Ryan Roberts
2022-12-06 13:59 ` [PATCH v1 01/12] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-14 19:16   ` Oliver Upton
2022-12-14 19:16     ` Oliver Upton
2022-12-14 19:16     ` Oliver Upton
2022-12-15  0:53     ` Oliver Upton
2022-12-15  0:53       ` Oliver Upton
2022-12-15  0:53       ` Oliver Upton
2022-12-06 13:59 ` [PATCH v1 02/12] arm64/mm: Update tlb invalidation routines for FEAT_LPA2 Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59 ` [PATCH v1 03/12] KVM: arm64: Add new (V)TCR_EL2 field definitions " Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59 ` [PATCH v1 04/12] KVM: arm64: Plumbing to enable multiple pgtable formats Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59 ` [PATCH v1 05/12] KVM: arm64: Maintain page-table format info in struct kvm_pgtable Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-19 19:45   ` Oliver Upton
2022-12-19 19:45     ` Oliver Upton
2022-12-19 19:45     ` Oliver Upton
2022-12-06 13:59 ` [PATCH v1 06/12] KVM: arm64: Use LPA2 page-tables for stage2 if HW supports it Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59 ` [PATCH v1 07/12] KVM: arm64: Use LPA2 page-tables for hyp stage1 " Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59 ` [PATCH v1 08/12] KVM: arm64: Insert PS field at TCR_EL2 assembly time Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59 ` [PATCH v1 09/12] KVM: arm64: Convert translation level parameter to s8 Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59 ` [PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-20  0:06   ` Oliver Upton
2022-12-20  0:06     ` Oliver Upton
2022-12-20  0:06     ` Oliver Upton
2022-12-20  9:01     ` Ryan Roberts
2022-12-20  9:01       ` Ryan Roberts
2022-12-20  9:01       ` Ryan Roberts
2022-12-20 18:08       ` Oliver Upton
2022-12-20 18:08         ` Oliver Upton
2022-12-20 18:08         ` Oliver Upton
2022-12-06 13:59 ` [PATCH v1 11/12] KVM: arm64: Support upto 5 levels of translation in kvm_pgtable Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59 ` [PATCH v1 12/12] KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systems Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-06 13:59   ` Ryan Roberts
2022-12-15  0:52 ` [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2 Oliver Upton
2022-12-15  0:52   ` Oliver Upton
2022-12-15  0:52   ` Oliver Upton
2022-12-15  9:33   ` Ryan Roberts
2022-12-15  9:33     ` Ryan Roberts
2022-12-15  9:33     ` Ryan Roberts
2022-12-15 18:12     ` Oliver Upton
2022-12-15 18:12       ` Oliver Upton
2022-12-15 18:12       ` Oliver Upton
2022-12-20 18:28       ` Oliver Upton
2022-12-20 18:28         ` Oliver Upton
2022-12-20 18:28         ` Oliver Upton
2023-02-20 14:17         ` Ryan Roberts
2023-02-20 14:17           ` Ryan Roberts
2023-02-22 20:42           ` Oliver Upton
2023-02-22 20:42             ` Oliver Upton
2023-02-23  9:53             ` Catalin Marinas
2023-02-23  9:53               ` Catalin Marinas
2022-12-15  9:35   ` Marc Zyngier
2022-12-15  9:35     ` Marc Zyngier
2022-12-15  9:35     ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.