linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Re: [v2] Support for Arm CCA VMs on Linux
  2024-04-12  8:40 [v2] Support for Arm CCA VMs on Linux Steven Price
@ 2024-04-11 18:54 ` Itaru Kitayama
  2024-04-15  8:14   ` Steven Price
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 104+ messages in thread
From: Itaru Kitayama @ 2024-04-11 18:54 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Hi Steven,

On Fri, Apr 12, 2024 at 09:40:56AM +0100, Steven Price wrote:
> We are happy to announce the second version of the Arm Confidential
> Compute Architecture (CCA) support for the Linux stack. The intention is
> to seek early feedback in the following areas:
>  * KVM integration of the Arm CCA;
>  * KVM UABI for managing the Realms, seeking to generalise the
>    operations where possible with other Confidential Compute solutions;
>  * Linux Guest support for Realms.
> 
> See the previous RFC[1] for a more detailed overview of Arm's CCA
> solution, or visible the Arm CCA Landing page[2].
> 
> This series is based on the final RMM v1.0 (EAC5) specification[3].

It's great to see the updated "V2" series. Since you said you like
"early" feedback on V2, does that mean it's likely to be followed by
V3 and V4, anticipating large code-base changes from the current form
(V2)? Do you have a rough timeframe to make this Arm CCA support landed
in mainline? Do you Arm folk expect this is going to be a multiple-year 
long project? 

Thanks,
Itaru.

> 
> Quick-start guide
> =================
> 
> The easiest way of getting started with the stack is by using
> Shrinkwrap[4]. Currently Shrinkwrap has a configuration for the initial
> v1.0-EAC5 release[5], so the following overlay needs to be applied to
> the standard 'cca-3world.yaml' file. Note that the 'rmm' component needs
> updating to 'main' because there are fixes that are needed and are not
> yet in a tagged release. The following will create an overlay file and
> build a working environment:
> 
> cat<<EOT >cca-v2.yaml
> build:
>   linux:
>     repo:
>       revision: cca-full/v2
>   kvmtool:
>     repo:
>       kvmtool:
>         revision: cca/v2
>   rmm:
>     repo:
>       revision: main
>   kvm-unit-tests:
>     repo:
>       revision: cca/v2
> EOT
> 
> shrinkwrap build cca-3world.yaml --overlay buildroot.yaml --btvar GUEST_ROOTFS='${artifact:BUILDROOT}' --overlay cca-v2.yaml
> 
> You will then want to modify the 'guest-disk.img' to include the files
> necessary for the realm guest (see the documentation in cca-3world.yaml
> for details of other options):
> 
>   cd ~/.shrinkwrap/package/cca-3world
>   /sbin/e2fsck -fp rootfs.ext2 
>   /sbin/resize2fs rootfs.ext2 256M
>   mkdir mnt
>   sudo mount rootfs.ext2 mnt/
>   sudo mkdir mnt/cca
>   sudo cp guest-disk.img KVMTOOL_EFI.fd lkvm Image mnt/cca/
>   sudo umount mnt 
>   rmdir mnt/
> 
> Finally you can run the FVP with the host:
> 
>   shrinkwrap run cca-3world.yaml --rtvar ROOTFS=$HOME/.shrinkwrap/package/cca-3world/rootfs.ext2
> 
> And once the host kernel has booted, login (user name 'root') and start
> a realm guest:
> 
>   cd /cca
>   ./lkvm run --realm --restricted_mem -c 2 -m 256 -k Image -p earlycon
> 
> Be patient and you should end up in a realm guest with the host's
> filesystem mounted via p9.
> 
> It's also possible to use EFI within the realm guest, again see
> cca-3world.yaml within Shrinkwrap for more details.
> 
> An branch of kvm-unit-tests including realm-specific tests is provided
> here:
>   https://gitlab.arm.com/linux-arm/kvm-unit-tests-cca/-/tree/cca/v2
> 
> [1] Previous RFC
>     https://lore.kernel.org/r/20230127112248.136810-1-suzuki.poulose%40arm.com
> [2] Arm CCA Landing page (See Key Resources section for various documentation)
>     https://www.arm.com/architecture/security-features/arm-confidential-compute-architecture
> [3] RMM v1.0-EAC5 specification
>     https://developer.arm.com/documentation/den0137/1-0eac5/
> [4] Shrinkwrap
>     https://git.gitlab.arm.com/tooling/shrinkwrap
> [5] Linux support for Arm CCA RMM v1.0-EAC5
>     https://lore.kernel.org/r/fb259449-026e-4083-a02b-f8a4ebea1f87%40arm.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [v2] Support for Arm CCA VMs on Linux
@ 2024-04-12  8:40 Steven Price
  2024-04-11 18:54 ` Itaru Kitayama
                   ` (3 more replies)
  0 siblings, 4 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:40 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

We are happy to announce the second version of the Arm Confidential
Compute Architecture (CCA) support for the Linux stack. The intention is
to seek early feedback in the following areas:
 * KVM integration of the Arm CCA;
 * KVM UABI for managing the Realms, seeking to generalise the
   operations where possible with other Confidential Compute solutions;
 * Linux Guest support for Realms.

See the previous RFC[1] for a more detailed overview of Arm's CCA
solution, or visible the Arm CCA Landing page[2].

This series is based on the final RMM v1.0 (EAC5) specification[3].

Quick-start guide
=================

The easiest way of getting started with the stack is by using
Shrinkwrap[4]. Currently Shrinkwrap has a configuration for the initial
v1.0-EAC5 release[5], so the following overlay needs to be applied to
the standard 'cca-3world.yaml' file. Note that the 'rmm' component needs
updating to 'main' because there are fixes that are needed and are not
yet in a tagged release. The following will create an overlay file and
build a working environment:

cat<<EOT >cca-v2.yaml
build:
  linux:
    repo:
      revision: cca-full/v2
  kvmtool:
    repo:
      kvmtool:
        revision: cca/v2
  rmm:
    repo:
      revision: main
  kvm-unit-tests:
    repo:
      revision: cca/v2
EOT

shrinkwrap build cca-3world.yaml --overlay buildroot.yaml --btvar GUEST_ROOTFS='${artifact:BUILDROOT}' --overlay cca-v2.yaml

You will then want to modify the 'guest-disk.img' to include the files
necessary for the realm guest (see the documentation in cca-3world.yaml
for details of other options):

  cd ~/.shrinkwrap/package/cca-3world
  /sbin/e2fsck -fp rootfs.ext2 
  /sbin/resize2fs rootfs.ext2 256M
  mkdir mnt
  sudo mount rootfs.ext2 mnt/
  sudo mkdir mnt/cca
  sudo cp guest-disk.img KVMTOOL_EFI.fd lkvm Image mnt/cca/
  sudo umount mnt 
  rmdir mnt/

Finally you can run the FVP with the host:

  shrinkwrap run cca-3world.yaml --rtvar ROOTFS=$HOME/.shrinkwrap/package/cca-3world/rootfs.ext2

And once the host kernel has booted, login (user name 'root') and start
a realm guest:

  cd /cca
  ./lkvm run --realm --restricted_mem -c 2 -m 256 -k Image -p earlycon

Be patient and you should end up in a realm guest with the host's
filesystem mounted via p9.

It's also possible to use EFI within the realm guest, again see
cca-3world.yaml within Shrinkwrap for more details.

An branch of kvm-unit-tests including realm-specific tests is provided
here:
  https://gitlab.arm.com/linux-arm/kvm-unit-tests-cca/-/tree/cca/v2

[1] Previous RFC
    https://lore.kernel.org/r/20230127112248.136810-1-suzuki.poulose%40arm.com
[2] Arm CCA Landing page (See Key Resources section for various documentation)
    https://www.arm.com/architecture/security-features/arm-confidential-compute-architecture
[3] RMM v1.0-EAC5 specification
    https://developer.arm.com/documentation/den0137/1-0eac5/
[4] Shrinkwrap
    https://git.gitlab.arm.com/tooling/shrinkwrap
[5] Linux support for Arm CCA RMM v1.0-EAC5
    https://lore.kernel.org/r/fb259449-026e-4083-a02b-f8a4ebea1f87%40arm.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA
  2024-04-12  8:40 [v2] Support for Arm CCA VMs on Linux Steven Price
  2024-04-11 18:54 ` Itaru Kitayama
@ 2024-04-12  8:41 ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 01/14] arm64: rsi: Add RSI definitions Steven Price
                     ` (13 more replies)
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
  2024-04-12 16:52 ` [v2] Support for Arm CCA VMs on Linux Jean-Philippe Brucker
  3 siblings, 14 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:41 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

This series adds support for running Linux in a protected VM under the
Arm Confidential Compute Architecture (CCA). The purpose of this series
is to gather feedback on the proposed changes to the architecture code
for CCA.

The ABI to the RMM from a realm (the RSI) is based on the final RMM v1.0
(EAC 5) specification[1].

This series is based on v6.9-rc1. It is also available as a git
repository:

https://gitlab.arm.com/linux-arm/linux-cca cca-guest/v2

Introduction
============
A more general introduction to Arm CCA is available on the Arm
website[2], and links to the other components involved are available in
the overall cover letter.

Arm Confidential Compute Architecture adds two new 'worlds' to the
architecture: Root and Realm. A new software component known as the RMM
(Realm Management Monitor) runs in Realm EL2 and is trusted by both the
Normal World and VMs running within Realms. This enables mutual
distrust between the Realm VMs and the Normal World.

Virtual machines running within a Realm can decide on a (4k)
page-by-page granularity whether to share a page with the (Normal World)
host or to keep it private (protected). This protection is provided by
the hardware and attempts to access a page which isn't shared by the
Normal World will trigger a Granule Protection Fault.

Realm VMs can communicate with the RMM via another SMC interface known
as RSI (Realm Services Interface). This series adds wrappers for the
full set of RSI commands and uses them to manage the Realm IPA State
(RIPAS) and to discover the configuration of the realm.

The VM running within the Realm needs to ensure that memory that is
going to use is marked as 'RIPAS_RAM' (i.e. protected memory accessible
only to the guest). This could be provided by the VMM (and subject to
measurement to ensure it is setup correctly) or the VM can set it
itself.  This series includes a patch which will iterate over all
described RAM and set the RIPAS. This is a relatively cheap operation,
and doesn't require memory donation from the host. Instead, memory can
be dynamically provided by the host on fault. An alternative would be to
update booting.rst and state this as a requirement, but this would
reduce the flexibility of the VMM to manage the available memory to the
guest (as the initial RIPAS state is part of the guest's measurement).

Within the Realm the most-significant active bit of the IPA is used to
select whether the access is to protected memory or to memory shared
with the host. This series treats this bit as if it is attribute bit in
the page tables and will modify it when sharing/unsharing memory with
the host.

This top bit usage also necessitates that the IPA width is made more
dynamic in the guest. The VMM will choose a width (and therefore which
bit controls the shared flag) and the guest must be able to identify
this bit to mask it out when necessary. PHYS_MASK_SHIFT/PHYS_MASK are
therefore made dynamic.

To allow virtio to communicate with the host the shared buffers must be
placed in memory which has this top IPA bit set. This is achieved by
implementating the set_memory_{encrypted,decrypted} APIs for arm64 and
forcing the use of bounce buffers. For now all device access is
considered to required the memory to be shared, at this stage there is
no support for real devices to be assigned to a realm guest - obviously
if device assignment is added this will have to change.

Finally the GIC is (largely) emulated by the (untrusted) host. The RMM
provides some management (including register save/restore) but the
ITS buffers must be placed into shared memory for the host to emulate.
There is likely to be future work to harden the GIC driver against a
malicious host (along with any other drivers used within a Realm guest).

[1] https://developer.arm.com/documentation/den0137/1-0eac5/
[2] https://www.arm.com/architecture/security-features/arm-confidential-compute-architecture

Sami Mujawar (2):
  arm64: rsi: Interfaces to query attestation token
  virt: arm-cca-guest: TSM_REPORT support for realms

Steven Price (5):
  arm64: realm: Query IPA size from the RMM
  arm64: Mark all I/O as non-secure shared
  arm64: Make the PHYS_MASK_SHIFT dynamic
  arm64: Enforce bounce buffers for realm DMA
  arm64: realm: Support nonsecure ITS emulation shared

Suzuki K Poulose (7):
  arm64: rsi: Add RSI definitions
  arm64: Detect if in a realm and set RIPAS RAM
  fixmap: Allow architecture overriding set_fixmap_io
  arm64: Override set_fixmap_io
  arm64: Enable memory encrypt for Realms
  arm64: Force device mappings to be non-secure shared
  efi: arm64: Map Device with Prot Shared

 arch/arm64/Kconfig                            |   3 +
 arch/arm64/include/asm/fixmap.h               |   4 +-
 arch/arm64/include/asm/io.h                   |   6 +-
 arch/arm64/include/asm/kvm_arm.h              |   2 +-
 arch/arm64/include/asm/mem_encrypt.h          |  19 ++
 arch/arm64/include/asm/pgtable-hwdef.h        |   4 +-
 arch/arm64/include/asm/pgtable-prot.h         |   3 +
 arch/arm64/include/asm/pgtable.h              |   7 +-
 arch/arm64/include/asm/rsi.h                  |  46 ++++
 arch/arm64/include/asm/rsi_cmds.h             | 143 ++++++++++++
 arch/arm64/include/asm/rsi_smc.h              | 136 ++++++++++++
 arch/arm64/kernel/Makefile                    |   3 +-
 arch/arm64/kernel/efi.c                       |   2 +-
 arch/arm64/kernel/rsi.c                       |  85 +++++++
 arch/arm64/kernel/setup.c                     |   3 +
 arch/arm64/mm/init.c                          |  13 +-
 arch/arm64/mm/mmu.c                           |  13 ++
 arch/arm64/mm/pageattr.c                      |  48 +++-
 drivers/irqchip/irq-gic-v3-its.c              |  95 ++++++--
 drivers/virt/coco/Kconfig                     |   2 +
 drivers/virt/coco/Makefile                    |   1 +
 drivers/virt/coco/arm-cca-guest/Kconfig       |  11 +
 drivers/virt/coco/arm-cca-guest/Makefile      |   2 +
 .../virt/coco/arm-cca-guest/arm-cca-guest.c   | 208 ++++++++++++++++++
 include/asm-generic/fixmap.h                  |   2 +
 25 files changed, 822 insertions(+), 39 deletions(-)
 create mode 100644 arch/arm64/include/asm/mem_encrypt.h
 create mode 100644 arch/arm64/include/asm/rsi.h
 create mode 100644 arch/arm64/include/asm/rsi_cmds.h
 create mode 100644 arch/arm64/include/asm/rsi_smc.h
 create mode 100644 arch/arm64/kernel/rsi.c
 create mode 100644 drivers/virt/coco/arm-cca-guest/Kconfig
 create mode 100644 drivers/virt/coco/arm-cca-guest/Makefile
 create mode 100644 drivers/virt/coco/arm-cca-guest/arm-cca-guest.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v2 01/14] arm64: rsi: Add RSI definitions
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 02/14] arm64: Detect if in a realm and set RIPAS RAM Steven Price
                     ` (12 subsequent siblings)
  13 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Suzuki K Poulose, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Steven Price

From: Suzuki K Poulose <suzuki.poulose@arm.com>

The RMM (Realm Management Monitor) provides functionality that can be
accessed by a realm guest through SMC (Realm Services Interface) calls.

The SMC definitions are based on DEN0137[1] version A-eac5.

[1] https://developer.arm.com/documentation/den0137/latest

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/rsi_cmds.h |  47 +++++++++++
 arch/arm64/include/asm/rsi_smc.h  | 136 ++++++++++++++++++++++++++++++
 2 files changed, 183 insertions(+)
 create mode 100644 arch/arm64/include/asm/rsi_cmds.h
 create mode 100644 arch/arm64/include/asm/rsi_smc.h

diff --git a/arch/arm64/include/asm/rsi_cmds.h b/arch/arm64/include/asm/rsi_cmds.h
new file mode 100644
index 000000000000..458fb58c4251
--- /dev/null
+++ b/arch/arm64/include/asm/rsi_cmds.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+
+#ifndef __ASM_RSI_CMDS_H
+#define __ASM_RSI_CMDS_H
+
+#include <linux/arm-smccc.h>
+
+#include <asm/rsi_smc.h>
+
+static inline void invoke_rsi_fn_smc_with_res(unsigned long function_id,
+					      unsigned long arg0,
+					      unsigned long arg1,
+					      unsigned long arg2,
+					      unsigned long arg3,
+					      struct arm_smccc_res *res)
+{
+	arm_smccc_smc(function_id, arg0, arg1, arg2, arg3, 0, 0, 0, res);
+}
+
+static inline unsigned long rsi_get_version(unsigned long req,
+					    unsigned long *out_lower,
+					    unsigned long *out_higher)
+{
+	struct arm_smccc_res res;
+
+	invoke_rsi_fn_smc_with_res(SMC_RSI_ABI_VERSION, req, 0, 0, 0, &res);
+
+	if (out_lower)
+		*out_lower = res.a1;
+	if (out_higher)
+		*out_higher = res.a2;
+
+	return res.a0;
+}
+
+static inline unsigned long rsi_get_realm_config(struct realm_config *cfg)
+{
+	struct arm_smccc_res res;
+
+	invoke_rsi_fn_smc_with_res(SMC_RSI_REALM_CONFIG, virt_to_phys(cfg), 0, 0, 0, &res);
+	return res.a0;
+}
+
+#endif
diff --git a/arch/arm64/include/asm/rsi_smc.h b/arch/arm64/include/asm/rsi_smc.h
new file mode 100644
index 000000000000..c2c9a3dfed48
--- /dev/null
+++ b/arch/arm64/include/asm/rsi_smc.h
@@ -0,0 +1,136 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+
+#ifndef __SMC_RSI_H_
+#define __SMC_RSI_H_
+
+/*
+ * This file describes the Realm Services Interface (RSI) Application Binary
+ * Interface (ABI) for SMC calls made from within the Realm to the RMM and
+ * serviced by the RMM.
+ */
+
+#define SMC_RSI_CALL_BASE		0xC4000000
+
+/*
+ * The major version number of the RSI implementation.  Increase this whenever
+ * the binary format or semantics of the SMC calls change.
+ */
+#define RSI_ABI_VERSION_MAJOR		1
+
+/*
+ * The minor version number of the RSI implementation.  Increase this when
+ * a bug is fixed, or a feature is added without breaking binary compatibility.
+ */
+#define RSI_ABI_VERSION_MINOR		0
+
+#define RSI_ABI_VERSION			((RSI_ABI_VERSION_MAJOR << 16) | \
+					 RSI_ABI_VERSION_MINOR)
+
+#define RSI_ABI_VERSION_GET_MAJOR(_version) ((_version) >> 16)
+#define RSI_ABI_VERSION_GET_MINOR(_version) ((_version) & 0xFFFF)
+
+#define RSI_SUCCESS			0
+#define RSI_ERROR_INPUT			1
+#define RSI_ERROR_STATE			2
+#define RSI_INCOMPLETE			3
+
+#define SMC_RSI_FID(_x)			(SMC_RSI_CALL_BASE + (_x))
+
+#define SMC_RSI_ABI_VERSION			SMC_RSI_FID(0x190)
+
+/*
+ * arg1 == Challenge value, bytes:  0 -  7
+ * arg2 == Challenge value, bytes:  7 - 15
+ * arg3 == Challenge value, bytes: 16 - 23
+ * arg4 == Challenge value, bytes: 24 - 31
+ * arg5 == Challenge value, bytes: 32 - 39
+ * arg6 == Challenge value, bytes: 40 - 47
+ * arg7 == Challenge value, bytes: 48 - 55
+ * arg8 == Challenge value, bytes: 56 - 63
+ * ret0 == Status / error
+ * ret1 == Upper bound of token size in bytes
+ */
+#define SMC_RSI_ATTESTATION_TOKEN_INIT		SMC_RSI_FID(0x194)
+
+/*
+ * arg1 == The IPA of token buffer
+ * arg2 == Offset within the granule of the token buffer
+ * arg3 == Size of the granule buffer
+ * ret0 == Status / error
+ * ret1 == Length of token bytes copied to the granule buffer
+ */
+#define SMC_RSI_ATTESTATION_TOKEN_CONTINUE	SMC_RSI_FID(0x195)
+
+/*
+ * arg1  == Index, which measurements slot to extend
+ * arg2  == Size of realm measurement in bytes, max 64 bytes
+ * arg3  == Measurement value, bytes:  0 -  7
+ * arg4  == Measurement value, bytes:  7 - 15
+ * arg5  == Measurement value, bytes: 16 - 23
+ * arg6  == Measurement value, bytes: 24 - 31
+ * arg7  == Measurement value, bytes: 32 - 39
+ * arg8  == Measurement value, bytes: 40 - 47
+ * arg9  == Measurement value, bytes: 48 - 55
+ * arg10 == Measurement value, bytes: 56 - 63
+ * ret0  == Status / error
+ */
+#define SMC_RSI_MEASUREMENT_EXTEND		SMC_RSI_FID(0x193)
+
+/*
+ * arg1 == Index, which measurements slot to read
+ * ret0 == Status / error
+ * ret1 == Measurement value, bytes:  0 -  7
+ * ret2 == Measurement value, bytes:  7 - 15
+ * ret3 == Measurement value, bytes: 16 - 23
+ * ret4 == Measurement value, bytes: 24 - 31
+ * ret5 == Measurement value, bytes: 32 - 39
+ * ret6 == Measurement value, bytes: 40 - 47
+ * ret7 == Measurement value, bytes: 48 - 55
+ * ret8 == Measurement value, bytes: 56 - 63
+ */
+#define SMC_RSI_MEASUREMENT_READ		SMC_RSI_FID(0x192)
+
+#ifndef __ASSEMBLY__
+
+struct realm_config {
+	unsigned long ipa_bits; /* Width of IPA in bits */
+};
+
+#endif /* __ASSEMBLY__ */
+
+/*
+ * arg1 == struct realm_config addr
+ * ret0 == Status / error
+ */
+#define SMC_RSI_REALM_CONFIG			SMC_RSI_FID(0x196)
+
+/*
+ * arg1 == Base IPA address of target region
+ * arg2 == Top of the region
+ * arg3 == RIPAS value
+ * arg4 == flags
+ * ret0 == Status / error
+ * ret1 == Top of modified IPA range
+ */
+#define SMC_RSI_IPA_STATE_SET			SMC_RSI_FID(0x197)
+
+#define RSI_NO_CHANGE_DESTROYED			0
+#define RSI_CHANGE_DESTROYED			1
+
+/*
+ * arg1 == IPA of target page
+ * ret0 == Status / error
+ * ret1 == RIPAS value
+ */
+#define SMC_RSI_IPA_STATE_GET			SMC_RSI_FID(0x198)
+
+/*
+ * arg1 == IPA of host call structure
+ * ret0 == Status / error
+ */
+#define SMC_RSI_HOST_CALL			SMC_RSI_FID(0x199)
+
+#endif /* __SMC_RSI_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 02/14] arm64: Detect if in a realm and set RIPAS RAM
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
  2024-04-12  8:42   ` [PATCH v2 01/14] arm64: rsi: Add RSI definitions Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 03/14] arm64: realm: Query IPA size from the RMM Steven Price
                     ` (11 subsequent siblings)
  13 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Suzuki K Poulose, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Steven Price

From: Suzuki K Poulose <suzuki.poulose@arm.com>

Detect that the VM is a realm guest by the presence of the RSI
interface.

If in a realm then all memory needs to be marked as RIPAS RAM initially,
the loader may or may not have done this for us. To be sure iterate over
all RAM and mark it as such. Any failure is fatal as that implies the
RAM regions passed to Linux are incorrect - which would mean failing
later when attempting to access non-existent RAM.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/rsi.h      | 46 ++++++++++++++++++++++++
 arch/arm64/include/asm/rsi_cmds.h | 22 ++++++++++++
 arch/arm64/kernel/Makefile        |  3 +-
 arch/arm64/kernel/rsi.c           | 58 +++++++++++++++++++++++++++++++
 arch/arm64/kernel/setup.c         |  3 ++
 arch/arm64/mm/init.c              |  2 ++
 6 files changed, 133 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/rsi.h
 create mode 100644 arch/arm64/kernel/rsi.c

diff --git a/arch/arm64/include/asm/rsi.h b/arch/arm64/include/asm/rsi.h
new file mode 100644
index 000000000000..3b56aac5dc43
--- /dev/null
+++ b/arch/arm64/include/asm/rsi.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+
+#ifndef __ASM_RSI_H_
+#define __ASM_RSI_H_
+
+#include <linux/jump_label.h>
+#include <asm/rsi_cmds.h>
+
+extern struct static_key_false rsi_present;
+
+void arm64_setup_memory(void);
+
+void __init arm64_rsi_init(void);
+static inline bool is_realm_world(void)
+{
+	return static_branch_unlikely(&rsi_present);
+}
+
+static inline void set_memory_range(phys_addr_t start, phys_addr_t end,
+				    enum ripas state)
+{
+	unsigned long ret;
+	phys_addr_t top;
+
+	while (start != end) {
+		ret = rsi_set_addr_range_state(start, end, state, &top);
+		BUG_ON(ret);
+		BUG_ON(top < start);
+		BUG_ON(top > end);
+		start = top;
+	}
+}
+
+static inline void set_memory_range_protected(phys_addr_t start, phys_addr_t end)
+{
+	set_memory_range(start, end, RSI_RIPAS_RAM);
+}
+
+static inline void set_memory_range_shared(phys_addr_t start, phys_addr_t end)
+{
+	set_memory_range(start, end, RSI_RIPAS_EMPTY);
+}
+#endif
diff --git a/arch/arm64/include/asm/rsi_cmds.h b/arch/arm64/include/asm/rsi_cmds.h
index 458fb58c4251..b4cbeafa2f41 100644
--- a/arch/arm64/include/asm/rsi_cmds.h
+++ b/arch/arm64/include/asm/rsi_cmds.h
@@ -10,6 +10,11 @@
 
 #include <asm/rsi_smc.h>
 
+enum ripas {
+	RSI_RIPAS_EMPTY,
+	RSI_RIPAS_RAM,
+};
+
 static inline void invoke_rsi_fn_smc_with_res(unsigned long function_id,
 					      unsigned long arg0,
 					      unsigned long arg1,
@@ -44,4 +49,21 @@ static inline unsigned long rsi_get_realm_config(struct realm_config *cfg)
 	return res.a0;
 }
 
+static inline unsigned long rsi_set_addr_range_state(phys_addr_t start,
+						     phys_addr_t end,
+						     enum ripas state,
+						     phys_addr_t *top)
+{
+	struct arm_smccc_res res;
+
+	invoke_rsi_fn_smc_with_res(SMC_RSI_IPA_STATE_SET,
+				   start, end, state, RSI_NO_CHANGE_DESTROYED,
+				   &res);
+
+	if (top)
+		*top = res.a1;
+
+	return res.a0;
+}
+
 #endif
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 763824963ed1..a483b916ed11 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -33,7 +33,8 @@ obj-y			:= debug-monitors.o entry.o irq.o fpsimd.o		\
 			   return_address.o cpuinfo.o cpu_errata.o		\
 			   cpufeature.o alternative.o cacheinfo.o		\
 			   smp.o smp_spin_table.o topology.o smccc-call.o	\
-			   syscall.o proton-pack.o idle.o patching.o pi/
+			   syscall.o proton-pack.o idle.o patching.o pi/	\
+			   rsi.o
 
 obj-$(CONFIG_COMPAT)			+= sys32.o signal32.o			\
 					   sys_compat.o
diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
new file mode 100644
index 000000000000..1076649ac082
--- /dev/null
+++ b/arch/arm64/kernel/rsi.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+
+#include <linux/jump_label.h>
+#include <linux/memblock.h>
+#include <asm/rsi.h>
+
+DEFINE_STATIC_KEY_FALSE_RO(rsi_present);
+EXPORT_SYMBOL(rsi_present);
+
+static bool rsi_version_matches(void)
+{
+	unsigned long ver;
+	unsigned long ret = rsi_get_version(RSI_ABI_VERSION, &ver, NULL);
+
+	if (ret == SMCCC_RET_NOT_SUPPORTED)
+		return false;
+
+	if (ver != RSI_ABI_VERSION) {
+		pr_err("RME: RSI version %lu.%lu not supported\n",
+		       RSI_ABI_VERSION_GET_MAJOR(ver),
+		       RSI_ABI_VERSION_GET_MINOR(ver));
+		return false;
+	}
+
+	pr_info("RME: Using RSI version %lu.%lu\n",
+		RSI_ABI_VERSION_GET_MAJOR(ver),
+		RSI_ABI_VERSION_GET_MINOR(ver));
+
+	return true;
+}
+
+void arm64_setup_memory(void)
+{
+	u64 i;
+	phys_addr_t start, end;
+
+	if (!static_branch_unlikely(&rsi_present))
+		return;
+
+	/*
+	 * Iterate over the available memory ranges
+	 * and convert the state to protected memory.
+	 */
+	for_each_mem_range(i, &start, &end) {
+		set_memory_range_protected(start, end);
+	}
+}
+
+void __init arm64_rsi_init(void)
+{
+	if (!rsi_version_matches())
+		return;
+
+	static_branch_enable(&rsi_present);
+}
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 65a052bf741f..a4bd97e74704 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -43,6 +43,7 @@
 #include <asm/cpu_ops.h>
 #include <asm/kasan.h>
 #include <asm/numa.h>
+#include <asm/rsi.h>
 #include <asm/scs.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
@@ -293,6 +294,8 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
 	 * cpufeature code and early parameters.
 	 */
 	jump_label_init();
+	/* Init RSI after jump_labels are active */
+	arm64_rsi_init();
 	parse_early_param();
 
 	dynamic_scs_init();
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 03efd86dce0a..786fd6ce5f17 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -40,6 +40,7 @@
 #include <asm/kvm_host.h>
 #include <asm/memory.h>
 #include <asm/numa.h>
+#include <asm/rsi.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <linux/sizes.h>
@@ -313,6 +314,7 @@ void __init arm64_memblock_init(void)
 	early_init_fdt_scan_reserved_mem();
 
 	high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
+	arm64_setup_memory();
 }
 
 void __init bootmem_init(void)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 03/14] arm64: realm: Query IPA size from the RMM
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
  2024-04-12  8:42   ` [PATCH v2 01/14] arm64: rsi: Add RSI definitions Steven Price
  2024-04-12  8:42   ` [PATCH v2 02/14] arm64: Detect if in a realm and set RIPAS RAM Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 04/14] arm64: Mark all I/O as non-secure shared Steven Price
                     ` (10 subsequent siblings)
  13 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

The top bit of the configured IPA size is used as an attribute to
control whether the address is protected or shared. Query the
configuration from the RMM to assertain which bit this is.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/pgtable-prot.h | 3 +++
 arch/arm64/kernel/rsi.c               | 8 ++++++++
 2 files changed, 11 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index dd9ee67d1d87..15d8f0133af8 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -63,6 +63,9 @@
 #include <asm/pgtable-types.h>
 
 extern bool arm64_use_ng_mappings;
+extern unsigned long prot_ns_shared;
+
+#define PROT_NS_SHARED		((prot_ns_shared))
 
 #define PTE_MAYBE_NG		(arm64_use_ng_mappings ? PTE_NG : 0)
 #define PMD_MAYBE_NG		(arm64_use_ng_mappings ? PMD_SECT_NG : 0)
diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
index 1076649ac082..b93252ed6fc5 100644
--- a/arch/arm64/kernel/rsi.c
+++ b/arch/arm64/kernel/rsi.c
@@ -7,6 +7,11 @@
 #include <linux/memblock.h>
 #include <asm/rsi.h>
 
+struct realm_config __attribute((aligned(PAGE_SIZE))) config;
+
+unsigned long prot_ns_shared;
+EXPORT_SYMBOL(prot_ns_shared);
+
 DEFINE_STATIC_KEY_FALSE_RO(rsi_present);
 EXPORT_SYMBOL(rsi_present);
 
@@ -53,6 +58,9 @@ void __init arm64_rsi_init(void)
 {
 	if (!rsi_version_matches())
 		return;
+	if (rsi_get_realm_config(&config))
+		return;
+	prot_ns_shared = BIT(config.ipa_bits - 1);
 
 	static_branch_enable(&rsi_present);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 04/14] arm64: Mark all I/O as non-secure shared
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
                     ` (2 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 03/14] arm64: realm: Query IPA size from the RMM Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 05/14] fixmap: Allow architecture overriding set_fixmap_io Steven Price
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

All I/O is by default considered non-secure for realms. As such
mark them as shared with the host.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/io.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 8d825522c55c..f283c764ea20 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -145,12 +145,12 @@ extern void __memset_io(volatile void __iomem *, int, size_t);
 
 #define ioremap_prot ioremap_prot
 
-#define _PAGE_IOREMAP PROT_DEVICE_nGnRE
+#define _PAGE_IOREMAP (PROT_DEVICE_nGnRE | PROT_NS_SHARED)
 
 #define ioremap_wc(addr, size)	\
-	ioremap_prot((addr), (size), PROT_NORMAL_NC)
+	ioremap_prot((addr), (size), (PROT_NORMAL_NC | PROT_NS_SHARED))
 #define ioremap_np(addr, size)	\
-	ioremap_prot((addr), (size), PROT_DEVICE_nGnRnE)
+	ioremap_prot((addr), (size), (PROT_DEVICE_nGnRnE | PROT_NS_SHARED))
 
 /*
  * io{read,write}{16,32,64}be() macros
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 05/14] fixmap: Allow architecture overriding set_fixmap_io
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
                     ` (3 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 04/14] arm64: Mark all I/O as non-secure shared Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 06/14] arm64: Override set_fixmap_io Steven Price
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Suzuki K Poulose, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Steven Price

From: Suzuki K Poulose <suzuki.poulose@arm.com>

For a realm guest it will be necessary to ensure IO mappings are shared
so that the VMM can emulate the device. The following patch will provide
an implementation of set_fixmap_io for arm64 setting the shared bit (if
in a realm).

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 include/asm-generic/fixmap.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/asm-generic/fixmap.h b/include/asm-generic/fixmap.h
index 8cc7b09c1bc7..c5ce0368c1ee 100644
--- a/include/asm-generic/fixmap.h
+++ b/include/asm-generic/fixmap.h
@@ -94,8 +94,10 @@ static inline unsigned long virt_to_fix(const unsigned long vaddr)
 /*
  * Some fixmaps are for IO
  */
+#ifndef set_fixmap_io
 #define set_fixmap_io(idx, phys) \
 	__set_fixmap(idx, phys, FIXMAP_PAGE_IO)
+#endif
 
 #define set_fixmap_offset_io(idx, phys) \
 	__set_fixmap_offset(idx, phys, FIXMAP_PAGE_IO)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 06/14] arm64: Override set_fixmap_io
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
                     ` (4 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 05/14] fixmap: Allow architecture overriding set_fixmap_io Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 07/14] arm64: Make the PHYS_MASK_SHIFT dynamic Steven Price
                     ` (7 subsequent siblings)
  13 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Suzuki K Poulose, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Steven Price

From: Suzuki K Poulose <suzuki.poulose@arm.com>

Override the set_fixmap_io to set shared permission for the host
in case of a CC guest. For now we mark it shared unconditionally.
Future changes could filter the physical address and make the
decision accordingly.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/fixmap.h |  4 +++-
 arch/arm64/mm/mmu.c             | 13 +++++++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index 87e307804b99..f765943b088c 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -107,7 +107,9 @@ void __init early_fixmap_init(void);
 #define __late_set_fixmap __set_fixmap
 #define __late_clear_fixmap(idx) __set_fixmap((idx), 0, FIXMAP_PAGE_CLEAR)
 
-extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot);
+#define set_fixmap_io set_fixmap_io
+void set_fixmap_io(enum fixed_addresses idx, phys_addr_t phys);
+void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot);
 
 #include <asm-generic/fixmap.h>
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 495b732d5af3..79d84db9ffcb 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1179,6 +1179,19 @@ void vmemmap_free(unsigned long start, unsigned long end,
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
+void set_fixmap_io(enum fixed_addresses idx, phys_addr_t phys)
+{
+	pgprot_t prot = FIXMAP_PAGE_IO;
+
+	/*
+	 * For now we consider all I/O as non-secure. For future
+	 * filter the I/O base for setting appropriate permissions.
+	 */
+	prot = __pgprot(pgprot_val(prot) | PROT_NS_SHARED);
+
+	return __set_fixmap(idx, phys, prot);
+}
+
 int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t prot)
 {
 	pud_t new_pud = pfn_pud(__phys_to_pfn(phys), mk_pud_sect_prot(prot));
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 07/14] arm64: Make the PHYS_MASK_SHIFT dynamic
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
                     ` (5 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 06/14] arm64: Override set_fixmap_io Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 08/14] arm64: Enforce bounce buffers for realm DMA Steven Price
                     ` (6 subsequent siblings)
  13 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Make the PHYS_MASK_SHIFT dynamic for Realms. This is only is required
for masking the PFN from a pte entry. Elsewhere, we could still use the
PA bits configured by the kernel. So, this patch:

 -> renames PHYS_MASK_SHIFT -> MAX_PHYS_SHIFT as supported by the kernel
 -> Makes PHYS_MASK_SHIFT -> Dynamic value of the (I)PA bit width
 -> For a realm: reduces phys_mask_shift if the RMM reports a smaller
    configured size for the guest.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h       | 2 +-
 arch/arm64/include/asm/pgtable-hwdef.h | 4 ++--
 arch/arm64/include/asm/pgtable.h       | 5 +++++
 arch/arm64/kernel/rsi.c                | 5 +++++
 4 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index e01bb5ca13b7..9944aca348bd 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -398,7 +398,7 @@
  * bits in PAR are res0.
  */
 #define PAR_TO_HPFAR(par)		\
-	(((par) & GENMASK_ULL(52 - 1, 12)) >> 8)
+	(((par) & GENMASK_ULL(MAX_PHYS_MASK_SHIFT - 1, 12)) >> 8)
 
 #define ECN(x) { ESR_ELx_EC_##x, #x }
 
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index ef207a0d4f0d..90dc292bed5f 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -206,8 +206,8 @@
 /*
  * Highest possible physical address supported.
  */
-#define PHYS_MASK_SHIFT		(CONFIG_ARM64_PA_BITS)
-#define PHYS_MASK		((UL(1) << PHYS_MASK_SHIFT) - 1)
+#define MAX_PHYS_MASK_SHIFT	(CONFIG_ARM64_PA_BITS)
+#define MAX_PHYS_MASK		((UL(1) << PHYS_MASK_SHIFT) - 1)
 
 #define TTBR_CNP_BIT		(UL(1) << 0)
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index afdd56d26ad7..f5376bd567a1 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -39,6 +39,11 @@
 #include <linux/sched.h>
 #include <linux/page_table_check.h>
 
+extern unsigned int phys_mask_shift;
+
+#define PHYS_MASK_SHIFT		(phys_mask_shift)
+#define PHYS_MASK		((1UL << PHYS_MASK_SHIFT) - 1)
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
 
diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
index b93252ed6fc5..159bc428c77b 100644
--- a/arch/arm64/kernel/rsi.c
+++ b/arch/arm64/kernel/rsi.c
@@ -12,6 +12,8 @@ struct realm_config __attribute((aligned(PAGE_SIZE))) config;
 unsigned long prot_ns_shared;
 EXPORT_SYMBOL(prot_ns_shared);
 
+unsigned int phys_mask_shift = CONFIG_ARM64_PA_BITS;
+
 DEFINE_STATIC_KEY_FALSE_RO(rsi_present);
 EXPORT_SYMBOL(rsi_present);
 
@@ -62,5 +64,8 @@ void __init arm64_rsi_init(void)
 		return;
 	prot_ns_shared = BIT(config.ipa_bits - 1);
 
+	if (config.ipa_bits - 1 < phys_mask_shift)
+		phys_mask_shift = config.ipa_bits - 1;
+
 	static_branch_enable(&rsi_present);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 08/14] arm64: Enforce bounce buffers for realm DMA
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
                     ` (6 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 07/14] arm64: Make the PHYS_MASK_SHIFT dynamic Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 09/14] arm64: Enable memory encrypt for Realms Steven Price
                     ` (5 subsequent siblings)
  13 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Within a realm guest it's not possible for a device emulated by the VMM
to access arbitrary guest memory. So force the use of bounce buffers to
ensure that the memory the emulated devices are accessing is in memory
which is explicitly shared with the host.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kernel/rsi.c |  2 ++
 arch/arm64/mm/init.c    | 11 +++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
index 159bc428c77b..5c8ed3aaa35f 100644
--- a/arch/arm64/kernel/rsi.c
+++ b/arch/arm64/kernel/rsi.c
@@ -5,6 +5,8 @@
 
 #include <linux/jump_label.h>
 #include <linux/memblock.h>
+#include <linux/swiotlb.h>
+
 #include <asm/rsi.h>
 
 struct realm_config __attribute((aligned(PAGE_SIZE))) config;
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 786fd6ce5f17..01a2e3ce6921 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -370,7 +370,9 @@ void __init bootmem_init(void)
  */
 void __init mem_init(void)
 {
-	bool swiotlb = max_pfn > PFN_DOWN(arm64_dma_phys_limit);
+	bool swiotlb = (max_pfn > PFN_DOWN(arm64_dma_phys_limit));
+
+	swiotlb |= is_realm_world();
 
 	if (IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) && !swiotlb) {
 		/*
@@ -383,7 +385,12 @@ void __init mem_init(void)
 		swiotlb = true;
 	}
 
-	swiotlb_init(swiotlb, SWIOTLB_VERBOSE);
+	if (is_realm_world()) {
+		swiotlb_init(swiotlb, SWIOTLB_VERBOSE | SWIOTLB_FORCE);
+		swiotlb_update_mem_attributes();
+	} else {
+		swiotlb_init(swiotlb, SWIOTLB_VERBOSE);
+	}
 
 	/* this will put all unused low memory onto the freelists */
 	memblock_free_all();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 09/14] arm64: Enable memory encrypt for Realms
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
                     ` (7 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 08/14] arm64: Enforce bounce buffers for realm DMA Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-15  3:13     ` kernel test robot
  2024-04-12  8:42   ` [PATCH v2 10/14] arm64: Force device mappings to be non-secure shared Steven Price
                     ` (4 subsequent siblings)
  13 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Suzuki K Poulose, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Steven Price

From: Suzuki K Poulose <suzuki.poulose@arm.com>

Use the memory encryption APIs to trigger a RSI call to request a
transition between protected memory and shared memory (or vice versa)
and updating the kernel's linear map of modified pages to flip the top
bit of the IPA. This requires that block mappings are not used in the
direct map for realm guests.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/Kconfig                   |  3 ++
 arch/arm64/include/asm/mem_encrypt.h | 19 +++++++++++
 arch/arm64/kernel/rsi.c              | 12 +++++++
 arch/arm64/mm/pageattr.c             | 48 ++++++++++++++++++++++++++--
 4 files changed, 79 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm64/include/asm/mem_encrypt.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..ffd4685a3029 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -20,6 +20,7 @@ config ARM64
 	select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
 	select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
 	select ARCH_HAS_CACHE_LINE_SIZE
+	select ARCH_HAS_CC_PLATFORM
 	select ARCH_HAS_CURRENT_STACK_POINTER
 	select ARCH_HAS_DEBUG_VIRTUAL
 	select ARCH_HAS_DEBUG_VM_PGTABLE
@@ -40,6 +41,8 @@ config ARM64
 	select ARCH_HAS_SETUP_DMA_OPS
 	select ARCH_HAS_SET_DIRECT_MAP
 	select ARCH_HAS_SET_MEMORY
+	select ARCH_HAS_MEM_ENCRYPT
+	select ARCH_HAS_FORCE_DMA_UNENCRYPTED
 	select ARCH_STACKWALK
 	select ARCH_HAS_STRICT_KERNEL_RWX
 	select ARCH_HAS_STRICT_MODULE_RWX
diff --git a/arch/arm64/include/asm/mem_encrypt.h b/arch/arm64/include/asm/mem_encrypt.h
new file mode 100644
index 000000000000..7381f9585321
--- /dev/null
+++ b/arch/arm64/include/asm/mem_encrypt.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+
+#ifndef __ASM_MEM_ENCRYPT_H
+#define __ASM_MEM_ENCRYPT_H
+
+#include <asm/rsi.h>
+
+/* All DMA must be to non-secure memory for now */
+static inline bool force_dma_unencrypted(struct device *dev)
+{
+	return is_realm_world();
+}
+
+int set_memory_encrypted(unsigned long addr, int numpages);
+int set_memory_decrypted(unsigned long addr, int numpages);
+#endif
diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
index 5c8ed3aaa35f..ba3f346e7a91 100644
--- a/arch/arm64/kernel/rsi.c
+++ b/arch/arm64/kernel/rsi.c
@@ -6,6 +6,7 @@
 #include <linux/jump_label.h>
 #include <linux/memblock.h>
 #include <linux/swiotlb.h>
+#include <linux/cc_platform.h>
 
 #include <asm/rsi.h>
 
@@ -19,6 +20,17 @@ unsigned int phys_mask_shift = CONFIG_ARM64_PA_BITS;
 DEFINE_STATIC_KEY_FALSE_RO(rsi_present);
 EXPORT_SYMBOL(rsi_present);
 
+bool cc_platform_has(enum cc_attr attr)
+{
+	switch (attr) {
+	case CC_ATTR_MEM_ENCRYPT:
+		return is_realm_world();
+	default:
+		return false;
+	}
+}
+EXPORT_SYMBOL_GPL(cc_platform_has);
+
 static bool rsi_version_matches(void)
 {
 	unsigned long ver;
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 0c4e3ecf989d..229b6d9990f5 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -5,10 +5,12 @@
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/module.h>
+#include <linux/mem_encrypt.h>
 #include <linux/sched.h>
 #include <linux/vmalloc.h>
 
 #include <asm/cacheflush.h>
+#include <asm/pgtable-prot.h>
 #include <asm/set_memory.h>
 #include <asm/tlbflush.h>
 #include <asm/kfence.h>
@@ -23,14 +25,16 @@ bool rodata_full __ro_after_init = IS_ENABLED(CONFIG_RODATA_FULL_DEFAULT_ENABLED
 bool can_set_direct_map(void)
 {
 	/*
-	 * rodata_full and DEBUG_PAGEALLOC require linear map to be
-	 * mapped at page granularity, so that it is possible to
+	 * rodata_full, DEBUG_PAGEALLOC and a Realm guest all require linear
+	 * map to be mapped at page granularity, so that it is possible to
 	 * protect/unprotect single pages.
 	 *
 	 * KFENCE pool requires page-granular mapping if initialized late.
+	 *
+	 * Realms need to make pages shared/protected at page granularity.
 	 */
 	return rodata_full || debug_pagealloc_enabled() ||
-	       arm64_kfence_can_set_direct_map();
+		arm64_kfence_can_set_direct_map() || is_realm_world();
 }
 
 static int change_page_range(pte_t *ptep, unsigned long addr, void *data)
@@ -41,6 +45,7 @@ static int change_page_range(pte_t *ptep, unsigned long addr, void *data)
 	pte = clear_pte_bit(pte, cdata->clear_mask);
 	pte = set_pte_bit(pte, cdata->set_mask);
 
+	/* TODO: Break before make for PROT_NS_SHARED updates */
 	__set_pte(ptep, pte);
 	return 0;
 }
@@ -192,6 +197,43 @@ int set_direct_map_default_noflush(struct page *page)
 				   PAGE_SIZE, change_page_range, &data);
 }
 
+static int __set_memory_encrypted(unsigned long addr,
+				  int numpages,
+				  bool encrypt)
+{
+	unsigned long set_prot = 0, clear_prot = 0;
+	phys_addr_t start, end;
+
+	if (!is_realm_world())
+		return 0;
+
+	WARN_ON(!__is_lm_address(addr));
+	start = __virt_to_phys(addr);
+	end = start + numpages * PAGE_SIZE;
+
+	if (encrypt) {
+		clear_prot = PROT_NS_SHARED;
+		set_memory_range_protected(start, end);
+	} else {
+		set_prot = PROT_NS_SHARED;
+		set_memory_range_shared(start, end);
+	}
+
+	return __change_memory_common(addr, PAGE_SIZE * numpages,
+				      __pgprot(set_prot),
+				      __pgprot(clear_prot));
+}
+
+int set_memory_encrypted(unsigned long addr, int numpages)
+{
+	return __set_memory_encrypted(addr, numpages, true);
+}
+
+int set_memory_decrypted(unsigned long addr, int numpages)
+{
+	return __set_memory_encrypted(addr, numpages, false);
+}
+
 #ifdef CONFIG_DEBUG_PAGEALLOC
 void __kernel_map_pages(struct page *page, int numpages, int enable)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 10/14] arm64: Force device mappings to be non-secure shared
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
                     ` (8 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 09/14] arm64: Enable memory encrypt for Realms Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 11/14] efi: arm64: Map Device with Prot Shared Steven Price
                     ` (3 subsequent siblings)
  13 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Suzuki K Poulose, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Steven Price

From: Suzuki K Poulose <suzuki.poulose@arm.com>

Device mappings (currently) need to be emulated by the VMM so must be
mapped shared with the host.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index f5376bd567a1..db71c564ec21 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -598,7 +598,7 @@ static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
 #define pgprot_writecombine(prot) \
 	__pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_NORMAL_NC) | PTE_PXN | PTE_UXN)
 #define pgprot_device(prot) \
-	__pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_DEVICE_nGnRE) | PTE_PXN | PTE_UXN)
+	__pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_DEVICE_nGnRE) | PTE_PXN | PTE_UXN | PROT_NS_SHARED)
 #define pgprot_tagged(prot) \
 	__pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_NORMAL_TAGGED))
 #define pgprot_mhp	pgprot_tagged
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 11/14] efi: arm64: Map Device with Prot Shared
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
                     ` (9 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 10/14] arm64: Force device mappings to be non-secure shared Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 12/14] arm64: realm: Support nonsecure ITS emulation shared Steven Price
                     ` (2 subsequent siblings)
  13 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Suzuki K Poulose, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Steven Price

From: Suzuki K Poulose <suzuki.poulose@arm.com>

Device mappings need to be emualted by the VMM so must be mapped shared
with the host.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kernel/efi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 9afcc690fe73..bb8b39f16092 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -33,7 +33,7 @@ static __init pteval_t create_mapping_protection(efi_memory_desc_t *md)
 	u32 type = md->type;
 
 	if (type == EFI_MEMORY_MAPPED_IO)
-		return PROT_DEVICE_nGnRE;
+		return PROT_NS_SHARED | PROT_DEVICE_nGnRE;
 
 	if (region_is_misaligned(md)) {
 		static bool __initdata code_is_misaligned;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 12/14] arm64: realm: Support nonsecure ITS emulation shared
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
                     ` (10 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 11/14] efi: arm64: Map Device with Prot Shared Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 13/14] arm64: rsi: Interfaces to query attestation token Steven Price
  2024-04-12  8:42   ` [PATCH v2 14/14] virt: arm-cca-guest: TSM_REPORT support for realms Steven Price
  13 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Within a realm guest the ITS is emulated by the host. This means the
allocations must have been made available to the host by a call to
set_memory_decrypted(). Introduce an allocation function which performs
this extra call.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 drivers/irqchip/irq-gic-v3-its.c | 95 ++++++++++++++++++++++++--------
 1 file changed, 71 insertions(+), 24 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index fca888b36680..94e29d6c82e6 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -18,6 +18,7 @@
 #include <linux/irqdomain.h>
 #include <linux/list.h>
 #include <linux/log2.h>
+#include <linux/mem_encrypt.h>
 #include <linux/memblock.h>
 #include <linux/mm.h>
 #include <linux/msi.h>
@@ -27,6 +28,7 @@
 #include <linux/of_pci.h>
 #include <linux/of_platform.h>
 #include <linux/percpu.h>
+#include <linux/set_memory.h>
 #include <linux/slab.h>
 #include <linux/syscore_ops.h>
 
@@ -163,6 +165,7 @@ struct its_device {
 	struct its_node		*its;
 	struct event_lpi_map	event_map;
 	void			*itt;
+	u32			itt_order;
 	u32			nr_ites;
 	u32			device_id;
 	bool			shared;
@@ -198,6 +201,33 @@ static DEFINE_IDA(its_vpeid_ida);
 #define gic_data_rdist_rd_base()	(gic_data_rdist()->rd_base)
 #define gic_data_rdist_vlpi_base()	(gic_data_rdist_rd_base() + SZ_128K)
 
+static struct page *its_alloc_shared_pages_node(int node, gfp_t gfp,
+						unsigned int order)
+{
+	struct page *page;
+
+	if (node == NUMA_NO_NODE)
+		page = alloc_pages(gfp, order);
+	else
+		page = alloc_pages_node(node, gfp, order);
+
+	if (page)
+		set_memory_decrypted((unsigned long)page_address(page),
+				     1 << order);
+	return page;
+}
+
+static struct page *its_alloc_shared_pages(gfp_t gfp, unsigned int order)
+{
+	return its_alloc_shared_pages_node(NUMA_NO_NODE, gfp, order);
+}
+
+static void its_free_shared_pages(void *addr, unsigned int order)
+{
+	set_memory_encrypted((unsigned long)addr, 1 << order);
+	free_pages((unsigned long)addr, order);
+}
+
 /*
  * Skip ITSs that have no vLPIs mapped, unless we're on GICv4.1, as we
  * always have vSGIs mapped.
@@ -2206,7 +2236,8 @@ static struct page *its_allocate_prop_table(gfp_t gfp_flags)
 {
 	struct page *prop_page;
 
-	prop_page = alloc_pages(gfp_flags, get_order(LPI_PROPBASE_SZ));
+	prop_page = its_alloc_shared_pages(gfp_flags,
+					   get_order(LPI_PROPBASE_SZ));
 	if (!prop_page)
 		return NULL;
 
@@ -2217,8 +2248,8 @@ static struct page *its_allocate_prop_table(gfp_t gfp_flags)
 
 static void its_free_prop_table(struct page *prop_page)
 {
-	free_pages((unsigned long)page_address(prop_page),
-		   get_order(LPI_PROPBASE_SZ));
+	its_free_shared_pages(page_address(prop_page),
+			      get_order(LPI_PROPBASE_SZ));
 }
 
 static bool gic_check_reserved_range(phys_addr_t addr, unsigned long size)
@@ -2340,10 +2371,10 @@ static int its_setup_baser(struct its_node *its, struct its_baser *baser,
 		order = get_order(GITS_BASER_PAGES_MAX * psz);
 	}
 
-	page = alloc_pages_node(its->numa_node, GFP_KERNEL | __GFP_ZERO, order);
+	page = its_alloc_shared_pages_node(its->numa_node,
+					   GFP_KERNEL | __GFP_ZERO, order);
 	if (!page)
 		return -ENOMEM;
-
 	base = (void *)page_address(page);
 	baser_phys = virt_to_phys(base);
 
@@ -2353,7 +2384,7 @@ static int its_setup_baser(struct its_node *its, struct its_baser *baser,
 		/* 52bit PA is supported only when PageSize=64K */
 		if (psz != SZ_64K) {
 			pr_err("ITS: no 52bit PA support when psz=%d\n", psz);
-			free_pages((unsigned long)base, order);
+			its_free_shared_pages(base, order);
 			return -ENXIO;
 		}
 
@@ -2409,7 +2440,7 @@ static int its_setup_baser(struct its_node *its, struct its_baser *baser,
 		pr_err("ITS@%pa: %s doesn't stick: %llx %llx\n",
 		       &its->phys_base, its_base_type_string[type],
 		       val, tmp);
-		free_pages((unsigned long)base, order);
+		its_free_shared_pages(base, order);
 		return -ENXIO;
 	}
 
@@ -2548,8 +2579,8 @@ static void its_free_tables(struct its_node *its)
 
 	for (i = 0; i < GITS_BASER_NR_REGS; i++) {
 		if (its->tables[i].base) {
-			free_pages((unsigned long)its->tables[i].base,
-				   its->tables[i].order);
+			its_free_shared_pages(its->tables[i].base,
+					      its->tables[i].order);
 			its->tables[i].base = NULL;
 		}
 	}
@@ -2815,7 +2846,8 @@ static bool allocate_vpe_l2_table(int cpu, u32 id)
 
 	/* Allocate memory for 2nd level table */
 	if (!table[idx]) {
-		page = alloc_pages(GFP_KERNEL | __GFP_ZERO, get_order(psz));
+		page = its_alloc_shared_pages(GFP_KERNEL | __GFP_ZERO,
+					      get_order(psz));
 		if (!page)
 			return false;
 
@@ -2934,7 +2966,8 @@ static int allocate_vpe_l1_table(void)
 
 	pr_debug("np = %d, npg = %lld, psz = %d, epp = %d, esz = %d\n",
 		 np, npg, psz, epp, esz);
-	page = alloc_pages(GFP_ATOMIC | __GFP_ZERO, get_order(np * PAGE_SIZE));
+	page = its_alloc_shared_pages(GFP_ATOMIC | __GFP_ZERO,
+				      get_order(np * PAGE_SIZE));
 	if (!page)
 		return -ENOMEM;
 
@@ -2980,8 +3013,8 @@ static struct page *its_allocate_pending_table(gfp_t gfp_flags)
 {
 	struct page *pend_page;
 
-	pend_page = alloc_pages(gfp_flags | __GFP_ZERO,
-				get_order(LPI_PENDBASE_SZ));
+	pend_page = its_alloc_shared_pages(gfp_flags | __GFP_ZERO,
+					   get_order(LPI_PENDBASE_SZ));
 	if (!pend_page)
 		return NULL;
 
@@ -2993,7 +3026,8 @@ static struct page *its_allocate_pending_table(gfp_t gfp_flags)
 
 static void its_free_pending_table(struct page *pt)
 {
-	free_pages((unsigned long)page_address(pt), get_order(LPI_PENDBASE_SZ));
+	its_free_shared_pages(page_address(pt),
+			      get_order(LPI_PENDBASE_SZ));
 }
 
 /*
@@ -3328,8 +3362,9 @@ static bool its_alloc_table_entry(struct its_node *its,
 
 	/* Allocate memory for 2nd level table */
 	if (!table[idx]) {
-		page = alloc_pages_node(its->numa_node, GFP_KERNEL | __GFP_ZERO,
-					get_order(baser->psz));
+		page = its_alloc_shared_pages_node(its->numa_node,
+						   GFP_KERNEL | __GFP_ZERO,
+						   get_order(baser->psz));
 		if (!page)
 			return false;
 
@@ -3412,7 +3447,9 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
 	unsigned long *lpi_map = NULL;
 	unsigned long flags;
 	u16 *col_map = NULL;
+	struct page *page;
 	void *itt;
+	int itt_order;
 	int lpi_base;
 	int nr_lpis;
 	int nr_ites;
@@ -3424,7 +3461,6 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
 	if (WARN_ON(!is_power_of_2(nvecs)))
 		nvecs = roundup_pow_of_two(nvecs);
 
-	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
 	/*
 	 * Even if the device wants a single LPI, the ITT must be
 	 * sized as a power of two (and you need at least one bit...).
@@ -3432,7 +3468,16 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
 	nr_ites = max(2, nvecs);
 	sz = nr_ites * (FIELD_GET(GITS_TYPER_ITT_ENTRY_SIZE, its->typer) + 1);
 	sz = max(sz, ITS_ITT_ALIGN) + ITS_ITT_ALIGN - 1;
-	itt = kzalloc_node(sz, GFP_KERNEL, its->numa_node);
+	itt_order = get_order(sz);
+	page = its_alloc_shared_pages_node(its->numa_node,
+					   GFP_KERNEL | __GFP_ZERO,
+					   itt_order);
+	if (!page)
+		return NULL;
+	itt = (void *)page_address(page);
+
+	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+
 	if (alloc_lpis) {
 		lpi_map = its_lpi_alloc(nvecs, &lpi_base, &nr_lpis);
 		if (lpi_map)
@@ -3444,9 +3489,9 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
 		lpi_base = 0;
 	}
 
-	if (!dev || !itt ||  !col_map || (!lpi_map && alloc_lpis)) {
+	if (!dev || !col_map || (!lpi_map && alloc_lpis)) {
 		kfree(dev);
-		kfree(itt);
+		its_free_shared_pages(itt, itt_order);
 		bitmap_free(lpi_map);
 		kfree(col_map);
 		return NULL;
@@ -3456,6 +3501,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
 
 	dev->its = its;
 	dev->itt = itt;
+	dev->itt_order = itt_order;
 	dev->nr_ites = nr_ites;
 	dev->event_map.lpi_map = lpi_map;
 	dev->event_map.col_map = col_map;
@@ -3483,7 +3529,7 @@ static void its_free_device(struct its_device *its_dev)
 	list_del(&its_dev->entry);
 	raw_spin_unlock_irqrestore(&its_dev->its->lock, flags);
 	kfree(its_dev->event_map.col_map);
-	kfree(its_dev->itt);
+	its_free_shared_pages(its_dev->itt, its_dev->itt_order);
 	kfree(its_dev);
 }
 
@@ -5127,8 +5173,9 @@ static int __init its_probe_one(struct its_node *its)
 		}
 	}
 
-	page = alloc_pages_node(its->numa_node, GFP_KERNEL | __GFP_ZERO,
-				get_order(ITS_CMD_QUEUE_SZ));
+	page = its_alloc_shared_pages_node(its->numa_node,
+					   GFP_KERNEL | __GFP_ZERO,
+					   get_order(ITS_CMD_QUEUE_SZ));
 	if (!page) {
 		err = -ENOMEM;
 		goto out_unmap_sgir;
@@ -5192,7 +5239,7 @@ static int __init its_probe_one(struct its_node *its)
 out_free_tables:
 	its_free_tables(its);
 out_free_cmd:
-	free_pages((unsigned long)its->cmd_base, get_order(ITS_CMD_QUEUE_SZ));
+	its_free_shared_pages(its->cmd_base, get_order(ITS_CMD_QUEUE_SZ));
 out_unmap_sgir:
 	if (its->sgir_base)
 		iounmap(its->sgir_base);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 13/14] arm64: rsi: Interfaces to query attestation token
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
                     ` (11 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 12/14] arm64: realm: Support nonsecure ITS emulation shared Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 14/14] virt: arm-cca-guest: TSM_REPORT support for realms Steven Price
  13 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Sami Mujawar, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Steven Price

From: Sami Mujawar <sami.mujawar@arm.com>

Add interfaces to query the attestation token using
the RSI calls.

Signed-off-by: Sami Mujawar <sami.mujawar@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/rsi_cmds.h | 74 +++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/arch/arm64/include/asm/rsi_cmds.h b/arch/arm64/include/asm/rsi_cmds.h
index b4cbeafa2f41..c1850aefe54e 100644
--- a/arch/arm64/include/asm/rsi_cmds.h
+++ b/arch/arm64/include/asm/rsi_cmds.h
@@ -10,6 +10,9 @@
 
 #include <asm/rsi_smc.h>
 
+#define GRANULE_SHIFT		12
+#define GRANULE_SIZE		(_AC(1, UL) << GRANULE_SHIFT)
+
 enum ripas {
 	RSI_RIPAS_EMPTY,
 	RSI_RIPAS_RAM,
@@ -66,4 +69,75 @@ static inline unsigned long rsi_set_addr_range_state(phys_addr_t start,
 	return res.a0;
 }
 
+/**
+ * rsi_attestation_token_init - Initialise the operation to retrieve an
+ * attestation token.
+ *
+ * @challenge:	The challenge data to be used in the attestation token
+ *		generation.
+ * @size:	Size of the challenge data in bytes.
+ *
+ * Initialises the attestation token generation and returns an upper bound
+ * on the attestation token size that can be used to allocate an adequate
+ * buffer. The caller is expected to subsequently call
+ * rsi_attestation_token_continue() to retrieve the attestation token data on
+ * the same CPU.
+ *
+ * Returns:
+ *  On success, returns the upper limit of the attestation report size.
+ *  Otherwise, -EINVAL
+ */
+static inline unsigned long
+rsi_attestation_token_init(const u8 *challenge, unsigned long size)
+{
+	struct arm_smccc_1_2_regs regs = { 0 };
+
+	/* The challenge must be at least 32bytes and at most 64bytes */
+	if (!challenge || size < 32 || size > 64)
+		return -EINVAL;
+
+	regs.a0 = SMC_RSI_ATTESTATION_TOKEN_INIT;
+	memcpy(&regs.a1, challenge, size);
+	arm_smccc_1_2_smc(&regs, &regs);
+
+	if (regs.a0 == RSI_SUCCESS)
+		return regs.a1;
+
+	return -EINVAL;
+}
+
+/**
+ * rsi_attestation_token_continue - Continue the operation to retrieve an
+ * attestation token.
+ *
+ * @granule: {I}PA of the Granule to which the token will be written.
+ * @offset:  Offset within Granule to start of buffer in bytes.
+ * @size:    The size of the buffer.
+ * @len:     The number of bytes written to the buffer.
+ *
+ * Retrieves up to a GRANULE_SIZE worth of token data per call. The caller is
+ * expected to call rsi_attestation_token_init() before calling this function
+ * to retrieve the attestation token.
+ *
+ * Return:
+ * * %RSI_SUCCESS     - Attestation token retrieved successfully.
+ * * %RSI_INCOMPLETE  - Token generation is not complete.
+ * * %RSI_ERROR_INPUT - A parameter was not valid.
+ * * %RSI_ERROR_STATE - Attestation not in progress.
+ */
+static inline int rsi_attestation_token_continue(phys_addr_t granule,
+						 unsigned long offset,
+						 unsigned long size,
+						 unsigned long *len)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RSI_ATTESTATION_TOKEN_CONTINUE,
+			     granule, offset, size, 0, &res);
+
+	if (len)
+		*len = res.a1;
+	return res.a0;
+}
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 14/14] virt: arm-cca-guest: TSM_REPORT support for realms
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
                     ` (12 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 13/14] arm64: rsi: Interfaces to query attestation token Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-24 13:06     ` Thomas Fossati
  2024-04-24 13:19     ` Suzuki K Poulose
  13 siblings, 2 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Sami Mujawar, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Steven Price

From: Sami Mujawar <sami.mujawar@arm.com>

Introduce an arm-cca-guest driver that registers with
the configfs-tsm module to provide user interfaces for
retrieving an attestation token.

When a new report is requested the arm-cca-guest driver
invokes the appropriate RSI interfaces to query an
attestation token.

The steps to retrieve an attestation token are as follows:
  1. Mount the configfs filesystem if not already mounted
     mount -t configfs none /sys/kernel/config
  2. Generate an attestation token
     report=/sys/kernel/config/tsm/report/report0
     mkdir $report
     dd if=/dev/urandom bs=64 count=1 > $report/inblob
     hexdump -C $report/outblob
     rmdir $report

Signed-off-by: Sami Mujawar <sami.mujawar@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 drivers/virt/coco/Kconfig                     |   2 +
 drivers/virt/coco/Makefile                    |   1 +
 drivers/virt/coco/arm-cca-guest/Kconfig       |  11 +
 drivers/virt/coco/arm-cca-guest/Makefile      |   2 +
 .../virt/coco/arm-cca-guest/arm-cca-guest.c   | 208 ++++++++++++++++++
 5 files changed, 224 insertions(+)
 create mode 100644 drivers/virt/coco/arm-cca-guest/Kconfig
 create mode 100644 drivers/virt/coco/arm-cca-guest/Makefile
 create mode 100644 drivers/virt/coco/arm-cca-guest/arm-cca-guest.c

diff --git a/drivers/virt/coco/Kconfig b/drivers/virt/coco/Kconfig
index 87d142c1f932..4fb69804b622 100644
--- a/drivers/virt/coco/Kconfig
+++ b/drivers/virt/coco/Kconfig
@@ -12,3 +12,5 @@ source "drivers/virt/coco/efi_secret/Kconfig"
 source "drivers/virt/coco/sev-guest/Kconfig"
 
 source "drivers/virt/coco/tdx-guest/Kconfig"
+
+source "drivers/virt/coco/arm-cca-guest/Kconfig"
diff --git a/drivers/virt/coco/Makefile b/drivers/virt/coco/Makefile
index 18c1aba5edb7..a6228a1bf992 100644
--- a/drivers/virt/coco/Makefile
+++ b/drivers/virt/coco/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_TSM_REPORTS)	+= tsm.o
 obj-$(CONFIG_EFI_SECRET)	+= efi_secret/
 obj-$(CONFIG_SEV_GUEST)		+= sev-guest/
 obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx-guest/
+obj-$(CONFIG_ARM_CCA_GUEST)	+= arm-cca-guest/
diff --git a/drivers/virt/coco/arm-cca-guest/Kconfig b/drivers/virt/coco/arm-cca-guest/Kconfig
new file mode 100644
index 000000000000..c4039c10dce2
--- /dev/null
+++ b/drivers/virt/coco/arm-cca-guest/Kconfig
@@ -0,0 +1,11 @@
+config ARM_CCA_GUEST
+	tristate "Arm CCA Guest driver"
+	depends on ARM64
+	default m
+	select TSM_REPORTS
+	help
+	  The driver provides userspace interface to request and
+	  attestation report from the Realm Management Monitor(RMM).
+
+	  If you choose 'M' here, this module will be called
+	  realm-guest.
diff --git a/drivers/virt/coco/arm-cca-guest/Makefile b/drivers/virt/coco/arm-cca-guest/Makefile
new file mode 100644
index 000000000000..69eeba08e98a
--- /dev/null
+++ b/drivers/virt/coco/arm-cca-guest/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_ARM_CCA_GUEST) += arm-cca-guest.o
diff --git a/drivers/virt/coco/arm-cca-guest/arm-cca-guest.c b/drivers/virt/coco/arm-cca-guest/arm-cca-guest.c
new file mode 100644
index 000000000000..2c15ff162da0
--- /dev/null
+++ b/drivers/virt/coco/arm-cca-guest/arm-cca-guest.c
@@ -0,0 +1,208 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+
+#include <linux/arm-smccc.h>
+#include <linux/cc_platform.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/smp.h>
+#include <linux/tsm.h>
+#include <linux/types.h>
+
+#include <asm/rsi.h>
+
+/**
+ * struct arm_cca_token_info - a descriptor for the token buffer.
+ * @granule:	PA of the page to which the token will be written
+ * @offset:	Offset within granule to start of buffer in bytes
+ * @len:	Number of bytes of token data that was retrieved
+ * @result:	result of rsi_attestation_token_continue operation
+ */
+struct arm_cca_token_info {
+	phys_addr_t     granule;
+	unsigned long   offset;
+	int             result;
+};
+
+/**
+ * arm_cca_attestation_continue - Retrieve the attestation token data.
+ *
+ * @param: pointer to the arm_cca_token_info
+ *
+ * Attestation token generation is a long running operation and therefore
+ * the token data may not be retrieved in a single call. Moreover, the
+ * token retrieval operation must be requested on the same CPU on which the
+ * attestation token generation was initialised.
+ * This helper function is therefore scheduled on the same CPU multiple
+ * times until the entire token data is retrieved.
+ */
+static void arm_cca_attestation_continue(void *param)
+{
+	unsigned long len;
+	unsigned long size;
+	struct arm_cca_token_info *info;
+
+	if (!param)
+		return;
+
+	info = (struct arm_cca_token_info *)param;
+
+	size = GRANULE_SIZE - info->offset;
+	info->result = rsi_attestation_token_continue(info->granule,
+						      info->offset, size, &len);
+	info->offset += len;
+}
+
+/**
+ * arm_cca_report_new - Generate a new attestation token.
+ *
+ * @report: pointer to the TSM report context information.
+ * @data:  pointer to the context specific data for this module.
+ *
+ * Initialise the attestation token generation using the challenge data
+ * passed in the TSM decriptor. Allocate memory for the attestation token
+ * and schedule calls to retrieve the attestation token on the same CPU
+ * on which the attestation token generation was initialised.
+ *
+ * Return:
+ * * %0        - Attestation token generated successfully.
+ * * %-EINVAL  - A parameter was not valid.
+ * * %-ENOMEM  - Out of memory.
+ * * %-EFAULT  - Failed to get IPA for memory page(s).
+ * * %A negative status code as returned by smp_call_function_single().
+ */
+static int arm_cca_report_new(struct tsm_report *report, void *data)
+{
+	int ret;
+	int cpu;
+	long max_size;
+	unsigned long token_size;
+	struct arm_cca_token_info info;
+	void *buf;
+	u8 *token __free(kvfree) = NULL;
+	struct tsm_desc *desc = &report->desc;
+
+	if (!report)
+		return -EINVAL;
+
+	if (desc->inblob_len < 32 || desc->inblob_len > 64)
+		return -EINVAL;
+
+	/*
+	 * Get a CPU on which the attestation token generation will be
+	 * scheduled and initialise the attestation token generation.
+	 */
+	cpu = get_cpu();
+	max_size = rsi_attestation_token_init(desc->inblob, desc->inblob_len);
+	put_cpu();
+
+	if (max_size <= 0)
+		return -EINVAL;
+
+	/* Allocate outblob */
+	token = kvzalloc(max_size, GFP_KERNEL);
+	if (!token)
+		return -ENOMEM;
+
+	/*
+	 * Since the outblob may not be physically contiguous, use a page
+	 * to bounce the buffer from RMM.
+	 */
+	buf = alloc_pages_exact(GRANULE_SIZE, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	/* Get the PA of the memory page(s) that were allocated. */
+	info.granule = (unsigned long)virt_to_phys(buf);
+
+	token_size = 0;
+	/* Loop until the token is ready or there is an error. */
+	do {
+		/* Retrieve one GRANULE_SIZE data per loop iteration. */
+		info.offset = 0;
+		do {
+			/*
+			 * Schedule a call to retrieve a sub-granule chunk
+			 * of data per loop iteration.
+			 */
+			ret = smp_call_function_single(cpu,
+						       arm_cca_attestation_continue,
+						       (void *)&info, true);
+			if (ret != 0) {
+				token_size = 0;
+				goto exit_free_granule_page;
+			}
+
+			ret = info.result;
+		} while ((ret == RSI_INCOMPLETE) &&
+			 (info.offset < GRANULE_SIZE));
+
+		/*
+		 * Copy the retrieved token data from the granule
+		 * to the token buffer, ensuring that the RMM doesn't
+		 * overflow the buffer.
+		 */
+		if (WARN_ON(token_size + info.offset > max_size))
+			break;
+		memcpy(&token[token_size], buf, info.offset);
+		token_size += info.offset;
+	} while (ret == RSI_INCOMPLETE);
+
+	if (ret != RSI_SUCCESS) {
+		ret = -ENXIO;
+		token_size = 0;
+		goto exit_free_granule_page;
+	}
+
+	report->outblob = no_free_ptr(token);
+exit_free_granule_page:
+	report->outblob_len = token_size;
+	free_pages_exact(buf, GRANULE_SIZE);
+	return ret;
+}
+
+static const struct tsm_ops arm_cca_tsm_ops = {
+	.name = KBUILD_MODNAME,
+	.report_new = arm_cca_report_new,
+};
+
+/**
+ * arm_cca_guest_init - Register with the Trusted Security Module (TSM)
+ * interface.
+ *
+ * Return:
+ * * %0        - Registered successfully with the TSM interface.
+ * * %-ENODEV  - The execution context is not an Arm Realm.
+ * * %-EINVAL  - A parameter was not valid.
+ * * %-EBUSY   - Already registered.
+ */
+static int __init arm_cca_guest_init(void)
+{
+	int ret;
+
+	if (!is_realm_world())
+		return -ENODEV;
+
+	ret = tsm_register(&arm_cca_tsm_ops, NULL, &tsm_report_default_type);
+	if (ret < 0)
+		pr_err("Failed to register with TSM.\n");
+
+	return ret;
+}
+module_init(arm_cca_guest_init);
+
+/**
+ * arm_cca_guest_exit - unregister with the Trusted Security Module (TSM)
+ * interface.
+ */
+static void __exit arm_cca_guest_exit(void)
+{
+	tsm_unregister(&arm_cca_tsm_ops);
+}
+module_exit(arm_cca_guest_exit);
+
+MODULE_AUTHOR("Sami Mujawar <sami.mujawar@arm.com>");
+MODULE_DESCRIPTION("Arm CCA Guest TSM Driver.");
+MODULE_LICENSE("GPL");
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 00/43] arm64: Support for Arm CCA in KVM
  2024-04-12  8:40 [v2] Support for Arm CCA VMs on Linux Steven Price
  2024-04-11 18:54 ` Itaru Kitayama
  2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
@ 2024-04-12  8:42 ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 01/43] KVM: Prepare for handling only shared mappings in mmu_notifier events Steven Price
                     ` (42 more replies)
  2024-04-12 16:52 ` [v2] Support for Arm CCA VMs on Linux Jean-Philippe Brucker
  3 siblings, 43 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

This series adds support for running protected VMs using KVM under the
Arm Confidential Compute Architecture (CCA). The purpose of this
series is to gather feedback on the proposed changes to the architecture
code for CCA.

The main change from the previous RFC is that it updates the code to use
a guest_memfd descriptor to back the private memory of the guest. This
avoids any issues where a malicious VMM could potentially cause a fatal
Granule Protection Fault elsewhere in the kernel.

The ABI to the RMM (the RMI) is based on the final RMM v1.0 (EAC 5)
specification[1].

This series is based on v6.9-rc1. It is also available as a git
repository:

https://gitlab.arm.com/linux-arm/linux-cca cca-host/v2

Work in progress changes for kvmtool are available from the git
repository below, these changes are based on Fuad Tabba's repository for
pKVM to provide some alignment with the ongoing pKVM work:

https://gitlab.arm.com/linux-arm/kvmtool-cca cca/v2

Introduction
============
A more general introduction to Arm CCA is available on the Arm
website[2], and links to the other components involved are available in
the overall cover letter.

Arm Confidential Compute Architecture adds two new 'worlds' to the
architecture: Root and Realm. A new software component known as the RMM
(Realm Management Monitor) runs in Realm EL2 and is trusted by both the
Normal World and VMs running within Realms. This enables mutual
distrust between the Realm VMs and the Normal World.

Virtual machines running within a Realm can decide on a (4k)
page-by-page granularity whether to share a page with the (Normal World)
host or to keep it private (protected). This protection is provided by
the hardware and attempts to access a page which isn't shared by the
Normal World will trigger a Granule Protection Fault. The series starts
by adding handling for these; faults within user space can be handled by
killing the process, faults within kernel space are considered fatal.

The Normal World host can communicate with the RMM via an SMC interface
known as RMI (Realm Management Interface), and Realm VMs can communicate
with the RMM via another SMC interface known as RSI (Realm Services
Interface). This series adds wrappers for the full set of RMI commands
and uses them to manage the realm guests.

The Normal World can use RMI commands to delegate pages to the Realm
world and to create, manage and run Realm VMs. Once delegated the pages
are inaccessible to the Normal World (unless explicitly shared by the
guest). However the Normal World may destroy the Realm VM at any time to
be able to reclaim (undelegate) the pages.

Realm VMs are identified by the KVM_CREATE_VM command, where the 'type'
argument has a new field to describe whether the guest is 'normal' or a
'realm'.

Entry/exit of a Realm VM attempts to reuse the KVM infrastructure, but
ultimately the final mechanism is different. So this series has a bunch
of commits handling the differences. As much as possible is placed in a
two new files: rme.c and rme-exit.c.

KVM also handles some of the PSCI requests for a realm and helps the RMM
complete the PSCI service requests.

Interrupts are managed by KVM, and are injected into the Realm with the
help of the RMM.

The RMM specification provides a new mechanism for a guest to
communicate with host which goes by the name "Host Call". This is simply
hooked up to the existing support for HVC calls from a normal
guest.

[1] https://developer.arm.com/documentation/den0137/1-0eac5/
[2] https://www.arm.com/architecture/security-features/arm-confidential-compute-architecture

Jean-Philippe Brucker (7):
  arm64: RME: Propagate number of breakpoints and watchpoints to
    userspace
  arm64: RME: Set breakpoint parameters through SET_ONE_REG
  arm64: RME: Initialize PMCR.N with number counter supported by RMM
  arm64: RME: Propagate max SVE vector length from RMM
  arm64: RME: Configure max SVE vector length for a Realm
  arm64: RME: Provide register list for unfinalized RME RECs
  arm64: RME: Provide accurate register list

Joey Gouly (2):
  arm64: rme: allow userspace to inject aborts
  arm64: rme: support RSI_HOST_CALL

Sean Christopherson (1):
  KVM: Prepare for handling only shared mappings in mmu_notifier events

Steven Price (29):
  arm64: RME: Handle Granule Protection Faults (GPFs)
  arm64: RME: Add SMC definitions for calling the RMM
  arm64: RME: Add wrappers for RMI calls
  arm64: RME: Check for RME support at KVM init
  arm64: RME: Define the user ABI
  arm64: RME: ioctls to create and configure realms
  arm64: kvm: Allow passing machine type in KVM creation
  arm64: RME: Keep a spare page delegated to the RMM
  arm64: RME: RTT handling
  arm64: RME: Allocate/free RECs to match vCPUs
  arm64: RME: Support for the VGIC in realms
  KVM: arm64: Support timers in realm RECs
  arm64: RME: Allow VMM to set RIPAS
  arm64: RME: Handle realm enter/exit
  KVM: arm64: Handle realm MMIO emulation
  arm64: RME: Allow populating initial contents
  arm64: RME: Runtime faulting of memory
  KVM: arm64: Handle realm VCPU load
  KVM: arm64: Validate register access for a Realm VM
  KVM: arm64: Handle Realm PSCI requests
  KVM: arm64: WARN on injected undef exceptions
  arm64: Don't expose stolen time for realm guests
  arm64: RME: Always use 4k pages for realms
  arm64: rme: Prevent Device mappings for Realms
  arm_pmu: Provide a mechanism for disabling the physical IRQ
  arm64: rme: Enable PMU support with a realm guest
  kvm: rme: Hide KVM_CAP_READONLY_MEM for realm guests
  arm64: kvm: Expose support for private memory
  KVM: arm64: Allow activating realms

Suzuki K Poulose (4):
  kvm: arm64: pgtable: Track the number of pages in the entry level
  kvm: arm64: Include kvm_emulate.h in kvm/arm_psci.h
  kvm: arm64: Expose debug HW register numbers for Realm
  arm64: rme: Allow checking SVE on VM instance

 Documentation/virt/kvm/api.rst       |    3 +
 arch/arm64/include/asm/kvm_emulate.h |   35 +
 arch/arm64/include/asm/kvm_host.h    |   13 +-
 arch/arm64/include/asm/kvm_pgtable.h |    2 +
 arch/arm64/include/asm/kvm_rme.h     |  154 +++
 arch/arm64/include/asm/rmi_cmds.h    |  509 +++++++++
 arch/arm64/include/asm/rmi_smc.h     |  250 ++++
 arch/arm64/include/asm/virt.h        |    1 +
 arch/arm64/include/uapi/asm/kvm.h    |   49 +
 arch/arm64/kvm/Kconfig               |    1 +
 arch/arm64/kvm/Makefile              |    3 +-
 arch/arm64/kvm/arch_timer.c          |   45 +-
 arch/arm64/kvm/arm.c                 |  178 ++-
 arch/arm64/kvm/guest.c               |   99 +-
 arch/arm64/kvm/hyp/pgtable.c         |    5 +-
 arch/arm64/kvm/hypercalls.c          |    4 +-
 arch/arm64/kvm/inject_fault.c        |    2 +
 arch/arm64/kvm/mmio.c                |   10 +-
 arch/arm64/kvm/mmu.c                 |  172 ++-
 arch/arm64/kvm/pmu-emul.c            |    7 +-
 arch/arm64/kvm/psci.c                |   29 +
 arch/arm64/kvm/reset.c               |   23 +-
 arch/arm64/kvm/rme-exit.c            |  211 ++++
 arch/arm64/kvm/rme.c                 | 1590 ++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c            |   83 +-
 arch/arm64/kvm/vgic/vgic-v3.c        |    9 +-
 arch/arm64/kvm/vgic/vgic.c           |   37 +-
 arch/arm64/mm/fault.c                |   29 +-
 drivers/perf/arm_pmu.c               |   15 +
 include/kvm/arm_arch_timer.h         |    2 +
 include/kvm/arm_psci.h               |    2 +
 include/linux/kvm_host.h             |    2 +
 include/linux/perf/arm_pmu.h         |    1 +
 include/uapi/linux/kvm.h             |   30 +-
 virt/kvm/kvm_main.c                  |    7 +
 35 files changed, 3514 insertions(+), 98 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_rme.h
 create mode 100644 arch/arm64/include/asm/rmi_cmds.h
 create mode 100644 arch/arm64/include/asm/rmi_smc.h
 create mode 100644 arch/arm64/kvm/rme-exit.c
 create mode 100644 arch/arm64/kvm/rme.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v2 01/43] KVM: Prepare for handling only shared mappings in mmu_notifier events
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-25  9:48     ` Fuad Tabba
  2024-04-12  8:42   ` [PATCH v2 02/43] kvm: arm64: pgtable: Track the number of pages in the entry level Steven Price
                     ` (41 subsequent siblings)
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Sean Christopherson, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Steven Price

From: Sean Christopherson <seanjc@google.com>

Add flags to "struct kvm_gfn_range" to let notifier events target only
shared and only private mappings, and write up the existing mmu_notifier
events to be shared-only (private memory is never associated with a
userspace virtual address, i.e. can't be reached via mmu_notifiers).

Add two flags so that KVM can handle the three possibilities (shared,
private, and shared+private) without needing something like a tri-state
enum.

Link: https://lore.kernel.org/all/ZJX0hk+KpQP0KUyB@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 include/linux/kvm_host.h | 2 ++
 virt/kvm/kvm_main.c      | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 48f31dcd318a..c7581360fd88 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -268,6 +268,8 @@ struct kvm_gfn_range {
 	gfn_t start;
 	gfn_t end;
 	union kvm_mmu_notifier_arg arg;
+	bool only_private;
+	bool only_shared;
 	bool may_block;
 };
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fb49c2a60200..3486ceef6f4e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -633,6 +633,13 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
 			 * the second or later invocation of the handler).
 			 */
 			gfn_range.arg = range->arg;
+
+			/*
+			 * HVA-based notifications aren't relevant to private
+			 * mappings as they don't have a userspace mapping.
+			 */
+			gfn_range.only_private = false;
+			gfn_range.only_shared = true;
 			gfn_range.may_block = range->may_block;
 
 			/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 02/43] kvm: arm64: pgtable: Track the number of pages in the entry level
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
  2024-04-12  8:42   ` [PATCH v2 01/43] KVM: Prepare for handling only shared mappings in mmu_notifier events Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 03/43] kvm: arm64: Include kvm_emulate.h in kvm/arm_psci.h Steven Price
                     ` (40 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Suzuki K Poulose, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Steven Price

From: Suzuki K Poulose <suzuki.poulose@arm.com>

Keep track of the number of pages allocated for the top level PGD,
rather than computing it everytime (though we need it only twice now).
This will be used later by Arm CCA KVM changes.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 2 ++
 arch/arm64/kvm/hyp/pgtable.c         | 5 +++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 19278dfe7978..0350c08ada7a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -362,6 +362,7 @@ static inline bool kvm_pgtable_walk_lock_held(void)
  * struct kvm_pgtable - KVM page-table.
  * @ia_bits:		Maximum input address size, in bits.
  * @start_level:	Level at which the page-table walk starts.
+ * @pgd_pages:		Number of pages in the entry level of the page-table.
  * @pgd:		Pointer to the first top-level entry of the page-table.
  * @mm_ops:		Memory management callbacks.
  * @mmu:		Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
@@ -372,6 +373,7 @@ static inline bool kvm_pgtable_walk_lock_held(void)
 struct kvm_pgtable {
 	u32					ia_bits;
 	s8					start_level;
+	u8					pgd_pages;
 	kvm_pteref_t				pgd;
 	struct kvm_pgtable_mm_ops		*mm_ops;
 
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 3fae5830f8d2..9decff9736ac 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1552,7 +1552,8 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
 	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 
-	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
+	pgt->pgd_pages = kvm_pgd_pages(ia_bits, start_level);
+	pgd_sz = pgt->pgd_pages * PAGE_SIZE;
 	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz);
 	if (!pgt->pgd)
 		return -ENOMEM;
@@ -1604,7 +1605,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-	pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
+	pgd_sz = pgt->pgd_pages * PAGE_SIZE;
 	pgt->mm_ops->free_pages_exact(kvm_dereference_pteref(&walker, pgt->pgd), pgd_sz);
 	pgt->pgd = NULL;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 03/43] kvm: arm64: Include kvm_emulate.h in kvm/arm_psci.h
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
  2024-04-12  8:42   ` [PATCH v2 01/43] KVM: Prepare for handling only shared mappings in mmu_notifier events Steven Price
  2024-04-12  8:42   ` [PATCH v2 02/43] kvm: arm64: pgtable: Track the number of pages in the entry level Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 04/43] arm64: RME: Handle Granule Protection Faults (GPFs) Steven Price
                     ` (39 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Suzuki K Poulose, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Steven Price

From: Suzuki K Poulose <suzuki.poulose@arm.com>

Fix a potential build error (like below, when asm/kvm_emulate.h gets included after
the kvm/arm_psci.h) by including the missing header file in kvm/arm_psci.h:

./include/kvm/arm_psci.h: In function ‘kvm_psci_version’:
./include/kvm/arm_psci.h:29:13: error: implicit declaration of function
   ‘vcpu_has_feature’; did you mean ‘cpu_have_feature’? [-Werror=implicit-function-declaration]
   29 |         if (vcpu_has_feature(vcpu, KVM_ARM_VCPU_PSCI_0_2)) {
	         |             ^~~~~~~~~~~~~~~~
			       |             cpu_have_feature

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 include/kvm/arm_psci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
index e8fb624013d1..1801c6fd3f10 100644
--- a/include/kvm/arm_psci.h
+++ b/include/kvm/arm_psci.h
@@ -10,6 +10,8 @@
 #include <linux/kvm_host.h>
 #include <uapi/linux/psci.h>
 
+#include <asm/kvm_emulate.h>
+
 #define KVM_ARM_PSCI_0_1	PSCI_VERSION(0, 1)
 #define KVM_ARM_PSCI_0_2	PSCI_VERSION(0, 2)
 #define KVM_ARM_PSCI_1_0	PSCI_VERSION(1, 0)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 04/43] arm64: RME: Handle Granule Protection Faults (GPFs)
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (2 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 03/43] kvm: arm64: Include kvm_emulate.h in kvm/arm_psci.h Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-16 11:17     ` Suzuki K Poulose
  2024-04-12  8:42   ` [PATCH v2 05/43] arm64: RME: Add SMC definitions for calling the RMM Steven Price
                     ` (38 subsequent siblings)
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

If the host attempts to access granules that have been delegated for use
in a realm these accesses will be caught and will trigger a Granule
Protection Fault (GPF).

A fault during a page walk signals a bug in the kernel and is handled by
oopsing the kernel. A non-page walk fault could be caused by user space
having access to a page which has been delegated to the kernel and will
trigger a SIGBUS to allow debugging why user space is trying to access a
delegated page.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/mm/fault.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 8251e2fea9c7..91da0f446dd9 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -765,6 +765,25 @@ static int do_tag_check_fault(unsigned long far, unsigned long esr,
 	return 0;
 }
 
+static int do_gpf_ptw(unsigned long far, unsigned long esr, struct pt_regs *regs)
+{
+	const struct fault_info *inf = esr_to_fault_info(esr);
+
+	die_kernel_fault(inf->name, far, esr, regs);
+	return 0;
+}
+
+static int do_gpf(unsigned long far, unsigned long esr, struct pt_regs *regs)
+{
+	const struct fault_info *inf = esr_to_fault_info(esr);
+
+	if (!is_el1_instruction_abort(esr) && fixup_exception(regs))
+		return 0;
+
+	arm64_notify_die(inf->name, regs, inf->sig, inf->code, far, esr);
+	return 0;
+}
+
 static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"ttbr address size fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"level 1 address size fault"	},
@@ -802,11 +821,11 @@ static const struct fault_info fault_info[] = {
 	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 34"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 35"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 36"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 37"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 38"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 39"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 40"			},
+	{ do_gpf_ptw,		SIGKILL, SI_KERNEL,	"Granule Protection Fault at level 0" },
+	{ do_gpf_ptw,		SIGKILL, SI_KERNEL,	"Granule Protection Fault at level 1" },
+	{ do_gpf_ptw,		SIGKILL, SI_KERNEL,	"Granule Protection Fault at level 2" },
+	{ do_gpf_ptw,		SIGKILL, SI_KERNEL,	"Granule Protection Fault at level 3" },
+	{ do_gpf,		SIGBUS,  SI_KERNEL,	"Granule Protection Fault not on table walk" },
 	{ do_bad,		SIGKILL, SI_KERNEL,	"level -1 address size fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 42"			},
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level -1 translation fault"	},
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 05/43] arm64: RME: Add SMC definitions for calling the RMM
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (3 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 04/43] arm64: RME: Handle Granule Protection Faults (GPFs) Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-16 12:38     ` Suzuki K Poulose
  2024-04-12  8:42   ` [PATCH v2 06/43] arm64: RME: Add wrappers for RMI calls Steven Price
                     ` (37 subsequent siblings)
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

The RMM (Realm Management Monitor) provides functionality that can be
accessed by SMC calls from the host.

The SMC definitions are based on DEN0137[1] version 1.0-eac5

[1] https://developer.arm.com/documentation/den0137/1-0eac5/

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/rmi_smc.h | 250 +++++++++++++++++++++++++++++++
 1 file changed, 250 insertions(+)
 create mode 100644 arch/arm64/include/asm/rmi_smc.h

diff --git a/arch/arm64/include/asm/rmi_smc.h b/arch/arm64/include/asm/rmi_smc.h
new file mode 100644
index 000000000000..c205efdb18d8
--- /dev/null
+++ b/arch/arm64/include/asm/rmi_smc.h
@@ -0,0 +1,250 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ *
+ * The values and structures in this file are from the Realm Management Monitor
+ * specification (DEN0137) version A-bet0:
+ * https://developer.arm.com/documentation/den0137/1-0bet0/
+ */
+
+#ifndef __ASM_RME_SMC_H
+#define __ASM_RME_SMC_H
+
+#include <linux/arm-smccc.h>
+
+#define SMC_RxI_CALL(func)				\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,		\
+			   ARM_SMCCC_SMC_64,		\
+			   ARM_SMCCC_OWNER_STANDARD,	\
+			   (func))
+
+#define SMC_RMI_DATA_CREATE		SMC_RxI_CALL(0x0153)
+#define SMC_RMI_DATA_CREATE_UNKNOWN	SMC_RxI_CALL(0x0154)
+#define SMC_RMI_DATA_DESTROY		SMC_RxI_CALL(0x0155)
+#define SMC_RMI_FEATURES		SMC_RxI_CALL(0x0165)
+#define SMC_RMI_GRANULE_DELEGATE	SMC_RxI_CALL(0x0151)
+#define SMC_RMI_GRANULE_UNDELEGATE	SMC_RxI_CALL(0x0152)
+#define SMC_RMI_PSCI_COMPLETE		SMC_RxI_CALL(0x0164)
+#define SMC_RMI_REALM_ACTIVATE		SMC_RxI_CALL(0x0157)
+#define SMC_RMI_REALM_CREATE		SMC_RxI_CALL(0x0158)
+#define SMC_RMI_REALM_DESTROY		SMC_RxI_CALL(0x0159)
+#define SMC_RMI_REC_AUX_COUNT		SMC_RxI_CALL(0x0167)
+#define SMC_RMI_REC_CREATE		SMC_RxI_CALL(0x015a)
+#define SMC_RMI_REC_DESTROY		SMC_RxI_CALL(0x015b)
+#define SMC_RMI_REC_ENTER		SMC_RxI_CALL(0x015c)
+#define SMC_RMI_RTT_CREATE		SMC_RxI_CALL(0x015d)
+#define SMC_RMI_RTT_DESTROY		SMC_RxI_CALL(0x015e)
+#define SMC_RMI_RTT_FOLD		SMC_RxI_CALL(0x0166)
+#define SMC_RMI_RTT_INIT_RIPAS		SMC_RxI_CALL(0x0168)
+#define SMC_RMI_RTT_MAP_UNPROTECTED	SMC_RxI_CALL(0x015f)
+#define SMC_RMI_RTT_READ_ENTRY		SMC_RxI_CALL(0x0161)
+#define SMC_RMI_RTT_SET_RIPAS		SMC_RxI_CALL(0x0169)
+#define SMC_RMI_RTT_UNMAP_UNPROTECTED	SMC_RxI_CALL(0x0162)
+#define SMC_RMI_VERSION			SMC_RxI_CALL(0x0150)
+
+#define RMI_ABI_MAJOR_VERSION	1
+#define RMI_ABI_MINOR_VERSION	0
+
+#define RMI_UNASSIGNED			0
+#define RMI_ASSIGNED			1
+#define RMI_TABLE			2
+
+#define RMI_ABI_VERSION_GET_MAJOR(version) ((version) >> 16)
+#define RMI_ABI_VERSION_GET_MINOR(version) ((version) & 0xFFFF)
+#define RMI_ABI_VERSION(major, minor)      (((major) << 16) | (minor))
+
+#define RMI_RETURN_STATUS(ret)		((ret) & 0xFF)
+#define RMI_RETURN_INDEX(ret)		(((ret) >> 8) & 0xFF)
+
+#define RMI_SUCCESS		0
+#define RMI_ERROR_INPUT		1
+#define RMI_ERROR_REALM		2
+#define RMI_ERROR_REC		3
+#define RMI_ERROR_RTT		4
+
+#define RMI_EMPTY		0
+#define RMI_RAM			1
+#define RMI_DESTROYED		2
+
+#define RMI_NO_MEASURE_CONTENT	0
+#define RMI_MEASURE_CONTENT	1
+
+#define RMI_FEATURE_REGISTER_0_S2SZ		GENMASK(7, 0)
+#define RMI_FEATURE_REGISTER_0_LPA2		BIT(8)
+#define RMI_FEATURE_REGISTER_0_SVE_EN		BIT(9)
+#define RMI_FEATURE_REGISTER_0_SVE_VL		GENMASK(13, 10)
+#define RMI_FEATURE_REGISTER_0_NUM_BPS		GENMASK(17, 14)
+#define RMI_FEATURE_REGISTER_0_NUM_WPS		GENMASK(21, 18)
+#define RMI_FEATURE_REGISTER_0_PMU_EN		BIT(22)
+#define RMI_FEATURE_REGISTER_0_PMU_NUM_CTRS	GENMASK(27, 23)
+#define RMI_FEATURE_REGISTER_0_HASH_SHA_256	BIT(28)
+#define RMI_FEATURE_REGISTER_0_HASH_SHA_512	BIT(29)
+
+#define RMI_REALM_PARAM_FLAG_LPA2		BIT(0)
+#define RMI_REALM_PARAM_FLAG_SVE		BIT(1)
+#define RMI_REALM_PARAM_FLAG_PMU		BIT(2)
+
+/*
+ * Note many of these fields are smaller than u64 but all fields have u64
+ * alignment, so use u64 to ensure correct alignment.
+ */
+struct realm_params {
+	union { /* 0x0 */
+		struct {
+			u64 flags;
+			u64 s2sz;
+			u64 sve_vl;
+			u64 num_bps;
+			u64 num_wps;
+			u64 pmu_num_ctrs;
+			u64 hash_algo;
+		};
+		u8 padding_1[0x400];
+	};
+	union { /* 0x400 */
+		u8 rpv[64];
+		u8 padding_2[0x400];
+	};
+	union { /* 0x800 */
+		struct {
+			u64 vmid;
+			u64 rtt_base;
+			s64 rtt_level_start;
+			u64 rtt_num_start;
+		};
+		u8 padding_3[0x800];
+	};
+};
+
+/*
+ * The number of GPRs (starting from X0) that are
+ * configured by the host when a REC is created.
+ */
+#define REC_CREATE_NR_GPRS		8
+
+#define REC_PARAMS_FLAG_RUNNABLE	BIT_ULL(0)
+
+#define REC_PARAMS_AUX_GRANULES		16
+
+struct rec_params {
+	union { /* 0x0 */
+		u64 flags;
+		u8 padding1[0x100];
+	};
+	union { /* 0x100 */
+		u64 mpidr;
+		u8 padding2[0x100];
+	};
+	union { /* 0x200 */
+		u64 pc;
+		u8 padding3[0x100];
+	};
+	union { /* 0x300 */
+		u64 gprs[REC_CREATE_NR_GPRS];
+		u8 padding4[0x500];
+	};
+	union { /* 0x800 */
+		struct {
+			u64 num_rec_aux;
+			u64 aux[REC_PARAMS_AUX_GRANULES];
+		};
+		u8 padding5[0x800];
+	};
+};
+
+#define RMI_EMULATED_MMIO		BIT(0)
+#define RMI_INJECT_SEA			BIT(1)
+#define RMI_TRAP_WFI			BIT(2)
+#define RMI_TRAP_WFE			BIT(3)
+
+#define REC_RUN_GPRS			31
+#define REC_GIC_NUM_LRS			16
+
+struct rec_entry {
+	union { /* 0x000 */
+		u64 flags;
+		u8 padding0[0x200];
+	};
+	union { /* 0x200 */
+		u64 gprs[REC_RUN_GPRS];
+		u8 padding2[0x100];
+	};
+	union { /* 0x300 */
+		struct {
+			u64 gicv3_hcr;
+			u64 gicv3_lrs[REC_GIC_NUM_LRS];
+		};
+		u8 padding3[0x100];
+	};
+	u8 padding4[0x400];
+};
+
+struct rec_exit {
+	union { /* 0x000 */
+		u8 exit_reason;
+		u8 padding0[0x100];
+	};
+	union { /* 0x100 */
+		struct {
+			u64 esr;
+			u64 far;
+			u64 hpfar;
+		};
+		u8 padding1[0x100];
+	};
+	union { /* 0x200 */
+		u64 gprs[REC_RUN_GPRS];
+		u8 padding2[0x100];
+	};
+	union { /* 0x300 */
+		struct {
+			u64 gicv3_hcr;
+			u64 gicv3_lrs[REC_GIC_NUM_LRS];
+			u64 gicv3_misr;
+			u64 gicv3_vmcr;
+		};
+		u8 padding3[0x100];
+	};
+	union { /* 0x400 */
+		struct {
+			u64 cntp_ctl;
+			u64 cntp_cval;
+			u64 cntv_ctl;
+			u64 cntv_cval;
+		};
+		u8 padding4[0x100];
+	};
+	union { /* 0x500 */
+		struct {
+			u64 ripas_base;
+			u64 ripas_top;
+			u64 ripas_value;
+		};
+		u8 padding5[0x100];
+	};
+	union { /* 0x600 */
+		u16 imm;
+		u8 padding6[0x100];
+	};
+	union { /* 0x700 */
+		struct {
+			u64 pmu_ovf_status;
+		};
+		u8 padding7[0x100];
+	};
+};
+
+struct rec_run {
+	struct rec_entry entry;
+	struct rec_exit exit;
+};
+
+#define RMI_EXIT_SYNC			0x00
+#define RMI_EXIT_IRQ			0x01
+#define RMI_EXIT_FIQ			0x02
+#define RMI_EXIT_PSCI			0x03
+#define RMI_EXIT_RIPAS_CHANGE		0x04
+#define RMI_EXIT_HOST_CALL		0x05
+#define RMI_EXIT_SERROR			0x06
+
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 06/43] arm64: RME: Add wrappers for RMI calls
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (4 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 05/43] arm64: RME: Add SMC definitions for calling the RMM Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-16 13:14     ` Suzuki K Poulose
  2024-04-12  8:42   ` [PATCH v2 07/43] arm64: RME: Check for RME support at KVM init Steven Price
                     ` (36 subsequent siblings)
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

The wrappers make the call sites easier to read and deal with the
boiler plate of handling the error codes from the RMM.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/rmi_cmds.h | 509 ++++++++++++++++++++++++++++++
 1 file changed, 509 insertions(+)
 create mode 100644 arch/arm64/include/asm/rmi_cmds.h

diff --git a/arch/arm64/include/asm/rmi_cmds.h b/arch/arm64/include/asm/rmi_cmds.h
new file mode 100644
index 000000000000..c21414127e8e
--- /dev/null
+++ b/arch/arm64/include/asm/rmi_cmds.h
@@ -0,0 +1,509 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+
+#ifndef __ASM_RMI_CMDS_H
+#define __ASM_RMI_CMDS_H
+
+#include <linux/arm-smccc.h>
+
+#include <asm/rmi_smc.h>
+
+struct rtt_entry {
+	unsigned long walk_level;
+	unsigned long desc;
+	int state;
+	int ripas;
+};
+
+/**
+ * rmi_data_create() - Create a Data Granule
+ * @rd: PA of the RD
+ * @data: PA of the target granule
+ * @ipa: IPA at which the granule will be mapped in the guest
+ * @src: PA of the source granule
+ * @flags: RMI_MEASURE_CONTENT if the contents should be measured
+ *
+ * Create a new Data Granule, copying contents from a Non-secure Granule.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_data_create(unsigned long rd, unsigned long data,
+				  unsigned long ipa, unsigned long src,
+				  unsigned long flags)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_DATA_CREATE, rd, data, ipa, src,
+			     flags, &res);
+
+	return res.a0;
+}
+
+/**
+ * rmi_data_create_unknown() - Create a Data Granule with unknown contents
+ * @rd: PA of the RD
+ * @data: PA of the target granule
+ * @ipa: IPA at which the granule will be mapped in the guest
+ *
+ * Create a new Data Granule with unknown contents
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_data_create_unknown(unsigned long rd,
+					  unsigned long data,
+					  unsigned long ipa)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_DATA_CREATE_UNKNOWN, rd, data, ipa, &res);
+
+	return res.a0;
+}
+
+/**
+ * rmi_data_destroy() - Destroy a Data Granule
+ * @rd: PA of the RD
+ * @ipa: IPA at which the granule is mapped in the guest
+ * @data_out: PA of the granule which was destroyed
+ * @top_out: Top IPA of non-live RTT entries
+ *
+ * Transitions the granule to DESTROYED state, the address cannot be used by
+ * the guest for the lifetime of the Realm.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_data_destroy(unsigned long rd, unsigned long ipa,
+				   unsigned long *data_out,
+				   unsigned long *top_out)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_DATA_DESTROY, rd, ipa, &res);
+
+	*data_out = res.a1;
+	*top_out = res.a2;
+
+	return res.a0;
+}
+
+/**
+ * rmi_features() - Read feature register
+ * @index: Feature register index
+ * @out: Feature register value is written to this pointer
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_features(unsigned long index, unsigned long *out)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_FEATURES, index, &res);
+
+	*out = res.a1;
+	return res.a0;
+}
+
+/**
+ * rmi_granule_delegate() - Delegate a Granule
+ * @phys: PA of the Granule
+ *
+ * Delegate a Granule for use by the Realm World.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_granule_delegate(unsigned long phys)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_GRANULE_DELEGATE, phys, &res);
+
+	return res.a0;
+}
+
+/**
+ * rmi_granule_undelegate() - Undelegate a Granule
+ * @phys: PA of the Granule
+ *
+ * Undelegate a Granule to allow use by the Normal World. Will fail if the
+ * Granule is in use.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_granule_undelegate(unsigned long phys)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_GRANULE_UNDELEGATE, phys, &res);
+
+	return res.a0;
+}
+
+/**
+ * rmi_psci_complete() - Complete pending PSCI command
+ * @calling_rec: PA of the calling REC
+ * @target_rec: PA of the target REC
+ * @status: Status of the PSCI request
+ *
+ * Completes a pending PSCI command which was called with an MPIDR argument, by
+ * providing the corresponding REC.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_psci_complete(unsigned long calling_rec,
+				    unsigned long target_rec,
+				    unsigned long status)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_PSCI_COMPLETE, calling_rec, target_rec,
+			     status, &res);
+
+	return res.a0;
+}
+
+/**
+ * rmi_realm_activate() - Active a Realm
+ * @rd: PA of the RD
+ *
+ * Mark a Realm as Active signalling that creation is complete and allowing
+ * execution of the Realm.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_realm_activate(unsigned long rd)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_REALM_ACTIVATE, rd, &res);
+
+	return res.a0;
+}
+
+/**
+ * rmi_realm_create() - Create a Realm
+ * @rd: PA of the RD
+ * @params_ptr: PA of Realm parameters
+ *
+ * Create a new Realm using the given parameters.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_realm_create(unsigned long rd, unsigned long params_ptr)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_REALM_CREATE, rd, params_ptr, &res);
+
+	return res.a0;
+}
+
+/**
+ * rmi_realm_destroy() - Destroy a Realm
+ * @rd: PA of the RD
+ *
+ * Destroys a Realm, all objects belonging to the Realm must be destroyed first.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_realm_destroy(unsigned long rd)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_REALM_DESTROY, rd, &res);
+
+	return res.a0;
+}
+
+/**
+ * rmi_rec_aux_count() - Get number of auxiliary Granules required
+ * @rd: PA of the RD
+ * @aux_count: Number of pages written to this pointer
+ *
+ * A REC may require extra auxiliary pages to be delegateed for the RMM to
+ * store metadata (not visible to the normal world) in. This function provides
+ * the number of pages that are required.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_rec_aux_count(unsigned long rd, unsigned long *aux_count)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_REC_AUX_COUNT, rd, &res);
+
+	*aux_count = res.a1;
+	return res.a0;
+}
+
+/**
+ * rmi_rec_create() - Create a REC
+ * @rd: PA of the RD
+ * @rec: PA of the target REC
+ * @params_ptr: PA of REC parameters
+ *
+ * Create a REC using the parameters specified in the struct rec_params pointed
+ * to by @params_ptr.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_rec_create(unsigned long rd, unsigned long rec,
+				 unsigned long params_ptr)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_REC_CREATE, rd, rec, params_ptr, &res);
+
+	return res.a0;
+}
+
+/**
+ * rmi_rec_destroy() - Destroy a REC
+ * @rec: PA of the target REC
+ *
+ * Destroys a REC. The REC must not be running.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_rec_destroy(unsigned long rec)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_REC_DESTROY, rec, &res);
+
+	return res.a0;
+}
+
+/**
+ * rmi_rec_enter() - Enter a REC
+ * @rec: PA of the target REC
+ * @run_ptr: PA of RecRun structure
+ *
+ * Starts (or continues) execution within a REC.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_rec_enter(unsigned long rec, unsigned long run_ptr)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_REC_ENTER, rec, run_ptr, &res);
+
+	return res.a0;
+}
+
+/**
+ * rmi_rtt_create() - Creates an RTT
+ * @rd: PA of the RD
+ * @rtt: PA of the target RTT
+ * @ipa: Base of the IPA range described by the RTT
+ * @level: Depth of the RTT within the tree
+ *
+ * Creates an RTT (Realm Translation Table) at the specified address and level
+ * within the realm.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_rtt_create(unsigned long rd, unsigned long rtt,
+				 unsigned long ipa, long level)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_RTT_CREATE, rd, rtt, ipa, level, &res);
+
+	return res.a0;
+}
+
+/**
+ * rmi_rtt_destroy() - Destroy an RTT
+ * @rd: PA of the RD
+ * @ipa: Base of the IPA range described by the RTT
+ * @level: Depth of the RTT within the tree
+ * @out_rtt: Pointer to write the PA of the RTT which was destroyed
+ * @out_top: Pointer to write the top IPA of non-live RTT entries
+ *
+ * Destroys an RTT. The RTT must be empty.
+ *
+ * Return: RMI return code.
+ */
+static inline int rmi_rtt_destroy(unsigned long rd,
+				  unsigned long ipa,
+				  long level,
+				  unsigned long *out_rtt,
+				  unsigned long *out_top)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_RTT_DESTROY, rd, ipa, level, &res);
+
+	*out_rtt = res.a1;
+	*out_top = res.a2;
+
+	return res.a0;
+}
+
+/**
+ * rmi_rtt_fold() - Fold an RTT
+ * @rd: PA of the RD
+ * @ipa: Base of the IPA range described by the RTT
+ * @level: Depth of the RTT within the tree
+ * @out_rtt: Pointer to write the PA of the RTT which was destroyed
+ *
+ * Folds an RTT. If all entries with the RTT are 'homogeneous' the RTT can be
+ * folded into the parent and the RTT destroyed.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_rtt_fold(unsigned long rd, unsigned long ipa,
+			       long level, unsigned long *out_rtt)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_RTT_FOLD, rd, ipa, level, &res);
+
+	*out_rtt = res.a1;
+
+	return res.a0;
+}
+
+/**
+ * rmi_rtt_init_ripas() - Set RIPAS for new Realm
+ * @rd: PA of the RD
+ * @base: Base of target IPA region
+ * @top: Top of target IPA region
+ * @out_top: Top IPA of range whose RIPAS was modified
+ *
+ * Sets the RIPAS of a target IPA range to RAM, for a Realm in the NEW state.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_rtt_init_ripas(unsigned long rd, unsigned long base,
+				     unsigned long top, unsigned long *out_top)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_RTT_INIT_RIPAS, rd, base, top, &res);
+
+	*out_top = res.a1;
+
+	return res.a0;
+}
+
+/**
+ * rmi_rtt_map_unprotected() - Map NS pages into a Realm
+ * @rd: PA of the RD
+ * @ipa: Base IPA of the mapping
+ * @level: Depth within the RTT tree
+ * @desc: RTTE descriptor
+ *
+ * Create a mapping from an Unprotected IPA to a Non-secure PA.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_rtt_map_unprotected(unsigned long rd,
+					  unsigned long ipa,
+					  long level,
+					  unsigned long desc)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_RTT_MAP_UNPROTECTED, rd, ipa, level,
+			     desc, &res);
+
+	return res.a0;
+}
+
+/**
+ * rmi_rtt_read_entry() - Read an RTTE
+ * @rd: PA of the RD
+ * @ipa: IPA for which to read the RTTE
+ * @level: RTT level at which to read the RTTE
+ * @rtt: Output structure describing the RTTE
+ *
+ * Reads a RTTE (Realm Translation Table Entry).
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_rtt_read_entry(unsigned long rd, unsigned long ipa,
+				     long level, struct rtt_entry *rtt)
+{
+	struct arm_smccc_1_2_regs regs = {
+		SMC_RMI_RTT_READ_ENTRY,
+		rd, ipa, level
+	};
+
+	arm_smccc_1_2_smc(&regs, &regs);
+
+	rtt->walk_level = regs.a1;
+	rtt->state = regs.a2 & 0xFF;
+	rtt->desc = regs.a3;
+	rtt->ripas = regs.a4;
+
+	return regs.a0;
+}
+
+/**
+ * rmi_rtt_set_ripas() - Set RIPAS for an running realm
+ * @rd: PA of the RD
+ * @rec: PA of the REC making the request
+ * @base: Base of target IPA region
+ * @top: Top of target IPA region
+ * @out_top: Pointer to write top IPA of range whose RIPAS was modified
+ *
+ * Completes a request made by the Realm to change the RIPAS of a target IPA
+ * range.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_rtt_set_ripas(unsigned long rd, unsigned long rec,
+				    unsigned long base, unsigned long top,
+				    unsigned long *out_top)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_RTT_SET_RIPAS, rd, rec, base, top, &res);
+
+	*out_top = res.a1;
+
+	return res.a0;
+}
+
+/**
+ * rmi_rtt_unmap_unprotected() - Remove a NS mapping
+ * @rd: PA of the RD
+ * @ipa: Base IPA of the mapping
+ * @level: Depth within the RTT tree
+ * @out_top: Pointer to write top IPA of non-live RTT entries
+ *
+ * Removes a mapping at an Unprotected IPA.
+ *
+ * Return: RMI return code
+ */
+static inline int rmi_rtt_unmap_unprotected(unsigned long rd,
+					    unsigned long ipa,
+					    long level,
+					    unsigned long *out_top)
+{
+	struct arm_smccc_res res;
+
+	arm_smccc_1_1_invoke(SMC_RMI_RTT_UNMAP_UNPROTECTED, rd, ipa,
+			     level, &res);
+
+	*out_top = res.a1;
+
+	return res.a0;
+}
+
+/**
+ * rmi_rtt_get_phys() - Get the PA from a RTTE
+ * @rtt: The RTTE
+ *
+ * Return: the physical address from a RTT entry.
+ */
+static inline phys_addr_t rmi_rtt_get_phys(struct rtt_entry *rtt)
+{
+	return rtt->desc & GENMASK(47, 12);
+}
+
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 07/43] arm64: RME: Check for RME support at KVM init
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (5 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 06/43] arm64: RME: Add wrappers for RMI calls Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-16 13:30     ` Suzuki K Poulose
  2024-04-12  8:42   ` [PATCH v2 08/43] arm64: RME: Define the user ABI Steven Price
                     ` (35 subsequent siblings)
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Query the RMI version number and check if it is a compatible version. A
static key is also provided to signal that a supported RMM is available.

Functions are provided to query if a VM or VCPU is a realm (or rec)
which currently will always return false.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 18 +++++++++
 arch/arm64/include/asm/kvm_host.h    |  4 ++
 arch/arm64/include/asm/kvm_rme.h     | 56 ++++++++++++++++++++++++++++
 arch/arm64/include/asm/virt.h        |  1 +
 arch/arm64/kvm/Makefile              |  3 +-
 arch/arm64/kvm/arm.c                 |  9 +++++
 arch/arm64/kvm/rme.c                 | 52 ++++++++++++++++++++++++++
 7 files changed, 142 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/kvm_rme.h
 create mode 100644 arch/arm64/kvm/rme.c

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 975af30af31f..6f08398537e2 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -611,4 +611,22 @@ static __always_inline void kvm_reset_cptr_el2(struct kvm_vcpu *vcpu)
 
 	kvm_write_cptr_el2(val);
 }
+
+static inline bool kvm_is_realm(struct kvm *kvm)
+{
+	if (static_branch_unlikely(&kvm_rme_is_available))
+		return kvm->arch.is_realm;
+	return false;
+}
+
+static inline enum realm_state kvm_realm_state(struct kvm *kvm)
+{
+	return READ_ONCE(kvm->arch.realm.state);
+}
+
+static inline bool vcpu_is_rec(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 9e8a496fb284..63b68b85db3f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -27,6 +27,7 @@
 #include <asm/fpsimd.h>
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_rme.h>
 #include <asm/vncr_mapping.h>
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
@@ -348,6 +349,9 @@ struct kvm_arch {
 	 * the associated pKVM instance in the hypervisor.
 	 */
 	struct kvm_protected_vm pkvm;
+
+	bool is_realm;
+	struct realm realm;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
new file mode 100644
index 000000000000..922da3f47227
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_rme.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+
+#ifndef __ASM_KVM_RME_H
+#define __ASM_KVM_RME_H
+
+/**
+ * enum realm_state - State of a Realm
+ */
+enum realm_state {
+	/**
+	 * @REALM_STATE_NONE:
+	 *      Realm has not yet been created. rmi_realm_create() may be
+	 *      called to create the realm.
+	 */
+	REALM_STATE_NONE,
+	/**
+	 * @REALM_STATE_NEW:
+	 *      Realm is under construction, not eligible for execution. Pages
+	 *      may be populated with rmi_data_create().
+	 */
+	REALM_STATE_NEW,
+	/**
+	 * @REALM_STATE_ACTIVE:
+	 *      Realm has been created and is eligible for execution with
+	 *      rmi_rec_enter(). Pages may no longer be populated with
+	 *      rmi_data_create().
+	 */
+	REALM_STATE_ACTIVE,
+	/**
+	 * @REALM_STATE_DYING:
+	 *      Realm is in the process of being destroyed or has already been
+	 *      destroyed.
+	 */
+	REALM_STATE_DYING,
+	/**
+	 * @REALM_STATE_DEAD:
+	 *      Realm has been destroyed.
+	 */
+	REALM_STATE_DEAD
+};
+
+/**
+ * struct realm - Additional per VM data for a Realm
+ *
+ * @state: The lifetime state machine for the realm
+ */
+struct realm {
+	enum realm_state state;
+};
+
+int kvm_init_rme(void);
+
+#endif
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 261d6e9df2e1..12cf36c38189 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -81,6 +81,7 @@ void __hyp_reset_vectors(void);
 bool is_kvm_arm_initialised(void);
 
 DECLARE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
+DECLARE_STATIC_KEY_FALSE(kvm_rme_is_available);
 
 /* Reports the availability of HYP mode */
 static inline bool is_hyp_mode_available(void)
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index c0c050e53157..1c1d8cdf381f 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -20,7 +20,8 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
 	 vgic/vgic-mmio.o vgic/vgic-mmio-v2.o \
 	 vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
-	 vgic/vgic-its.o vgic/vgic-debug.o
+	 vgic/vgic-its.o vgic/vgic-debug.o \
+	 rme.o
 
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o pmu.o
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3dee5490eea9..2056c660c5ee 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -38,6 +38,7 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_nested.h>
 #include <asm/kvm_pkvm.h>
+#include <asm/kvm_rme.h>
 #include <asm/kvm_emulate.h>
 #include <asm/sections.h>
 
@@ -47,6 +48,8 @@
 
 static enum kvm_mode kvm_mode = KVM_MODE_DEFAULT;
 
+DEFINE_STATIC_KEY_FALSE(kvm_rme_is_available);
+
 DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
 
 DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
@@ -2562,6 +2565,12 @@ static __init int kvm_arm_init(void)
 
 	in_hyp_mode = is_kernel_in_hyp_mode();
 
+	if (in_hyp_mode) {
+		err = kvm_init_rme();
+		if (err)
+			return err;
+	}
+
 	if (cpus_have_final_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE) ||
 	    cpus_have_final_cap(ARM64_WORKAROUND_1508412))
 		kvm_info("Guests without required CPU erratum workarounds can deadlock system!\n" \
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
new file mode 100644
index 000000000000..3dbbf9d046bf
--- /dev/null
+++ b/arch/arm64/kvm/rme.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/rmi_cmds.h>
+#include <asm/virt.h>
+
+static int rmi_check_version(void)
+{
+	struct arm_smccc_res res;
+	int version_major, version_minor;
+	unsigned long host_version = RMI_ABI_VERSION(RMI_ABI_MAJOR_VERSION,
+						     RMI_ABI_MINOR_VERSION);
+
+	arm_smccc_1_1_invoke(SMC_RMI_VERSION, host_version, &res);
+
+	if (res.a0 == SMCCC_RET_NOT_SUPPORTED)
+		return -ENXIO;
+
+	version_major = RMI_ABI_VERSION_GET_MAJOR(res.a1);
+	version_minor = RMI_ABI_VERSION_GET_MINOR(res.a1);
+
+	if (version_major != RMI_ABI_MAJOR_VERSION) {
+		kvm_err("Unsupported RMI ABI (v%d.%d) host supports v%d.%d\n",
+			version_major, version_minor,
+			RMI_ABI_MAJOR_VERSION,
+			RMI_ABI_MINOR_VERSION);
+		return -ENXIO;
+	}
+
+	kvm_info("RMI ABI version %d.%d\n", version_major, version_minor);
+
+	return 0;
+}
+
+int kvm_init_rme(void)
+{
+	if (PAGE_SIZE != SZ_4K)
+		/* Only 4k page size on the host is supported */
+		return 0;
+
+	if (rmi_check_version())
+		/* Continue without realm support */
+		return 0;
+
+	/* Future patch will enable static branch kvm_rme_is_available */
+
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 08/43] arm64: RME: Define the user ABI
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (6 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 07/43] arm64: RME: Check for RME support at KVM init Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 09/43] arm64: RME: ioctls to create and configure realms Steven Price
                     ` (34 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

There is one (multiplexed) CAP which can be used to create, populate and
then activate the realm.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 Documentation/virt/kvm/api.rst    |  1 +
 arch/arm64/include/uapi/asm/kvm.h | 49 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          | 11 +++++++
 3 files changed, 61 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0b5a33ee71ee..b4bd3d0928a2 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5064,6 +5064,7 @@ Recognised values for feature:
 
   =====      ===========================================
   arm64      KVM_ARM_VCPU_SVE (requires KVM_CAP_ARM_SVE)
+  arm64      KVM_ARM_VCPU_REC (requires KVM_CAP_ARM_RME)
   =====      ===========================================
 
 Finalizes the configuration of the specified vcpu feature.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 964df31da975..86110a8cc6be 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -108,6 +108,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
 #define KVM_ARM_VCPU_HAS_EL2		7 /* Support nested virtualization */
+#define KVM_ARM_VCPU_REC		8 /* VCPU REC state as part of Realm */
 
 struct kvm_vcpu_init {
 	__u32 target;
@@ -418,6 +419,54 @@ enum {
 #define   KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES	3
 #define   KVM_DEV_ARM_ITS_CTRL_RESET		4
 
+/* KVM_CAP_ARM_RME on VM fd */
+#define KVM_CAP_ARM_RME_CONFIG_REALM		0
+#define KVM_CAP_ARM_RME_CREATE_RD		1
+#define KVM_CAP_ARM_RME_INIT_IPA_REALM		2
+#define KVM_CAP_ARM_RME_POPULATE_REALM		3
+#define KVM_CAP_ARM_RME_ACTIVATE_REALM		4
+
+#define KVM_CAP_ARM_RME_MEASUREMENT_ALGO_SHA256		0
+#define KVM_CAP_ARM_RME_MEASUREMENT_ALGO_SHA512		1
+
+#define KVM_CAP_ARM_RME_RPV_SIZE 64
+
+/* List of configuration items accepted for KVM_CAP_ARM_RME_CONFIG_REALM */
+#define KVM_CAP_ARM_RME_CFG_RPV			0
+#define KVM_CAP_ARM_RME_CFG_HASH_ALGO		1
+
+struct kvm_cap_arm_rme_config_item {
+	__u32 cfg;
+	union {
+		/* cfg == KVM_CAP_ARM_RME_CFG_RPV */
+		struct {
+			__u8	rpv[KVM_CAP_ARM_RME_RPV_SIZE];
+		};
+
+		/* cfg == KVM_CAP_ARM_RME_CFG_HASH_ALGO */
+		struct {
+			__u32	hash_algo;
+		};
+
+		/* Fix the size of the union */
+		__u8	reserved[256];
+	};
+};
+
+#define KVM_ARM_RME_POPULATE_FLAGS_MEASURE	(1U << 0)
+struct kvm_cap_arm_rme_populate_realm_args {
+	__u64 populate_ipa_base;
+	__u64 populate_ipa_size;
+	__u32 flags;
+	__u32 reserved[3];
+};
+
+struct kvm_cap_arm_rme_init_ipa_args {
+	__u64 init_ipa_base;
+	__u64 init_ipa_size;
+	__u32 reserved[4];
+};
+
 /* Device Control API on vcpu fd */
 #define KVM_ARM_VCPU_PMU_V3_CTRL	0
 #define   KVM_ARM_VCPU_PMU_V3_IRQ	0
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..a1147036d1bd 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -918,6 +918,8 @@ struct kvm_enable_cap {
 #define KVM_CAP_GUEST_MEMFD 234
 #define KVM_CAP_VM_TYPES 235
 
+#define KVM_CAP_ARM_RME 300 /* FIXME: Large number to prevent conflicts */
+
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
 	__u32 pin;
@@ -1548,4 +1550,13 @@ struct kvm_create_guest_memfd {
 	__u64 reserved[6];
 };
 
+/* Available with KVM_CAP_ARM_RME, only for VMs with KVM_VM_TYPE_ARM_REALM  */
+struct kvm_arm_rmm_psci_complete {
+	__u64 target_mpidr;
+	__u32 psci_status;
+	__u32 padding[3];
+};
+
+#define KVM_ARM_VCPU_RMM_PSCI_COMPLETE	_IOW(KVMIO, 0xd2, struct kvm_arm_rmm_psci_complete)
+
 #endif /* __LINUX_KVM_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 09/43] arm64: RME: ioctls to create and configure realms
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (7 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 08/43] arm64: RME: Define the user ABI Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-17  9:51     ` Suzuki K Poulose
  2024-04-18 16:04     ` Suzuki K Poulose
  2024-04-12  8:42   ` [PATCH v2 10/43] kvm: arm64: Expose debug HW register numbers for Realm Steven Price
                     ` (33 subsequent siblings)
  42 siblings, 2 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Jean-Philippe Brucker

Add the KVM_CAP_ARM_RME_CREATE_FD ioctl to create a realm. This involves
delegating pages to the RMM to hold the Realm Descriptor (RD) and for
the base level of the Realm Translation Tables (RTT). A VMID also need
to be picked, since the RMM has a separate VMID address space a
dedicated allocator is added for this purpose.

KVM_CAP_ARM_RME_CONFIG_REALM is provided to allow configuring the realm
before it is created.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/include/asm/kvm_emulate.h |   5 +
 arch/arm64/include/asm/kvm_rme.h     |  19 ++
 arch/arm64/kvm/arm.c                 |  18 ++
 arch/arm64/kvm/mmu.c                 |  15 +-
 arch/arm64/kvm/rme.c                 | 282 +++++++++++++++++++++++++++
 5 files changed, 337 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 6f08398537e2..c606316f4729 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -624,6 +624,11 @@ static inline enum realm_state kvm_realm_state(struct kvm *kvm)
 	return READ_ONCE(kvm->arch.realm.state);
 }
 
+static inline bool kvm_realm_is_created(struct kvm *kvm)
+{
+	return kvm_is_realm(kvm) && kvm_realm_state(kvm) != REALM_STATE_NONE;
+}
+
 static inline bool vcpu_is_rec(struct kvm_vcpu *vcpu)
 {
 	return false;
diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
index 922da3f47227..cf8cc4d30364 100644
--- a/arch/arm64/include/asm/kvm_rme.h
+++ b/arch/arm64/include/asm/kvm_rme.h
@@ -6,6 +6,8 @@
 #ifndef __ASM_KVM_RME_H
 #define __ASM_KVM_RME_H
 
+#include <uapi/linux/kvm.h>
+
 /**
  * enum realm_state - State of a Realm
  */
@@ -46,11 +48,28 @@ enum realm_state {
  * struct realm - Additional per VM data for a Realm
  *
  * @state: The lifetime state machine for the realm
+ * @rd: Kernel mapping of the Realm Descriptor (RD)
+ * @params: Parameters for the RMI_REALM_CREATE command
+ * @num_aux: The number of auxiliary pages required by the RMM
+ * @vmid: VMID to be used by the RMM for the realm
+ * @ia_bits: Number of valid Input Address bits in the IPA
  */
 struct realm {
 	enum realm_state state;
+
+	void *rd;
+	struct realm_params *params;
+
+	unsigned long num_aux;
+	unsigned int vmid;
+	unsigned int ia_bits;
 };
 
 int kvm_init_rme(void);
+u32 kvm_realm_ipa_limit(void);
+
+int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
+int kvm_init_realm_vm(struct kvm *kvm);
+void kvm_destroy_realm(struct kvm *kvm);
 
 #endif
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2056c660c5ee..5729ea430d6d 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -119,6 +119,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		}
 		mutex_unlock(&kvm->slots_lock);
 		break;
+	case KVM_CAP_ARM_RME:
+		if (!kvm_is_realm(kvm))
+			return -EINVAL;
+		mutex_lock(&kvm->lock);
+		r = kvm_realm_enable_cap(kvm, cap);
+		mutex_unlock(&kvm->lock);
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -179,6 +186,13 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	bitmap_zero(kvm->arch.vcpu_features, KVM_VCPU_MAX_FEATURES);
 
+	/* Initialise the realm bits after the generic bits are enabled */
+	if (kvm_is_realm(kvm)) {
+		ret = kvm_init_realm_vm(kvm);
+		if (ret)
+			goto err_free_cpumask;
+	}
+
 	return 0;
 
 err_free_cpumask:
@@ -219,6 +233,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_unshare_hyp(kvm, kvm + 1);
 
 	kvm_arm_teardown_hypercalls(kvm);
+	kvm_destroy_realm(kvm);
 }
 
 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
@@ -328,6 +343,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES:
 		r = BIT(0);
 		break;
+	case KVM_CAP_ARM_RME:
+		r = static_key_enabled(&kvm_rme_is_available);
+		break;
 	default:
 		r = 0;
 	}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 18680771cdb0..aae365647b62 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -872,6 +872,10 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
 	struct kvm_pgtable *pgt;
 	u64 mmfr0, mmfr1;
 	u32 phys_shift;
+	u32 ipa_limit = kvm_ipa_limit;
+
+	if (kvm_is_realm(kvm))
+		ipa_limit = kvm_realm_ipa_limit();
 
 	if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
 		return -EINVAL;
@@ -880,12 +884,12 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
 	if (is_protected_kvm_enabled()) {
 		phys_shift = kvm_ipa_limit;
 	} else if (phys_shift) {
-		if (phys_shift > kvm_ipa_limit ||
+		if (phys_shift > ipa_limit ||
 		    phys_shift < ARM64_MIN_PARANGE_BITS)
 			return -EINVAL;
 	} else {
 		phys_shift = KVM_PHYS_SHIFT;
-		if (phys_shift > kvm_ipa_limit) {
+		if (phys_shift > ipa_limit) {
 			pr_warn_once("%s using unsupported default IPA limit, upgrade your VMM\n",
 				     current->comm);
 			return -EINVAL;
@@ -1014,6 +1018,13 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 	struct kvm_pgtable *pgt = NULL;
 
 	write_lock(&kvm->mmu_lock);
+	if (kvm_is_realm(kvm) &&
+	    (kvm_realm_state(kvm) != REALM_STATE_DEAD &&
+	     kvm_realm_state(kvm) != REALM_STATE_NONE)) {
+		/* TODO: teardown rtts */
+		write_unlock(&kvm->mmu_lock);
+		return;
+	}
 	pgt = mmu->pgt;
 	if (pgt) {
 		mmu->pgd_phys = 0;
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 3dbbf9d046bf..658d14e8d87d 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -5,9 +5,20 @@
 
 #include <linux/kvm_host.h>
 
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
 #include <asm/rmi_cmds.h>
 #include <asm/virt.h>
 
+#include <asm/kvm_pgtable.h>
+
+static unsigned long rmm_feat_reg0;
+
+static bool rme_supports(unsigned long feature)
+{
+	return !!u64_get_bits(rmm_feat_reg0, feature);
+}
+
 static int rmi_check_version(void)
 {
 	struct arm_smccc_res res;
@@ -36,8 +47,272 @@ static int rmi_check_version(void)
 	return 0;
 }
 
+u32 kvm_realm_ipa_limit(void)
+{
+	return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
+}
+
+static int get_start_level(struct realm *realm)
+{
+	return 4 - stage2_pgtable_levels(realm->ia_bits);
+}
+
+static int realm_create_rd(struct kvm *kvm)
+{
+	struct realm *realm = &kvm->arch.realm;
+	struct realm_params *params = realm->params;
+	void *rd = NULL;
+	phys_addr_t rd_phys, params_phys;
+	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
+	int i, r;
+
+	if (WARN_ON(realm->rd) || WARN_ON(!realm->params))
+		return -EEXIST;
+
+	rd = (void *)__get_free_page(GFP_KERNEL);
+	if (!rd)
+		return -ENOMEM;
+
+	rd_phys = virt_to_phys(rd);
+	if (rmi_granule_delegate(rd_phys)) {
+		r = -ENXIO;
+		goto out;
+	}
+
+	for (i = 0; i < pgt->pgd_pages; i++) {
+		phys_addr_t pgd_phys = kvm->arch.mmu.pgd_phys + i * PAGE_SIZE;
+
+		if (rmi_granule_delegate(pgd_phys)) {
+			r = -ENXIO;
+			goto out_undelegate_tables;
+		}
+	}
+
+	realm->ia_bits = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
+
+	params->rtt_level_start = get_start_level(realm);
+	params->rtt_num_start = pgt->pgd_pages;
+	params->rtt_base = kvm->arch.mmu.pgd_phys;
+	params->vmid = realm->vmid;
+
+	params_phys = virt_to_phys(params);
+
+	if (rmi_realm_create(rd_phys, params_phys)) {
+		r = -ENXIO;
+		goto out_undelegate_tables;
+	}
+
+	realm->rd = rd;
+
+	if (WARN_ON(rmi_rec_aux_count(rd_phys, &realm->num_aux))) {
+		WARN_ON(rmi_realm_destroy(rd_phys));
+		goto out_undelegate_tables;
+	}
+
+	return 0;
+
+out_undelegate_tables:
+	while (--i >= 0) {
+		phys_addr_t pgd_phys = kvm->arch.mmu.pgd_phys + i * PAGE_SIZE;
+
+		WARN_ON(rmi_granule_undelegate(pgd_phys));
+	}
+	WARN_ON(rmi_granule_undelegate(rd_phys));
+out:
+	free_page((unsigned long)rd);
+	return r;
+}
+
+/* Protects access to rme_vmid_bitmap */
+static DEFINE_SPINLOCK(rme_vmid_lock);
+static unsigned long *rme_vmid_bitmap;
+
+static int rme_vmid_init(void)
+{
+	unsigned int vmid_count = 1 << kvm_get_vmid_bits();
+
+	rme_vmid_bitmap = bitmap_zalloc(vmid_count, GFP_KERNEL);
+	if (!rme_vmid_bitmap) {
+		kvm_err("%s: Couldn't allocate rme vmid bitmap\n", __func__);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int rme_vmid_reserve(void)
+{
+	int ret;
+	unsigned int vmid_count = 1 << kvm_get_vmid_bits();
+
+	spin_lock(&rme_vmid_lock);
+	ret = bitmap_find_free_region(rme_vmid_bitmap, vmid_count, 0);
+	spin_unlock(&rme_vmid_lock);
+
+	return ret;
+}
+
+static void rme_vmid_release(unsigned int vmid)
+{
+	spin_lock(&rme_vmid_lock);
+	bitmap_release_region(rme_vmid_bitmap, vmid, 0);
+	spin_unlock(&rme_vmid_lock);
+}
+
+static int kvm_create_realm(struct kvm *kvm)
+{
+	struct realm *realm = &kvm->arch.realm;
+	int ret;
+
+	if (!kvm_is_realm(kvm) || kvm_realm_is_created(kvm))
+		return -EEXIST;
+
+	ret = rme_vmid_reserve();
+	if (ret < 0)
+		return ret;
+	realm->vmid = ret;
+
+	ret = realm_create_rd(kvm);
+	if (ret) {
+		rme_vmid_release(realm->vmid);
+		return ret;
+	}
+
+	WRITE_ONCE(realm->state, REALM_STATE_NEW);
+
+	/* The realm is up, free the parameters.  */
+	free_page((unsigned long)realm->params);
+	realm->params = NULL;
+
+	return 0;
+}
+
+static int config_realm_hash_algo(struct realm *realm,
+				  struct kvm_cap_arm_rme_config_item *cfg)
+{
+	switch (cfg->hash_algo) {
+	case KVM_CAP_ARM_RME_MEASUREMENT_ALGO_SHA256:
+		if (!rme_supports(RMI_FEATURE_REGISTER_0_HASH_SHA_256))
+			return -EINVAL;
+		break;
+	case KVM_CAP_ARM_RME_MEASUREMENT_ALGO_SHA512:
+		if (!rme_supports(RMI_FEATURE_REGISTER_0_HASH_SHA_512))
+			return -EINVAL;
+		break;
+	default:
+		return -EINVAL;
+	}
+	realm->params->hash_algo = cfg->hash_algo;
+	return 0;
+}
+
+static int kvm_rme_config_realm(struct kvm *kvm, struct kvm_enable_cap *cap)
+{
+	struct kvm_cap_arm_rme_config_item cfg;
+	struct realm *realm = &kvm->arch.realm;
+	int r = 0;
+
+	if (kvm_realm_is_created(kvm))
+		return -EINVAL;
+
+	if (copy_from_user(&cfg, (void __user *)cap->args[1], sizeof(cfg)))
+		return -EFAULT;
+
+	switch (cfg.cfg) {
+	case KVM_CAP_ARM_RME_CFG_RPV:
+		memcpy(&realm->params->rpv, &cfg.rpv, sizeof(cfg.rpv));
+		break;
+	case KVM_CAP_ARM_RME_CFG_HASH_ALGO:
+		r = config_realm_hash_algo(realm, &cfg);
+		break;
+	default:
+		r = -EINVAL;
+	}
+
+	return r;
+}
+
+int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
+{
+	int r = 0;
+
+	if (!kvm_is_realm(kvm))
+		return -EINVAL;
+
+	switch (cap->args[0]) {
+	case KVM_CAP_ARM_RME_CONFIG_REALM:
+		r = kvm_rme_config_realm(kvm, cap);
+		break;
+	case KVM_CAP_ARM_RME_CREATE_RD:
+		r = kvm_create_realm(kvm);
+		break;
+	default:
+		r = -EINVAL;
+		break;
+	}
+
+	return r;
+}
+
+void kvm_destroy_realm(struct kvm *kvm)
+{
+	struct realm *realm = &kvm->arch.realm;
+	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
+	int i;
+
+	if (realm->params) {
+		free_page((unsigned long)realm->params);
+		realm->params = NULL;
+	}
+
+	if (!kvm_realm_is_created(kvm))
+		return;
+
+	WRITE_ONCE(realm->state, REALM_STATE_DYING);
+
+	if (realm->rd) {
+		phys_addr_t rd_phys = virt_to_phys(realm->rd);
+
+		if (WARN_ON(rmi_realm_destroy(rd_phys)))
+			return;
+		if (WARN_ON(rmi_granule_undelegate(rd_phys)))
+			return;
+		free_page((unsigned long)realm->rd);
+		realm->rd = NULL;
+	}
+
+	rme_vmid_release(realm->vmid);
+
+	for (i = 0; i < pgt->pgd_pages; i++) {
+		phys_addr_t pgd_phys = kvm->arch.mmu.pgd_phys + i * PAGE_SIZE;
+
+		if (WARN_ON(rmi_granule_undelegate(pgd_phys)))
+			return;
+	}
+
+	WRITE_ONCE(realm->state, REALM_STATE_DEAD);
+
+	kvm_free_stage2_pgd(&kvm->arch.mmu);
+}
+
+int kvm_init_realm_vm(struct kvm *kvm)
+{
+	struct realm_params *params;
+
+	params = (struct realm_params *)get_zeroed_page(GFP_KERNEL);
+	if (!params)
+		return -ENOMEM;
+
+	/* Default parameters, not exposed to user space */
+	params->s2sz = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
+	kvm->arch.realm.params = params;
+	return 0;
+}
+
 int kvm_init_rme(void)
 {
+	int ret;
+
 	if (PAGE_SIZE != SZ_4K)
 		/* Only 4k page size on the host is supported */
 		return 0;
@@ -46,6 +321,13 @@ int kvm_init_rme(void)
 		/* Continue without realm support */
 		return 0;
 
+	if (WARN_ON(rmi_features(0, &rmm_feat_reg0)))
+		return 0;
+
+	ret = rme_vmid_init();
+	if (ret)
+		return ret;
+
 	/* Future patch will enable static branch kvm_rme_is_available */
 
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 10/43] kvm: arm64: Expose debug HW register numbers for Realm
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (8 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 09/43] arm64: RME: ioctls to create and configure realms Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 11/43] arm64: kvm: Allow passing machine type in KVM creation Steven Price
                     ` (32 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Suzuki K Poulose, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Steven Price

From: Suzuki K Poulose <suzuki.poulose@arm.com>

Expose VM specific Debug HW register numbers.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/arm.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 5729ea430d6d..22da6493912a 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -67,6 +67,22 @@ bool is_kvm_arm_initialised(void)
 	return kvm_arm_initialised;
 }
 
+static u32 kvm_arm_get_num_brps(struct kvm *kvm)
+{
+	if (!kvm || !kvm_is_realm(kvm))
+		return get_num_brps();
+	/* Realm guest is not debuggable. */
+	return 0;
+}
+
+static u32 kvm_arm_get_num_wrps(struct kvm *kvm)
+{
+	if (!kvm || !kvm_is_realm(kvm))
+		return get_num_wrps();
+	/* Realm guest is not debuggable. */
+	return 0;
+}
+
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
 {
 	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
@@ -257,7 +273,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_IRQ_LINE_LAYOUT_2:
 	case KVM_CAP_ARM_NISV_TO_USER:
 	case KVM_CAP_ARM_INJECT_EXT_DABT:
-	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
 	case KVM_CAP_ARM_SYSTEM_SUSPEND:
@@ -265,6 +280,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_COUNTER_OFFSET:
 		r = 1;
 		break;
+	case KVM_CAP_SET_GUEST_DEBUG:
+		r = !kvm_is_realm(kvm);
+		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
 		return KVM_GUESTDBG_VALID_MASK;
 	case KVM_CAP_ARM_SET_DEVICE_ADDR:
@@ -310,10 +328,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = cpus_have_final_cap(ARM64_HAS_32BIT_EL1);
 		break;
 	case KVM_CAP_GUEST_DEBUG_HW_BPS:
-		r = get_num_brps();
+		r = kvm_arm_get_num_brps(kvm);
 		break;
 	case KVM_CAP_GUEST_DEBUG_HW_WPS:
-		r = get_num_wrps();
+		r = kvm_arm_get_num_wrps(kvm);
 		break;
 	case KVM_CAP_ARM_PMU_V3:
 		r = kvm_arm_support_pmu_v3();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 11/43] arm64: kvm: Allow passing machine type in KVM creation
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (9 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 10/43] kvm: arm64: Expose debug HW register numbers for Realm Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-17 10:20     ` Suzuki K Poulose
  2024-04-12  8:42   ` [PATCH v2 12/43] arm64: RME: Keep a spare page delegated to the RMM Steven Price
                     ` (31 subsequent siblings)
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Previously machine type was used purely for specifying the physical
address size of the guest. Reserve the higher bits to specify an ARM
specific machine type and declare a new type 'KVM_VM_TYPE_ARM_REALM'
used to create a realm guest.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/arm.c     | 17 +++++++++++++++++
 arch/arm64/kvm/mmu.c     |  3 ---
 include/uapi/linux/kvm.h | 19 +++++++++++++++----
 3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 22da6493912a..c5a6139d5454 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -173,6 +173,23 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	mutex_unlock(&kvm->lock);
 #endif
 
+	if (type & ~(KVM_VM_TYPE_ARM_MASK | KVM_VM_TYPE_ARM_IPA_SIZE_MASK))
+		return -EINVAL;
+
+	switch (type & KVM_VM_TYPE_ARM_MASK) {
+	case KVM_VM_TYPE_ARM_NORMAL:
+		break;
+	case KVM_VM_TYPE_ARM_REALM:
+		kvm->arch.is_realm = true;
+		if (!kvm_is_realm(kvm)) {
+			/* Realm support unavailable */
+			return -EINVAL;
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+
 	ret = kvm_share_hyp(kvm, kvm + 1);
 	if (ret)
 		return ret;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index aae365647b62..af4564f3add5 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -877,9 +877,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
 	if (kvm_is_realm(kvm))
 		ipa_limit = kvm_realm_ipa_limit();
 
-	if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
-		return -EINVAL;
-
 	phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
 	if (is_protected_kvm_enabled()) {
 		phys_shift = kvm_ipa_limit;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a1147036d1bd..5153c837c8c7 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -635,14 +635,25 @@ struct kvm_enable_cap {
 #define KVM_S390_SIE_PAGE_OFFSET 1
 
 /*
- * On arm64, machine type can be used to request the physical
- * address size for the VM. Bits[7-0] are reserved for the guest
- * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
- * value 0 implies the default IPA size, 40bits.
+ * On arm64, machine type can be used to request both the machine type and
+ * the physical address size for the VM.
+ *
+ * Bits[11-8] are reserved for the ARM specific machine type.
+ *
+ * Bits[7-0] are reserved for the guest PA size shift (i.e, log2(PA_Size)).
+ * For backward compatibility, value 0 implies the default IPA size, 40bits.
  */
+#define KVM_VM_TYPE_ARM_SHIFT		8
+#define KVM_VM_TYPE_ARM_MASK		(0xfULL << KVM_VM_TYPE_ARM_SHIFT)
+#define KVM_VM_TYPE_ARM(_type)		\
+	(((_type) << KVM_VM_TYPE_ARM_SHIFT) & KVM_VM_TYPE_ARM_MASK)
+#define KVM_VM_TYPE_ARM_NORMAL		KVM_VM_TYPE_ARM(0)
+#define KVM_VM_TYPE_ARM_REALM		KVM_VM_TYPE_ARM(1)
+
 #define KVM_VM_TYPE_ARM_IPA_SIZE_MASK	0xffULL
 #define KVM_VM_TYPE_ARM_IPA_SIZE(x)		\
 	((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
+
 /*
  * ioctls for /dev/kvm fds:
  */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 12/43] arm64: RME: Keep a spare page delegated to the RMM
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (10 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 11/43] arm64: kvm: Allow passing machine type in KVM creation Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-17 10:19     ` Suzuki K Poulose
  2024-04-12  8:42   ` [PATCH v2 13/43] arm64: RME: RTT handling Steven Price
                     ` (30 subsequent siblings)
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Pages can only be populated/destroyed on the RMM at the 4KB granule,
this requires creating the full depth of RTTs. However if the pages are
going to be combined into a 4MB huge page the last RTT is only
temporarily needed. Similarly when freeing memory the huge page must be
temporarily split requiring temporary usage of the full depth oF RTTs.

To avoid needing to perform a temporary allocation and delegation of a
page for this purpose we keep a spare delegated page around. In
particular this avoids the need for memory allocation while destroying
the realm guest.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_rme.h | 5 +++++
 arch/arm64/kvm/rme.c             | 8 ++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
index cf8cc4d30364..fba85e9ce3ae 100644
--- a/arch/arm64/include/asm/kvm_rme.h
+++ b/arch/arm64/include/asm/kvm_rme.h
@@ -50,6 +50,9 @@ enum realm_state {
  * @state: The lifetime state machine for the realm
  * @rd: Kernel mapping of the Realm Descriptor (RD)
  * @params: Parameters for the RMI_REALM_CREATE command
+ * @spare_page: A physical page that has been delegated to the Realm world but
+ *              is otherwise free. Used to avoid temporary allocation during
+ *              RTT operations.
  * @num_aux: The number of auxiliary pages required by the RMM
  * @vmid: VMID to be used by the RMM for the realm
  * @ia_bits: Number of valid Input Address bits in the IPA
@@ -60,6 +63,8 @@ struct realm {
 	void *rd;
 	struct realm_params *params;
 
+	phys_addr_t spare_page;
+
 	unsigned long num_aux;
 	unsigned int vmid;
 	unsigned int ia_bits;
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 658d14e8d87d..9652ec6ab2fd 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -103,6 +103,7 @@ static int realm_create_rd(struct kvm *kvm)
 	}
 
 	realm->rd = rd;
+	realm->spare_page = PHYS_ADDR_MAX;
 
 	if (WARN_ON(rmi_rec_aux_count(rd_phys, &realm->num_aux))) {
 		WARN_ON(rmi_realm_destroy(rd_phys));
@@ -283,6 +284,13 @@ void kvm_destroy_realm(struct kvm *kvm)
 
 	rme_vmid_release(realm->vmid);
 
+	if (realm->spare_page != PHYS_ADDR_MAX) {
+		/* Leak the page if the undelegate fails */
+		if (!WARN_ON(rmi_granule_undelegate(realm->spare_page)))
+			free_page((unsigned long)phys_to_virt(realm->spare_page));
+		realm->spare_page = PHYS_ADDR_MAX;
+	}
+
 	for (i = 0; i < pgt->pgd_pages; i++) {
 		phys_addr_t pgd_phys = kvm->arch.mmu.pgd_phys + i * PAGE_SIZE;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 13/43] arm64: RME: RTT handling
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (11 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 12/43] arm64: RME: Keep a spare page delegated to the RMM Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-17 13:37     ` Suzuki K Poulose
  2024-04-12  8:42   ` [PATCH v2 14/43] arm64: RME: Allocate/free RECs to match vCPUs Steven Price
                     ` (29 subsequent siblings)
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

The RMM owns the stage 2 page tables for a realm, and KVM must request
that the RMM creates/destroys entries as necessary. The physical pages
to store the page tables are delegated to the realm as required, and can
be undelegated when no longer used.

Creating new RTTs is the easy part, tearing down is a little more
tricky. The result of realm_rtt_destroy() can be used to effectively
walk the tree and destroy the entries (undelegating pages that were
given to the realm).

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_rme.h |  19 ++++
 arch/arm64/kvm/mmu.c             |   6 +-
 arch/arm64/kvm/rme.c             | 171 +++++++++++++++++++++++++++++++
 3 files changed, 193 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
index fba85e9ce3ae..4ab5cb5e91b3 100644
--- a/arch/arm64/include/asm/kvm_rme.h
+++ b/arch/arm64/include/asm/kvm_rme.h
@@ -76,5 +76,24 @@ u32 kvm_realm_ipa_limit(void);
 int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
 int kvm_init_realm_vm(struct kvm *kvm);
 void kvm_destroy_realm(struct kvm *kvm);
+void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits);
+
+#define RME_RTT_BLOCK_LEVEL	2
+#define RME_RTT_MAX_LEVEL	3
+
+#define RME_PAGE_SHIFT		12
+#define RME_PAGE_SIZE		BIT(RME_PAGE_SHIFT)
+/* See ARM64_HW_PGTABLE_LEVEL_SHIFT() */
+#define RME_RTT_LEVEL_SHIFT(l)	\
+	((RME_PAGE_SHIFT - 3) * (4 - (l)) + 3)
+#define RME_L2_BLOCK_SIZE	BIT(RME_RTT_LEVEL_SHIFT(2))
+
+static inline unsigned long rme_rtt_level_mapsize(int level)
+{
+	if (WARN_ON(level > RME_RTT_MAX_LEVEL))
+		return RME_PAGE_SIZE;
+
+	return (1UL << RME_RTT_LEVEL_SHIFT(level));
+}
 
 #endif
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index af4564f3add5..46f0c4e80ace 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1012,17 +1012,17 @@ void stage2_unmap_vm(struct kvm *kvm)
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 {
 	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
-	struct kvm_pgtable *pgt = NULL;
+	struct kvm_pgtable *pgt;
 
 	write_lock(&kvm->mmu_lock);
+	pgt = mmu->pgt;
 	if (kvm_is_realm(kvm) &&
 	    (kvm_realm_state(kvm) != REALM_STATE_DEAD &&
 	     kvm_realm_state(kvm) != REALM_STATE_NONE)) {
-		/* TODO: teardown rtts */
 		write_unlock(&kvm->mmu_lock);
+		kvm_realm_destroy_rtts(kvm, pgt->ia_bits);
 		return;
 	}
-	pgt = mmu->pgt;
 	if (pgt) {
 		mmu->pgd_phys = 0;
 		mmu->pgt = NULL;
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 9652ec6ab2fd..09b59bcad8b6 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -47,6 +47,53 @@ static int rmi_check_version(void)
 	return 0;
 }
 
+static phys_addr_t __alloc_delegated_page(struct realm *realm,
+					  struct kvm_mmu_memory_cache *mc,
+					  gfp_t flags)
+{
+	phys_addr_t phys = PHYS_ADDR_MAX;
+	void *virt;
+
+	if (realm->spare_page != PHYS_ADDR_MAX) {
+		swap(realm->spare_page, phys);
+		goto out;
+	}
+
+	if (mc)
+		virt = kvm_mmu_memory_cache_alloc(mc);
+	else
+		virt = (void *)__get_free_page(flags);
+
+	if (!virt)
+		goto out;
+
+	phys = virt_to_phys(virt);
+
+	if (rmi_granule_delegate(phys)) {
+		free_page((unsigned long)virt);
+
+		phys = PHYS_ADDR_MAX;
+	}
+
+out:
+	return phys;
+}
+
+static void free_delegated_page(struct realm *realm, phys_addr_t phys)
+{
+	if (realm->spare_page == PHYS_ADDR_MAX) {
+		realm->spare_page = phys;
+		return;
+	}
+
+	if (WARN_ON(rmi_granule_undelegate(phys))) {
+		/* Undelegate failed: leak the page */
+		return;
+	}
+
+	free_page((unsigned long)phys_to_virt(phys));
+}
+
 u32 kvm_realm_ipa_limit(void)
 {
 	return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
@@ -124,6 +171,130 @@ static int realm_create_rd(struct kvm *kvm)
 	return r;
 }
 
+static int realm_rtt_destroy(struct realm *realm, unsigned long addr,
+			     int level, phys_addr_t *rtt_granule,
+			     unsigned long *next_addr)
+{
+	unsigned long out_rtt;
+	unsigned long out_top;
+	int ret;
+
+	ret = rmi_rtt_destroy(virt_to_phys(realm->rd), addr, level,
+			      &out_rtt, &out_top);
+
+	if (rtt_granule)
+		*rtt_granule = out_rtt;
+	if (next_addr)
+		*next_addr = out_top;
+
+	return ret;
+}
+
+static int realm_tear_down_rtt_level(struct realm *realm, int level,
+				     unsigned long start, unsigned long end)
+{
+	ssize_t map_size;
+	unsigned long addr, next_addr;
+
+	if (WARN_ON(level > RME_RTT_MAX_LEVEL))
+		return -EINVAL;
+
+	map_size = rme_rtt_level_mapsize(level - 1);
+
+	for (addr = start; addr < end; addr = next_addr) {
+		phys_addr_t rtt_granule;
+		int ret;
+		unsigned long align_addr = ALIGN(addr, map_size);
+
+		next_addr = ALIGN(addr + 1, map_size);
+
+		if (next_addr <= end && align_addr == addr) {
+			ret = realm_rtt_destroy(realm, addr, level,
+						&rtt_granule, &next_addr);
+		} else {
+			/* Recurse a level deeper */
+			ret = realm_tear_down_rtt_level(realm,
+							level + 1,
+							addr,
+							min(next_addr, end));
+			if (ret)
+				return ret;
+			continue;
+		}
+
+		switch (RMI_RETURN_STATUS(ret)) {
+		case RMI_SUCCESS:
+			if (!WARN_ON(rmi_granule_undelegate(rtt_granule)))
+				free_page((unsigned long)phys_to_virt(rtt_granule));
+			break;
+		case RMI_ERROR_RTT:
+			if (next_addr > addr) {
+				/* unassigned or destroyed */
+				break;
+			}
+			if (WARN_ON(RMI_RETURN_INDEX(ret) != level))
+				return -EBUSY;
+			if (WARN_ON(level == RME_RTT_MAX_LEVEL)) {
+				// Live entry
+				return -EBUSY;
+			}
+			/* Recurse a level deeper */
+			next_addr = ALIGN(addr + 1, map_size);
+			ret = realm_tear_down_rtt_level(realm,
+							level + 1,
+							addr,
+							next_addr);
+			if (ret)
+				return ret;
+			/* Try again at this level */
+			next_addr = addr;
+			break;
+		default:
+			WARN_ON(1);
+			return -ENXIO;
+		}
+	}
+
+	return 0;
+}
+
+static int realm_tear_down_rtt_range(struct realm *realm,
+				     unsigned long start, unsigned long end)
+{
+	return realm_tear_down_rtt_level(realm, get_start_level(realm) + 1,
+					 start, end);
+}
+
+static void ensure_spare_page(struct realm *realm)
+{
+	phys_addr_t tmp_rtt;
+
+	/*
+	 * Make sure we have a spare delegated page for tearing down the
+	 * block mappings. We do this by allocating then freeing a page.
+	 * We must use Atomic allocations as we are called with kvm->mmu_lock
+	 * held.
+	 */
+	tmp_rtt = __alloc_delegated_page(realm, NULL, GFP_ATOMIC);
+
+	/*
+	 * If the allocation failed, continue as we may not have a block level
+	 * mapping so it may not be fatal, otherwise free it to assign it
+	 * to the spare page.
+	 */
+	if (tmp_rtt != PHYS_ADDR_MAX)
+		free_delegated_page(realm, tmp_rtt);
+}
+
+void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits)
+{
+	struct realm *realm = &kvm->arch.realm;
+
+	ensure_spare_page(realm);
+
+	WARN_ON(realm_tear_down_rtt_range(realm, 0, (1UL << ia_bits)));
+}
+
 /* Protects access to rme_vmid_bitmap */
 static DEFINE_SPINLOCK(rme_vmid_lock);
 static unsigned long *rme_vmid_bitmap;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 14/43] arm64: RME: Allocate/free RECs to match vCPUs
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (12 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 13/43] arm64: RME: RTT handling Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-18  9:23     ` Suzuki K Poulose
  2024-04-12  8:42   ` [PATCH v2 15/43] arm64: RME: Support for the VGIC in realms Steven Price
                     ` (28 subsequent siblings)
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

The RMM maintains a data structure known as the Realm Execution Context
(or REC). It is similar to struct kvm_vcpu and tracks the state of the
virtual CPUs. KVM must delegate memory and request the structures are
created when vCPUs are created, and suitably tear down on destruction.

See Realm Management Monitor specification (DEN0137) for more information:
https://developer.arm.com/documentation/den0137/

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h |   2 +
 arch/arm64/include/asm/kvm_host.h    |   3 +
 arch/arm64/include/asm/kvm_rme.h     |  18 ++++
 arch/arm64/kvm/arm.c                 |   2 +
 arch/arm64/kvm/reset.c               |  11 ++
 arch/arm64/kvm/rme.c                 | 150 +++++++++++++++++++++++++++
 6 files changed, 186 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index c606316f4729..2209a7c6267f 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -631,6 +631,8 @@ static inline bool kvm_realm_is_created(struct kvm *kvm)
 
 static inline bool vcpu_is_rec(struct kvm_vcpu *vcpu)
 {
+	if (static_branch_unlikely(&kvm_rme_is_available))
+		return vcpu->arch.rec.mpidr != INVALID_HWID;
 	return false;
 }
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 63b68b85db3f..f7ac40ce0caf 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -694,6 +694,9 @@ struct kvm_vcpu_arch {
 
 	/* Per-vcpu CCSIDR override or NULL */
 	u32 *ccsidr;
+
+	/* Realm meta data */
+	struct realm_rec rec;
 };
 
 /*
diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
index 4ab5cb5e91b3..915e76068b00 100644
--- a/arch/arm64/include/asm/kvm_rme.h
+++ b/arch/arm64/include/asm/kvm_rme.h
@@ -6,6 +6,7 @@
 #ifndef __ASM_KVM_RME_H
 #define __ASM_KVM_RME_H
 
+#include <asm/rmi_smc.h>
 #include <uapi/linux/kvm.h>
 
 /**
@@ -70,6 +71,21 @@ struct realm {
 	unsigned int ia_bits;
 };
 
+/**
+ * struct realm_rec - Additional per VCPU data for a Realm
+ *
+ * @mpidr: MPIDR (Multiprocessor Affinity Register) value to identify this VCPU
+ * @rec_page: Kernel VA of the RMM's private page for this REC
+ * @aux_pages: Additional pages private to the RMM for this REC
+ * @run: Kernel VA of the RmiRecRun structure shared with the RMM
+ */
+struct realm_rec {
+	unsigned long mpidr;
+	void *rec_page;
+	struct page *aux_pages[REC_PARAMS_AUX_GRANULES];
+	struct rec_run *run;
+};
+
 int kvm_init_rme(void);
 u32 kvm_realm_ipa_limit(void);
 
@@ -77,6 +93,8 @@ int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
 int kvm_init_realm_vm(struct kvm *kvm);
 void kvm_destroy_realm(struct kvm *kvm);
 void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits);
+int kvm_create_rec(struct kvm_vcpu *vcpu);
+void kvm_destroy_rec(struct kvm_vcpu *vcpu);
 
 #define RME_RTT_BLOCK_LEVEL	2
 #define RME_RTT_MAX_LEVEL	3
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c5a6139d5454..d70c511e16a0 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -432,6 +432,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	/* Force users to call KVM_ARM_VCPU_INIT */
 	vcpu_clear_flag(vcpu, VCPU_INITIALIZED);
 
+	vcpu->arch.rec.mpidr = INVALID_HWID;
+
 	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
 
 	/*
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 68d1d05672bd..6e6eb4a15095 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -134,6 +134,11 @@ int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature)
 			return -EPERM;
 
 		return kvm_vcpu_finalize_sve(vcpu);
+	case KVM_ARM_VCPU_REC:
+		if (!kvm_is_realm(vcpu->kvm))
+			return -EINVAL;
+
+		return kvm_create_rec(vcpu);
 	}
 
 	return -EINVAL;
@@ -144,6 +149,11 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
 	if (vcpu_has_sve(vcpu) && !kvm_arm_vcpu_sve_finalized(vcpu))
 		return false;
 
+	if (kvm_is_realm(vcpu->kvm) &&
+	    !(vcpu_is_rec(vcpu) &&
+	      READ_ONCE(vcpu->kvm->arch.realm.state) == REALM_STATE_ACTIVE))
+		return false;
+
 	return true;
 }
 
@@ -157,6 +167,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
 		kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu));
 	kfree(sve_state);
 	kfree(vcpu->arch.ccsidr);
+	kvm_destroy_rec(vcpu);
 }
 
 static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 09b59bcad8b6..629a095bea61 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -474,6 +474,156 @@ void kvm_destroy_realm(struct kvm *kvm)
 	kvm_free_stage2_pgd(&kvm->arch.mmu);
 }
 
+static void free_rec_aux(struct page **aux_pages,
+			 unsigned int num_aux)
+{
+	unsigned int i;
+
+	for (i = 0; i < num_aux; i++) {
+		phys_addr_t aux_page_phys = page_to_phys(aux_pages[i]);
+
+		/* If the undelegate fails then leak the page */
+		if (WARN_ON(rmi_granule_undelegate(aux_page_phys)))
+			continue;
+
+		__free_page(aux_pages[i]);
+	}
+}
+
+static int alloc_rec_aux(struct page **aux_pages,
+			 u64 *aux_phys_pages,
+			 unsigned int num_aux)
+{
+	int ret;
+	unsigned int i;
+
+	for (i = 0; i < num_aux; i++) {
+		struct page *aux_page;
+		phys_addr_t aux_page_phys;
+
+		aux_page = alloc_page(GFP_KERNEL);
+		if (!aux_page) {
+			ret = -ENOMEM;
+			goto out_err;
+		}
+		aux_page_phys = page_to_phys(aux_page);
+		if (rmi_granule_delegate(aux_page_phys)) {
+			__free_page(aux_page);
+			ret = -ENXIO;
+			goto out_err;
+		}
+		aux_pages[i] = aux_page;
+		aux_phys_pages[i] = aux_page_phys;
+	}
+
+	return 0;
+out_err:
+	free_rec_aux(aux_pages, i);
+	return ret;
+}
+
+int kvm_create_rec(struct kvm_vcpu *vcpu)
+{
+	struct user_pt_regs *vcpu_regs = vcpu_gp_regs(vcpu);
+	unsigned long mpidr = kvm_vcpu_get_mpidr_aff(vcpu);
+	struct realm *realm = &vcpu->kvm->arch.realm;
+	struct realm_rec *rec = &vcpu->arch.rec;
+	unsigned long rec_page_phys;
+	struct rec_params *params;
+	int r, i;
+
+	if (kvm_realm_state(vcpu->kvm) != REALM_STATE_NEW)
+		return -ENOENT;
+
+	/*
+	 * The RMM will report PSCI v1.0 to Realms and the KVM_ARM_VCPU_PSCI_0_2
+	 * flag covers v0.2 and onwards.
+	 */
+	if (!vcpu_has_feature(vcpu, KVM_ARM_VCPU_PSCI_0_2))
+		return -EINVAL;
+
+	BUILD_BUG_ON(sizeof(*params) > PAGE_SIZE);
+	BUILD_BUG_ON(sizeof(*rec->run) > PAGE_SIZE);
+
+	params = (struct rec_params *)get_zeroed_page(GFP_KERNEL);
+	rec->rec_page = (void *)__get_free_page(GFP_KERNEL);
+	rec->run = (void *)get_zeroed_page(GFP_KERNEL);
+	if (!params || !rec->rec_page || !rec->run) {
+		r = -ENOMEM;
+		goto out_free_pages;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(params->gprs); i++)
+		params->gprs[i] = vcpu_regs->regs[i];
+
+	params->pc = vcpu_regs->pc;
+
+	if (vcpu->vcpu_id == 0)
+		params->flags |= REC_PARAMS_FLAG_RUNNABLE;
+
+	rec_page_phys = virt_to_phys(rec->rec_page);
+
+	if (rmi_granule_delegate(rec_page_phys)) {
+		r = -ENXIO;
+		goto out_free_pages;
+	}
+
+	r = alloc_rec_aux(rec->aux_pages, params->aux, realm->num_aux);
+	if (r)
+		goto out_undelegate_rmm_rec;
+
+	params->num_rec_aux = realm->num_aux;
+	params->mpidr = mpidr;
+
+	if (rmi_rec_create(virt_to_phys(realm->rd),
+			   rec_page_phys,
+			   virt_to_phys(params))) {
+		r = -ENXIO;
+		goto out_free_rec_aux;
+	}
+
+	rec->mpidr = mpidr;
+
+	free_page((unsigned long)params);
+	return 0;
+
+out_free_rec_aux:
+	free_rec_aux(rec->aux_pages, realm->num_aux);
+out_undelegate_rmm_rec:
+	if (WARN_ON(rmi_granule_undelegate(rec_page_phys)))
+		rec->rec_page = NULL;
+out_free_pages:
+	free_page((unsigned long)rec->run);
+	free_page((unsigned long)rec->rec_page);
+	free_page((unsigned long)params);
+	return r;
+}
+
+void kvm_destroy_rec(struct kvm_vcpu *vcpu)
+{
+	struct realm *realm = &vcpu->kvm->arch.realm;
+	struct realm_rec *rec = &vcpu->arch.rec;
+	unsigned long rec_page_phys;
+
+	if (!vcpu_is_rec(vcpu))
+		return;
+
+	rec_page_phys = virt_to_phys(rec->rec_page);
+
+	/* If the REC destroy fails, leak all pages relating to the REC */
+	if (WARN_ON(rmi_rec_destroy(rec_page_phys)))
+		return;
+
+	free_rec_aux(rec->aux_pages, realm->num_aux);
+
+	/* If the undelegate fails then leak the REC page */
+	if (WARN_ON(rmi_granule_undelegate(rec_page_phys)))
+		return;
+
+	free_page((unsigned long)rec->rec_page);
+	free_page((unsigned long)rec->run);
+}
+
 int kvm_init_realm_vm(struct kvm *kvm)
 {
 	struct realm_params *params;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 15/43] arm64: RME: Support for the VGIC in realms
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (13 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 14/43] arm64: RME: Allocate/free RECs to match vCPUs Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 16/43] KVM: arm64: Support timers in realm RECs Steven Price
                     ` (27 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

The RMM provides emulation of a VGIC to the realm guest but delegates
much of the handling to the host. Implement support in KVM for
saving/restoring state to/from the REC structure.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/arm.c          | 15 +++++++++++---
 arch/arm64/kvm/vgic/vgic-v3.c |  9 +++++++--
 arch/arm64/kvm/vgic/vgic.c    | 37 +++++++++++++++++++++++++++++++++--
 3 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index d70c511e16a0..f8daf478114f 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -534,17 +534,22 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	kvm_timer_vcpu_put(vcpu);
+	kvm_vgic_put(vcpu);
+
+	vcpu->cpu = -1;
+
+	if (vcpu_is_rec(vcpu))
+		return;
+
 	kvm_arch_vcpu_put_debug_state_flags(vcpu);
 	kvm_arch_vcpu_put_fp(vcpu);
 	if (has_vhe())
 		kvm_vcpu_put_vhe(vcpu);
-	kvm_timer_vcpu_put(vcpu);
-	kvm_vgic_put(vcpu);
 	kvm_vcpu_pmu_restore_host(vcpu);
 	kvm_arm_vmid_clear_active();
 
 	vcpu_clear_on_unsupported_cpu(vcpu);
-	vcpu->cpu = -1;
 }
 
 static void __kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
@@ -758,6 +763,10 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 	}
 
 	if (!irqchip_in_kernel(kvm)) {
+		/* Userspace irqchip not yet supported with Realms */
+		if (kvm_is_realm(vcpu->kvm))
+			return -EOPNOTSUPP;
+
 		/*
 		 * Tell the rest of the code that there are userspace irqchip
 		 * VMs in the wild.
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 4ea3340786b9..75ae98b0f700 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -7,9 +7,11 @@
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 #include <kvm/arm_vgic.h>
+#include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_asm.h>
+#include <asm/rmi_smc.h>
 
 #include "vgic.h"
 
@@ -667,7 +669,8 @@ int vgic_v3_probe(const struct gic_kvm_info *info)
 			(unsigned long long)info->vcpu.start);
 	} else if (kvm_get_mode() != KVM_MODE_PROTECTED) {
 		kvm_vgic_global_state.vcpu_base = info->vcpu.start;
-		kvm_vgic_global_state.can_emulate_gicv2 = true;
+		if (!static_branch_unlikely(&kvm_rme_is_available))
+			kvm_vgic_global_state.can_emulate_gicv2 = true;
 		ret = kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V2);
 		if (ret) {
 			kvm_err("Cannot register GICv2 KVM device.\n");
@@ -742,7 +745,9 @@ void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
-	if (likely(cpu_if->vgic_sre))
+	if (vcpu_is_rec(vcpu))
+		cpu_if->vgic_vmcr = vcpu->arch.rec.run->exit.gicv3_vmcr;
+	else if (likely(cpu_if->vgic_sre))
 		cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
 }
 
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index 4ec93587c8cd..02561d4e6447 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -10,7 +10,9 @@
 #include <linux/list_sort.h>
 #include <linux/nospec.h>
 
+#include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/rmi_smc.h>
 
 #include "vgic.h"
 
@@ -845,10 +847,23 @@ static inline bool can_access_vgic_from_kernel(void)
 	return !static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif) || has_vhe();
 }
 
+static inline void vgic_rmm_save_state(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
+	int i;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		cpu_if->vgic_lr[i] = vcpu->arch.rec.run->exit.gicv3_lrs[i];
+		vcpu->arch.rec.run->entry.gicv3_lrs[i] = 0;
+	}
+}
+
 static inline void vgic_save_state(struct kvm_vcpu *vcpu)
 {
 	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		vgic_v2_save_state(vcpu);
+	else if (vcpu_is_rec(vcpu))
+		vgic_rmm_save_state(vcpu);
 	else
 		__vgic_v3_save_state(&vcpu->arch.vgic_cpu.vgic_v3);
 }
@@ -875,10 +890,28 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 	vgic_prune_ap_list(vcpu);
 }
 
+static inline void vgic_rmm_restore_state(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
+	int i;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		vcpu->arch.rec.run->entry.gicv3_lrs[i] = cpu_if->vgic_lr[i];
+		/*
+		 * Also populate the rec.run->exit copies so that a late
+		 * decision to back out from entering the realm doesn't cause
+		 * the state to be lost
+		 */
+		vcpu->arch.rec.run->exit.gicv3_lrs[i] = cpu_if->vgic_lr[i];
+	}
+}
+
 static inline void vgic_restore_state(struct kvm_vcpu *vcpu)
 {
 	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		vgic_v2_restore_state(vcpu);
+	else if (vcpu_is_rec(vcpu))
+		vgic_rmm_restore_state(vcpu);
 	else
 		__vgic_v3_restore_state(&vcpu->arch.vgic_cpu.vgic_v3);
 }
@@ -919,7 +952,7 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 
 void kvm_vgic_load(struct kvm_vcpu *vcpu)
 {
-	if (unlikely(!vgic_initialized(vcpu->kvm)))
+	if (unlikely(!vgic_initialized(vcpu->kvm)) || vcpu_is_rec(vcpu))
 		return;
 
 	if (kvm_vgic_global_state.type == VGIC_V2)
@@ -930,7 +963,7 @@ void kvm_vgic_load(struct kvm_vcpu *vcpu)
 
 void kvm_vgic_put(struct kvm_vcpu *vcpu)
 {
-	if (unlikely(!vgic_initialized(vcpu->kvm)))
+	if (unlikely(!vgic_initialized(vcpu->kvm)) || vcpu_is_rec(vcpu))
 		return;
 
 	if (kvm_vgic_global_state.type == VGIC_V2)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 16/43] KVM: arm64: Support timers in realm RECs
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (14 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 15/43] arm64: RME: Support for the VGIC in realms Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-18  9:30     ` Suzuki K Poulose
  2024-04-12  8:42   ` [PATCH v2 17/43] arm64: RME: Allow VMM to set RIPAS Steven Price
                     ` (26 subsequent siblings)
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

The RMM keeps track of the timer while the realm REC is running, but on
exit to the normal world KVM is responsible for handling the timers.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/arch_timer.c  | 45 ++++++++++++++++++++++++++++++++----
 include/kvm/arm_arch_timer.h |  2 ++
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 879982b1cc73..0b2be34a9ba3 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -162,6 +162,13 @@ static void timer_set_cval(struct arch_timer_context *ctxt, u64 cval)
 
 static void timer_set_offset(struct arch_timer_context *ctxt, u64 offset)
 {
+	struct kvm_vcpu *vcpu = ctxt->vcpu;
+
+	if (kvm_is_realm(vcpu->kvm)) {
+		WARN_ON(offset);
+		return;
+	}
+
 	if (!ctxt->offset.vm_offset) {
 		WARN(offset, "timer %ld\n", arch_timer_ctx_index(ctxt));
 		return;
@@ -460,6 +467,21 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 	}
 }
 
+void kvm_realm_timers_update(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *arch_timer = &vcpu->arch.timer_cpu;
+	int i;
+
+	for (i = 0; i < NR_KVM_EL0_TIMERS; i++) {
+		struct arch_timer_context *timer = &arch_timer->timers[i];
+		bool status = timer_get_ctl(timer) & ARCH_TIMER_CTRL_IT_STAT;
+		bool level = kvm_timer_irq_can_fire(timer) && status;
+
+		if (level != timer->irq.level)
+			kvm_timer_update_irq(vcpu, level, timer);
+	}
+}
+
 /* Only called for a fully emulated timer */
 static void timer_emulate(struct arch_timer_context *ctx)
 {
@@ -831,6 +853,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 	if (unlikely(!timer->enabled))
 		return;
 
+	kvm_timer_unblocking(vcpu);
+
 	get_timer_map(vcpu, &map);
 
 	if (static_branch_likely(&has_gic_active_state)) {
@@ -844,8 +868,6 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 		kvm_timer_vcpu_load_nogic(vcpu);
 	}
 
-	kvm_timer_unblocking(vcpu);
-
 	timer_restore_state(map.direct_vtimer);
 	if (map.direct_ptimer)
 		timer_restore_state(map.direct_ptimer);
@@ -988,7 +1010,9 @@ static void timer_context_init(struct kvm_vcpu *vcpu, int timerid)
 
 	ctxt->vcpu = vcpu;
 
-	if (timerid == TIMER_VTIMER)
+	if (kvm_is_realm(vcpu->kvm))
+		ctxt->offset.vm_offset = NULL;
+	else if (timerid == TIMER_VTIMER)
 		ctxt->offset.vm_offset = &kvm->arch.timer_data.voffset;
 	else
 		ctxt->offset.vm_offset = &kvm->arch.timer_data.poffset;
@@ -1011,13 +1035,19 @@ static void timer_context_init(struct kvm_vcpu *vcpu, int timerid)
 void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
+	u64 cntvoff;
 
 	for (int i = 0; i < NR_KVM_TIMERS; i++)
 		timer_context_init(vcpu, i);
 
+	if (kvm_is_realm(vcpu->kvm))
+		cntvoff = 0;
+	else
+		cntvoff = kvm_phys_timer_read();
+
 	/* Synchronize offsets across timers of a VM if not already provided */
 	if (!test_bit(KVM_ARCH_FLAG_VM_COUNTER_OFFSET, &vcpu->kvm->arch.flags)) {
-		timer_set_offset(vcpu_vtimer(vcpu), kvm_phys_timer_read());
+		timer_set_offset(vcpu_vtimer(vcpu), cntvoff);
 		timer_set_offset(vcpu_ptimer(vcpu), 0);
 	}
 
@@ -1525,6 +1555,13 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
 		return -EINVAL;
 	}
 
+	/*
+	 * We don't use mapped IRQs for Realms because the RMI doesn't allow
+	 * us setting the LR.HW bit in the VGIC.
+	 */
+	if (vcpu_is_rec(vcpu))
+		return 0;
+
 	get_timer_map(vcpu, &map);
 
 	ret = kvm_vgic_map_phys_irq(vcpu,
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index c819c5d16613..d8ab297560d0 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -112,6 +112,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 
+void kvm_realm_timers_update(struct kvm_vcpu *vcpu);
+
 u64 kvm_phys_timer_read(void);
 
 void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 17/43] arm64: RME: Allow VMM to set RIPAS
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (15 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 16/43] KVM: arm64: Support timers in realm RECs Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-19  9:34     ` Suzuki K Poulose
                       ` (2 more replies)
  2024-04-12  8:42   ` [PATCH v2 18/43] arm64: RME: Handle realm enter/exit Steven Price
                     ` (25 subsequent siblings)
  42 siblings, 3 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Each page within the protected region of the realm guest can be marked
as either RAM or EMPTY. Allow the VMM to control this before the guest
has started and provide the equivalent functions to change this (with
the guest's approval) at runtime.

When transitioning from RIPAS RAM (1) to RIPAS EMPTY (0) the memory is
unmapped from the guest and undelegated allowing the memory to be reused
by the host. When transitioning to RIPAS RAM the actual population of
the leaf RTTs is done later on stage 2 fault, however it may be
necessary to allocate additional RTTs to represent the range requested.

When freeing a block mapping it is necessary to temporarily unfold the
RTT which requires delegating an extra page to the RMM, this page can
then be recovered once the contents of the block mapping have been
freed. A spare, delegated page (spare_page) is used for this purpose.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_rme.h |  16 ++
 arch/arm64/kvm/mmu.c             |   8 +-
 arch/arm64/kvm/rme.c             | 390 +++++++++++++++++++++++++++++++
 3 files changed, 411 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
index 915e76068b00..cc8f81cfc3c0 100644
--- a/arch/arm64/include/asm/kvm_rme.h
+++ b/arch/arm64/include/asm/kvm_rme.h
@@ -96,6 +96,14 @@ void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits);
 int kvm_create_rec(struct kvm_vcpu *vcpu);
 void kvm_destroy_rec(struct kvm_vcpu *vcpu);
 
+void kvm_realm_unmap_range(struct kvm *kvm,
+			   unsigned long ipa,
+			   u64 size,
+			   bool unmap_private);
+int realm_set_ipa_state(struct kvm_vcpu *vcpu,
+			unsigned long addr, unsigned long end,
+			unsigned long ripas);
+
 #define RME_RTT_BLOCK_LEVEL	2
 #define RME_RTT_MAX_LEVEL	3
 
@@ -114,4 +122,12 @@ static inline unsigned long rme_rtt_level_mapsize(int level)
 	return (1UL << RME_RTT_LEVEL_SHIFT(level));
 }
 
+static inline bool realm_is_addr_protected(struct realm *realm,
+					   unsigned long addr)
+{
+	unsigned int ia_bits = realm->ia_bits;
+
+	return !(addr & ~(BIT(ia_bits - 1) - 1));
+}
+
 #endif
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 46f0c4e80ace..8a7b5449697f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -310,6 +310,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
  * @start: The intermediate physical base address of the range to unmap
  * @size:  The size of the area to unmap
  * @may_block: Whether or not we are permitted to block
+ * @only_shared: If true then protected mappings should not be unmapped
  *
  * Clear a range of stage-2 mappings, lowering the various ref-counts.  Must
  * be called while holding mmu_lock (unless for freeing the stage2 pgd before
@@ -317,7 +318,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
  * with things behind our backs.
  */
 static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size,
-				 bool may_block)
+				 bool may_block, bool only_shared)
 {
 	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
 	phys_addr_t end = start + size;
@@ -330,7 +331,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
 
 static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
 {
-	__unmap_stage2_range(mmu, start, size, true);
+	__unmap_stage2_range(mmu, start, size, true, false);
 }
 
 static void stage2_flush_memslot(struct kvm *kvm,
@@ -1771,7 +1772,8 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 
 	__unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
 			     (range->end - range->start) << PAGE_SHIFT,
-			     range->may_block);
+			     range->may_block,
+			     range->only_shared);
 
 	return false;
 }
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 629a095bea61..9e5983c51393 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -79,6 +79,12 @@ static phys_addr_t __alloc_delegated_page(struct realm *realm,
 	return phys;
 }
 
+static phys_addr_t alloc_delegated_page(struct realm *realm,
+					struct kvm_mmu_memory_cache *mc)
+{
+	return __alloc_delegated_page(realm, mc, GFP_KERNEL);
+}
+
 static void free_delegated_page(struct realm *realm, phys_addr_t phys)
 {
 	if (realm->spare_page == PHYS_ADDR_MAX) {
@@ -94,6 +100,151 @@ static void free_delegated_page(struct realm *realm, phys_addr_t phys)
 	free_page((unsigned long)phys_to_virt(phys));
 }
 
+static int realm_rtt_create(struct realm *realm,
+			    unsigned long addr,
+			    int level,
+			    phys_addr_t phys)
+{
+	addr = ALIGN_DOWN(addr, rme_rtt_level_mapsize(level - 1));
+	return rmi_rtt_create(virt_to_phys(realm->rd), phys, addr, level);
+}
+
+static int realm_rtt_fold(struct realm *realm,
+			  unsigned long addr,
+			  int level,
+			  phys_addr_t *rtt_granule)
+{
+	unsigned long out_rtt;
+	int ret;
+
+	ret = rmi_rtt_fold(virt_to_phys(realm->rd), addr, level, &out_rtt);
+
+	if (RMI_RETURN_STATUS(ret) == RMI_SUCCESS && rtt_granule)
+		*rtt_granule = out_rtt;
+
+	return ret;
+}
+
+static int realm_destroy_protected(struct realm *realm,
+				   unsigned long ipa,
+				   unsigned long *next_addr)
+{
+	unsigned long rd = virt_to_phys(realm->rd);
+	unsigned long addr;
+	phys_addr_t rtt;
+	int ret;
+
+loop:
+	ret = rmi_data_destroy(rd, ipa, &addr, next_addr);
+	if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
+		if (*next_addr > ipa)
+			return 0; /* UNASSIGNED */
+		rtt = alloc_delegated_page(realm, NULL);
+		if (WARN_ON(rtt == PHYS_ADDR_MAX))
+			return -1;
+		/* ASSIGNED - ipa is mapped as a block, so split */
+		ret = realm_rtt_create(realm, ipa,
+				       RMI_RETURN_INDEX(ret) + 1, rtt);
+		if (WARN_ON(ret)) {
+			free_delegated_page(realm, rtt);
+			return -1;
+		}
+		/* retry */
+		goto loop;
+	} else if (WARN_ON(ret)) {
+		return -1;
+	}
+	ret = rmi_granule_undelegate(addr);
+
+	/*
+	 * If the undelegate fails then something has gone seriously
+	 * wrong: take an extra reference to just leak the page
+	 */
+	if (WARN_ON(ret))
+		get_page(phys_to_page(addr));
+
+	return 0;
+}
+
+static void realm_unmap_range_shared(struct kvm *kvm,
+				     int level,
+				     unsigned long start,
+				     unsigned long end)
+{
+	struct realm *realm = &kvm->arch.realm;
+	unsigned long rd = virt_to_phys(realm->rd);
+	ssize_t map_size = rme_rtt_level_mapsize(level);
+	unsigned long next_addr, addr;
+	unsigned long shared_bit = BIT(realm->ia_bits - 1);
+
+	if (WARN_ON(level > RME_RTT_MAX_LEVEL))
+		return;
+
+	start |= shared_bit;
+	end |= shared_bit;
+
+	for (addr = start; addr < end; addr = next_addr) {
+		unsigned long align_addr = ALIGN(addr, map_size);
+		int ret;
+
+		next_addr = ALIGN(addr + 1, map_size);
+
+		if (align_addr != addr || next_addr > end) {
+			/* Need to recurse deeper */
+			if (addr < align_addr)
+				next_addr = align_addr;
+			realm_unmap_range_shared(kvm, level + 1, addr,
+						 min(next_addr, end));
+			continue;
+		}
+
+		ret = rmi_rtt_unmap_unprotected(rd, addr, level, &next_addr);
+		switch (RMI_RETURN_STATUS(ret)) {
+		case RMI_SUCCESS:
+			break;
+		case RMI_ERROR_RTT:
+			if (next_addr == addr) {
+				next_addr = ALIGN(addr + 1, map_size);
+				realm_unmap_range_shared(kvm, level + 1, addr,
+							 next_addr);
+			}
+			break;
+		default:
+			WARN_ON(1);
+		}
+	}
+}
+
+static void realm_unmap_range_private(struct kvm *kvm,
+				      unsigned long start,
+				      unsigned long end)
+{
+	struct realm *realm = &kvm->arch.realm;
+	ssize_t map_size = RME_PAGE_SIZE;
+	unsigned long next_addr, addr;
+
+	for (addr = start; addr < end; addr = next_addr) {
+		int ret;
+
+		next_addr = ALIGN(addr + 1, map_size);
+
+		ret = realm_destroy_protected(realm, addr, &next_addr);
+
+		if (WARN_ON(ret))
+			break;
+	}
+}
+
+static void realm_unmap_range(struct kvm *kvm,
+			      unsigned long start,
+			      unsigned long end,
+			      bool unmap_private)
+{
+	realm_unmap_range_shared(kvm, RME_RTT_MAX_LEVEL - 1, start, end);
+	if (unmap_private)
+		realm_unmap_range_private(kvm, start, end);
+}
+
 u32 kvm_realm_ipa_limit(void)
 {
 	return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
@@ -190,6 +341,30 @@ static int realm_rtt_destroy(struct realm *realm, unsigned long addr,
 	return ret;
 }
 
+static int realm_create_rtt_levels(struct realm *realm,
+				   unsigned long ipa,
+				   int level,
+				   int max_level,
+				   struct kvm_mmu_memory_cache *mc)
+{
+	if (WARN_ON(level == max_level))
+		return 0;
+
+	while (level++ < max_level) {
+		phys_addr_t rtt = alloc_delegated_page(realm, mc);
+
+		if (rtt == PHYS_ADDR_MAX)
+			return -ENOMEM;
+
+		if (realm_rtt_create(realm, ipa, level, rtt)) {
+			free_delegated_page(realm, rtt);
+			return -ENXIO;
+		}
+	}
+
+	return 0;
+}
+
 static int realm_tear_down_rtt_level(struct realm *realm, int level,
 				     unsigned long start, unsigned long end)
 {
@@ -265,6 +440,68 @@ static int realm_tear_down_rtt_range(struct realm *realm,
 					 start, end);
 }
 
+/*
+ * Returns 0 on successful fold, a negative value on error, a positive value if
+ * we were not able to fold all tables at this level.
+ */
+static int realm_fold_rtt_level(struct realm *realm, int level,
+				unsigned long start, unsigned long end)
+{
+	int not_folded = 0;
+	ssize_t map_size;
+	unsigned long addr, next_addr;
+
+	if (WARN_ON(level > RME_RTT_MAX_LEVEL))
+		return -EINVAL;
+
+	map_size = rme_rtt_level_mapsize(level - 1);
+
+	for (addr = start; addr < end; addr = next_addr) {
+		phys_addr_t rtt_granule;
+		int ret;
+		unsigned long align_addr = ALIGN(addr, map_size);
+
+		next_addr = ALIGN(addr + 1, map_size);
+
+		ret = realm_rtt_fold(realm, align_addr, level, &rtt_granule);
+
+		switch (RMI_RETURN_STATUS(ret)) {
+		case RMI_SUCCESS:
+			if (!WARN_ON(rmi_granule_undelegate(rtt_granule)))
+				free_page((unsigned long)phys_to_virt(rtt_granule));
+			break;
+		case RMI_ERROR_RTT:
+			if (level == RME_RTT_MAX_LEVEL ||
+			    RMI_RETURN_INDEX(ret) < level) {
+				not_folded++;
+				break;
+			}
+			/* Recurse a level deeper */
+			ret = realm_fold_rtt_level(realm,
+						   level + 1,
+						   addr,
+						   next_addr);
+			if (ret < 0)
+				return ret;
+			else if (ret == 0)
+				/* Try again at this level */
+				next_addr = addr;
+			break;
+		default:
+			return -ENXIO;
+		}
+	}
+
+	return not_folded;
+}
+
+static int realm_fold_rtt_range(struct realm *realm,
+				unsigned long start, unsigned long end)
+{
+	return realm_fold_rtt_level(realm, get_start_level(realm) + 1,
+				    start, end);
+}
+
 static void ensure_spare_page(struct realm *realm)
 {
 	phys_addr_t tmp_rtt;
@@ -295,6 +532,147 @@ void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits)
 	WARN_ON(realm_tear_down_rtt_range(realm, 0, (1UL << ia_bits)));
 }
 
+void kvm_realm_unmap_range(struct kvm *kvm, unsigned long ipa, u64 size,
+			   bool unmap_private)
+{
+	unsigned long end = ipa + size;
+	struct realm *realm = &kvm->arch.realm;
+
+	end = min(BIT(realm->ia_bits - 1), end);
+
+	ensure_spare_page(realm);
+
+	realm_unmap_range(kvm, ipa, end, unmap_private);
+
+	realm_fold_rtt_range(realm, ipa, end);
+}
+
+static int find_map_level(struct realm *realm,
+			  unsigned long start,
+			  unsigned long end)
+{
+	int level = RME_RTT_MAX_LEVEL;
+
+	while (level > get_start_level(realm)) {
+		unsigned long map_size = rme_rtt_level_mapsize(level - 1);
+
+		if (!IS_ALIGNED(start, map_size) ||
+		    (start + map_size) > end)
+			break;
+
+		level--;
+	}
+
+	return level;
+}
+
+int realm_set_ipa_state(struct kvm_vcpu *vcpu,
+			unsigned long start,
+			unsigned long end,
+			unsigned long ripas)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct realm *realm = &kvm->arch.realm;
+	struct realm_rec *rec = &vcpu->arch.rec;
+	phys_addr_t rd_phys = virt_to_phys(realm->rd);
+	phys_addr_t rec_phys = virt_to_phys(rec->rec_page);
+	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+	unsigned long ipa = start;
+	int ret = 0;
+
+	while (ipa < end) {
+		unsigned long next;
+
+		ret = rmi_rtt_set_ripas(rd_phys, rec_phys, ipa, end, &next);
+
+		if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
+			int walk_level = RMI_RETURN_INDEX(ret);
+			int level = find_map_level(realm, ipa, end);
+
+			if (walk_level < level) {
+				ret = realm_create_rtt_levels(realm, ipa,
+							      walk_level,
+							      level,
+							      memcache);
+				if (!ret)
+					continue;
+			} else {
+				ret = -EINVAL;
+			}
+
+			break;
+		} else if (RMI_RETURN_STATUS(ret) != RMI_SUCCESS) {
+			WARN(1, "Unexpected error in %s: %#x\n", __func__,
+			     ret);
+			ret = -EINVAL;
+			break;
+		}
+		ipa = next;
+	}
+
+	if (ripas == RMI_EMPTY && ipa != start)
+		kvm_realm_unmap_range(kvm, start, ipa - start, true);
+
+	return ret;
+}
+
+static int realm_init_ipa_state(struct realm *realm,
+				unsigned long ipa,
+				unsigned long end)
+{
+	phys_addr_t rd_phys = virt_to_phys(realm->rd);
+	int ret;
+
+	while (ipa < end) {
+		unsigned long next;
+
+		ret = rmi_rtt_init_ripas(rd_phys, ipa, end, &next);
+
+		if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
+			int err_level = RMI_RETURN_INDEX(ret);
+			int level = find_map_level(realm, ipa, end);
+
+			if (WARN_ON(err_level >= level))
+				return -ENXIO;
+
+			ret = realm_create_rtt_levels(realm, ipa,
+						      err_level,
+						      level, NULL);
+			if (ret)
+				return ret;
+			/* Retry with the RTT levels in place */
+			continue;
+		} else if (WARN_ON(ret)) {
+			return -ENXIO;
+		}
+
+		ipa = next;
+	}
+
+	return 0;
+}
+
+static int kvm_init_ipa_range_realm(struct kvm *kvm,
+				    struct kvm_cap_arm_rme_init_ipa_args *args)
+{
+	int ret = 0;
+	gpa_t addr, end;
+	struct realm *realm = &kvm->arch.realm;
+
+	addr = args->init_ipa_base;
+	end = addr + args->init_ipa_size;
+
+	if (end < addr)
+		return -EINVAL;
+
+	if (kvm_realm_state(kvm) != REALM_STATE_NEW)
+		return -EINVAL;
+
+	ret = realm_init_ipa_state(realm, addr, end);
+
+	return ret;
+}
+
 /* Protects access to rme_vmid_bitmap */
 static DEFINE_SPINLOCK(rme_vmid_lock);
 static unsigned long *rme_vmid_bitmap;
@@ -418,6 +796,18 @@ int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
 	case KVM_CAP_ARM_RME_CREATE_RD:
 		r = kvm_create_realm(kvm);
 		break;
+	case KVM_CAP_ARM_RME_INIT_IPA_REALM: {
+		struct kvm_cap_arm_rme_init_ipa_args args;
+		void __user *argp = u64_to_user_ptr(cap->args[1]);
+
+		if (copy_from_user(&args, argp, sizeof(args))) {
+			r = -EFAULT;
+			break;
+		}
+
+		r = kvm_init_ipa_range_realm(kvm, &args);
+		break;
+	}
 	default:
 		r = -EINVAL;
 		break;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 18/43] arm64: RME: Handle realm enter/exit
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (16 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 17/43] arm64: RME: Allow VMM to set RIPAS Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-19 13:00     ` Suzuki K Poulose
  2024-04-12  8:42   ` [PATCH v2 19/43] KVM: arm64: Handle realm MMIO emulation Steven Price
                     ` (24 subsequent siblings)
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Entering a realm is done using a SMC call to the RMM. On exit the
exit-codes need to be handled slightly differently to the normal KVM
path so define our own functions for realm enter/exit and hook them
in if the guest is a realm guest.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_rme.h |   3 +
 arch/arm64/kvm/Makefile          |   2 +-
 arch/arm64/kvm/arm.c             |  19 +++-
 arch/arm64/kvm/rme-exit.c        | 180 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/rme.c             |  11 ++
 5 files changed, 209 insertions(+), 6 deletions(-)
 create mode 100644 arch/arm64/kvm/rme-exit.c

diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
index cc8f81cfc3c0..749f2eb97bd4 100644
--- a/arch/arm64/include/asm/kvm_rme.h
+++ b/arch/arm64/include/asm/kvm_rme.h
@@ -96,6 +96,9 @@ void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits);
 int kvm_create_rec(struct kvm_vcpu *vcpu);
 void kvm_destroy_rec(struct kvm_vcpu *vcpu);
 
+int kvm_rec_enter(struct kvm_vcpu *vcpu);
+int handle_rme_exit(struct kvm_vcpu *vcpu, int rec_run_status);
+
 void kvm_realm_unmap_range(struct kvm *kvm,
 			   unsigned long ipa,
 			   u64 size,
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 1c1d8cdf381f..2eec980cbe5c 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -21,7 +21,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 vgic/vgic-mmio.o vgic/vgic-mmio-v2.o \
 	 vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
 	 vgic/vgic-its.o vgic/vgic-debug.o \
-	 rme.o
+	 rme.o rme-exit.o
 
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o pmu.o
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f8daf478114f..e92afb3835f6 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1130,7 +1130,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		trace_kvm_entry(*vcpu_pc(vcpu));
 		guest_timing_enter_irqoff();
 
-		ret = kvm_arm_vcpu_enter_exit(vcpu);
+		if (vcpu_is_rec(vcpu))
+			ret = kvm_rec_enter(vcpu);
+		else
+			ret = kvm_arm_vcpu_enter_exit(vcpu);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		vcpu->stat.exits++;
@@ -1184,10 +1187,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
 		local_irq_enable();
 
-		trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
-
 		/* Exit types that need handling before we can be preempted */
-		handle_exit_early(vcpu, ret);
+		if (!vcpu_is_rec(vcpu)) {
+			trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu),
+				       *vcpu_pc(vcpu));
+
+			handle_exit_early(vcpu, ret);
+		}
 
 		preempt_enable();
 
@@ -1210,7 +1216,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 			ret = ARM_EXCEPTION_IL;
 		}
 
-		ret = handle_exit(vcpu, ret);
+		if (vcpu_is_rec(vcpu))
+			ret = handle_rme_exit(vcpu, ret);
+		else
+			ret = handle_exit(vcpu, ret);
 	}
 
 	/* Tell userspace about in-kernel device output levels */
diff --git a/arch/arm64/kvm/rme-exit.c b/arch/arm64/kvm/rme-exit.c
new file mode 100644
index 000000000000..5bf58c9b42b7
--- /dev/null
+++ b/arch/arm64/kvm/rme-exit.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+
+#include <linux/kvm_host.h>
+#include <kvm/arm_hypercalls.h>
+#include <kvm/arm_psci.h>
+
+#include <asm/rmi_smc.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_rme.h>
+#include <asm/kvm_mmu.h>
+
+typedef int (*exit_handler_fn)(struct kvm_vcpu *vcpu);
+
+static int rec_exit_reason_notimpl(struct kvm_vcpu *vcpu)
+{
+	struct realm_rec *rec = &vcpu->arch.rec;
+
+	pr_err("[vcpu %d] Unhandled exit reason from realm (ESR: %#llx)\n",
+	       vcpu->vcpu_id, rec->run->exit.esr);
+	return -ENXIO;
+}
+
+static int rec_exit_sync_dabt(struct kvm_vcpu *vcpu)
+{
+	return kvm_handle_guest_abort(vcpu);
+}
+
+static int rec_exit_sync_iabt(struct kvm_vcpu *vcpu)
+{
+	struct realm_rec *rec = &vcpu->arch.rec;
+
+	pr_err("[vcpu %d] Unhandled instruction abort (ESR: %#llx).\n",
+	       vcpu->vcpu_id, rec->run->exit.esr);
+	return -ENXIO;
+}
+
+static int rec_exit_sys_reg(struct kvm_vcpu *vcpu)
+{
+	struct realm_rec *rec = &vcpu->arch.rec;
+	unsigned long esr = kvm_vcpu_get_esr(vcpu);
+	int rt = kvm_vcpu_sys_get_rt(vcpu);
+	bool is_write = !(esr & 1);
+	int ret;
+
+	if (is_write)
+		vcpu_set_reg(vcpu, rt, rec->run->exit.gprs[0]);
+
+	ret = kvm_handle_sys_reg(vcpu);
+
+	if (ret >= 0 && !is_write)
+		rec->run->entry.gprs[0] = vcpu_get_reg(vcpu, rt);
+
+	return ret;
+}
+
+static exit_handler_fn rec_exit_handlers[] = {
+	[0 ... ESR_ELx_EC_MAX]	= rec_exit_reason_notimpl,
+	[ESR_ELx_EC_SYS64]	= rec_exit_sys_reg,
+	[ESR_ELx_EC_DABT_LOW]	= rec_exit_sync_dabt,
+	[ESR_ELx_EC_IABT_LOW]	= rec_exit_sync_iabt
+};
+
+static int rec_exit_psci(struct kvm_vcpu *vcpu)
+{
+	struct realm_rec *rec = &vcpu->arch.rec;
+	int i;
+	int ret;
+
+	for (i = 0; i < REC_RUN_GPRS; i++)
+		vcpu_set_reg(vcpu, i, rec->run->exit.gprs[i]);
+
+	ret = kvm_smccc_call_handler(vcpu);
+
+	for (i = 0; i < REC_RUN_GPRS; i++)
+		rec->run->entry.gprs[i] = vcpu_get_reg(vcpu, i);
+
+	return ret;
+}
+
+static int rec_exit_ripas_change(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct realm *realm = &kvm->arch.realm;
+	struct realm_rec *rec = &vcpu->arch.rec;
+	unsigned long base = rec->run->exit.ripas_base;
+	unsigned long top = rec->run->exit.ripas_top;
+	unsigned long ripas = rec->run->exit.ripas_value & 1;
+	int ret = -EINVAL;
+
+	if (realm_is_addr_protected(realm, base) &&
+	    realm_is_addr_protected(realm, top - 1)) {
+		kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache,
+					   kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu));
+		write_lock(&kvm->mmu_lock);
+		ret = realm_set_ipa_state(vcpu, base, top, ripas);
+		write_unlock(&kvm->mmu_lock);
+	}
+
+	WARN(ret && ret != -ENOMEM,
+	     "Unable to satisfy SET_IPAS for %#lx - %#lx, ripas: %#lx\n",
+	     base, top, ripas);
+
+	/* Exit to VMM to complete the change */
+	kvm_prepare_memory_fault_exit(vcpu, base, top - base, false, false,
+				      ripas == 1);
+
+	return 0;
+}
+
+static void update_arch_timer_irq_lines(struct kvm_vcpu *vcpu)
+{
+	struct realm_rec *rec = &vcpu->arch.rec;
+
+	__vcpu_sys_reg(vcpu, CNTV_CTL_EL0) = rec->run->exit.cntv_ctl;
+	__vcpu_sys_reg(vcpu, CNTV_CVAL_EL0) = rec->run->exit.cntv_cval;
+	__vcpu_sys_reg(vcpu, CNTP_CTL_EL0) = rec->run->exit.cntp_ctl;
+	__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = rec->run->exit.cntp_cval;
+
+	kvm_realm_timers_update(vcpu);
+}
+
+/*
+ * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
+ * proper exit to userspace.
+ */
+int handle_rme_exit(struct kvm_vcpu *vcpu, int rec_run_ret)
+{
+	struct realm_rec *rec = &vcpu->arch.rec;
+	u8 esr_ec = ESR_ELx_EC(rec->run->exit.esr);
+	unsigned long status, index;
+
+	status = RMI_RETURN_STATUS(rec_run_ret);
+	index = RMI_RETURN_INDEX(rec_run_ret);
+
+	/*
+	 * If a PSCI_SYSTEM_OFF request raced with a vcpu executing, we might
+	 * see the following status code and index indicating an attempt to run
+	 * a REC when the RD state is SYSTEM_OFF.  In this case, we just need to
+	 * return to user space which can deal with the system event or will try
+	 * to run the KVM VCPU again, at which point we will no longer attempt
+	 * to enter the Realm because we will have a sleep request pending on
+	 * the VCPU as a result of KVM's PSCI handling.
+	 */
+	if (status == RMI_ERROR_REALM && index == 1) {
+		vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
+		return 0;
+	}
+
+	if (rec_run_ret)
+		return -ENXIO;
+
+	vcpu->arch.fault.esr_el2 = rec->run->exit.esr;
+	vcpu->arch.fault.far_el2 = rec->run->exit.far;
+	vcpu->arch.fault.hpfar_el2 = rec->run->exit.hpfar;
+
+	update_arch_timer_irq_lines(vcpu);
+
+	/* Reset the emulation flags for the next run of the REC */
+	rec->run->entry.flags = 0;
+
+	switch (rec->run->exit.exit_reason) {
+	case RMI_EXIT_SYNC:
+		return rec_exit_handlers[esr_ec](vcpu);
+	case RMI_EXIT_IRQ:
+	case RMI_EXIT_FIQ:
+		return 1;
+	case RMI_EXIT_PSCI:
+		return rec_exit_psci(vcpu);
+	case RMI_EXIT_RIPAS_CHANGE:
+		return rec_exit_ripas_change(vcpu);
+	}
+
+	kvm_pr_unimpl("Unsupported exit reason: %u\n",
+		      rec->run->exit.exit_reason);
+	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+	return 0;
+}
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 9e5983c51393..0a3f823b2446 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -864,6 +864,17 @@ void kvm_destroy_realm(struct kvm *kvm)
 	kvm_free_stage2_pgd(&kvm->arch.mmu);
 }
 
+int kvm_rec_enter(struct kvm_vcpu *vcpu)
+{
+	struct realm_rec *rec = &vcpu->arch.rec;
+
+	if (kvm_realm_state(vcpu->kvm) != REALM_STATE_ACTIVE)
+		return -EINVAL;
+
+	return rmi_rec_enter(virt_to_phys(rec->rec_page),
+			     virt_to_phys(rec->run));
+}
+
 static void free_rec_aux(struct page **aux_pages,
 			 unsigned int num_aux)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 19/43] KVM: arm64: Handle realm MMIO emulation
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (17 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 18/43] arm64: RME: Handle realm enter/exit Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 20/43] arm64: RME: Allow populating initial contents Steven Price
                     ` (23 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

MMIO emulation for a realm cannot be done directly with the VM's
registers as they are protected from the host. However, for emulatable
data aborts, the RMM uses GPRS[0] to provide the read/written value.
We can transfer this from/to the equivalent VCPU's register entry and
then depend on the generic MMIO handling code in KVM.

For a MMIO read, the value is placed in the shared RecExit structure
during kvm_handle_mmio_return() rather than in the VCPU's register
entry.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/mmio.c     | 10 +++++++++-
 arch/arm64/kvm/rme-exit.c |  6 ++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
index 200c8019a82a..589a1a7601ad 100644
--- a/arch/arm64/kvm/mmio.c
+++ b/arch/arm64/kvm/mmio.c
@@ -6,6 +6,7 @@
 
 #include <linux/kvm_host.h>
 #include <asm/kvm_emulate.h>
+#include <asm/rmi_smc.h>
 #include <trace/events/kvm.h>
 
 #include "trace.h"
@@ -90,6 +91,9 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu)
 
 	vcpu->mmio_needed = 0;
 
+	if (vcpu_is_rec(vcpu))
+		vcpu->arch.rec.run->entry.flags |= RMI_EMULATED_MMIO;
+
 	if (!kvm_vcpu_dabt_iswrite(vcpu)) {
 		struct kvm_run *run = vcpu->run;
 
@@ -108,7 +112,11 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu)
 		trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
 			       &data);
 		data = vcpu_data_host_to_guest(vcpu, data, len);
-		vcpu_set_reg(vcpu, kvm_vcpu_dabt_get_rd(vcpu), data);
+
+		if (vcpu_is_rec(vcpu))
+			vcpu->arch.rec.run->entry.gprs[0] = data;
+		else
+			vcpu_set_reg(vcpu, kvm_vcpu_dabt_get_rd(vcpu), data);
 	}
 
 	/*
diff --git a/arch/arm64/kvm/rme-exit.c b/arch/arm64/kvm/rme-exit.c
index 5bf58c9b42b7..2ddaec09cc98 100644
--- a/arch/arm64/kvm/rme-exit.c
+++ b/arch/arm64/kvm/rme-exit.c
@@ -25,6 +25,12 @@ static int rec_exit_reason_notimpl(struct kvm_vcpu *vcpu)
 
 static int rec_exit_sync_dabt(struct kvm_vcpu *vcpu)
 {
+	struct realm_rec *rec = &vcpu->arch.rec;
+
+	if (kvm_vcpu_dabt_iswrite(vcpu) && kvm_vcpu_dabt_isvalid(vcpu))
+		vcpu_set_reg(vcpu, kvm_vcpu_dabt_get_rd(vcpu),
+			     rec->run->exit.gprs[0]);
+
 	return kvm_handle_guest_abort(vcpu);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 20/43] arm64: RME: Allow populating initial contents
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (18 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 19/43] KVM: arm64: Handle realm MMIO emulation Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-19 13:17     ` Suzuki K Poulose
  2024-04-12  8:42   ` [PATCH v2 21/43] arm64: RME: Runtime faulting of memory Steven Price
                     ` (22 subsequent siblings)
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

The VMM needs to populate the realm with some data before starting (e.g.
a kernel and initrd). This is measured by the RMM and used as part of
the attestation later on.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/rme.c | 234 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 234 insertions(+)

diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 0a3f823b2446..4aab507f896e 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/kvm_host.h>
+#include <linux/hugetlb.h>
 
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
@@ -547,6 +548,227 @@ void kvm_realm_unmap_range(struct kvm *kvm, unsigned long ipa, u64 size,
 	realm_fold_rtt_range(realm, ipa, end);
 }
 
+static int realm_create_protected_data_page(struct realm *realm,
+					    unsigned long ipa,
+					    struct page *dst_page,
+					    struct page *src_page,
+					    unsigned long flags)
+{
+	phys_addr_t dst_phys, src_phys;
+	int ret;
+
+	dst_phys = page_to_phys(dst_page);
+	src_phys = page_to_phys(src_page);
+
+	if (rmi_granule_delegate(dst_phys))
+		return -ENXIO;
+
+	ret = rmi_data_create(virt_to_phys(realm->rd), dst_phys, ipa, src_phys,
+			      flags);
+
+	if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
+		/* Create missing RTTs and retry */
+		int level = RMI_RETURN_INDEX(ret);
+
+		ret = realm_create_rtt_levels(realm, ipa, level,
+					      RME_RTT_MAX_LEVEL, NULL);
+		if (ret)
+			goto err;
+
+		ret = rmi_data_create(virt_to_phys(realm->rd), dst_phys, ipa,
+				      src_phys, flags);
+	}
+
+	if (ret)
+		goto err;
+
+	return 0;
+
+err:
+	if (WARN_ON(rmi_granule_undelegate(dst_phys))) {
+		/* Page can't be returned to NS world so is lost */
+		get_page(dst_page);
+	}
+	return -ENXIO;
+}
+
+static int fold_rtt(struct realm *realm, unsigned long addr, int level)
+{
+	phys_addr_t rtt_addr;
+	int ret;
+
+	ret = realm_rtt_fold(realm, addr, level + 1, &rtt_addr);
+	if (ret)
+		return ret;
+
+	free_delegated_page(realm, rtt_addr);
+
+	return 0;
+}
+
+static int populate_par_region(struct kvm *kvm,
+			       phys_addr_t ipa_base,
+			       phys_addr_t ipa_end,
+			       u32 flags)
+{
+	struct realm *realm = &kvm->arch.realm;
+	struct kvm_memory_slot *memslot;
+	gfn_t base_gfn, end_gfn;
+	int idx;
+	phys_addr_t ipa;
+	int ret = 0;
+	struct page *tmp_page;
+	unsigned long data_flags = 0;
+
+	base_gfn = gpa_to_gfn(ipa_base);
+	end_gfn = gpa_to_gfn(ipa_end);
+
+	if (flags & KVM_ARM_RME_POPULATE_FLAGS_MEASURE)
+		data_flags = RMI_MEASURE_CONTENT;
+
+	idx = srcu_read_lock(&kvm->srcu);
+	memslot = gfn_to_memslot(kvm, base_gfn);
+	if (!memslot) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	/* We require the region to be contained within a single memslot */
+	if (memslot->base_gfn + memslot->npages < end_gfn) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	tmp_page = alloc_page(GFP_KERNEL);
+	if (!tmp_page) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	mmap_read_lock(current->mm);
+
+	ipa = ipa_base;
+	while (ipa < ipa_end) {
+		struct vm_area_struct *vma;
+		unsigned long map_size;
+		unsigned int vma_shift;
+		unsigned long offset;
+		unsigned long hva;
+		struct page *page;
+		kvm_pfn_t pfn;
+		int level;
+
+		hva = gfn_to_hva_memslot(memslot, gpa_to_gfn(ipa));
+		vma = vma_lookup(current->mm, hva);
+		if (!vma) {
+			ret = -EFAULT;
+			break;
+		}
+
+		if (is_vm_hugetlb_page(vma))
+			vma_shift = huge_page_shift(hstate_vma(vma));
+		else
+			vma_shift = PAGE_SHIFT;
+
+		map_size = 1 << vma_shift;
+
+		/*
+		 * FIXME: This causes over mapping, but there's no good
+		 * solution here with the ABI as it stands
+		 */
+		ipa = ALIGN_DOWN(ipa, map_size);
+
+		switch (map_size) {
+		case RME_L2_BLOCK_SIZE:
+			level = 2;
+			break;
+		case PAGE_SIZE:
+			level = 3;
+			break;
+		default:
+			WARN_ONCE(1, "Unsupport vma_shift %d", vma_shift);
+			ret = -EFAULT;
+			break;
+		}
+
+		pfn = gfn_to_pfn_memslot(memslot, gpa_to_gfn(ipa));
+
+		if (is_error_pfn(pfn)) {
+			ret = -EFAULT;
+			break;
+		}
+
+		if (level < RME_RTT_MAX_LEVEL) {
+			/*
+			 * A temporary RTT is needed during the map, precreate
+			 * it, however if there is an error (e.g. missing
+			 * parent tables) this will be handled in the
+			 * realm_create_protected_data_page() call.
+			 */
+			realm_create_rtt_levels(realm, ipa, level,
+						RME_RTT_MAX_LEVEL, NULL);
+		}
+
+		page = pfn_to_page(pfn);
+
+		for (offset = 0; offset < map_size && !ret;
+		     offset += PAGE_SIZE, page++) {
+			phys_addr_t page_ipa = ipa + offset;
+
+			ret = realm_create_protected_data_page(realm, page_ipa,
+							       page, tmp_page,
+							       data_flags);
+		}
+		if (ret)
+			goto err_release_pfn;
+
+		if (level == 2) {
+			ret = fold_rtt(realm, ipa, level);
+			if (ret)
+				goto err_release_pfn;
+		}
+
+		ipa += map_size;
+		kvm_release_pfn_dirty(pfn);
+err_release_pfn:
+		if (ret) {
+			kvm_release_pfn_clean(pfn);
+			break;
+		}
+	}
+
+	mmap_read_unlock(current->mm);
+	__free_page(tmp_page);
+
+out:
+	srcu_read_unlock(&kvm->srcu, idx);
+	return ret;
+}
+
+static int kvm_populate_realm(struct kvm *kvm,
+			      struct kvm_cap_arm_rme_populate_realm_args *args)
+{
+	phys_addr_t ipa_base, ipa_end;
+
+	if (kvm_realm_state(kvm) != REALM_STATE_NEW)
+		return -EINVAL;
+
+	if (!IS_ALIGNED(args->populate_ipa_base, PAGE_SIZE) ||
+	    !IS_ALIGNED(args->populate_ipa_size, PAGE_SIZE))
+		return -EINVAL;
+
+	if (args->flags & ~RMI_MEASURE_CONTENT)
+		return -EINVAL;
+
+	ipa_base = args->populate_ipa_base;
+	ipa_end = ipa_base + args->populate_ipa_size;
+
+	if (ipa_end < ipa_base)
+		return -EINVAL;
+
+	return populate_par_region(kvm, ipa_base, ipa_end, args->flags);
+}
+
 static int find_map_level(struct realm *realm,
 			  unsigned long start,
 			  unsigned long end)
@@ -808,6 +1030,18 @@ int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
 		r = kvm_init_ipa_range_realm(kvm, &args);
 		break;
 	}
+	case KVM_CAP_ARM_RME_POPULATE_REALM: {
+		struct kvm_cap_arm_rme_populate_realm_args args;
+		void __user *argp = u64_to_user_ptr(cap->args[1]);
+
+		if (copy_from_user(&args, argp, sizeof(args))) {
+			r = -EFAULT;
+			break;
+		}
+
+		r = kvm_populate_realm(kvm, &args);
+		break;
+	}
 	default:
 		r = -EINVAL;
 		break;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 21/43] arm64: RME: Runtime faulting of memory
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (19 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 20/43] arm64: RME: Allow populating initial contents Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-25 10:43     ` Fuad Tabba
  2024-04-12  8:42   ` [PATCH v2 22/43] KVM: arm64: Handle realm VCPU load Steven Price
                     ` (21 subsequent siblings)
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

At runtime if the realm guest accesses memory which hasn't yet been
mapped then KVM needs to either populate the region or fault the guest.

For memory in the lower (protected) region of IPA a fresh page is
provided to the RMM which will zero the contents. For memory in the
upper (shared) region of IPA, the memory from the memslot is mapped
into the realm VM non secure.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h |  10 ++
 arch/arm64/include/asm/kvm_rme.h     |  10 ++
 arch/arm64/kvm/mmu.c                 | 119 +++++++++++++++-
 arch/arm64/kvm/rme.c                 | 199 ++++++++++++++++++++++++---
 4 files changed, 316 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 2209a7c6267f..d40d998d9be2 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -629,6 +629,16 @@ static inline bool kvm_realm_is_created(struct kvm *kvm)
 	return kvm_is_realm(kvm) && kvm_realm_state(kvm) != REALM_STATE_NONE;
 }
 
+static inline gpa_t kvm_gpa_stolen_bits(struct kvm *kvm)
+{
+	if (kvm_is_realm(kvm)) {
+		struct realm *realm = &kvm->arch.realm;
+
+		return BIT(realm->ia_bits - 1);
+	}
+	return 0;
+}
+
 static inline bool vcpu_is_rec(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_rme_is_available))
diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
index 749f2eb97bd4..48c7766fadeb 100644
--- a/arch/arm64/include/asm/kvm_rme.h
+++ b/arch/arm64/include/asm/kvm_rme.h
@@ -103,6 +103,16 @@ void kvm_realm_unmap_range(struct kvm *kvm,
 			   unsigned long ipa,
 			   u64 size,
 			   bool unmap_private);
+int realm_map_protected(struct realm *realm,
+			unsigned long base_ipa,
+			struct page *dst_page,
+			unsigned long map_size,
+			struct kvm_mmu_memory_cache *memcache);
+int realm_map_non_secure(struct realm *realm,
+			 unsigned long ipa,
+			 struct page *page,
+			 unsigned long map_size,
+			 struct kvm_mmu_memory_cache *memcache);
 int realm_set_ipa_state(struct kvm_vcpu *vcpu,
 			unsigned long addr, unsigned long end,
 			unsigned long ripas);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8a7b5449697f..50a49e4e2020 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -325,8 +325,13 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
 	WARN_ON(size & ~PAGE_MASK);
-	WARN_ON(stage2_apply_range(mmu, start, end, kvm_pgtable_stage2_unmap,
-				   may_block));
+
+	if (kvm_is_realm(kvm))
+		kvm_realm_unmap_range(kvm, start, size, !only_shared);
+	else
+		WARN_ON(stage2_apply_range(mmu, start, end,
+					   kvm_pgtable_stage2_unmap,
+					   may_block));
 }
 
 static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
@@ -340,7 +345,11 @@ static void stage2_flush_memslot(struct kvm *kvm,
 	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
 	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
 
-	stage2_apply_range_resched(&kvm->arch.mmu, addr, end, kvm_pgtable_stage2_flush);
+	if (kvm_is_realm(kvm))
+		kvm_realm_unmap_range(kvm, addr, end - addr, false);
+	else
+		stage2_apply_range_resched(&kvm->arch.mmu, addr, end,
+					   kvm_pgtable_stage2_flush);
 }
 
 /**
@@ -997,6 +1006,10 @@ void stage2_unmap_vm(struct kvm *kvm)
 	struct kvm_memory_slot *memslot;
 	int idx, bkt;
 
+	/* For realms this is handled by the RMM so nothing to do here */
+	if (kvm_is_realm(kvm))
+		return;
+
 	idx = srcu_read_lock(&kvm->srcu);
 	mmap_read_lock(current->mm);
 	write_lock(&kvm->mmu_lock);
@@ -1020,6 +1033,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 	if (kvm_is_realm(kvm) &&
 	    (kvm_realm_state(kvm) != REALM_STATE_DEAD &&
 	     kvm_realm_state(kvm) != REALM_STATE_NONE)) {
+		unmap_stage2_range(mmu, 0, (~0ULL) & PAGE_MASK);
 		write_unlock(&kvm->mmu_lock);
 		kvm_realm_destroy_rtts(kvm, pgt->ia_bits);
 		return;
@@ -1383,6 +1397,69 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
 	return vma->vm_flags & VM_MTE_ALLOWED;
 }
 
+static int realm_map_ipa(struct kvm *kvm, phys_addr_t ipa,
+			 kvm_pfn_t pfn, unsigned long map_size,
+			 enum kvm_pgtable_prot prot,
+			 struct kvm_mmu_memory_cache *memcache)
+{
+	struct realm *realm = &kvm->arch.realm;
+	struct page *page = pfn_to_page(pfn);
+
+	if (WARN_ON(!(prot & KVM_PGTABLE_PROT_W)))
+		return -EFAULT;
+
+	if (!realm_is_addr_protected(realm, ipa))
+		return realm_map_non_secure(realm, ipa, page, map_size,
+					    memcache);
+
+	return realm_map_protected(realm, ipa, page, map_size, memcache);
+}
+
+static int private_memslot_fault(struct kvm_vcpu *vcpu,
+				 phys_addr_t fault_ipa,
+				 struct kvm_memory_slot *memslot)
+{
+	struct kvm *kvm = vcpu->kvm;
+	gpa_t gpa_stolen_mask = kvm_gpa_stolen_bits(kvm);
+	gfn_t gfn = (fault_ipa & ~gpa_stolen_mask) >> PAGE_SHIFT;
+	bool is_priv_gfn = !((fault_ipa & gpa_stolen_mask) == gpa_stolen_mask);
+	bool priv_exists = kvm_mem_is_private(kvm, gfn);
+	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
+	int order;
+	kvm_pfn_t pfn;
+	int ret;
+
+	if (priv_exists != is_priv_gfn) {
+		kvm_prepare_memory_fault_exit(vcpu,
+					      fault_ipa & ~gpa_stolen_mask,
+					      PAGE_SIZE,
+					      kvm_is_write_fault(vcpu),
+					      false, is_priv_gfn);
+
+		return 0;
+	}
+
+	if (!is_priv_gfn) {
+		/* Not a private mapping, handling normally */
+		return -EAGAIN;
+	}
+
+	if (kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &order))
+		return 1; /* Retry */
+
+	ret = kvm_mmu_topup_memory_cache(memcache,
+					 kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu));
+	if (ret)
+		return ret;
+
+	/* FIXME: Should be able to use bigger than PAGE_SIZE mappings */
+	ret = realm_map_ipa(kvm, fault_ipa, pfn, PAGE_SIZE, KVM_PGTABLE_PROT_W,
+			     memcache);
+	if (!ret)
+		return 1; /* Handled */
+	return ret;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  bool fault_is_perm)
@@ -1402,10 +1479,19 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
+	gpa_t gpa_stolen_mask = kvm_gpa_stolen_bits(vcpu->kvm);
 
 	if (fault_is_perm)
 		fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
 	write_fault = kvm_is_write_fault(vcpu);
+
+	/*
+	 * Realms cannot map protected pages read-only
+	 * FIXME: It should be possible to map unprotected pages read-only
+	 */
+	if (vcpu_is_rec(vcpu))
+		write_fault = true;
+
 	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
 	VM_BUG_ON(write_fault && exec_fault);
 
@@ -1478,7 +1564,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
 		fault_ipa &= ~(vma_pagesize - 1);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	gfn = (fault_ipa & ~gpa_stolen_mask) >> PAGE_SHIFT;
 	mte_allowed = kvm_vma_mte_allowed(vma);
 
 	vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
@@ -1538,7 +1624,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * If we are not forced to use page mapping, check if we are
 	 * backed by a THP and thus use block mapping if possible.
 	 */
-	if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
+	/* FIXME: We shouldn't need to disable this for realms */
+	if (vma_pagesize == PAGE_SIZE && !(force_pte || device || kvm_is_realm(kvm))) {
 		if (fault_is_perm && fault_granule > PAGE_SIZE)
 			vma_pagesize = fault_granule;
 		else
@@ -1584,6 +1671,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	if (fault_is_perm && vma_pagesize == fault_granule)
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
+	else if (kvm_is_realm(kvm))
+		ret = realm_map_ipa(kvm, fault_ipa, pfn, vma_pagesize,
+				    prot, memcache);
 	else
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
 					     __pfn_to_phys(pfn), prot,
@@ -1638,6 +1728,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	struct kvm_memory_slot *memslot;
 	unsigned long hva;
 	bool is_iabt, write_fault, writable;
+	gpa_t gpa_stolen_mask = kvm_gpa_stolen_bits(vcpu->kvm);
 	gfn_t gfn;
 	int ret, idx;
 
@@ -1693,8 +1784,15 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	gfn = (fault_ipa & ~gpa_stolen_mask) >> PAGE_SHIFT;
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+
+	if (kvm_slot_can_be_private(memslot)) {
+		ret = private_memslot_fault(vcpu, fault_ipa, memslot);
+		if (ret != -EAGAIN)
+			goto out;
+	}
+
 	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
 	write_fault = kvm_is_write_fault(vcpu);
 	if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
@@ -1738,6 +1836,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		 * of the page size.
 		 */
 		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
+		fault_ipa &= ~gpa_stolen_mask;
 		ret = io_mem_abort(vcpu, fault_ipa);
 		goto out_unlock;
 	}
@@ -1819,6 +1918,10 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	if (!kvm->arch.mmu.pgt)
 		return false;
 
+	/* We don't support aging for Realms */
+	if (kvm_is_realm(kvm))
+		return true;
+
 	return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
 						   range->start << PAGE_SHIFT,
 						   size, true);
@@ -1831,6 +1934,10 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	if (!kvm->arch.mmu.pgt)
 		return false;
 
+	/* We don't support aging for Realms */
+	if (kvm_is_realm(kvm))
+		return true;
+
 	return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
 						   range->start << PAGE_SHIFT,
 						   size, false);
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 4aab507f896e..72f6f5f542c4 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -606,6 +606,170 @@ static int fold_rtt(struct realm *realm, unsigned long addr, int level)
 	return 0;
 }
 
+int realm_map_protected(struct realm *realm,
+			unsigned long base_ipa,
+			struct page *dst_page,
+			unsigned long map_size,
+			struct kvm_mmu_memory_cache *memcache)
+{
+	phys_addr_t dst_phys = page_to_phys(dst_page);
+	phys_addr_t rd = virt_to_phys(realm->rd);
+	unsigned long phys = dst_phys;
+	unsigned long ipa = base_ipa;
+	unsigned long size;
+	int map_level;
+	int ret = 0;
+
+	if (WARN_ON(!IS_ALIGNED(ipa, map_size)))
+		return -EINVAL;
+
+	switch (map_size) {
+	case PAGE_SIZE:
+		map_level = 3;
+		break;
+	case RME_L2_BLOCK_SIZE:
+		map_level = 2;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (map_level < RME_RTT_MAX_LEVEL) {
+		/*
+		 * A temporary RTT is needed during the map, precreate it,
+		 * however if there is an error (e.g. missing parent tables)
+		 * this will be handled below.
+		 */
+		realm_create_rtt_levels(realm, ipa, map_level,
+					RME_RTT_MAX_LEVEL, memcache);
+	}
+
+	for (size = 0; size < map_size; size += PAGE_SIZE) {
+		if (rmi_granule_delegate(phys)) {
+			struct rtt_entry rtt;
+
+			/*
+			 * It's possible we raced with another VCPU on the same
+			 * fault. If the entry exists and matches then exit
+			 * early and assume the other VCPU will handle the
+			 * mapping.
+			 */
+			if (rmi_rtt_read_entry(rd, ipa, RME_RTT_MAX_LEVEL, &rtt))
+				goto err;
+
+			// FIXME: For a block mapping this could race at level
+			// 2 or 3...
+			if (WARN_ON((rtt.walk_level != RME_RTT_MAX_LEVEL ||
+				     rtt.state != RMI_ASSIGNED ||
+				     rtt.desc != phys))) {
+				goto err;
+			}
+
+			return 0;
+		}
+
+		ret = rmi_data_create_unknown(rd, phys, ipa);
+
+		if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
+			/* Create missing RTTs and retry */
+			int level = RMI_RETURN_INDEX(ret);
+
+			ret = realm_create_rtt_levels(realm, ipa, level,
+						      RME_RTT_MAX_LEVEL,
+						      memcache);
+			WARN_ON(ret);
+			if (ret)
+				goto err_undelegate;
+
+			ret = rmi_data_create_unknown(rd, phys, ipa);
+		}
+		WARN_ON(ret);
+
+		if (ret)
+			goto err_undelegate;
+
+		phys += PAGE_SIZE;
+		ipa += PAGE_SIZE;
+	}
+
+	if (map_size == RME_L2_BLOCK_SIZE)
+		ret = fold_rtt(realm, base_ipa, map_level);
+	if (WARN_ON(ret))
+		goto err;
+
+	return 0;
+
+err_undelegate:
+	if (WARN_ON(rmi_granule_undelegate(phys))) {
+		/* Page can't be returned to NS world so is lost */
+		get_page(phys_to_page(phys));
+	}
+err:
+	while (size > 0) {
+		unsigned long data, top;
+
+		phys -= PAGE_SIZE;
+		size -= PAGE_SIZE;
+		ipa -= PAGE_SIZE;
+
+		WARN_ON(rmi_data_destroy(rd, ipa, &data, &top));
+
+		if (WARN_ON(rmi_granule_undelegate(phys))) {
+			/* Page can't be returned to NS world so is lost */
+			get_page(phys_to_page(phys));
+		}
+	}
+	return -ENXIO;
+}
+
+int realm_map_non_secure(struct realm *realm,
+			 unsigned long ipa,
+			 struct page *page,
+			 unsigned long map_size,
+			 struct kvm_mmu_memory_cache *memcache)
+{
+	phys_addr_t rd = virt_to_phys(realm->rd);
+	int map_level;
+	int ret = 0;
+	unsigned long desc = page_to_phys(page) |
+			     PTE_S2_MEMATTR(MT_S2_FWB_NORMAL) |
+			     /* FIXME: Read+Write permissions for now */
+			     (3 << 6) |
+			     PTE_SHARED;
+
+	if (WARN_ON(!IS_ALIGNED(ipa, map_size)))
+		return -EINVAL;
+
+	switch (map_size) {
+	case PAGE_SIZE:
+		map_level = 3;
+		break;
+	case RME_L2_BLOCK_SIZE:
+		map_level = 2;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ret = rmi_rtt_map_unprotected(rd, ipa, map_level, desc);
+
+	if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
+		/* Create missing RTTs and retry */
+		int level = RMI_RETURN_INDEX(ret);
+
+		ret = realm_create_rtt_levels(realm, ipa, level, map_level,
+					      memcache);
+		if (WARN_ON(ret))
+			return -ENXIO;
+
+		ret = rmi_rtt_map_unprotected(rd, ipa, map_level, desc);
+	}
+	if (WARN_ON(ret))
+		return -ENXIO;
+
+	return 0;
+}
+
 static int populate_par_region(struct kvm *kvm,
 			       phys_addr_t ipa_base,
 			       phys_addr_t ipa_end,
@@ -617,7 +781,6 @@ static int populate_par_region(struct kvm *kvm,
 	int idx;
 	phys_addr_t ipa;
 	int ret = 0;
-	struct page *tmp_page;
 	unsigned long data_flags = 0;
 
 	base_gfn = gpa_to_gfn(ipa_base);
@@ -639,9 +802,8 @@ static int populate_par_region(struct kvm *kvm,
 		goto out;
 	}
 
-	tmp_page = alloc_page(GFP_KERNEL);
-	if (!tmp_page) {
-		ret = -ENOMEM;
+	if (!kvm_slot_can_be_private(memslot)) {
+		ret = -EINVAL;
 		goto out;
 	}
 
@@ -714,31 +876,36 @@ static int populate_par_region(struct kvm *kvm,
 		for (offset = 0; offset < map_size && !ret;
 		     offset += PAGE_SIZE, page++) {
 			phys_addr_t page_ipa = ipa + offset;
+			kvm_pfn_t priv_pfn;
+			int order;
 
-			ret = realm_create_protected_data_page(realm, page_ipa,
-							       page, tmp_page,
-							       data_flags);
+			ret = kvm_gmem_get_pfn(kvm, memslot,
+					       page_ipa >> PAGE_SHIFT,
+					       &priv_pfn, &order);
+			if (ret)
+				break;
+
+			ret = realm_create_protected_data_page(
+					realm, page_ipa,
+					pfn_to_page(priv_pfn),
+					page, data_flags);
 		}
+
+		kvm_release_pfn_clean(pfn);
+
 		if (ret)
-			goto err_release_pfn;
+			break;
 
 		if (level == 2) {
 			ret = fold_rtt(realm, ipa, level);
 			if (ret)
-				goto err_release_pfn;
+				break;
 		}
 
 		ipa += map_size;
-		kvm_release_pfn_dirty(pfn);
-err_release_pfn:
-		if (ret) {
-			kvm_release_pfn_clean(pfn);
-			break;
-		}
 	}
 
 	mmap_read_unlock(current->mm);
-	__free_page(tmp_page);
 
 out:
 	srcu_read_unlock(&kvm->srcu, idx);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 22/43] KVM: arm64: Handle realm VCPU load
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (20 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 21/43] arm64: RME: Runtime faulting of memory Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 23/43] KVM: arm64: Validate register access for a Realm VM Steven Price
                     ` (20 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

When loading a realm VCPU much of the work is handled by the RMM so only
some of the actions are required. Rearrange kvm_arch_vcpu_load()
slightly so we can bail out early for a realm guest.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/arm.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e92afb3835f6..4fd58a1d3351 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -512,10 +512,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	kvm_vgic_load(vcpu);
 	kvm_timer_vcpu_load(vcpu);
-	if (has_vhe())
-		kvm_vcpu_load_vhe(vcpu);
-	kvm_arch_vcpu_load_fp(vcpu);
-	kvm_vcpu_pmu_restore_guest(vcpu);
 	if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
 		kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
 
@@ -524,6 +520,15 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	else
 		vcpu_set_wfx_traps(vcpu);
 
+	/* No additional state needs to be loaded on Realmed VMs */
+	if (vcpu_is_rec(vcpu))
+		return;
+
+	if (has_vhe())
+		kvm_vcpu_load_vhe(vcpu);
+	kvm_arch_vcpu_load_fp(vcpu);
+	kvm_vcpu_pmu_restore_guest(vcpu);
+
 	if (vcpu_has_ptrauth(vcpu))
 		vcpu_ptrauth_disable(vcpu);
 	kvm_arch_vcpu_load_debug_state_flags(vcpu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 23/43] KVM: arm64: Validate register access for a Realm VM
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (21 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 22/43] KVM: arm64: Handle realm VCPU load Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 24/43] KVM: arm64: Handle Realm PSCI requests Steven Price
                     ` (19 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

The RMM only allows setting the lower GPRS (x0-x7) and PC for a realm
guest. Check this in kvm_arm_set_reg() so that the VMM can receive a
suitable error return if other registers are accessed.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/guest.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index e2f762d959bb..a035689dc39b 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -782,12 +782,38 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	return kvm_arm_sys_reg_get_reg(vcpu, reg);
 }
 
+/*
+ * The RMI ABI only enables setting the lower GPRs (x0-x7) and PC.
+ * All other registers are reset to architectural or otherwise defined reset
+ * values by the RMM, except for a few configuration fields that correspond to
+ * Realm parameters.
+ */
+static bool validate_realm_set_reg(struct kvm_vcpu *vcpu,
+				   const struct kvm_one_reg *reg)
+{
+	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE) {
+		u64 off = core_reg_offset_from_id(reg->id);
+
+		switch (off) {
+		case KVM_REG_ARM_CORE_REG(regs.regs[0]) ...
+		     KVM_REG_ARM_CORE_REG(regs.regs[7]):
+		case KVM_REG_ARM_CORE_REG(regs.pc):
+			return true;
+		}
+	}
+
+	return false;
+}
+
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 {
 	/* We currently use nothing arch-specific in upper 32 bits */
 	if ((reg->id & ~KVM_REG_SIZE_MASK) >> 32 != KVM_REG_ARM64 >> 32)
 		return -EINVAL;
 
+	if (kvm_is_realm(vcpu->kvm) && !validate_realm_set_reg(vcpu, reg))
+		return -EINVAL;
+
 	switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
 	case KVM_REG_ARM_CORE:	return set_core_reg(vcpu, reg);
 	case KVM_REG_ARM_FW:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 24/43] KVM: arm64: Handle Realm PSCI requests
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (22 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 23/43] KVM: arm64: Validate register access for a Realm VM Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 25/43] KVM: arm64: WARN on injected undef exceptions Steven Price
                     ` (18 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

The RMM needs to be informed of the target REC when a PSCI call is made
with an MPIDR argument. Expose an ioctl to the userspace in case the PSCI
is handled by it.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_rme.h |  3 +++
 arch/arm64/kvm/arm.c             | 25 +++++++++++++++++++++++++
 arch/arm64/kvm/psci.c            | 29 +++++++++++++++++++++++++++++
 arch/arm64/kvm/rme.c             | 15 +++++++++++++++
 4 files changed, 72 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
index 48c7766fadeb..fef04706ff9d 100644
--- a/arch/arm64/include/asm/kvm_rme.h
+++ b/arch/arm64/include/asm/kvm_rme.h
@@ -116,6 +116,9 @@ int realm_map_non_secure(struct realm *realm,
 int realm_set_ipa_state(struct kvm_vcpu *vcpu,
 			unsigned long addr, unsigned long end,
 			unsigned long ripas);
+int realm_psci_complete(struct kvm_vcpu *calling,
+			struct kvm_vcpu *target,
+			unsigned long status);
 
 #define RME_RTT_BLOCK_LEVEL	2
 #define RME_RTT_MAX_LEVEL	3
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 4fd58a1d3351..a0dcae0391d0 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1605,6 +1605,22 @@ static int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 	return __kvm_arm_vcpu_set_events(vcpu, events);
 }
 
+static int kvm_arm_vcpu_rmm_psci_complete(struct kvm_vcpu *vcpu,
+					  struct kvm_arm_rmm_psci_complete *arg)
+{
+	struct kvm_vcpu *target = kvm_mpidr_to_vcpu(vcpu->kvm, arg->target_mpidr);
+
+	if (!target)
+		return -EINVAL;
+
+	/*
+	 * RMM v1.0 only supports PSCI_RET_SUCCESS or PSCI_RET_DENIED
+	 * for the status. But, let us leave it to the RMM to filter
+	 * for making this future proof.
+	 */
+	return realm_psci_complete(vcpu, target, arg->psci_status);
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
@@ -1727,6 +1743,15 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
 		return kvm_arm_vcpu_finalize(vcpu, what);
 	}
+	case KVM_ARM_VCPU_RMM_PSCI_COMPLETE: {
+		struct kvm_arm_rmm_psci_complete req;
+
+		if (!kvm_is_realm(vcpu->kvm))
+			return -EINVAL;
+		if (copy_from_user(&req, argp, sizeof(req)))
+			return -EFAULT;
+		return kvm_arm_vcpu_rmm_psci_complete(vcpu, &req);
+	}
 	default:
 		r = -EINVAL;
 	}
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 1f69b667332b..f9abab5d50d7 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -103,6 +103,12 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 
 	reset_state->reset = true;
 	kvm_make_request(KVM_REQ_VCPU_RESET, vcpu);
+	/*
+	 * Make sure we issue PSCI_COMPLETE before the VCPU can be
+	 * scheduled.
+	 */
+	if (vcpu_is_rec(vcpu))
+		realm_psci_complete(source_vcpu, vcpu, PSCI_RET_SUCCESS);
 
 	/*
 	 * Make sure the reset request is observed if the RUNNABLE mp_state is
@@ -115,6 +121,10 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 
 out_unlock:
 	spin_unlock(&vcpu->arch.mp_state_lock);
+	if (vcpu_is_rec(vcpu) && ret != PSCI_RET_SUCCESS)
+		realm_psci_complete(source_vcpu, vcpu,
+				    ret == PSCI_RET_ALREADY_ON ?
+				    PSCI_RET_SUCCESS : PSCI_RET_DENIED);
 	return ret;
 }
 
@@ -142,6 +152,25 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
 	/* Ignore other bits of target affinity */
 	target_affinity &= target_affinity_mask;
 
+	if (vcpu_is_rec(vcpu)) {
+		struct kvm_vcpu *target_vcpu;
+
+		/* RMM supports only zero affinity level */
+		if (lowest_affinity_level != 0)
+			return PSCI_RET_INVALID_PARAMS;
+
+		target_vcpu = kvm_mpidr_to_vcpu(kvm, target_affinity);
+		if (!target_vcpu)
+			return PSCI_RET_INVALID_PARAMS;
+
+		/*
+		 * Provide the references of running and target RECs to the RMM
+		 * so that the RMM can complete the PSCI request.
+		 */
+		realm_psci_complete(vcpu, target_vcpu, PSCI_RET_SUCCESS);
+		return PSCI_RET_SUCCESS;
+	}
+
 	/*
 	 * If one or more VCPU matching target affinity are running
 	 * then ON else OFF
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 72f6f5f542c4..9544653accb6 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -101,6 +101,21 @@ static void free_delegated_page(struct realm *realm, phys_addr_t phys)
 	free_page((unsigned long)phys_to_virt(phys));
 }
 
+int realm_psci_complete(struct kvm_vcpu *calling, struct kvm_vcpu *target,
+			unsigned long status)
+{
+	int ret;
+
+	ret = rmi_psci_complete(virt_to_phys(calling->arch.rec.rec_page),
+				virt_to_phys(target->arch.rec.rec_page),
+				status);
+
+	if (ret)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int realm_rtt_create(struct realm *realm,
 			    unsigned long addr,
 			    int level,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 25/43] KVM: arm64: WARN on injected undef exceptions
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (23 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 24/43] KVM: arm64: Handle Realm PSCI requests Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 26/43] arm64: Don't expose stolen time for realm guests Steven Price
                     ` (17 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

The RMM doesn't allow injection of a undefined exception into a realm
guest. Add a WARN to catch if this ever happens.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/inject_fault.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index a640e839848e..44ce1c9bdc2e 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -224,6 +224,8 @@ void kvm_inject_size_fault(struct kvm_vcpu *vcpu)
  */
 void kvm_inject_undefined(struct kvm_vcpu *vcpu)
 {
+	if (vcpu_is_rec(vcpu))
+		WARN(1, "Cannot inject undefined exception into REC. Continuing with unknown behaviour");
 	if (vcpu_el1_is_32bit(vcpu))
 		inject_undef32(vcpu);
 	else
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 26/43] arm64: Don't expose stolen time for realm guests
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (24 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 25/43] KVM: arm64: WARN on injected undef exceptions Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 27/43] arm64: rme: allow userspace to inject aborts Steven Price
                     ` (16 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

It doesn't make much sense and with the ABI as it is it's a footgun for
the VMM which makes fatal granule protection faults easy to trigger.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/arm.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a0dcae0391d0..f2279ab45add 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -339,7 +339,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = system_supports_mte();
 		break;
 	case KVM_CAP_STEAL_TIME:
-		r = kvm_arm_pvtime_supported();
+		if (kvm && kvm_is_realm(kvm))
+			r = 0;
+		else
+			r = kvm_arm_pvtime_supported();
 		break;
 	case KVM_CAP_ARM_EL1_32BIT:
 		r = cpus_have_final_cap(ARM64_HAS_32BIT_EL1);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 27/43] arm64: rme: allow userspace to inject aborts
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (25 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 26/43] arm64: Don't expose stolen time for realm guests Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 28/43] arm64: rme: support RSI_HOST_CALL Steven Price
                     ` (15 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Joey Gouly, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Steven Price

From: Joey Gouly <joey.gouly@arm.com>

Extend KVM_SET_VCPU_EVENTS to support realms, where KVM cannot set the
system registers, and the RMM must perform it on next REC entry.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 Documentation/virt/kvm/api.rst |  2 ++
 arch/arm64/kvm/guest.c         | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index b4bd3d0928a2..0b2386ee4f15 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1278,6 +1278,8 @@ User space may need to inject several types of events to the guest.
 Set the pending SError exception state for this VCPU. It is not possible to
 'cancel' an Serror that has been made pending.
 
+User space cannot inject SErrors into Realms.
+
 If the guest performed an access to I/O memory which could not be handled by
 userspace, for example because of missing instruction syndrome decode
 information or because there is no device mapped at the accessed IPA, then
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index a035689dc39b..5223a828a344 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -865,6 +865,30 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 	bool has_esr = events->exception.serror_has_esr;
 	bool ext_dabt_pending = events->exception.ext_dabt_pending;
 
+	if (vcpu_is_rec(vcpu)) {
+		/* Cannot inject SError into a Realm. */
+		if (serror_pending)
+			return -EINVAL;
+
+		/*
+		 * If a data abort is pending, set the flag and let the RMM
+		 * inject an SEA when the REC is scheduled to be run.
+		 */
+		if (ext_dabt_pending) {
+			/*
+			 * Can only inject SEA into a Realm if the previous exit
+			 * was due to a data abort of an Unprotected IPA.
+			 */
+			if (!(vcpu->arch.rec.run->entry.flags & RMI_EMULATED_MMIO))
+				return -EINVAL;
+
+			vcpu->arch.rec.run->entry.flags &= ~RMI_EMULATED_MMIO;
+			vcpu->arch.rec.run->entry.flags |= RMI_INJECT_SEA;
+		}
+
+		return 0;
+	}
+
 	if (serror_pending && has_esr) {
 		if (!cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
 			return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 28/43] arm64: rme: support RSI_HOST_CALL
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (26 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 27/43] arm64: rme: allow userspace to inject aborts Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 29/43] arm64: rme: Allow checking SVE on VM instance Steven Price
                     ` (14 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Joey Gouly, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Steven Price

From: Joey Gouly <joey.gouly@arm.com>

Forward RSI_HOST_CALLS to KVM's HVC handler.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/rme-exit.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/arch/arm64/kvm/rme-exit.c b/arch/arm64/kvm/rme-exit.c
index 2ddaec09cc98..b442e9a00d07 100644
--- a/arch/arm64/kvm/rme-exit.c
+++ b/arch/arm64/kvm/rme-exit.c
@@ -116,6 +116,29 @@ static int rec_exit_ripas_change(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static int rec_exit_host_call(struct kvm_vcpu *vcpu)
+{
+	int ret, i;
+	struct realm_rec *rec = &vcpu->arch.rec;
+
+	vcpu->stat.hvc_exit_stat++;
+
+	for (i = 0; i < REC_RUN_GPRS; i++)
+		vcpu_set_reg(vcpu, i, rec->run->exit.gprs[i]);
+
+	ret = kvm_smccc_call_handler(vcpu);
+
+	if (ret < 0) {
+		vcpu_set_reg(vcpu, 0, ~0UL);
+		ret = 1;
+	}
+
+	for (i = 0; i < REC_RUN_GPRS; i++)
+		rec->run->entry.gprs[i] = vcpu_get_reg(vcpu, i);
+
+	return ret;
+}
+
 static void update_arch_timer_irq_lines(struct kvm_vcpu *vcpu)
 {
 	struct realm_rec *rec = &vcpu->arch.rec;
@@ -177,6 +200,8 @@ int handle_rme_exit(struct kvm_vcpu *vcpu, int rec_run_ret)
 		return rec_exit_psci(vcpu);
 	case RMI_EXIT_RIPAS_CHANGE:
 		return rec_exit_ripas_change(vcpu);
+	case RMI_EXIT_HOST_CALL:
+		return rec_exit_host_call(vcpu);
 	}
 
 	kvm_pr_unimpl("Unsupported exit reason: %u\n",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 29/43] arm64: rme: Allow checking SVE on VM instance
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (27 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 28/43] arm64: rme: support RSI_HOST_CALL Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 30/43] arm64: RME: Always use 4k pages for realms Steven Price
                     ` (13 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Suzuki K Poulose, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Steven Price

From: Suzuki K Poulose <suzuki.poulose@arm.com>

Given we have different types of VMs supported, check the
support for SVE for the given instance of the VM to accurately
report the status.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_rme.h | 2 ++
 arch/arm64/kvm/arm.c             | 5 ++++-
 arch/arm64/kvm/rme.c             | 5 +++++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
index fef04706ff9d..62d1fa82ac92 100644
--- a/arch/arm64/include/asm/kvm_rme.h
+++ b/arch/arm64/include/asm/kvm_rme.h
@@ -89,6 +89,8 @@ struct realm_rec {
 int kvm_init_rme(void);
 u32 kvm_realm_ipa_limit(void);
 
+bool kvm_rme_supports_sve(void);
+
 int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
 int kvm_init_realm_vm(struct kvm *kvm);
 void kvm_destroy_realm(struct kvm *kvm);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f2279ab45add..dcd9089877f3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -363,7 +363,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = get_kvm_ipa_limit();
 		break;
 	case KVM_CAP_ARM_SVE:
-		r = system_supports_sve();
+		if (kvm && kvm_is_realm(kvm))
+			r = kvm_rme_supports_sve();
+		else
+			r = system_supports_sve();
 		break;
 	case KVM_CAP_ARM_PTRAUTH_ADDRESS:
 	case KVM_CAP_ARM_PTRAUTH_GENERIC:
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 9544653accb6..9593e8e35913 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -20,6 +20,11 @@ static bool rme_supports(unsigned long feature)
 	return !!u64_get_bits(rmm_feat_reg0, feature);
 }
 
+bool kvm_rme_supports_sve(void)
+{
+	return rme_supports(RMI_FEATURE_REGISTER_0_SVE_EN);
+}
+
 static int rmi_check_version(void)
 {
 	struct arm_smccc_res res;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 30/43] arm64: RME: Always use 4k pages for realms
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (28 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 29/43] arm64: rme: Allow checking SVE on VM instance Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 31/43] arm64: rme: Prevent Device mappings for Realms Steven Price
                     ` (12 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Always split up huge pages to avoid problems managing huge pages. There
are two issues currently:

1. The uABI for the VMM allows populating memory on 4k boundaries even
   if the underlying allocator (e.g. hugetlbfs) is using a larger page
   size. Using a memfd for private allocations will push this issue onto
   the VMM as it will need to respect the granularity of the allocator.

2. The guest is able to request arbitrary ranges to be remapped as
   shared. Again with a memfd approach it will be up to the VMM to deal
   with the complexity and either overmap (need the huge mapping and add
   an additional 'overlapping' shared mapping) or reject the request as
   invalid due to the use of a huge page allocator.

For now just break everything down to 4k pages in the RMM controlled
stage 2.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/mmu.c | 4 ++++
 arch/arm64/kvm/rme.c | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 50a49e4e2020..d891fc3c1a10 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1532,6 +1532,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (logging_active) {
 		force_pte = true;
 		vma_shift = PAGE_SHIFT;
+	} else if (kvm_is_realm(kvm)) {
+		// Force PTE level mappings for realms
+		force_pte = true;
+		vma_shift = PAGE_SHIFT;
 	} else {
 		vma_shift = get_vma_page_shift(vma, hva);
 	}
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 9593e8e35913..ae9fd12c4e7d 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -847,7 +847,9 @@ static int populate_par_region(struct kvm *kvm,
 			break;
 		}
 
-		if (is_vm_hugetlb_page(vma))
+		// FIXME: To avoid the overmapping issue (see below comment)
+		// force the use of 4k pages
+		if (is_vm_hugetlb_page(vma) && 0)
 			vma_shift = huge_page_shift(hstate_vma(vma));
 		else
 			vma_shift = PAGE_SHIFT;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 31/43] arm64: rme: Prevent Device mappings for Realms
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (29 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 30/43] arm64: RME: Always use 4k pages for realms Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 32/43] arm_pmu: Provide a mechanism for disabling the physical IRQ Steven Price
                     ` (11 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Physical device assignment is not yet supported by the RMM, so it
doesn't make much sense to allow device mappings within the realm.
Prevent them when the guest is a realm.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/mmu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d891fc3c1a10..48c957e21c83 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1101,6 +1101,10 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 	if (is_protected_kvm_enabled())
 		return -EPERM;
 
+	/* We don't support mapping special pages into a Realm */
+	if (kvm_is_realm(kvm))
+		return -EINVAL;
+
 	size += offset_in_page(guest_ipa);
 	guest_ipa &= PAGE_MASK;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 32/43] arm_pmu: Provide a mechanism for disabling the physical IRQ
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (30 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 31/43] arm64: rme: Prevent Device mappings for Realms Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-12  8:42   ` [PATCH v2 33/43] arm64: rme: Enable PMU support with a realm guest Steven Price
                     ` (10 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Arm CCA assigns the physical PMU device to the guest running in realm
world, however the IRQs are routed via the host. To enter a realm guest
while a PMU IRQ is pending it is necessary to block the physical IRQ to
prevent an immediate exit. Provide a mechanism in the PMU driver for KVM
to control the physical IRQ.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 drivers/perf/arm_pmu.c       | 15 +++++++++++++++
 include/linux/perf/arm_pmu.h |  1 +
 2 files changed, 16 insertions(+)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 8458fe2cebb4..1e024acc98bb 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -735,6 +735,21 @@ static int arm_perf_teardown_cpu(unsigned int cpu, struct hlist_node *node)
 	return 0;
 }
 
+void arm_pmu_set_phys_irq(bool enable)
+{
+	int cpu = get_cpu();
+	struct arm_pmu *pmu = per_cpu(cpu_armpmu, cpu);
+	int irq;
+
+	irq = armpmu_get_cpu_irq(pmu, cpu);
+	if (irq && !enable)
+		per_cpu(cpu_irq_ops, cpu)->disable_pmuirq(irq);
+	else if (irq && enable)
+		per_cpu(cpu_irq_ops, cpu)->enable_pmuirq(irq);
+
+	put_cpu();
+}
+
 #ifdef CONFIG_CPU_PM
 static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd)
 {
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index b3b34f6670cf..fd5e2e63b7fb 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -168,6 +168,7 @@ void kvm_host_pmu_init(struct arm_pmu *pmu);
 #endif
 
 bool arm_pmu_irq_is_nmi(void);
+void arm_pmu_set_phys_irq(bool enable);
 
 /* Internal functions only for core arm_pmu code */
 struct arm_pmu *armpmu_alloc(void);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 33/43] arm64: rme: Enable PMU support with a realm guest
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (31 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 32/43] arm_pmu: Provide a mechanism for disabling the physical IRQ Steven Price
@ 2024-04-12  8:42   ` Steven Price
  2024-04-13 23:44     ` kernel test robot
  2024-04-12  8:43   ` [PATCH v2 34/43] kvm: rme: Hide KVM_CAP_READONLY_MEM for realm guests Steven Price
                     ` (9 subsequent siblings)
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:42 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Jean-Philippe Brucker

Use the PMU registers from the RmiRecExit structure to identify when an
overflow interrupt is due and inject it into the guest. Also hook up the
configuration option for enabling the PMU within the guest.

When entering a realm guest with a PMU interrupt pending, it is
necessary to disable the physical interrupt. Otherwise when the RMM
restores the PMU state the physical interrupt will trigger causing an
immediate exit back to the host. The guest is expected to acknowledge
the interrupt causing a host exit (to update the GIC state) which gives
the opportunity to re-enable the physical interrupt before the next PMU
event.

Number of PMU counters is configured by the VMM by writing to PMCR.N.

Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/arm.c      | 15 +++++++++++++++
 arch/arm64/kvm/guest.c    |  7 +++++++
 arch/arm64/kvm/pmu-emul.c |  4 +++-
 arch/arm64/kvm/rme.c      |  8 ++++++++
 arch/arm64/kvm/sys_regs.c |  2 +-
 5 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index dcd9089877f3..2aad83053b62 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -15,6 +15,7 @@
 #include <linux/vmalloc.h>
 #include <linux/fs.h>
 #include <linux/mman.h>
+#include <linux/perf/arm_pmu.h>
 #include <linux/sched.h>
 #include <linux/kvm.h>
 #include <linux/kvm_irqfd.h>
@@ -1075,6 +1076,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 	run->exit_reason = KVM_EXIT_UNKNOWN;
 	run->flags = 0;
 	while (ret > 0) {
+		bool pmu_stopped = false;
+
 		/*
 		 * Check conditions before entering the guest
 		 */
@@ -1106,6 +1109,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
 		kvm_pmu_flush_hwstate(vcpu);
 
+		if (vcpu_is_rec(vcpu)) {
+			struct kvm_pmu *pmu = &vcpu->arch.pmu;
+
+			if (pmu->irq_level) {
+				pmu_stopped = true;
+				arm_pmu_set_phys_irq(false);
+			}
+		}
+
 		local_irq_disable();
 
 		kvm_vgic_flush_hwstate(vcpu);
@@ -1208,6 +1220,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
 		preempt_enable();
 
+		if (pmu_stopped)
+			arm_pmu_set_phys_irq(true);
+
 		/*
 		 * The ARMv8 architecture doesn't give the hypervisor
 		 * a mechanism to prevent a guest from dropping to AArch32 EL0
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5223a828a344..d35367cf527d 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -782,6 +782,8 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	return kvm_arm_sys_reg_get_reg(vcpu, reg);
 }
 
+#define KVM_REG_ARM_PMCR_EL0		ARM64_SYS_REG(3, 3, 9, 12, 0)
+
 /*
  * The RMI ABI only enables setting the lower GPRs (x0-x7) and PC.
  * All other registers are reset to architectural or otherwise defined reset
@@ -800,6 +802,11 @@ static bool validate_realm_set_reg(struct kvm_vcpu *vcpu,
 		case KVM_REG_ARM_CORE_REG(regs.pc):
 			return true;
 		}
+	} else {
+		switch (reg->id) {
+		case KVM_REG_ARM_PMCR_EL0:
+			return true;
+		}
 	}
 
 	return false;
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index a35ce10e0a9f..ce7c8e55d904 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -341,7 +341,9 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
 {
 	u64 reg = 0;
 
-	if ((kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E)) {
+	if (vcpu_is_rec(vcpu)) {
+		reg = vcpu->arch.rec.run->exit.pmu_ovf_status;
+	} else if ((kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E)) {
 		reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0);
 		reg &= __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
 		reg &= __vcpu_sys_reg(vcpu, PMINTENSET_EL1);
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index ae9fd12c4e7d..e60a1196a2fe 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -314,6 +314,11 @@ static int realm_create_rd(struct kvm *kvm)
 	params->rtt_base = kvm->arch.mmu.pgd_phys;
 	params->vmid = realm->vmid;
 
+	if (kvm->arch.arm_pmu) {
+		params->pmu_num_ctrs = kvm->arch.pmcr_n;
+		params->flags |= RMI_REALM_PARAM_FLAG_PMU;
+	}
+
 	params_phys = virt_to_phys(params);
 
 	if (rmi_realm_create(rd_phys, params_phys)) {
@@ -1366,6 +1371,9 @@ int kvm_create_rec(struct kvm_vcpu *vcpu)
 	if (!vcpu_has_feature(vcpu, KVM_ARM_VCPU_PSCI_0_2))
 		return -EINVAL;
 
+	if (vcpu->kvm->arch.arm_pmu && !kvm_vcpu_has_pmu(vcpu))
+		return -EINVAL;
+
 	BUILD_BUG_ON(sizeof(*params) > PAGE_SIZE);
 	BUILD_BUG_ON(sizeof(*rec->run) > PAGE_SIZE);
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index c9f4f387155f..60452c6519a4 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1279,7 +1279,7 @@ static int set_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
 	 * implements. Ignore this error to maintain compatibility
 	 * with the existing KVM behavior.
 	 */
-	if (!kvm_vm_has_ran_once(kvm) &&
+	if (!kvm_vm_has_ran_once(kvm) && !kvm_realm_is_created(kvm) &&
 	    new_n <= kvm_arm_pmu_get_max_counters(kvm))
 		kvm->arch.pmcr_n = new_n;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 34/43] kvm: rme: Hide KVM_CAP_READONLY_MEM for realm guests
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (32 preceding siblings ...)
  2024-04-12  8:42   ` [PATCH v2 33/43] arm64: rme: Enable PMU support with a realm guest Steven Price
@ 2024-04-12  8:43   ` Steven Price
  2024-04-12  8:43   ` [PATCH v2 35/43] arm64: RME: Propagate number of breakpoints and watchpoints to userspace Steven Price
                     ` (8 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:43 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

For protected memory read only isn't supported. While it may be possible
to support read only for unprotected memory, this isn't supported at the
present time.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2aad83053b62..69d29797c2ed 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -284,7 +284,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ONE_REG:
 	case KVM_CAP_ARM_PSCI:
 	case KVM_CAP_ARM_PSCI_0_2:
-	case KVM_CAP_READONLY_MEM:
 	case KVM_CAP_MP_STATE:
 	case KVM_CAP_IMMEDIATE_EXIT:
 	case KVM_CAP_VCPU_EVENTS:
@@ -298,6 +297,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_COUNTER_OFFSET:
 		r = 1;
 		break;
+	case KVM_CAP_READONLY_MEM:
 	case KVM_CAP_SET_GUEST_DEBUG:
 		r = !kvm_is_realm(kvm);
 		break;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 35/43] arm64: RME: Propagate number of breakpoints and watchpoints to userspace
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (33 preceding siblings ...)
  2024-04-12  8:43   ` [PATCH v2 34/43] kvm: rme: Hide KVM_CAP_READONLY_MEM for realm guests Steven Price
@ 2024-04-12  8:43   ` Steven Price
  2024-04-12  8:43   ` [PATCH v2 36/43] arm64: RME: Set breakpoint parameters through SET_ONE_REG Steven Price
                     ` (7 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:43 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Jean-Philippe Brucker, Catalin Marinas, Marc Zyngier,
	Will Deacon, James Morse, Oliver Upton, Suzuki K Poulose,
	Zenghui Yu, linux-arm-kernel, linux-kernel, Joey Gouly,
	Alexandru Elisei, Christoffer Dall, Fuad Tabba, linux-coco,
	Ganapatrao Kulkarni, Steven Price

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

The RMM describes the maximum number of BPs/WPs available to the guest
in the Feature Register 0. Propagate those numbers into ID_AA64DFR0_EL1,
which is visible to userspace. A VMM needs this information in order to
set up realm parameters.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_rme.h |  1 +
 arch/arm64/kvm/rme.c             | 22 ++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c        |  2 +-
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
index 62d1fa82ac92..44aa95befc14 100644
--- a/arch/arm64/include/asm/kvm_rme.h
+++ b/arch/arm64/include/asm/kvm_rme.h
@@ -88,6 +88,7 @@ struct realm_rec {
 
 int kvm_init_rme(void);
 u32 kvm_realm_ipa_limit(void);
+u64 kvm_realm_reset_id_aa64dfr0_el1(struct kvm_vcpu *vcpu, u64 val);
 
 bool kvm_rme_supports_sve(void);
 
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index e60a1196a2fe..4edc8d98e1e6 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -271,6 +271,28 @@ u32 kvm_realm_ipa_limit(void)
 	return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
 }
 
+u64 kvm_realm_reset_id_aa64dfr0_el1(struct kvm_vcpu *vcpu, u64 val)
+{
+	u32 bps = u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_NUM_BPS);
+	u32 wps = u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_NUM_WPS);
+	u32 ctx_cmps;
+
+	if (!kvm_is_realm(vcpu->kvm))
+		return val;
+
+	/* Ensure CTX_CMPs is still valid */
+	ctx_cmps = FIELD_GET(ID_AA64DFR0_EL1_CTX_CMPs, val) + 1;
+	ctx_cmps = min(bps, ctx_cmps);
+
+	val &= ~(ID_AA64DFR0_EL1_BRPs_MASK | ID_AA64DFR0_EL1_WRPs_MASK |
+		 ID_AA64DFR0_EL1_CTX_CMPs);
+	val |= FIELD_PREP(ID_AA64DFR0_EL1_BRPs_MASK, bps - 1) |
+	       FIELD_PREP(ID_AA64DFR0_EL1_WRPs_MASK, wps - 1) |
+	       FIELD_PREP(ID_AA64DFR0_EL1_CTX_CMPs, ctx_cmps - 1);
+
+	return val;
+}
+
 static int get_start_level(struct realm *realm)
 {
 	return 4 - stage2_pgtable_levels(realm->ia_bits);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 60452c6519a4..cb3dc640a4a5 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1708,7 +1708,7 @@ static u64 read_sanitised_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
 	/* Hide SPE from guests */
 	val &= ~ID_AA64DFR0_EL1_PMSVer_MASK;
 
-	return val;
+	return kvm_realm_reset_id_aa64dfr0_el1(vcpu, val);
 }
 
 static int set_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 36/43] arm64: RME: Set breakpoint parameters through SET_ONE_REG
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (34 preceding siblings ...)
  2024-04-12  8:43   ` [PATCH v2 35/43] arm64: RME: Propagate number of breakpoints and watchpoints to userspace Steven Price
@ 2024-04-12  8:43   ` Steven Price
  2024-04-12  8:43   ` [PATCH v2 37/43] arm64: RME: Initialize PMCR.N with number counter supported by RMM Steven Price
                     ` (6 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:43 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Jean-Philippe Brucker, Catalin Marinas, Marc Zyngier,
	Will Deacon, James Morse, Oliver Upton, Suzuki K Poulose,
	Zenghui Yu, linux-arm-kernel, linux-kernel, Joey Gouly,
	Alexandru Elisei, Christoffer Dall, Fuad Tabba, linux-coco,
	Ganapatrao Kulkarni, Steven Price

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

Allow userspace to configure the number of breakpoints and watchpoints
of a Realm VM through KVM_SET_ONE_REG ID_AA64DFR0_EL1.

The KVM sys_reg handler checks the user value against the maximum value
given by RMM (arm64_check_features() gets it from the
read_sanitised_id_aa64dfr0_el1() reset handler).

Userspace discovers that it can write these fields by issuing a
KVM_ARM_GET_REG_WRITABLE_MASKS ioctl.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/guest.c    |  2 ++
 arch/arm64/kvm/rme.c      |  3 +++
 arch/arm64/kvm/sys_regs.c | 21 ++++++++++++++-------
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index d35367cf527d..f9a47ce71a26 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -783,6 +783,7 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 }
 
 #define KVM_REG_ARM_PMCR_EL0		ARM64_SYS_REG(3, 3, 9, 12, 0)
+#define KVM_REG_ARM_ID_AA64DFR0_EL1	ARM64_SYS_REG(3, 0, 0, 5, 0)
 
 /*
  * The RMI ABI only enables setting the lower GPRs (x0-x7) and PC.
@@ -805,6 +806,7 @@ static bool validate_realm_set_reg(struct kvm_vcpu *vcpu,
 	} else {
 		switch (reg->id) {
 		case KVM_REG_ARM_PMCR_EL0:
+		case KVM_REG_ARM_ID_AA64DFR0_EL1:
 			return true;
 		}
 	}
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 4edc8d98e1e6..51ac8c3462ea 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -305,6 +305,7 @@ static int realm_create_rd(struct kvm *kvm)
 	void *rd = NULL;
 	phys_addr_t rd_phys, params_phys;
 	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
+	u64 dfr0 = IDREG(kvm, SYS_ID_AA64DFR0_EL1);
 	int i, r;
 
 	if (WARN_ON(realm->rd) || WARN_ON(!realm->params))
@@ -335,6 +336,8 @@ static int realm_create_rd(struct kvm *kvm)
 	params->rtt_num_start = pgt->pgd_pages;
 	params->rtt_base = kvm->arch.mmu.pgd_phys;
 	params->vmid = realm->vmid;
+	params->num_bps = SYS_FIELD_GET(ID_AA64DFR0_EL1, BRPs, dfr0) + 1;
+	params->num_wps = SYS_FIELD_GET(ID_AA64DFR0_EL1, WRPs, dfr0) + 1;
 
 	if (kvm->arch.arm_pmu) {
 		params->pmu_num_ctrs = kvm->arch.pmcr_n;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index cb3dc640a4a5..5703b63186fd 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1717,6 +1717,9 @@ static int set_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
 {
 	u8 debugver = SYS_FIELD_GET(ID_AA64DFR0_EL1, DebugVer, val);
 	u8 pmuver = SYS_FIELD_GET(ID_AA64DFR0_EL1, PMUVer, val);
+	u8 bps = SYS_FIELD_GET(ID_AA64DFR0_EL1, BRPs, val);
+	u8 wps = SYS_FIELD_GET(ID_AA64DFR0_EL1, WRPs, val);
+	u8 ctx_cmps = SYS_FIELD_GET(ID_AA64DFR0_EL1, CTX_CMPs, val);
 
 	/*
 	 * Prior to commit 3d0dba5764b9 ("KVM: arm64: PMU: Move the
@@ -1736,10 +1739,11 @@ static int set_id_aa64dfr0_el1(struct kvm_vcpu *vcpu,
 		val &= ~ID_AA64DFR0_EL1_PMUVer_MASK;
 
 	/*
-	 * ID_AA64DFR0_EL1.DebugVer is one of those awkward fields with a
-	 * nonzero minimum safe value.
+	 * ID_AA64DFR0_EL1.DebugVer, BRPs and WRPs all have to be greater than
+	 * zero. CTX_CMPs is never greater than BRPs.
 	 */
-	if (debugver < ID_AA64DFR0_EL1_DebugVer_IMP)
+	if (debugver < ID_AA64DFR0_EL1_DebugVer_IMP || !bps || !wps ||
+	    ctx_cmps > bps)
 		return -EINVAL;
 
 	return set_id_reg(vcpu, rd, val);
@@ -1822,10 +1826,11 @@ static int set_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
 	mutex_lock(&vcpu->kvm->arch.config_lock);
 
 	/*
-	 * Once the VM has started the ID registers are immutable. Reject any
-	 * write that does not match the final register value.
+	 * Once the VM has started or the Realm descriptor is created, the ID
+	 * registers are immutable. Reject any write that does not match the
+	 * final register value.
 	 */
-	if (kvm_vm_has_ran_once(vcpu->kvm)) {
+	if (kvm_vm_has_ran_once(vcpu->kvm) || kvm_realm_is_created(vcpu->kvm)) {
 		if (val != read_id_reg(vcpu, rd))
 			ret = -EBUSY;
 		else
@@ -2307,7 +2312,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	  .set_user = set_id_aa64dfr0_el1,
 	  .reset = read_sanitised_id_aa64dfr0_el1,
 	  .val = ID_AA64DFR0_EL1_PMUVer_MASK |
-		 ID_AA64DFR0_EL1_DebugVer_MASK, },
+		 ID_AA64DFR0_EL1_DebugVer_MASK |
+		 ID_AA64DFR0_EL1_BRPs_MASK |
+		 ID_AA64DFR0_EL1_WRPs_MASK, },
 	ID_SANITISED(ID_AA64DFR1_EL1),
 	ID_UNALLOCATED(5,2),
 	ID_UNALLOCATED(5,3),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 37/43] arm64: RME: Initialize PMCR.N with number counter supported by RMM
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (35 preceding siblings ...)
  2024-04-12  8:43   ` [PATCH v2 36/43] arm64: RME: Set breakpoint parameters through SET_ONE_REG Steven Price
@ 2024-04-12  8:43   ` Steven Price
  2024-04-12  8:43   ` [PATCH v2 38/43] arm64: RME: Propagate max SVE vector length from RMM Steven Price
                     ` (5 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:43 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Jean-Philippe Brucker, Catalin Marinas, Marc Zyngier,
	Will Deacon, James Morse, Oliver Upton, Suzuki K Poulose,
	Zenghui Yu, linux-arm-kernel, linux-kernel, Joey Gouly,
	Alexandru Elisei, Christoffer Dall, Fuad Tabba, linux-coco,
	Ganapatrao Kulkarni, Steven Price

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

Provide an accurate number of available PMU counters to userspace when
setting up a Realm.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_rme.h | 1 +
 arch/arm64/kvm/pmu-emul.c        | 3 +++
 arch/arm64/kvm/rme.c             | 5 +++++
 3 files changed, 9 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
index 44aa95befc14..9c00bcb018f8 100644
--- a/arch/arm64/include/asm/kvm_rme.h
+++ b/arch/arm64/include/asm/kvm_rme.h
@@ -88,6 +88,7 @@ struct realm_rec {
 
 int kvm_init_rme(void);
 u32 kvm_realm_ipa_limit(void);
+u8 kvm_realm_max_pmu_counters(void);
 u64 kvm_realm_reset_id_aa64dfr0_el1(struct kvm_vcpu *vcpu, u64 val);
 
 bool kvm_rme_supports_sve(void);
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index ce7c8e55d904..39790736f1e2 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -912,6 +912,9 @@ u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm)
 {
 	struct arm_pmu *arm_pmu = kvm->arch.arm_pmu;
 
+	if (kvm_is_realm(kvm))
+		return kvm_realm_max_pmu_counters();
+
 	/*
 	 * The arm_pmu->num_events considers the cycle counter as well.
 	 * Ignore that and return only the general-purpose counters.
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 51ac8c3462ea..1bd97e206846 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -271,6 +271,11 @@ u32 kvm_realm_ipa_limit(void)
 	return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
 }
 
+u8 kvm_realm_max_pmu_counters(void)
+{
+	return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_PMU_NUM_CTRS);
+}
+
 u64 kvm_realm_reset_id_aa64dfr0_el1(struct kvm_vcpu *vcpu, u64 val)
 {
 	u32 bps = u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_NUM_BPS);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 38/43] arm64: RME: Propagate max SVE vector length from RMM
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (36 preceding siblings ...)
  2024-04-12  8:43   ` [PATCH v2 37/43] arm64: RME: Initialize PMCR.N with number counter supported by RMM Steven Price
@ 2024-04-12  8:43   ` Steven Price
  2024-04-12  8:43   ` [PATCH v2 39/43] arm64: RME: Configure max SVE vector length for a Realm Steven Price
                     ` (4 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:43 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Jean-Philippe Brucker, Catalin Marinas, Marc Zyngier,
	Will Deacon, James Morse, Oliver Upton, Suzuki K Poulose,
	Zenghui Yu, linux-arm-kernel, linux-kernel, Joey Gouly,
	Alexandru Elisei, Christoffer Dall, Fuad Tabba, linux-coco,
	Ganapatrao Kulkarni, Steven Price

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

RMM provides the maximum vector length it supports for a guest in its
feature register. Make it visible to the rest of KVM and to userspace
via KVM_REG_ARM64_SVE_VLS.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  2 +-
 arch/arm64/include/asm/kvm_rme.h  |  1 +
 arch/arm64/kvm/guest.c            |  2 +-
 arch/arm64/kvm/reset.c            | 12 ++++++++++--
 arch/arm64/kvm/rme.c              |  6 ++++++
 5 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f7ac40ce0caf..902923402f6e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -76,8 +76,8 @@ static inline enum kvm_mode kvm_get_mode(void) { return KVM_MODE_NONE; };
 
 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
 
-extern unsigned int __ro_after_init kvm_sve_max_vl;
 int __init kvm_arm_init_sve(void);
+unsigned int kvm_sve_get_max_vl(struct kvm *kvm);
 
 u32 __attribute_const__ kvm_target_cpu(void);
 void kvm_reset_vcpu(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
index 9c00bcb018f8..be24be001aaa 100644
--- a/arch/arm64/include/asm/kvm_rme.h
+++ b/arch/arm64/include/asm/kvm_rme.h
@@ -89,6 +89,7 @@ struct realm_rec {
 int kvm_init_rme(void);
 u32 kvm_realm_ipa_limit(void);
 u8 kvm_realm_max_pmu_counters(void);
+unsigned int kvm_realm_sve_max_vl(void);
 u64 kvm_realm_reset_id_aa64dfr0_el1(struct kvm_vcpu *vcpu, u64 val);
 
 bool kvm_rme_supports_sve(void);
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index f9a47ce71a26..c62fda66cdc5 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -355,7 +355,7 @@ static int set_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 		if (vq_present(vqs, vq))
 			max_vq = vq;
 
-	if (max_vq > sve_vq_from_vl(kvm_sve_max_vl))
+	if (max_vq > sve_vq_from_vl(kvm_sve_get_max_vl(vcpu->kvm)))
 		return -EINVAL;
 
 	/*
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 6e6eb4a15095..a90b7c2d35bb 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -45,7 +45,7 @@ static u32 __ro_after_init kvm_ipa_limit;
 #define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
 				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)
 
-unsigned int __ro_after_init kvm_sve_max_vl;
+static unsigned int __ro_after_init kvm_sve_max_vl;
 
 int __init kvm_arm_init_sve(void)
 {
@@ -73,9 +73,17 @@ int __init kvm_arm_init_sve(void)
 	return 0;
 }
 
+unsigned int kvm_sve_get_max_vl(struct kvm *kvm)
+{
+	if (kvm_is_realm(kvm))
+		return kvm_realm_sve_max_vl();
+	else
+		return kvm_sve_max_vl;
+}
+
 static void kvm_vcpu_enable_sve(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.sve_max_vl = kvm_sve_max_vl;
+	vcpu->arch.sve_max_vl = kvm_sve_get_max_vl(vcpu->kvm);
 
 	/*
 	 * Userspace can still customize the vector lengths by writing
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 1bd97e206846..cd5b74aac092 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -276,6 +276,12 @@ u8 kvm_realm_max_pmu_counters(void)
 	return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_PMU_NUM_CTRS);
 }
 
+unsigned int kvm_realm_sve_max_vl(void)
+{
+	return sve_vl_from_vq(u64_get_bits(rmm_feat_reg0,
+					   RMI_FEATURE_REGISTER_0_SVE_VL) + 1);
+}
+
 u64 kvm_realm_reset_id_aa64dfr0_el1(struct kvm_vcpu *vcpu, u64 val)
 {
 	u32 bps = u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_NUM_BPS);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 39/43] arm64: RME: Configure max SVE vector length for a Realm
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (37 preceding siblings ...)
  2024-04-12  8:43   ` [PATCH v2 38/43] arm64: RME: Propagate max SVE vector length from RMM Steven Price
@ 2024-04-12  8:43   ` Steven Price
  2024-04-12  8:43   ` [PATCH v2 40/43] arm64: RME: Provide register list for unfinalized RME RECs Steven Price
                     ` (3 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:43 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Jean-Philippe Brucker, Catalin Marinas, Marc Zyngier,
	Will Deacon, James Morse, Oliver Upton, Suzuki K Poulose,
	Zenghui Yu, linux-arm-kernel, linux-kernel, Joey Gouly,
	Alexandru Elisei, Christoffer Dall, Fuad Tabba, linux-coco,
	Ganapatrao Kulkarni, Steven Price

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

Obtain the max vector length configured by userspace on the vCPUs, and
write it into the Realm parameters. By default the vCPU is configured
with the max vector length reported by RMM, and userspace can reduce it
with a write to KVM_REG_ARM64_SVE_VLS.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/guest.c |  3 ++-
 arch/arm64/kvm/rme.c   | 42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index c62fda66cdc5..d72e59e79185 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -341,7 +341,7 @@ static int set_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if (!vcpu_has_sve(vcpu))
 		return -ENOENT;
 
-	if (kvm_arm_vcpu_sve_finalized(vcpu))
+	if (kvm_arm_vcpu_sve_finalized(vcpu) || kvm_realm_is_created(vcpu->kvm))
 		return -EPERM; /* too late! */
 
 	if (WARN_ON(vcpu->arch.sve_state))
@@ -807,6 +807,7 @@ static bool validate_realm_set_reg(struct kvm_vcpu *vcpu,
 		switch (reg->id) {
 		case KVM_REG_ARM_PMCR_EL0:
 		case KVM_REG_ARM_ID_AA64DFR0_EL1:
+		case KVM_REG_ARM64_SVE_VLS:
 			return true;
 		}
 	}
diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index cd5b74aac092..93aab6caddf5 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -309,6 +309,44 @@ static int get_start_level(struct realm *realm)
 	return 4 - stage2_pgtable_levels(realm->ia_bits);
 }
 
+static int realm_init_sve_param(struct kvm *kvm, struct realm_params *params)
+{
+	int ret = 0;
+	unsigned long i;
+	struct kvm_vcpu *vcpu;
+	int max_vl, realm_max_vl = -1;
+
+	/*
+	 * Get the preferred SVE configuration, set by userspace with the
+	 * KVM_ARM_VCPU_SVE feature and KVM_REG_ARM64_SVE_VLS pseudo-register.
+	 */
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		mutex_lock(&vcpu->mutex);
+		if (vcpu_has_sve(vcpu)) {
+			if (!kvm_arm_vcpu_sve_finalized(vcpu))
+				ret = -EINVAL;
+			max_vl = vcpu->arch.sve_max_vl;
+		} else {
+			max_vl = 0;
+		}
+		mutex_unlock(&vcpu->mutex);
+		if (ret)
+			return ret;
+
+		/* We need all vCPUs to have the same SVE config */
+		if (realm_max_vl >= 0 && realm_max_vl != max_vl)
+			return -EINVAL;
+
+		realm_max_vl = max_vl;
+	}
+
+	if (realm_max_vl > 0) {
+		params->sve_vl = sve_vq_from_vl(realm_max_vl) - 1;
+		params->flags |= RMI_REALM_PARAM_FLAG_SVE;
+	}
+	return 0;
+}
+
 static int realm_create_rd(struct kvm *kvm)
 {
 	struct realm *realm = &kvm->arch.realm;
@@ -355,6 +393,10 @@ static int realm_create_rd(struct kvm *kvm)
 		params->flags |= RMI_REALM_PARAM_FLAG_PMU;
 	}
 
+	r = realm_init_sve_param(kvm, params);
+	if (r)
+		goto out_undelegate_tables;
+
 	params_phys = virt_to_phys(params);
 
 	if (rmi_realm_create(rd_phys, params_phys)) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 40/43] arm64: RME: Provide register list for unfinalized RME RECs
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (38 preceding siblings ...)
  2024-04-12  8:43   ` [PATCH v2 39/43] arm64: RME: Configure max SVE vector length for a Realm Steven Price
@ 2024-04-12  8:43   ` Steven Price
  2024-04-12  8:43   ` [PATCH v2 41/43] arm64: RME: Provide accurate register list Steven Price
                     ` (2 subsequent siblings)
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:43 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Jean-Philippe Brucker, Catalin Marinas, Marc Zyngier,
	Will Deacon, James Morse, Oliver Upton, Suzuki K Poulose,
	Zenghui Yu, linux-arm-kernel, linux-kernel, Joey Gouly,
	Alexandru Elisei, Christoffer Dall, Fuad Tabba, linux-coco,
	Ganapatrao Kulkarni, Steven Price

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

KVM_GET_REG_LIST should not be called before SVE is finalized. The ioctl
handler currently returns -EPERM in this case. But because it uses
kvm_arm_vcpu_is_finalized(), it now also rejects the call for
unfinalized REC even though finalizing the REC can only be done late,
after Realm descriptor creation.

Move the check to copy_sve_reg_indices(). One adverse side effect of
this change is that a KVM_GET_REG_LIST call that only probes for the
array size will now succeed even if SVE is not finalized, but that seems
harmless since the following KVM_GET_REG_LIST with the full array will
fail.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/arm.c   | 4 ----
 arch/arm64/kvm/guest.c | 9 +++------
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 69d29797c2ed..2dd014d3c366 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1696,10 +1696,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		if (unlikely(!kvm_vcpu_initialized(vcpu)))
 			break;
 
-		r = -EPERM;
-		if (!kvm_arm_vcpu_is_finalized(vcpu))
-			break;
-
 		r = -EFAULT;
 		if (copy_from_user(&reg_list, user_list, sizeof(reg_list)))
 			break;
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index d72e59e79185..f1fe51775649 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -652,12 +652,9 @@ static unsigned long num_sve_regs(const struct kvm_vcpu *vcpu)
 {
 	const unsigned int slices = vcpu_sve_slices(vcpu);
 
-	if (!vcpu_has_sve(vcpu))
+	if (!vcpu_has_sve(vcpu) || !kvm_arm_vcpu_sve_finalized(vcpu))
 		return 0;
 
-	/* Policed by KVM_GET_REG_LIST: */
-	WARN_ON(!kvm_arm_vcpu_sve_finalized(vcpu));
-
 	return slices * (SVE_NUM_PREGS + SVE_NUM_ZREGS + 1 /* FFR */)
 		+ 1; /* KVM_REG_ARM64_SVE_VLS */
 }
@@ -673,8 +670,8 @@ static int copy_sve_reg_indices(const struct kvm_vcpu *vcpu,
 	if (!vcpu_has_sve(vcpu))
 		return 0;
 
-	/* Policed by KVM_GET_REG_LIST: */
-	WARN_ON(!kvm_arm_vcpu_sve_finalized(vcpu));
+	if (!kvm_arm_vcpu_sve_finalized(vcpu))
+		return -EPERM;
 
 	/*
 	 * Enumerate this first, so that userspace can save/restore in
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 41/43] arm64: RME: Provide accurate register list
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (39 preceding siblings ...)
  2024-04-12  8:43   ` [PATCH v2 40/43] arm64: RME: Provide register list for unfinalized RME RECs Steven Price
@ 2024-04-12  8:43   ` Steven Price
  2024-04-12  8:43   ` [PATCH v2 42/43] arm64: kvm: Expose support for private memory Steven Price
  2024-04-12  8:43   ` [PATCH v2 43/43] KVM: arm64: Allow activating realms Steven Price
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:43 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Jean-Philippe Brucker, Catalin Marinas, Marc Zyngier,
	Will Deacon, James Morse, Oliver Upton, Suzuki K Poulose,
	Zenghui Yu, linux-arm-kernel, linux-kernel, Joey Gouly,
	Alexandru Elisei, Christoffer Dall, Fuad Tabba, linux-coco,
	Ganapatrao Kulkarni, Steven Price

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

Userspace can set a few registers with KVM_SET_ONE_REG (9 GP registers
at runtime, and 3 system registers during initialization). Update the
register list returned by KVM_GET_REG_LIST.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/guest.c      | 40 ++++++++++++++++++-------
 arch/arm64/kvm/hypercalls.c |  4 +--
 arch/arm64/kvm/sys_regs.c   | 58 ++++++++++++++++++++++++++++---------
 3 files changed, 75 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index f1fe51775649..b6d64a03024c 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -73,6 +73,17 @@ static u64 core_reg_offset_from_id(u64 id)
 	return id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_CORE);
 }
 
+static bool kvm_realm_validate_core_reg(u64 off)
+{
+	switch (off) {
+	case KVM_REG_ARM_CORE_REG(regs.regs[0]) ...
+	     KVM_REG_ARM_CORE_REG(regs.regs[7]):
+	case KVM_REG_ARM_CORE_REG(regs.pc):
+		return true;
+	}
+	return false;
+}
+
 static int core_reg_size_from_offset(const struct kvm_vcpu *vcpu, u64 off)
 {
 	int size;
@@ -115,6 +126,9 @@ static int core_reg_size_from_offset(const struct kvm_vcpu *vcpu, u64 off)
 	if (vcpu_has_sve(vcpu) && core_reg_offset_is_vreg(off))
 		return -EINVAL;
 
+	if (kvm_is_realm(vcpu->kvm) && !kvm_realm_validate_core_reg(off))
+		return -EPERM;
+
 	return size;
 }
 
@@ -599,8 +613,6 @@ static const u64 timer_reg_list[] = {
 	KVM_REG_ARM_PTIMER_CVAL,
 };
 
-#define NUM_TIMER_REGS ARRAY_SIZE(timer_reg_list)
-
 static bool is_timer_reg(u64 index)
 {
 	switch (index) {
@@ -615,9 +627,14 @@ static bool is_timer_reg(u64 index)
 	return false;
 }
 
+static unsigned long num_timer_regs(struct kvm_vcpu *vcpu)
+{
+	return kvm_is_realm(vcpu->kvm) ? 0 : ARRAY_SIZE(timer_reg_list);
+}
+
 static int copy_timer_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 {
-	for (int i = 0; i < NUM_TIMER_REGS; i++) {
+	for (int i = 0; i < num_timer_regs(vcpu); i++) {
 		if (put_user(timer_reg_list[i], uindices))
 			return -EFAULT;
 		uindices++;
@@ -655,6 +672,9 @@ static unsigned long num_sve_regs(const struct kvm_vcpu *vcpu)
 	if (!vcpu_has_sve(vcpu) || !kvm_arm_vcpu_sve_finalized(vcpu))
 		return 0;
 
+	if (kvm_is_realm(vcpu->kvm))
+		return 1; /* KVM_REG_ARM64_SVE_VLS */
+
 	return slices * (SVE_NUM_PREGS + SVE_NUM_ZREGS + 1 /* FFR */)
 		+ 1; /* KVM_REG_ARM64_SVE_VLS */
 }
@@ -682,6 +702,9 @@ static int copy_sve_reg_indices(const struct kvm_vcpu *vcpu,
 		return -EFAULT;
 	++num_regs;
 
+	if (kvm_is_realm(vcpu->kvm))
+		return num_regs;
+
 	for (i = 0; i < slices; i++) {
 		for (n = 0; n < SVE_NUM_ZREGS; n++) {
 			reg = KVM_REG_ARM64_SVE_ZREG(n, i);
@@ -720,7 +743,7 @@ unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu)
 	res += num_sve_regs(vcpu);
 	res += kvm_arm_num_sys_reg_descs(vcpu);
 	res += kvm_arm_get_fw_num_regs(vcpu);
-	res += NUM_TIMER_REGS;
+	res += num_timer_regs(vcpu);
 
 	return res;
 }
@@ -754,7 +777,7 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 	ret = copy_timer_indices(vcpu, uindices);
 	if (ret < 0)
 		return ret;
-	uindices += NUM_TIMER_REGS;
+	uindices += num_timer_regs(vcpu);
 
 	return kvm_arm_copy_sys_reg_indices(vcpu, uindices);
 }
@@ -794,12 +817,7 @@ static bool validate_realm_set_reg(struct kvm_vcpu *vcpu,
 	if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE) {
 		u64 off = core_reg_offset_from_id(reg->id);
 
-		switch (off) {
-		case KVM_REG_ARM_CORE_REG(regs.regs[0]) ...
-		     KVM_REG_ARM_CORE_REG(regs.regs[7]):
-		case KVM_REG_ARM_CORE_REG(regs.pc):
-			return true;
-		}
+		return kvm_realm_validate_core_reg(off);
 	} else {
 		switch (reg->id) {
 		case KVM_REG_ARM_PMCR_EL0:
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 5763d979d8ca..28b4166cf234 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -407,14 +407,14 @@ void kvm_arm_teardown_hypercalls(struct kvm *kvm)
 
 int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
 {
-	return ARRAY_SIZE(kvm_arm_fw_reg_ids);
+	return kvm_is_realm(vcpu->kvm) ? 0 : ARRAY_SIZE(kvm_arm_fw_reg_ids);
 }
 
 int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 {
 	int i;
 
-	for (i = 0; i < ARRAY_SIZE(kvm_arm_fw_reg_ids); i++) {
+	for (i = 0; i < kvm_arm_get_fw_num_regs(vcpu); i++) {
 		if (put_user(kvm_arm_fw_reg_ids[i], uindices++))
 			return -EFAULT;
 	}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 5703b63186fd..f1239a8c041b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -3869,18 +3869,18 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
 				    sys_reg_descs, ARRAY_SIZE(sys_reg_descs));
 }
 
-static unsigned int num_demux_regs(void)
+static unsigned int num_demux_regs(struct kvm_vcpu *vcpu)
 {
-	return CSSELR_MAX;
+	return kvm_is_realm(vcpu->kvm) ? 0 : CSSELR_MAX;
 }
 
-static int write_demux_regids(u64 __user *uindices)
+static int write_demux_regids(struct kvm_vcpu *vcpu, u64 __user *uindices)
 {
 	u64 val = KVM_REG_ARM64 | KVM_REG_SIZE_U32 | KVM_REG_ARM_DEMUX;
 	unsigned int i;
 
 	val |= KVM_REG_ARM_DEMUX_ID_CCSIDR;
-	for (i = 0; i < CSSELR_MAX; i++) {
+	for (i = 0; i < num_demux_regs(vcpu); i++) {
 		if (put_user(val | i, uindices))
 			return -EFAULT;
 		uindices++;
@@ -3888,6 +3888,23 @@ static int write_demux_regids(u64 __user *uindices)
 	return 0;
 }
 
+static unsigned int num_invariant_regs(struct kvm_vcpu *vcpu)
+{
+	return kvm_is_realm(vcpu->kvm) ? 0 : ARRAY_SIZE(invariant_sys_regs);
+}
+
+static int write_invariant_regids(struct kvm_vcpu *vcpu, u64 __user *uindices)
+{
+	unsigned int i;
+
+	for (i = 0; i < num_invariant_regs(vcpu); i++) {
+		if (put_user(sys_reg_to_index(&invariant_sys_regs[i]), uindices))
+			return -EFAULT;
+		uindices++;
+	}
+	return 0;
+}
+
 static u64 sys_reg_to_index(const struct sys_reg_desc *reg)
 {
 	return (KVM_REG_ARM64 | KVM_REG_SIZE_U64 |
@@ -3911,11 +3928,27 @@ static bool copy_reg_to_user(const struct sys_reg_desc *reg, u64 __user **uind)
 	return true;
 }
 
+static bool kvm_realm_sys_reg_hidden_user(const struct kvm_vcpu *vcpu, u64 reg)
+{
+	if (!kvm_is_realm(vcpu->kvm))
+		return false;
+
+	switch (reg) {
+	case SYS_ID_AA64DFR0_EL1:
+	case SYS_PMCR_EL0:
+		return false;
+	}
+	return true;
+}
+
 static int walk_one_sys_reg(const struct kvm_vcpu *vcpu,
 			    const struct sys_reg_desc *rd,
 			    u64 __user **uind,
 			    unsigned int *total)
 {
+	if (kvm_realm_sys_reg_hidden_user(vcpu, reg_to_encoding(rd)))
+		return 0;
+
 	/*
 	 * Ignore registers we trap but don't save,
 	 * and for which no custom user accessor is provided.
@@ -3953,29 +3986,26 @@ static int walk_sys_regs(struct kvm_vcpu *vcpu, u64 __user *uind)
 
 unsigned long kvm_arm_num_sys_reg_descs(struct kvm_vcpu *vcpu)
 {
-	return ARRAY_SIZE(invariant_sys_regs)
-		+ num_demux_regs()
+	return num_invariant_regs(vcpu)
+		+ num_demux_regs(vcpu)
 		+ walk_sys_regs(vcpu, (u64 __user *)NULL);
 }
 
 int kvm_arm_copy_sys_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 {
-	unsigned int i;
 	int err;
 
-	/* Then give them all the invariant registers' indices. */
-	for (i = 0; i < ARRAY_SIZE(invariant_sys_regs); i++) {
-		if (put_user(sys_reg_to_index(&invariant_sys_regs[i]), uindices))
-			return -EFAULT;
-		uindices++;
-	}
+	err = write_invariant_regids(vcpu, uindices);
+	if (err)
+		return err;
+	uindices += num_invariant_regs(vcpu);
 
 	err = walk_sys_regs(vcpu, uindices);
 	if (err < 0)
 		return err;
 	uindices += err;
 
-	return write_demux_regids(uindices);
+	return write_demux_regids(vcpu, uindices);
 }
 
 #define KVM_ARM_FEATURE_ID_RANGE_INDEX(r)			\
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 42/43] arm64: kvm: Expose support for private memory
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (40 preceding siblings ...)
  2024-04-12  8:43   ` [PATCH v2 41/43] arm64: RME: Provide accurate register list Steven Price
@ 2024-04-12  8:43   ` Steven Price
  2024-04-25 14:44     ` Fuad Tabba
  2024-04-12  8:43   ` [PATCH v2 43/43] KVM: arm64: Allow activating realms Steven Price
  42 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:43 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Select KVM_GENERIC_PRIVATE_MEM and provide the necessary support
functions.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  4 ++++
 arch/arm64/kvm/Kconfig            |  1 +
 arch/arm64/kvm/arm.c              |  5 +++++
 arch/arm64/kvm/mmu.c              | 19 +++++++++++++++++++
 4 files changed, 29 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 902923402f6e..93de7f5009fe 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1259,6 +1259,10 @@ static inline bool kvm_vm_is_protected(struct kvm *kvm)
 	return false;
 }
 
+#ifdef CONFIG_KVM_PRIVATE_MEM
+bool kvm_arch_has_private_mem(struct kvm *kvm);
+#endif
+
 int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature);
 bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 58f09370d17e..8da57e74c86a 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -37,6 +37,7 @@ menuconfig KVM
 	select HAVE_KVM_VCPU_RUN_PID_CHANGE
 	select SCHED_INFO
 	select GUEST_PERF_EVENTS if PERF_EVENTS
+	select KVM_GENERIC_PRIVATE_MEM
 	help
 	  Support hosting virtualized guest machines.
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2dd014d3c366..a66d0a6eb4fa 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -89,6 +89,11 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
 	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
 }
 
+bool kvm_arch_has_private_mem(struct kvm *kvm)
+{
+	return kvm_is_realm(kvm);
+}
+
 int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			    struct kvm_enable_cap *cap)
 {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 48c957e21c83..808bceebad4d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2171,6 +2171,25 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	return ret;
 }
 
+bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
+					struct kvm_gfn_range *range)
+{
+	WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm));
+	return false;
+}
+
+bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
+					 struct kvm_gfn_range *range)
+{
+	WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm));
+
+	if (range->arg.attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE)
+		range->only_shared = true;
+	kvm_unmap_gfn_range(kvm, range);
+
+	return false;
+}
+
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
 {
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 43/43] KVM: arm64: Allow activating realms
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
                     ` (41 preceding siblings ...)
  2024-04-12  8:43   ` [PATCH v2 42/43] arm64: kvm: Expose support for private memory Steven Price
@ 2024-04-12  8:43   ` Steven Price
  42 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-12  8:43 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Add the ioctl to activate a realm and set the static branch to enable
access to the realm functionality if the RMM is detected.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/rme.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
index 93aab6caddf5..5901d57ca9d0 100644
--- a/arch/arm64/kvm/rme.c
+++ b/arch/arm64/kvm/rme.c
@@ -1167,6 +1167,20 @@ static int kvm_init_ipa_range_realm(struct kvm *kvm,
 	return ret;
 }
 
+static int kvm_activate_realm(struct kvm *kvm)
+{
+	struct realm *realm = &kvm->arch.realm;
+
+	if (kvm_realm_state(kvm) != REALM_STATE_NEW)
+		return -EINVAL;
+
+	if (rmi_realm_activate(virt_to_phys(realm->rd)))
+		return -ENXIO;
+
+	WRITE_ONCE(realm->state, REALM_STATE_ACTIVE);
+	return 0;
+}
+
 /* Protects access to rme_vmid_bitmap */
 static DEFINE_SPINLOCK(rme_vmid_lock);
 static unsigned long *rme_vmid_bitmap;
@@ -1314,6 +1328,9 @@ int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
 		r = kvm_populate_realm(kvm, &args);
 		break;
 	}
+	case KVM_CAP_ARM_RME_ACTIVATE_REALM:
+		r = kvm_activate_realm(kvm);
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -1567,7 +1584,7 @@ int kvm_init_rme(void)
 	if (ret)
 		return ret;
 
-	/* Future patch will enable static branch kvm_rme_is_available */
+	static_branch_enable(&kvm_rme_is_available);
 
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [v2] Support for Arm CCA VMs on Linux
  2024-04-12  8:40 [v2] Support for Arm CCA VMs on Linux Steven Price
                   ` (2 preceding siblings ...)
  2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
@ 2024-04-12 16:52 ` Jean-Philippe Brucker
  3 siblings, 0 replies; 104+ messages in thread
From: Jean-Philippe Brucker @ 2024-04-12 16:52 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Mathieu Poirier, Thomas Fossati, Kevin Zhao,
	Leonardo Augusto Guimarães Garcia

On Fri, Apr 12, 2024 at 09:40:56AM +0100, Steven Price wrote:
> We are happy to announce the second version of the Arm Confidential
> Compute Architecture (CCA) support for the Linux stack. The intention is
> to seek early feedback in the following areas:
>  * KVM integration of the Arm CCA;
>  * KVM UABI for managing the Realms, seeking to generalise the
>    operations where possible with other Confidential Compute solutions;
>  * Linux Guest support for Realms.
> 
> See the previous RFC[1] for a more detailed overview of Arm's CCA
> solution, or visible the Arm CCA Landing page[2].
> 
> This series is based on the final RMM v1.0 (EAC5) specification[3].

Instructions for building and running the CCA stack on QEMU, both as
system emulation and VMM, are available here:
https://linaro.atlassian.net/wiki/spaces/QEMU/pages/29051027459/Building+an+RME+stack+for+QEMU

I'll send out the QEMU VMM patches shortly:
https://git.codelinaro.org/linaro/dcap/qemu.git branch cca/v2

Thanks,
Jean

> [1] Previous RFC
>     https://lore.kernel.org/r/20230127112248.136810-1-suzuki.poulose%40arm.com
> [2] Arm CCA Landing page (See Key Resources section for various documentation)
>     https://www.arm.com/architecture/security-features/arm-confidential-compute-architecture
> [3] RMM v1.0-EAC5 specification
>     https://developer.arm.com/documentation/den0137/1-0eac5/
> [4] Shrinkwrap
>     https://git.gitlab.arm.com/tooling/shrinkwrap
> [5] Linux support for Arm CCA RMM v1.0-EAC5
>     https://lore.kernel.org/r/fb259449-026e-4083-a02b-f8a4ebea1f87%40arm.com


> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 33/43] arm64: rme: Enable PMU support with a realm guest
  2024-04-12  8:42   ` [PATCH v2 33/43] arm64: rme: Enable PMU support with a realm guest Steven Price
@ 2024-04-13 23:44     ` kernel test robot
  2024-04-18 16:06       ` Suzuki K Poulose
  0 siblings, 1 reply; 104+ messages in thread
From: kernel test robot @ 2024-04-13 23:44 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: oe-kbuild-all, Steven Price, Catalin Marinas, Marc Zyngier,
	Will Deacon, James Morse, Oliver Upton, Suzuki K Poulose,
	Zenghui Yu, linux-arm-kernel, linux-kernel, Joey Gouly,
	Alexandru Elisei, Christoffer Dall, Fuad Tabba, linux-coco,
	Ganapatrao Kulkarni, Jean-Philippe Brucker

Hi Steven,

kernel test robot noticed the following build errors:

[auto build test ERROR on kvmarm/next]
[also build test ERROR on kvm/queue arm64/for-next/core linus/master v6.9-rc3 next-20240412]
[cannot apply to kvm/linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Steven-Price/KVM-Prepare-for-handling-only-shared-mappings-in-mmu_notifier-events/20240412-170311
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next
patch link:    https://lore.kernel.org/r/20240412084309.1733783-34-steven.price%40arm.com
patch subject: [PATCH v2 33/43] arm64: rme: Enable PMU support with a realm guest
config: arm64-randconfig-r064-20240414 (https://download.01.org/0day-ci/archive/20240414/202404140723.GKwnJxeZ-lkp@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 8b3b4a92adee40483c27f26c478a384cd69c6f05)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240414/202404140723.GKwnJxeZ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404140723.GKwnJxeZ-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from arch/arm64/kvm/arm.c:9:
   In file included from include/linux/entry-kvm.h:6:
   In file included from include/linux/resume_user_mode.h:8:
   In file included from include/linux/memcontrol.h:21:
   In file included from include/linux/mm.h:2208:
   include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     508 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     509 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:515:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     515 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     516 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     522 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:527:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     527 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     528 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:536:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     536 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     537 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
>> arch/arm64/kvm/arm.c:1115:13: error: no member named 'irq_level' in 'struct kvm_pmu'
    1115 |                         if (pmu->irq_level) {
         |                             ~~~  ^
>> arch/arm64/kvm/arm.c:1117:5: error: call to undeclared function 'arm_pmu_set_phys_irq'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    1117 |                                 arm_pmu_set_phys_irq(false);
         |                                 ^
   arch/arm64/kvm/arm.c:1224:4: error: call to undeclared function 'arm_pmu_set_phys_irq'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    1224 |                         arm_pmu_set_phys_irq(true);
         |                         ^
   5 warnings and 3 errors generated.


vim +1115 arch/arm64/kvm/arm.c

  1044	
  1045	/**
  1046	 * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
  1047	 * @vcpu:	The VCPU pointer
  1048	 *
  1049	 * This function is called through the VCPU_RUN ioctl called from user space. It
  1050	 * will execute VM code in a loop until the time slice for the process is used
  1051	 * or some emulation is needed from user space in which case the function will
  1052	 * return with return value 0 and with the kvm_run structure filled in with the
  1053	 * required data for the requested emulation.
  1054	 */
  1055	int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
  1056	{
  1057		struct kvm_run *run = vcpu->run;
  1058		int ret;
  1059	
  1060		if (run->exit_reason == KVM_EXIT_MMIO) {
  1061			ret = kvm_handle_mmio_return(vcpu);
  1062			if (ret)
  1063				return ret;
  1064		}
  1065	
  1066		vcpu_load(vcpu);
  1067	
  1068		if (run->immediate_exit) {
  1069			ret = -EINTR;
  1070			goto out;
  1071		}
  1072	
  1073		kvm_sigset_activate(vcpu);
  1074	
  1075		ret = 1;
  1076		run->exit_reason = KVM_EXIT_UNKNOWN;
  1077		run->flags = 0;
  1078		while (ret > 0) {
  1079			bool pmu_stopped = false;
  1080	
  1081			/*
  1082			 * Check conditions before entering the guest
  1083			 */
  1084			ret = xfer_to_guest_mode_handle_work(vcpu);
  1085			if (!ret)
  1086				ret = 1;
  1087	
  1088			if (ret > 0)
  1089				ret = check_vcpu_requests(vcpu);
  1090	
  1091			/*
  1092			 * Preparing the interrupts to be injected also
  1093			 * involves poking the GIC, which must be done in a
  1094			 * non-preemptible context.
  1095			 */
  1096			preempt_disable();
  1097	
  1098			/*
  1099			 * The VMID allocator only tracks active VMIDs per
  1100			 * physical CPU, and therefore the VMID allocated may not be
  1101			 * preserved on VMID roll-over if the task was preempted,
  1102			 * making a thread's VMID inactive. So we need to call
  1103			 * kvm_arm_vmid_update() in non-premptible context.
  1104			 */
  1105			if (kvm_arm_vmid_update(&vcpu->arch.hw_mmu->vmid) &&
  1106			    has_vhe())
  1107				__load_stage2(vcpu->arch.hw_mmu,
  1108					      vcpu->arch.hw_mmu->arch);
  1109	
  1110			kvm_pmu_flush_hwstate(vcpu);
  1111	
  1112			if (vcpu_is_rec(vcpu)) {
  1113				struct kvm_pmu *pmu = &vcpu->arch.pmu;
  1114	
> 1115				if (pmu->irq_level) {
  1116					pmu_stopped = true;
> 1117					arm_pmu_set_phys_irq(false);
  1118				}
  1119			}
  1120	
  1121			local_irq_disable();
  1122	
  1123			kvm_vgic_flush_hwstate(vcpu);
  1124	
  1125			kvm_pmu_update_vcpu_events(vcpu);
  1126	
  1127			/*
  1128			 * Ensure we set mode to IN_GUEST_MODE after we disable
  1129			 * interrupts and before the final VCPU requests check.
  1130			 * See the comment in kvm_vcpu_exiting_guest_mode() and
  1131			 * Documentation/virt/kvm/vcpu-requests.rst
  1132			 */
  1133			smp_store_mb(vcpu->mode, IN_GUEST_MODE);
  1134	
  1135			if (ret <= 0 || kvm_vcpu_exit_request(vcpu, &ret)) {
  1136				vcpu->mode = OUTSIDE_GUEST_MODE;
  1137				isb(); /* Ensure work in x_flush_hwstate is committed */
  1138				kvm_pmu_sync_hwstate(vcpu);
  1139				if (static_branch_unlikely(&userspace_irqchip_in_use))
  1140					kvm_timer_sync_user(vcpu);
  1141				kvm_vgic_sync_hwstate(vcpu);
  1142				local_irq_enable();
  1143				preempt_enable();
  1144				continue;
  1145			}
  1146	
  1147			kvm_arm_setup_debug(vcpu);
  1148			kvm_arch_vcpu_ctxflush_fp(vcpu);
  1149	
  1150			/**************************************************************
  1151			 * Enter the guest
  1152			 */
  1153			trace_kvm_entry(*vcpu_pc(vcpu));
  1154			guest_timing_enter_irqoff();
  1155	
  1156			if (vcpu_is_rec(vcpu))
  1157				ret = kvm_rec_enter(vcpu);
  1158			else
  1159				ret = kvm_arm_vcpu_enter_exit(vcpu);
  1160	
  1161			vcpu->mode = OUTSIDE_GUEST_MODE;
  1162			vcpu->stat.exits++;
  1163			/*
  1164			 * Back from guest
  1165			 *************************************************************/
  1166	
  1167			kvm_arm_clear_debug(vcpu);
  1168	
  1169			/*
  1170			 * We must sync the PMU state before the vgic state so
  1171			 * that the vgic can properly sample the updated state of the
  1172			 * interrupt line.
  1173			 */
  1174			kvm_pmu_sync_hwstate(vcpu);
  1175	
  1176			/*
  1177			 * Sync the vgic state before syncing the timer state because
  1178			 * the timer code needs to know if the virtual timer
  1179			 * interrupts are active.
  1180			 */
  1181			kvm_vgic_sync_hwstate(vcpu);
  1182	
  1183			/*
  1184			 * Sync the timer hardware state before enabling interrupts as
  1185			 * we don't want vtimer interrupts to race with syncing the
  1186			 * timer virtual interrupt state.
  1187			 */
  1188			if (static_branch_unlikely(&userspace_irqchip_in_use))
  1189				kvm_timer_sync_user(vcpu);
  1190	
  1191			kvm_arch_vcpu_ctxsync_fp(vcpu);
  1192	
  1193			/*
  1194			 * We must ensure that any pending interrupts are taken before
  1195			 * we exit guest timing so that timer ticks are accounted as
  1196			 * guest time. Transiently unmask interrupts so that any
  1197			 * pending interrupts are taken.
  1198			 *
  1199			 * Per ARM DDI 0487G.b section D1.13.4, an ISB (or other
  1200			 * context synchronization event) is necessary to ensure that
  1201			 * pending interrupts are taken.
  1202			 */
  1203			if (ARM_EXCEPTION_CODE(ret) == ARM_EXCEPTION_IRQ) {
  1204				local_irq_enable();
  1205				isb();
  1206				local_irq_disable();
  1207			}
  1208	
  1209			guest_timing_exit_irqoff();
  1210	
  1211			local_irq_enable();
  1212	
  1213			/* Exit types that need handling before we can be preempted */
  1214			if (!vcpu_is_rec(vcpu)) {
  1215				trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu),
  1216					       *vcpu_pc(vcpu));
  1217	
  1218				handle_exit_early(vcpu, ret);
  1219			}
  1220	
  1221			preempt_enable();
  1222	
  1223			if (pmu_stopped)
  1224				arm_pmu_set_phys_irq(true);
  1225	
  1226			/*
  1227			 * The ARMv8 architecture doesn't give the hypervisor
  1228			 * a mechanism to prevent a guest from dropping to AArch32 EL0
  1229			 * if implemented by the CPU. If we spot the guest in such
  1230			 * state and that we decided it wasn't supposed to do so (like
  1231			 * with the asymmetric AArch32 case), return to userspace with
  1232			 * a fatal error.
  1233			 */
  1234			if (vcpu_mode_is_bad_32bit(vcpu)) {
  1235				/*
  1236				 * As we have caught the guest red-handed, decide that
  1237				 * it isn't fit for purpose anymore by making the vcpu
  1238				 * invalid. The VMM can try and fix it by issuing  a
  1239				 * KVM_ARM_VCPU_INIT if it really wants to.
  1240				 */
  1241				vcpu_clear_flag(vcpu, VCPU_INITIALIZED);
  1242				ret = ARM_EXCEPTION_IL;
  1243			}
  1244	
  1245			if (vcpu_is_rec(vcpu))
  1246				ret = handle_rme_exit(vcpu, ret);
  1247			else
  1248				ret = handle_exit(vcpu, ret);
  1249		}
  1250	
  1251		/* Tell userspace about in-kernel device output levels */
  1252		if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {
  1253			kvm_timer_update_run(vcpu);
  1254			kvm_pmu_update_run(vcpu);
  1255		}
  1256	
  1257		kvm_sigset_deactivate(vcpu);
  1258	
  1259	out:
  1260		/*
  1261		 * In the unlikely event that we are returning to userspace
  1262		 * with pending exceptions or PC adjustment, commit these
  1263		 * adjustments in order to give userspace a consistent view of
  1264		 * the vcpu state. Note that this relies on __kvm_adjust_pc()
  1265		 * being preempt-safe on VHE.
  1266		 */
  1267		if (unlikely(vcpu_get_flag(vcpu, PENDING_EXCEPTION) ||
  1268			     vcpu_get_flag(vcpu, INCREMENT_PC)))
  1269			kvm_call_hyp(__kvm_adjust_pc, vcpu);
  1270	
  1271		vcpu_put(vcpu);
  1272		return ret;
  1273	}
  1274	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 09/14] arm64: Enable memory encrypt for Realms
  2024-04-12  8:42   ` [PATCH v2 09/14] arm64: Enable memory encrypt for Realms Steven Price
@ 2024-04-15  3:13     ` kernel test robot
  2024-04-25 13:42       ` Suzuki K Poulose
  0 siblings, 1 reply; 104+ messages in thread
From: kernel test robot @ 2024-04-15  3:13 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: llvm, oe-kbuild-all, Suzuki K Poulose, Catalin Marinas,
	Marc Zyngier, Will Deacon, James Morse, Oliver Upton, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Steven Price

Hi Steven,

kernel test robot noticed the following build errors:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on kvmarm/next efi/next tip/irq/core linus/master v6.9-rc3 next-20240412]
[cannot apply to arnd-asm-generic/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Steven-Price/arm64-rsi-Add-RSI-definitions/20240412-164852
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
patch link:    https://lore.kernel.org/r/20240412084213.1733764-10-steven.price%40arm.com
patch subject: [PATCH v2 09/14] arm64: Enable memory encrypt for Realms
config: arm64-allyesconfig (https://download.01.org/0day-ci/archive/20240415/202404151003.vkNApJiS-lkp@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 8b3b4a92adee40483c27f26c478a384cd69c6f05)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240415/202404151003.vkNApJiS-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404151003.vkNApJiS-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/hv/hv.c:13:
   In file included from include/linux/mm.h:2208:
   include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     508 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     509 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:515:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     515 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     516 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     522 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:527:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     527 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     528 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:536:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     536 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     537 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
>> drivers/hv/hv.c:132:10: error: call to undeclared function 'set_memory_decrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     132 |                         ret = set_memory_decrypted((unsigned long)hv_cpu->post_msg_page, 1);
         |                               ^
   drivers/hv/hv.c:168:10: error: call to undeclared function 'set_memory_decrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     168 |                         ret = set_memory_decrypted((unsigned long)
         |                               ^
>> drivers/hv/hv.c:218:11: error: call to undeclared function 'set_memory_encrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     218 |                                 ret = set_memory_encrypted((unsigned long)
         |                                       ^
   drivers/hv/hv.c:230:11: error: call to undeclared function 'set_memory_encrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     230 |                                 ret = set_memory_encrypted((unsigned long)
         |                                       ^
   drivers/hv/hv.c:239:11: error: call to undeclared function 'set_memory_encrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     239 |                                 ret = set_memory_encrypted((unsigned long)
         |                                       ^
   5 warnings and 5 errors generated.
--
   In file included from drivers/hv/connection.c:16:
   In file included from include/linux/mm.h:2208:
   include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     508 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     509 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:515:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     515 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     516 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     522 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:527:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     527 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     528 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:536:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     536 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     537 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
>> drivers/hv/connection.c:236:8: error: call to undeclared function 'set_memory_decrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     236 |         ret = set_memory_decrypted((unsigned long)
         |               ^
>> drivers/hv/connection.c:340:2: error: call to undeclared function 'set_memory_encrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     340 |         set_memory_encrypted((unsigned long)vmbus_connection.monitor_pages[0], 1);
         |         ^
   5 warnings and 2 errors generated.
--
   In file included from drivers/hv/channel.c:14:
   In file included from include/linux/mm.h:2208:
   include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     508 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     509 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:515:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     515 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     516 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     522 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:527:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     527 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     528 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:536:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     536 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     537 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
>> drivers/hv/channel.c:442:8: error: call to undeclared function 'set_memory_decrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     442 |         ret = set_memory_decrypted((unsigned long)kbuffer,
         |               ^
>> drivers/hv/channel.c:531:3: error: call to undeclared function 'set_memory_encrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     531 |                 set_memory_encrypted((unsigned long)kbuffer,
         |                 ^
   drivers/hv/channel.c:848:8: error: call to undeclared function 'set_memory_encrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     848 |         ret = set_memory_encrypted((unsigned long)gpadl->buffer,
         |               ^
   5 warnings and 3 errors generated.


vim +/set_memory_decrypted +132 drivers/hv/hv.c

3e7ee4902fe699 drivers/staging/hv/Hv.c Hank Janssen      2009-07-13   96  
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19   97  int hv_synic_alloc(void)
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19   98  {
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18   99  	int cpu, ret = -ENOMEM;
f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  100  	struct hv_per_cpu_context *hv_cpu;
f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  101  
f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  102  	/*
f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  103  	 * First, zero all per-cpu memory areas so hv_synic_free() can
f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  104  	 * detect what memory has been allocated and cleanup properly
f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  105  	 * after any failures.
f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  106  	 */
f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  107  	for_each_present_cpu(cpu) {
f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  108  		hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  109  		memset(hv_cpu, 0, sizeof(*hv_cpu));
f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  110  	}
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  111  
6396bb221514d2 drivers/hv/hv.c         Kees Cook         2018-06-12  112  	hv_context.hv_numa_map = kcalloc(nr_node_ids, sizeof(struct cpumask),
597ff72f3de850 drivers/hv/hv.c         Jia-Ju Bai        2018-03-04  113  					 GFP_KERNEL);
9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  114  	if (hv_context.hv_numa_map == NULL) {
9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  115  		pr_err("Unable to allocate NUMA map\n");
9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  116  		goto err;
9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  117  	}
9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  118  
421b8f20d3c381 drivers/hv/hv.c         Vitaly Kuznetsov  2016-12-07  119  	for_each_present_cpu(cpu) {
f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  120  		hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  121  
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  122  		tasklet_init(&hv_cpu->msg_dpc,
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  123  			     vmbus_on_msg_dpc, (unsigned long) hv_cpu);
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  124  
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  125  		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  126  			hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  127  			if (hv_cpu->post_msg_page == NULL) {
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  128  				pr_err("Unable to allocate post msg page\n");
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  129  				goto err;
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  130  			}
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  131  
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24 @132  			ret = set_memory_decrypted((unsigned long)hv_cpu->post_msg_page, 1);
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  133  			if (ret) {
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  134  				pr_err("Failed to decrypt post msg page: %d\n", ret);
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  135  				/* Just leak the page, as it's unsafe to free the page. */
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  136  				hv_cpu->post_msg_page = NULL;
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  137  				goto err;
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  138  			}
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  139  
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  140  			memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  141  		}
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  142  
faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  143  		/*
faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  144  		 * Synic message and event pages are allocated by paravisor.
faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  145  		 * Skip these pages allocation here.
faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  146  		 */
d3a9d7e49d1531 drivers/hv/hv.c         Dexuan Cui        2023-08-24  147  		if (!ms_hyperv.paravisor_present && !hv_root_partition) {
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  148  			hv_cpu->synic_message_page =
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  149  				(void *)get_zeroed_page(GFP_ATOMIC);
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  150  			if (hv_cpu->synic_message_page == NULL) {
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  151  				pr_err("Unable to allocate SYNIC message page\n");
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  152  				goto err;
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  153  			}
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  154  
faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  155  			hv_cpu->synic_event_page =
faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  156  				(void *)get_zeroed_page(GFP_ATOMIC);
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  157  			if (hv_cpu->synic_event_page == NULL) {
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  158  				pr_err("Unable to allocate SYNIC event page\n");
68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  159  
68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  160  				free_page((unsigned long)hv_cpu->synic_message_page);
68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  161  				hv_cpu->synic_message_page = NULL;
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  162  				goto err;
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  163  			}
faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  164  		}
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  165  
68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  166  		if (!ms_hyperv.paravisor_present &&
e3131f1c81448a drivers/hv/hv.c         Dexuan Cui        2023-08-24  167  		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  168  			ret = set_memory_decrypted((unsigned long)
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  169  				hv_cpu->synic_message_page, 1);
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  170  			if (ret) {
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  171  				pr_err("Failed to decrypt SYNIC msg page: %d\n", ret);
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  172  				hv_cpu->synic_message_page = NULL;
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  173  
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  174  				/*
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  175  				 * Free the event page here so that hv_synic_free()
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  176  				 * won't later try to re-encrypt it.
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  177  				 */
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  178  				free_page((unsigned long)hv_cpu->synic_event_page);
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  179  				hv_cpu->synic_event_page = NULL;
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  180  				goto err;
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  181  			}
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  182  
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  183  			ret = set_memory_decrypted((unsigned long)
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  184  				hv_cpu->synic_event_page, 1);
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  185  			if (ret) {
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  186  				pr_err("Failed to decrypt SYNIC event page: %d\n", ret);
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  187  				hv_cpu->synic_event_page = NULL;
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  188  				goto err;
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  189  			}
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  190  
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  191  			memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  192  			memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  193  		}
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  194  	}
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  195  
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  196  	return 0;
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  197  
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  198  err:
572086325ce9a9 drivers/hv/hv.c         Michael Kelley    2018-08-02  199  	/*
572086325ce9a9 drivers/hv/hv.c         Michael Kelley    2018-08-02  200  	 * Any memory allocations that succeeded will be freed when
572086325ce9a9 drivers/hv/hv.c         Michael Kelley    2018-08-02  201  	 * the caller cleans up by calling hv_synic_free()
572086325ce9a9 drivers/hv/hv.c         Michael Kelley    2018-08-02  202  	 */
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  203  	return ret;
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  204  }
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  205  
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  206  
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  207  void hv_synic_free(void)
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  208  {
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  209  	int cpu, ret;
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  210  
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  211  	for_each_present_cpu(cpu) {
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  212  		struct hv_per_cpu_context *hv_cpu
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  213  			= per_cpu_ptr(hv_context.cpu_context, cpu);
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  214  
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  215  		/* It's better to leak the page if the encryption fails. */
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  216  		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  217  			if (hv_cpu->post_msg_page) {
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24 @218  				ret = set_memory_encrypted((unsigned long)
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  219  					hv_cpu->post_msg_page, 1);
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  220  				if (ret) {
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  221  					pr_err("Failed to encrypt post msg page: %d\n", ret);
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  222  					hv_cpu->post_msg_page = NULL;
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  223  				}
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  224  			}
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  225  		}
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  226  
68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  227  		if (!ms_hyperv.paravisor_present &&
e3131f1c81448a drivers/hv/hv.c         Dexuan Cui        2023-08-24  228  		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  229  			if (hv_cpu->synic_message_page) {
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  230  				ret = set_memory_encrypted((unsigned long)
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  231  					hv_cpu->synic_message_page, 1);
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  232  				if (ret) {
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  233  					pr_err("Failed to encrypt SYNIC msg page: %d\n", ret);
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  234  					hv_cpu->synic_message_page = NULL;
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  235  				}
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  236  			}
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  237  
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  238  			if (hv_cpu->synic_event_page) {
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  239  				ret = set_memory_encrypted((unsigned long)
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  240  					hv_cpu->synic_event_page, 1);
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  241  				if (ret) {
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  242  					pr_err("Failed to encrypt SYNIC event page: %d\n", ret);
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  243  					hv_cpu->synic_event_page = NULL;
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  244  				}
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  245  			}
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  246  		}
193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  247  
23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  248  		free_page((unsigned long)hv_cpu->post_msg_page);
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  249  		free_page((unsigned long)hv_cpu->synic_event_page);
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  250  		free_page((unsigned long)hv_cpu->synic_message_page);
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  251  	}
37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  252  
9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  253  	kfree(hv_context.hv_numa_map);
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  254  }
2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  255  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [v2] Support for Arm CCA VMs on Linux
  2024-04-11 18:54 ` Itaru Kitayama
@ 2024-04-15  8:14   ` Steven Price
  0 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-15  8:14 UTC (permalink / raw)
  To: Itaru Kitayama
  Cc: kvm, kvmarm, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

On 11/04/2024 19:54, Itaru Kitayama wrote:
> Hi Steven,
> 
> On Fri, Apr 12, 2024 at 09:40:56AM +0100, Steven Price wrote:
>> We are happy to announce the second version of the Arm Confidential
>> Compute Architecture (CCA) support for the Linux stack. The intention is
>> to seek early feedback in the following areas:
>>  * KVM integration of the Arm CCA;
>>  * KVM UABI for managing the Realms, seeking to generalise the
>>    operations where possible with other Confidential Compute solutions;
>>  * Linux Guest support for Realms.
>>
>> See the previous RFC[1] for a more detailed overview of Arm's CCA
>> solution, or visible the Arm CCA Landing page[2].
>>
>> This series is based on the final RMM v1.0 (EAC5) specification[3].
> 
> It's great to see the updated "V2" series. Since you said you like
> "early" feedback on V2, does that mean it's likely to be followed by
> V3 and V4, anticipating large code-base changes from the current form
> (V2)? Do you have a rough timeframe to make this Arm CCA support landed
> in mainline? Do you Arm folk expect this is going to be a multiple-year 
> long project? 

I probably should have expanded on that wording a bit, sorry! ;)

I decided to drop the 'RFC' tag as I believe this is now in a state
where it's not got any known bugs. The previous RFC didn't use
guest_memfd and had a known issue where a malicious VMM could bring down
the host kernel - so was obviously not ready for merging. But, of
course, "no known bugs" and ready to merge are somewhat different
milestones.

The support for running in a guest is (I believe) in a good state and I
don't expect to have to iterate much on that before merging - but, as
always, that depends on the feedback received.

The host support I expect to take longer. The key thing here is that
there are other CoCo solutions and we don't want to deviate
unnecessarily from what gets merged for them. Most obviously there is
some overlap between pKVM and Arm's CCA as they both touch the Arm arch
code in similar ways. At the moment we've got a hacked up version of the
kvmtool based on pKVM's branch for testing this, but if you've been
following the threads on pKVM you will be aware that there is a question
over whether the guest_memfd support meets pKVM's needs. So there are
definite questions as to what long term approach works best here. There
is even the possibility that if pKVM can solve the issues using
anonymous memory then it may make sense to also switch Arm's CCA back to
using anonymous memory rather than guest_memfd. Although I expect we'll
want to keep guest_memfd as an option at the very least to match where
x86 is heading.

I'd also expect some minor iteration on the exact form the uAPI takes.
Of particular note is Intel is planing to introduce KVM_MAP_MEMORY[1]
which looks very similar to KVM_CAP_ARM_RME_POPULATE_REALM. It will
probably make sense for us to switch (although KVM_MAP_MEMORY has
restrictions which are unnecessary for Arm CCA - e.g. it's run on a vcpu
for x86 but not for Arm CCA).

In terms of timescales - honestly I don't really know. I certainly hope
this won't be as long as "multi-year"! Although the wider CoCo effort is
certainly going to take multiple years. This series is for "CCA v1.0",
there will be more versions of the RMM specification which will add more
features in the future. Equally there is likely to be a lot of work
needed in guest hardening which is largely generic across all CoCo
solutions.

Steve

[1]
https://lore.kernel.org/r/9a060293c9ad9a78f1d8994cfe1311e818e99257.1712785629.git.isaku.yamahata%40intel.com

> Thanks,
> Itaru.
> 
>>
>> Quick-start guide
>> =================
>>
>> The easiest way of getting started with the stack is by using
>> Shrinkwrap[4]. Currently Shrinkwrap has a configuration for the initial
>> v1.0-EAC5 release[5], so the following overlay needs to be applied to
>> the standard 'cca-3world.yaml' file. Note that the 'rmm' component needs
>> updating to 'main' because there are fixes that are needed and are not
>> yet in a tagged release. The following will create an overlay file and
>> build a working environment:
>>
>> cat<<EOT >cca-v2.yaml
>> build:
>>   linux:
>>     repo:
>>       revision: cca-full/v2
>>   kvmtool:
>>     repo:
>>       kvmtool:
>>         revision: cca/v2
>>   rmm:
>>     repo:
>>       revision: main
>>   kvm-unit-tests:
>>     repo:
>>       revision: cca/v2
>> EOT
>>
>> shrinkwrap build cca-3world.yaml --overlay buildroot.yaml --btvar GUEST_ROOTFS='${artifact:BUILDROOT}' --overlay cca-v2.yaml
>>
>> You will then want to modify the 'guest-disk.img' to include the files
>> necessary for the realm guest (see the documentation in cca-3world.yaml
>> for details of other options):
>>
>>   cd ~/.shrinkwrap/package/cca-3world
>>   /sbin/e2fsck -fp rootfs.ext2 
>>   /sbin/resize2fs rootfs.ext2 256M
>>   mkdir mnt
>>   sudo mount rootfs.ext2 mnt/
>>   sudo mkdir mnt/cca
>>   sudo cp guest-disk.img KVMTOOL_EFI.fd lkvm Image mnt/cca/
>>   sudo umount mnt 
>>   rmdir mnt/
>>
>> Finally you can run the FVP with the host:
>>
>>   shrinkwrap run cca-3world.yaml --rtvar ROOTFS=$HOME/.shrinkwrap/package/cca-3world/rootfs.ext2
>>
>> And once the host kernel has booted, login (user name 'root') and start
>> a realm guest:
>>
>>   cd /cca
>>   ./lkvm run --realm --restricted_mem -c 2 -m 256 -k Image -p earlycon
>>
>> Be patient and you should end up in a realm guest with the host's
>> filesystem mounted via p9.
>>
>> It's also possible to use EFI within the realm guest, again see
>> cca-3world.yaml within Shrinkwrap for more details.
>>
>> An branch of kvm-unit-tests including realm-specific tests is provided
>> here:
>>   https://gitlab.arm.com/linux-arm/kvm-unit-tests-cca/-/tree/cca/v2
>>
>> [1] Previous RFC
>>     https://lore.kernel.org/r/20230127112248.136810-1-suzuki.poulose%40arm.com
>> [2] Arm CCA Landing page (See Key Resources section for various documentation)
>>     https://www.arm.com/architecture/security-features/arm-confidential-compute-architecture
>> [3] RMM v1.0-EAC5 specification
>>     https://developer.arm.com/documentation/den0137/1-0eac5/
>> [4] Shrinkwrap
>>     https://git.gitlab.arm.com/tooling/shrinkwrap
>> [5] Linux support for Arm CCA RMM v1.0-EAC5
>>     https://lore.kernel.org/r/fb259449-026e-4083-a02b-f8a4ebea1f87%40arm.com


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 04/43] arm64: RME: Handle Granule Protection Faults (GPFs)
  2024-04-12  8:42   ` [PATCH v2 04/43] arm64: RME: Handle Granule Protection Faults (GPFs) Steven Price
@ 2024-04-16 11:17     ` Suzuki K Poulose
  2024-04-18 13:17       ` Steven Price
  0 siblings, 1 reply; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-16 11:17 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

On 12/04/2024 09:42, Steven Price wrote:
> If the host attempts to access granules that have been delegated for use
> in a realm these accesses will be caught and will trigger a Granule
> Protection Fault (GPF).
> 
> A fault during a page walk signals a bug in the kernel and is handled by
> oopsing the kernel. A non-page walk fault could be caused by user space
> having access to a page which has been delegated to the kernel and will
> trigger a SIGBUS to allow debugging why user space is trying to access a
> delegated page.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   arch/arm64/mm/fault.c | 29 ++++++++++++++++++++++++-----
>   1 file changed, 24 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 8251e2fea9c7..91da0f446dd9 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -765,6 +765,25 @@ static int do_tag_check_fault(unsigned long far, unsigned long esr,
>   	return 0;
>   }
>   
> +static int do_gpf_ptw(unsigned long far, unsigned long esr, struct pt_regs *regs)
> +{
> +	const struct fault_info *inf = esr_to_fault_info(esr);
> +
> +	die_kernel_fault(inf->name, far, esr, regs);
> +	return 0;
> +}
> +
> +static int do_gpf(unsigned long far, unsigned long esr, struct pt_regs *regs)
> +{
> +	const struct fault_info *inf = esr_to_fault_info(esr);
> +
> +	if (!is_el1_instruction_abort(esr) && fixup_exception(regs))
> +		return 0;
> +
> +	arm64_notify_die(inf->name, regs, inf->sig, inf->code, far, esr);
> +	return 0;
> +}
> +
>   static const struct fault_info fault_info[] = {
>   	{ do_bad,		SIGKILL, SI_KERNEL,	"ttbr address size fault"	},
>   	{ do_bad,		SIGKILL, SI_KERNEL,	"level 1 address size fault"	},
> @@ -802,11 +821,11 @@ static const struct fault_info fault_info[] = {
>   	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
>   	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 34"			},
>   	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 35"			},

Should this also be converted to do_gpf_ptw, "GPF at level -1", given we 
support LPA2 ?


> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 36"			},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 37"			},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 38"			},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 39"			},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 40"			},
> +	{ do_gpf_ptw,		SIGKILL, SI_KERNEL,	"Granule Protection Fault at level 0" },
> +	{ do_gpf_ptw,		SIGKILL, SI_KERNEL,	"Granule Protection Fault at level 1" },
> +	{ do_gpf_ptw,		SIGKILL, SI_KERNEL,	"Granule Protection Fault at level 2" },
> +	{ do_gpf_ptw,		SIGKILL, SI_KERNEL,	"Granule Protection Fault at level 3" },
> +	{ do_gpf,		SIGBUS,  SI_KERNEL,	"Granule Protection Fault not on table walk" },
>   	{ do_bad,		SIGKILL, SI_KERNEL,	"level -1 address size fault"	},
>   	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 42"			},
>   	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level -1 translation fault"	},


Rest looks fine to me.

Suzuki

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 05/43] arm64: RME: Add SMC definitions for calling the RMM
  2024-04-12  8:42   ` [PATCH v2 05/43] arm64: RME: Add SMC definitions for calling the RMM Steven Price
@ 2024-04-16 12:38     ` Suzuki K Poulose
  2024-04-18 13:17       ` Steven Price
  0 siblings, 1 reply; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-16 12:38 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

Hi Steven

On 12/04/2024 09:42, Steven Price wrote:
> The RMM (Realm Management Monitor) provides functionality that can be
> accessed by SMC calls from the host.
> 
> The SMC definitions are based on DEN0137[1] version 1.0-eac5
> 
> [1] https://developer.arm.com/documentation/den0137/1-0eac5/
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   arch/arm64/include/asm/rmi_smc.h | 250 +++++++++++++++++++++++++++++++
>   1 file changed, 250 insertions(+)
>   create mode 100644 arch/arm64/include/asm/rmi_smc.h
> 
> diff --git a/arch/arm64/include/asm/rmi_smc.h b/arch/arm64/include/asm/rmi_smc.h
> new file mode 100644
> index 000000000000..c205efdb18d8
> --- /dev/null
> +++ b/arch/arm64/include/asm/rmi_smc.h
> @@ -0,0 +1,250 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2023 ARM Ltd.
> + *
> + * The values and structures in this file are from the Realm Management Monitor
> + * specification (DEN0137) version A-bet0:
> + * https://developer.arm.com/documentation/den0137/1-0bet0/

This should now point to eac5 instead.

> + */
> +
> +#ifndef __ASM_RME_SMC_H
> +#define __ASM_RME_SMC_H
> +
> +#include <linux/arm-smccc.h>
> +
> +#define SMC_RxI_CALL(func)				\
> +	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,		\
> +			   ARM_SMCCC_SMC_64,		\
> +			   ARM_SMCCC_OWNER_STANDARD,	\
> +			   (func))
> +
> +#define SMC_RMI_DATA_CREATE		SMC_RxI_CALL(0x0153)
> +#define SMC_RMI_DATA_CREATE_UNKNOWN	SMC_RxI_CALL(0x0154)
> +#define SMC_RMI_DATA_DESTROY		SMC_RxI_CALL(0x0155)
> +#define SMC_RMI_FEATURES		SMC_RxI_CALL(0x0165)
> +#define SMC_RMI_GRANULE_DELEGATE	SMC_RxI_CALL(0x0151)
> +#define SMC_RMI_GRANULE_UNDELEGATE	SMC_RxI_CALL(0x0152)
> +#define SMC_RMI_PSCI_COMPLETE		SMC_RxI_CALL(0x0164)
> +#define SMC_RMI_REALM_ACTIVATE		SMC_RxI_CALL(0x0157)
> +#define SMC_RMI_REALM_CREATE		SMC_RxI_CALL(0x0158)
> +#define SMC_RMI_REALM_DESTROY		SMC_RxI_CALL(0x0159)
> +#define SMC_RMI_REC_AUX_COUNT		SMC_RxI_CALL(0x0167)
> +#define SMC_RMI_REC_CREATE		SMC_RxI_CALL(0x015a)
> +#define SMC_RMI_REC_DESTROY		SMC_RxI_CALL(0x015b)
> +#define SMC_RMI_REC_ENTER		SMC_RxI_CALL(0x015c)
> +#define SMC_RMI_RTT_CREATE		SMC_RxI_CALL(0x015d)
> +#define SMC_RMI_RTT_DESTROY		SMC_RxI_CALL(0x015e)
> +#define SMC_RMI_RTT_FOLD		SMC_RxI_CALL(0x0166)
> +#define SMC_RMI_RTT_INIT_RIPAS		SMC_RxI_CALL(0x0168)
> +#define SMC_RMI_RTT_MAP_UNPROTECTED	SMC_RxI_CALL(0x015f)
> +#define SMC_RMI_RTT_READ_ENTRY		SMC_RxI_CALL(0x0161)
> +#define SMC_RMI_RTT_SET_RIPAS		SMC_RxI_CALL(0x0169)
> +#define SMC_RMI_RTT_UNMAP_UNPROTECTED	SMC_RxI_CALL(0x0162)
> +#define SMC_RMI_VERSION			SMC_RxI_CALL(0x0150)
> +
> +#define RMI_ABI_MAJOR_VERSION	1
> +#define RMI_ABI_MINOR_VERSION	0
> +
> +#define RMI_UNASSIGNED			0
> +#define RMI_ASSIGNED			1
> +#define RMI_TABLE			2
> +
> +#define RMI_ABI_VERSION_GET_MAJOR(version) ((version) >> 16)
> +#define RMI_ABI_VERSION_GET_MINOR(version) ((version) & 0xFFFF)
> +#define RMI_ABI_VERSION(major, minor)      (((major) << 16) | (minor))
> +
> +#define RMI_RETURN_STATUS(ret)		((ret) & 0xFF)
> +#define RMI_RETURN_INDEX(ret)		(((ret) >> 8) & 0xFF)
> +
> +#define RMI_SUCCESS		0
> +#define RMI_ERROR_INPUT		1
> +#define RMI_ERROR_REALM		2
> +#define RMI_ERROR_REC		3
> +#define RMI_ERROR_RTT		4
> +
> +#define RMI_EMPTY		0
> +#define RMI_RAM			1
> +#define RMI_DESTROYED		2
> +
> +#define RMI_NO_MEASURE_CONTENT	0
> +#define RMI_MEASURE_CONTENT	1
> +
> +#define RMI_FEATURE_REGISTER_0_S2SZ		GENMASK(7, 0)
> +#define RMI_FEATURE_REGISTER_0_LPA2		BIT(8)
> +#define RMI_FEATURE_REGISTER_0_SVE_EN		BIT(9)
> +#define RMI_FEATURE_REGISTER_0_SVE_VL		GENMASK(13, 10)
> +#define RMI_FEATURE_REGISTER_0_NUM_BPS		GENMASK(17, 14)
> +#define RMI_FEATURE_REGISTER_0_NUM_WPS		GENMASK(21, 18)
> +#define RMI_FEATURE_REGISTER_0_PMU_EN		BIT(22)
> +#define RMI_FEATURE_REGISTER_0_PMU_NUM_CTRS	GENMASK(27, 23)
> +#define RMI_FEATURE_REGISTER_0_HASH_SHA_256	BIT(28)
> +#define RMI_FEATURE_REGISTER_0_HASH_SHA_512	BIT(29)
> +
> +#define RMI_REALM_PARAM_FLAG_LPA2		BIT(0)
> +#define RMI_REALM_PARAM_FLAG_SVE		BIT(1)
> +#define RMI_REALM_PARAM_FLAG_PMU		BIT(2)
> +
> +/*
> + * Note many of these fields are smaller than u64 but all fields have u64
> + * alignment, so use u64 to ensure correct alignment.
> + */
> +struct realm_params {
> +	union { /* 0x0 */
> +		struct {
> +			u64 flags;
> +			u64 s2sz;
> +			u64 sve_vl;
> +			u64 num_bps;
> +			u64 num_wps;
> +			u64 pmu_num_ctrs;
> +			u64 hash_algo;
> +		};
> +		u8 padding_1[0x400];
> +	};
> +	union { /* 0x400 */
> +		u8 rpv[64];
> +		u8 padding_2[0x400];
> +	};
> +	union { /* 0x800 */
> +		struct {
> +			u64 vmid;
> +			u64 rtt_base;
> +			s64 rtt_level_start;
> +			u64 rtt_num_start;
> +		};
> +		u8 padding_3[0x800];
> +	};
> +};
> +
> +/*
> + * The number of GPRs (starting from X0) that are
> + * configured by the host when a REC is created.
> + */
> +#define REC_CREATE_NR_GPRS		8
> +
> +#define REC_PARAMS_FLAG_RUNNABLE	BIT_ULL(0)
> +
> +#define REC_PARAMS_AUX_GRANULES		16
> +
> +struct rec_params {
> +	union { /* 0x0 */
> +		u64 flags;
> +		u8 padding1[0x100];
> +	};
> +	union { /* 0x100 */
> +		u64 mpidr;
> +		u8 padding2[0x100];
> +	};
> +	union { /* 0x200 */
> +		u64 pc;
> +		u8 padding3[0x100];
> +	};
> +	union { /* 0x300 */
> +		u64 gprs[REC_CREATE_NR_GPRS];
> +		u8 padding4[0x500];
> +	};
> +	union { /* 0x800 */
> +		struct {
> +			u64 num_rec_aux;
> +			u64 aux[REC_PARAMS_AUX_GRANULES];
> +		};
> +		u8 padding5[0x800];
> +	};
> +};
> +
> +#define RMI_EMULATED_MMIO		BIT(0)
> +#define RMI_INJECT_SEA			BIT(1)
> +#define RMI_TRAP_WFI			BIT(2)
> +#define RMI_TRAP_WFE			BIT(3)

For completeness, we could add :

#define RMI_RIPAS_RESPONSE		BIT(4)

Not sure if we use it later in the series.

> +
> +#define REC_RUN_GPRS			31
> +#define REC_GIC_NUM_LRS			16
> +
> +struct rec_entry {
> +	union { /* 0x000 */
> +		u64 flags;
> +		u8 padding0[0x200];
> +	};
> +	union { /* 0x200 */
> +		u64 gprs[REC_RUN_GPRS];
> +		u8 padding2[0x100];
> +	};
> +	union { /* 0x300 */
> +		struct {
> +			u64 gicv3_hcr;
> +			u64 gicv3_lrs[REC_GIC_NUM_LRS];
> +		};
> +		u8 padding3[0x100];
> +	};
> +	u8 padding4[0x400];
> +};
> +
> +struct rec_exit {
> +	union { /* 0x000 */
> +		u8 exit_reason;
> +		u8 padding0[0x100];
> +	};
> +	union { /* 0x100 */
> +		struct {
> +			u64 esr;
> +			u64 far;
> +			u64 hpfar;
> +		};
> +		u8 padding1[0x100];
> +	};
> +	union { /* 0x200 */
> +		u64 gprs[REC_RUN_GPRS];
> +		u8 padding2[0x100];
> +	};
> +	union { /* 0x300 */
> +		struct {
> +			u64 gicv3_hcr;
> +			u64 gicv3_lrs[REC_GIC_NUM_LRS];
> +			u64 gicv3_misr;
> +			u64 gicv3_vmcr;
> +		};
> +		u8 padding3[0x100];
> +	};
> +	union { /* 0x400 */
> +		struct {
> +			u64 cntp_ctl;
> +			u64 cntp_cval;
> +			u64 cntv_ctl;
> +			u64 cntv_cval;
> +		};
> +		u8 padding4[0x100];
> +	};
> +	union { /* 0x500 */
> +		struct {
> +			u64 ripas_base;
> +			u64 ripas_top;
> +			u64 ripas_value;
> +		};
> +		u8 padding5[0x100];
> +	};
> +	union { /* 0x600 */
> +		u16 imm;
> +		u8 padding6[0x100];
> +	};
> +	union { /* 0x700 */
> +		struct {
> +			u64 pmu_ovf_status;

This is u8 as per section B4.4.10 RmiPmuOverflowStatus type.

> +		};
> +		u8 padding7[0x100];
> +	};
> +};
> +
> +struct rec_run {
> +	struct rec_entry entry;
> +	struct rec_exit exit;
> +};
> +
> +#define RMI_EXIT_SYNC			0x00
> +#define RMI_EXIT_IRQ			0x01
> +#define RMI_EXIT_FIQ			0x02
> +#define RMI_EXIT_PSCI			0x03
> +#define RMI_EXIT_RIPAS_CHANGE		0x04
> +#define RMI_EXIT_HOST_CALL		0x05
> +#define RMI_EXIT_SERROR			0x06

Minor nit: Like the other definitions, it may be good to keep the 
defintions of the "exit_reason" above the field declaration.


Rest looks fine to me.

Suzuki
> +
> +#endif


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 06/43] arm64: RME: Add wrappers for RMI calls
  2024-04-12  8:42   ` [PATCH v2 06/43] arm64: RME: Add wrappers for RMI calls Steven Price
@ 2024-04-16 13:14     ` Suzuki K Poulose
  2024-04-19 11:18       ` Steven Price
  0 siblings, 1 reply; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-16 13:14 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

Hi Steven

On 12/04/2024 09:42, Steven Price wrote:
> The wrappers make the call sites easier to read and deal with the
> boiler plate of handling the error codes from the RMM.
> 

I have compared the parameters and output values to that of the RMM spec
and they match. There are some minor nits below.

> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   arch/arm64/include/asm/rmi_cmds.h | 509 ++++++++++++++++++++++++++++++
>   1 file changed, 509 insertions(+)
>   create mode 100644 arch/arm64/include/asm/rmi_cmds.h
> 
> diff --git a/arch/arm64/include/asm/rmi_cmds.h b/arch/arm64/include/asm/rmi_cmds.h
> new file mode 100644
> index 000000000000..c21414127e8e
> --- /dev/null
> +++ b/arch/arm64/include/asm/rmi_cmds.h
> @@ -0,0 +1,509 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2023 ARM Ltd.
> + */
> +
> +#ifndef __ASM_RMI_CMDS_H
> +#define __ASM_RMI_CMDS_H
> +
> +#include <linux/arm-smccc.h>
> +
> +#include <asm/rmi_smc.h>
> +
> +struct rtt_entry {
> +	unsigned long walk_level;
> +	unsigned long desc;
> +	int state;
> +	int ripas;
> +};
> +

...

> +/**
> + * rmi_data_destroy() - Destroy a Data Granule
> + * @rd: PA of the RD
> + * @ipa: IPA at which the granule is mapped in the guest
> + * @data_out: PA of the granule which was destroyed
> + * @top_out: Top IPA of non-live RTT entries
> + *
> + * Transitions the granule to DESTROYED state, the address cannot be used by
> + * the guest for the lifetime of the Realm.
> + *
> + * Return: RMI return code
> + */
> +static inline int rmi_data_destroy(unsigned long rd, unsigned long ipa,
> +				   unsigned long *data_out,
> +				   unsigned long *top_out)
> +{
> +	struct arm_smccc_res res;
> +
> +	arm_smccc_1_1_invoke(SMC_RMI_DATA_DESTROY, rd, ipa, &res);
> +
> +	*data_out = res.a1;
> +	*top_out = res.a2;

minor nit: Do we need to be safer by checking the parameters before 
filling them in ? i.e.,

	if (ptr)
		*ptr = result_out;

This applies for others calls below.

> +
> +	return res.a0;
> +}

> +
> +/**
> + * rmi_realm_destroy() - Destroy a Realm
> + * @rd: PA of the RD
> + *
> + * Destroys a Realm, all objects belonging to the Realm must be destroyed first.
> + *
> + * Return: RMI return code
> + */
> +static inline int rmi_realm_destroy(unsigned long rd)
> +{
> +	struct arm_smccc_res res;
> +
> +	arm_smccc_1_1_invoke(SMC_RMI_REALM_DESTROY, rd, &res);
> +
> +	return res.a0;
> +}
> +
> +/**
> + * rmi_rec_aux_count() - Get number of auxiliary Granules required
> + * @rd: PA of the RD
> + * @aux_count: Number of pages written to this pointer
> + *
> + * A REC may require extra auxiliary pages to be delegateed for the RMM to

minor nit: "s/delegateed/delegated/"

...

> +/**
> + * rmi_rtt_read_entry() - Read an RTTE
> + * @rd: PA of the RD
> + * @ipa: IPA for which to read the RTTE
> + * @level: RTT level at which to read the RTTE
> + * @rtt: Output structure describing the RTTE
> + *
> + * Reads a RTTE (Realm Translation Table Entry).
> + *
> + * Return: RMI return code
> + */
> +static inline int rmi_rtt_read_entry(unsigned long rd, unsigned long ipa,
> +				     long level, struct rtt_entry *rtt)
> +{
> +	struct arm_smccc_1_2_regs regs = {
> +		SMC_RMI_RTT_READ_ENTRY,
> +		rd, ipa, level
> +	};
> +
> +	arm_smccc_1_2_smc(&regs, &regs);
> +
> +	rtt->walk_level = regs.a1;
> +	rtt->state = regs.a2 & 0xFF;

minor nit: We mask the state, but not the "ripas". Both of them are u8. 
For consistency, we should mask both or neither.

> +	rtt->desc = regs.a3;
> +	rtt->ripas = regs.a4;
> +
> +	return regs.a0;
> +}
> +

...

> +/**
> + * rmi_rtt_get_phys() - Get the PA from a RTTE
> + * @rtt: The RTTE
> + *
> + * Return: the physical address from a RTT entry.
> + */
> +static inline phys_addr_t rmi_rtt_get_phys(struct rtt_entry *rtt)
> +{
> +	return rtt->desc & GENMASK(47, 12);
> +}

I guess this may need to change with the LPA2 support in RMM and must be
used in conjunction with the "realm" object to make the correct
conversion.


Suzuki


> +
> +#endif


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 07/43] arm64: RME: Check for RME support at KVM init
  2024-04-12  8:42   ` [PATCH v2 07/43] arm64: RME: Check for RME support at KVM init Steven Price
@ 2024-04-16 13:30     ` Suzuki K Poulose
  2024-04-22 15:39       ` Steven Price
  0 siblings, 1 reply; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-16 13:30 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

Hi Steven

On 12/04/2024 09:42, Steven Price wrote:
> Query the RMI version number and check if it is a compatible version. A
> static key is also provided to signal that a supported RMM is available.
> 
> Functions are provided to query if a VM or VCPU is a realm (or rec)
> which currently will always return false.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   arch/arm64/include/asm/kvm_emulate.h | 18 +++++++++
>   arch/arm64/include/asm/kvm_host.h    |  4 ++
>   arch/arm64/include/asm/kvm_rme.h     | 56 ++++++++++++++++++++++++++++
>   arch/arm64/include/asm/virt.h        |  1 +
>   arch/arm64/kvm/Makefile              |  3 +-
>   arch/arm64/kvm/arm.c                 |  9 +++++
>   arch/arm64/kvm/rme.c                 | 52 ++++++++++++++++++++++++++
>   7 files changed, 142 insertions(+), 1 deletion(-)
>   create mode 100644 arch/arm64/include/asm/kvm_rme.h
>   create mode 100644 arch/arm64/kvm/rme.c
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 975af30af31f..6f08398537e2 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -611,4 +611,22 @@ static __always_inline void kvm_reset_cptr_el2(struct kvm_vcpu *vcpu)
>   
>   	kvm_write_cptr_el2(val);
>   }
> +
> +static inline bool kvm_is_realm(struct kvm *kvm)
> +{
> +	if (static_branch_unlikely(&kvm_rme_is_available))
> +		return kvm->arch.is_realm;
> +	return false;
> +}
> +
> +static inline enum realm_state kvm_realm_state(struct kvm *kvm)
> +{
> +	return READ_ONCE(kvm->arch.realm.state);
> +}
> +
> +static inline bool vcpu_is_rec(struct kvm_vcpu *vcpu)
> +{
> +	return false;
> +}
> +
>   #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 9e8a496fb284..63b68b85db3f 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -27,6 +27,7 @@
>   #include <asm/fpsimd.h>
>   #include <asm/kvm.h>
>   #include <asm/kvm_asm.h>
> +#include <asm/kvm_rme.h>
>   #include <asm/vncr_mapping.h>
>   
>   #define __KVM_HAVE_ARCH_INTC_INITIALIZED
> @@ -348,6 +349,9 @@ struct kvm_arch {
>   	 * the associated pKVM instance in the hypervisor.
>   	 */
>   	struct kvm_protected_vm pkvm;
> +
> +	bool is_realm;
> +	struct realm realm;
>   };
>   
>   struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
> new file mode 100644
> index 000000000000..922da3f47227
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_rme.h
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2023 ARM Ltd.
> + */
> +
> +#ifndef __ASM_KVM_RME_H
> +#define __ASM_KVM_RME_H
> +
> +/**
> + * enum realm_state - State of a Realm
> + */
> +enum realm_state {
> +	/**
> +	 * @REALM_STATE_NONE:
> +	 *      Realm has not yet been created. rmi_realm_create() may be
> +	 *      called to create the realm.
> +	 */
> +	REALM_STATE_NONE,
> +	/**
> +	 * @REALM_STATE_NEW:
> +	 *      Realm is under construction, not eligible for execution. Pages
> +	 *      may be populated with rmi_data_create().
> +	 */
> +	REALM_STATE_NEW,
> +	/**
> +	 * @REALM_STATE_ACTIVE:
> +	 *      Realm has been created and is eligible for execution with
> +	 *      rmi_rec_enter(). Pages may no longer be populated with
> +	 *      rmi_data_create().
> +	 */
> +	REALM_STATE_ACTIVE,
> +	/**
> +	 * @REALM_STATE_DYING:
> +	 *      Realm is in the process of being destroyed or has already been
> +	 *      destroyed.
> +	 */
> +	REALM_STATE_DYING,
> +	/**
> +	 * @REALM_STATE_DEAD:
> +	 *      Realm has been destroyed.
> +	 */
> +	REALM_STATE_DEAD
> +};
> +
> +/**
> + * struct realm - Additional per VM data for a Realm
> + *
> + * @state: The lifetime state machine for the realm
> + */
> +struct realm {
> +	enum realm_state state;
> +};
> +
> +int kvm_init_rme(void);
> +
> +#endif
> diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
> index 261d6e9df2e1..12cf36c38189 100644
> --- a/arch/arm64/include/asm/virt.h
> +++ b/arch/arm64/include/asm/virt.h
> @@ -81,6 +81,7 @@ void __hyp_reset_vectors(void);
>   bool is_kvm_arm_initialised(void);
>   
>   DECLARE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
> +DECLARE_STATIC_KEY_FALSE(kvm_rme_is_available);
>   
>   /* Reports the availability of HYP mode */
>   static inline bool is_hyp_mode_available(void)
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index c0c050e53157..1c1d8cdf381f 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -20,7 +20,8 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
>   	 vgic/vgic-v3.o vgic/vgic-v4.o \
>   	 vgic/vgic-mmio.o vgic/vgic-mmio-v2.o \
>   	 vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
> -	 vgic/vgic-its.o vgic/vgic-debug.o
> +	 vgic/vgic-its.o vgic/vgic-debug.o \
> +	 rme.o
>   
>   kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o pmu.o
>   
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 3dee5490eea9..2056c660c5ee 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -38,6 +38,7 @@
>   #include <asm/kvm_mmu.h>
>   #include <asm/kvm_nested.h>
>   #include <asm/kvm_pkvm.h>
> +#include <asm/kvm_rme.h>
>   #include <asm/kvm_emulate.h>
>   #include <asm/sections.h>
>   
> @@ -47,6 +48,8 @@
>   
>   static enum kvm_mode kvm_mode = KVM_MODE_DEFAULT;
>   
> +DEFINE_STATIC_KEY_FALSE(kvm_rme_is_available);
> +
>   DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
>   
>   DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> @@ -2562,6 +2565,12 @@ static __init int kvm_arm_init(void)
>   
>   	in_hyp_mode = is_kernel_in_hyp_mode();
>   
> +	if (in_hyp_mode) {
> +		err = kvm_init_rme();
> +		if (err)
> +			return err;
> +	}
> +
>   	if (cpus_have_final_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE) ||
>   	    cpus_have_final_cap(ARM64_WORKAROUND_1508412))
>   		kvm_info("Guests without required CPU erratum workarounds can deadlock system!\n" \
> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
> new file mode 100644
> index 000000000000..3dbbf9d046bf
> --- /dev/null
> +++ b/arch/arm64/kvm/rme.c
> @@ -0,0 +1,52 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2023 ARM Ltd.
> + */
> +
> +#include <linux/kvm_host.h>
> +
> +#include <asm/rmi_cmds.h>
> +#include <asm/virt.h>
> +
> +static int rmi_check_version(void)
> +{
> +	struct arm_smccc_res res;
> +	int version_major, version_minor;
> +	unsigned long host_version = RMI_ABI_VERSION(RMI_ABI_MAJOR_VERSION,
> +						     RMI_ABI_MINOR_VERSION);
> +
> +	arm_smccc_1_1_invoke(SMC_RMI_VERSION, host_version, &res);
> +
> +	if (res.a0 == SMCCC_RET_NOT_SUPPORTED)
> +		return -ENXIO;
> +
> +	version_major = RMI_ABI_VERSION_GET_MAJOR(res.a1);
> +	version_minor = RMI_ABI_VERSION_GET_MINOR(res.a1);
> +

We don't seem to be using the res.a0 to determin if the RMM supports our
requested version. As per RMM spec, section B4.3.23 :

"
The status code and lower revision output values indicate which of the 
following is true, in order of precedence:
  a) The RMM supports an interface revision which is compatible with the
     requested revision.
      • The status code is RMI_SUCCESS.
      • The lower revision is equal to the requested revision.
  b) The RMM does not support an interface revision which is compatible
     with the requested revision The RMM supports an interface revision
     which is incompatible with and less than the requested revision.
      • The status code is RMI_ERROR_INPUT.
      • The lower revision is the highest interface revision which is
        both less than the requested revision and supported by the RMM.

  c) The RMM does not support an interface revision which is compatible
     with the requested revision The RMM supports an interface revision
     which is incompatible with and greater than the requested revision.
      • The status code is RMI_ERROR_INPUT.
      • The lower revision is equal to the higher revision.

So, we could simply check the res.a0 for RMI_SUCCESS and proceed with
marking RMM available.


> +	if (version_major != RMI_ABI_MAJOR_VERSION) {
> +		kvm_err("Unsupported RMI ABI (v%d.%d) host supports v%d.%d\n",
> +			version_major, version_minor,
> +			RMI_ABI_MAJOR_VERSION,
> +			RMI_ABI_MINOR_VERSION);
> +		return -ENXIO;
> +	}
> +
> +	kvm_info("RMI ABI version %d.%d\n", version_major, version_minor);
> +
> +	return 0;
> +}
> +
> +int kvm_init_rme(void)
> +{
> +	if (PAGE_SIZE != SZ_4K)
> +		/* Only 4k page size on the host is supported */
> +		return 0;
> +
> +	if (rmi_check_version())
> +		/* Continue without realm support */
> +		return 0;
> +
> +	/* Future patch will enable static branch kvm_rme_is_available */
> +
> +	return 0;

Do we ever expect this to fail the kvm initialisation ? Otherwise, we
could leave it as a void ?

Suzuki
> +}


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 09/43] arm64: RME: ioctls to create and configure realms
  2024-04-12  8:42   ` [PATCH v2 09/43] arm64: RME: ioctls to create and configure realms Steven Price
@ 2024-04-17  9:51     ` Suzuki K Poulose
  2024-04-22 16:33       ` Steven Price
  2024-04-18 16:04     ` Suzuki K Poulose
  1 sibling, 1 reply; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-17  9:51 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni, Jean-Philippe Brucker

Hi Steven

On 12/04/2024 09:42, Steven Price wrote:
> Add the KVM_CAP_ARM_RME_CREATE_FD ioctl to create a realm. This involves

minor nit: s/FD/RD

> delegating pages to the RMM to hold the Realm Descriptor (RD) and for
> the base level of the Realm Translation Tables (RTT). A VMID also need
> to be picked, since the RMM has a separate VMID address space a
> dedicated allocator is added for this purpose.
> 
> KVM_CAP_ARM_RME_CONFIG_REALM is provided to allow configuring the realm
> before it is created.

It might be helpful to provide a bit more background on the Realm 
parameters. Something like:

Realm parameters for Realm Descriptor creation could be classified as:

  1. Parameters specific to the Realm stage2 (e.g. IPA Size, vmid, stage2
     entry level, entry level RTTs, number of RTTs in start level, LPA2)
     Most of these are not measured by RMM and comes from KVM book
     keeping.

  2. Parameters controlling "Arm Architecture features for the VM". (e.g.
     SVE VL, PMU counters, number of HW BRPs/WPs), configured by the VMM
     using the "user ID register write" mechanism. These will be
     supported in the later patches.

  3. Parameters are not part of the core Arm architecture but defined
     by the RMM spec (e.g. Hash algorithm for measurement,
     Personalisation value). These are programmed via
     KVM_CAP_ARM_RME_CONFIG_REALM.



Also it may be a good idea to call out one of the issues that we have
with the UABI w.r.t the IPA Size. The IPA Size supported by RMM *could*
be different from the normal IPA Size as supported by KVM. We do not
expect this to be common, but is not impossible.

If the RMM_IPA_Size < Normal_IPA_Size, we have a problem with
advertising the "IPA Size" to the VMM. Right now we advertise
the "normal limit" by KVM_CAP_ARM_VM_IPA_SIZE and the IPA Size
is configured via vm_type[7:0] in KVM_CREATE_VM. Given we have
to configure the IPA size for a "Realm VM" at CREATE_VM time too,
the VMM is unable to choose a valid IPA Size for the Realm. We
have the following options:

1. Given IPA Size for a Realm is measured, the user must get
    what they choose. i.e., if the platform cannot support the
    requested size, don't run your Realm VM. In this case, we
    don't need to do anything.

2. Add KVM_CAP_ARM_VM_RMM_IPA_SIZE to expose the RMM limit
    for the VMM to choose.

3. VMM to create a Realm VM using the default IPA Size and then
    check the KVM_CAP_ARM_VM_IPA_SIZE on the "kvm" instance (which
    is Realm) and get the RMM IPA limit.

I prefer 2 or 1, in that order of preference. Happy to hear suggestions.

> 
> Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Steven Price <steven.price@arm.com>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>   arch/arm64/include/asm/kvm_emulate.h |   5 +
>   arch/arm64/include/asm/kvm_rme.h     |  19 ++
>   arch/arm64/kvm/arm.c                 |  18 ++
>   arch/arm64/kvm/mmu.c                 |  15 +-
>   arch/arm64/kvm/rme.c                 | 282 +++++++++++++++++++++++++++
>   5 files changed, 337 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 6f08398537e2..c606316f4729 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -624,6 +624,11 @@ static inline enum realm_state kvm_realm_state(struct kvm *kvm)
>   	return READ_ONCE(kvm->arch.realm.state);
>   }
>   
> +static inline bool kvm_realm_is_created(struct kvm *kvm)
> +{
> +	return kvm_is_realm(kvm) && kvm_realm_state(kvm) != REALM_STATE_NONE;
> +}
> +
>   static inline bool vcpu_is_rec(struct kvm_vcpu *vcpu)
>   {
>   	return false;
> diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
> index 922da3f47227..cf8cc4d30364 100644
> --- a/arch/arm64/include/asm/kvm_rme.h
> +++ b/arch/arm64/include/asm/kvm_rme.h
> @@ -6,6 +6,8 @@
>   #ifndef __ASM_KVM_RME_H
>   #define __ASM_KVM_RME_H
>   
> +#include <uapi/linux/kvm.h>
> +
>   /**
>    * enum realm_state - State of a Realm
>    */
> @@ -46,11 +48,28 @@ enum realm_state {
>    * struct realm - Additional per VM data for a Realm
>    *
>    * @state: The lifetime state machine for the realm
> + * @rd: Kernel mapping of the Realm Descriptor (RD)
> + * @params: Parameters for the RMI_REALM_CREATE command
> + * @num_aux: The number of auxiliary pages required by the RMM
> + * @vmid: VMID to be used by the RMM for the realm
> + * @ia_bits: Number of valid Input Address bits in the IPA
>    */
>   struct realm {
>   	enum realm_state state;
> +
> +	void *rd;
> +	struct realm_params *params;
> +
> +	unsigned long num_aux;
> +	unsigned int vmid;
> +	unsigned int ia_bits;
>   };
>   
>   int kvm_init_rme(void);
> +u32 kvm_realm_ipa_limit(void);
> +
> +int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
> +int kvm_init_realm_vm(struct kvm *kvm);
> +void kvm_destroy_realm(struct kvm *kvm);
>   
>   #endif
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 2056c660c5ee..5729ea430d6d 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -119,6 +119,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   		}
>   		mutex_unlock(&kvm->slots_lock);
>   		break;
> +	case KVM_CAP_ARM_RME:
> +		if (!kvm_is_realm(kvm))
> +			return -EINVAL;
> +		mutex_lock(&kvm->lock);
> +		r = kvm_realm_enable_cap(kvm, cap);
> +		mutex_unlock(&kvm->lock);
> +		break;
>   	default:
>   		r = -EINVAL;
>   		break;
> @@ -179,6 +186,13 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>   
>   	bitmap_zero(kvm->arch.vcpu_features, KVM_VCPU_MAX_FEATURES);
>   
> +	/* Initialise the realm bits after the generic bits are enabled */
> +	if (kvm_is_realm(kvm)) {
> +		ret = kvm_init_realm_vm(kvm);
> +		if (ret)
> +			goto err_free_cpumask;
> +	}
> +
>   	return 0;
>   
>   err_free_cpumask:
> @@ -219,6 +233,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>   	kvm_unshare_hyp(kvm, kvm + 1);
>   
>   	kvm_arm_teardown_hypercalls(kvm);
> +	kvm_destroy_realm(kvm);
>   }
>   
>   int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> @@ -328,6 +343,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   	case KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES:
>   		r = BIT(0);
>   		break;
> +	case KVM_CAP_ARM_RME:
> +		r = static_key_enabled(&kvm_rme_is_available);
> +		break;
>   	default:
>   		r = 0;
>   	}
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 18680771cdb0..aae365647b62 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -872,6 +872,10 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
>   	struct kvm_pgtable *pgt;
>   	u64 mmfr0, mmfr1;
>   	u32 phys_shift;
> +	u32 ipa_limit = kvm_ipa_limit;
> +
> +	if (kvm_is_realm(kvm))
> +		ipa_limit = kvm_realm_ipa_limit();
>   
>   	if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
>   		return -EINVAL;
> @@ -880,12 +884,12 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
>   	if (is_protected_kvm_enabled()) {
>   		phys_shift = kvm_ipa_limit;
>   	} else if (phys_shift) {
> -		if (phys_shift > kvm_ipa_limit ||
> +		if (phys_shift > ipa_limit ||
>   		    phys_shift < ARM64_MIN_PARANGE_BITS)
>   			return -EINVAL;
>   	} else {
>   		phys_shift = KVM_PHYS_SHIFT;
> -		if (phys_shift > kvm_ipa_limit) {
> +		if (phys_shift > ipa_limit) {
>   			pr_warn_once("%s using unsupported default IPA limit, upgrade your VMM\n",
>   				     current->comm);
>   			return -EINVAL;
> @@ -1014,6 +1018,13 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>   	struct kvm_pgtable *pgt = NULL;
>   
>   	write_lock(&kvm->mmu_lock);
> +	if (kvm_is_realm(kvm) &&
> +	    (kvm_realm_state(kvm) != REALM_STATE_DEAD &&
> +	     kvm_realm_state(kvm) != REALM_STATE_NONE)) {
> +		/* TODO: teardown rtts */
> +		write_unlock(&kvm->mmu_lock);
> +		return;
> +	}

This needs a comment to explain the rationale of deferring the
Stage2 pgt freeing. Something like :

	/*
	 * For realms, we can free the entry level RTTs
	 * only after :
	 *  1. All of the stage2 mappings are torn down.
	 *  2. The Realm has been destroyed.
	 *
	 * So, come back later once the RD has been destroyed.
	 */

>   	pgt = mmu->pgt;
>   	if (pgt) {
>   		mmu->pgd_phys = 0;
> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
> index 3dbbf9d046bf..658d14e8d87d 100644
> --- a/arch/arm64/kvm/rme.c
> +++ b/arch/arm64/kvm/rme.c
> @@ -5,9 +5,20 @@
>   
>   #include <linux/kvm_host.h>
>   
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>   #include <asm/rmi_cmds.h>
>   #include <asm/virt.h>
>   
> +#include <asm/kvm_pgtable.h>
> +
> +static unsigned long rmm_feat_reg0;
> +
> +static bool rme_supports(unsigned long feature)
> +{
> +	return !!u64_get_bits(rmm_feat_reg0, feature);
> +}
> +
>   static int rmi_check_version(void)
>   {
>   	struct arm_smccc_res res;
> @@ -36,8 +47,272 @@ static int rmi_check_version(void)
>   	return 0;
>   }
>   
> +u32 kvm_realm_ipa_limit(void)
> +{
> +	return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
> +}
> +
> +static int get_start_level(struct realm *realm)
> +{
> +	return 4 - stage2_pgtable_levels(realm->ia_bits);
> +}
> +
> +static int realm_create_rd(struct kvm *kvm)
> +{
> +	struct realm *realm = &kvm->arch.realm;
> +	struct realm_params *params = realm->params;
> +	void *rd = NULL;
> +	phys_addr_t rd_phys, params_phys;
> +	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> +	int i, r;
> +
> +	if (WARN_ON(realm->rd) || WARN_ON(!realm->params))
> +		return -EEXIST;
> +
> +	rd = (void *)__get_free_page(GFP_KERNEL);
> +	if (!rd)
> +		return -ENOMEM;
> +
> +	rd_phys = virt_to_phys(rd);
> +	if (rmi_granule_delegate(rd_phys)) {
> +		r = -ENXIO;
> +		goto out;

super minor nit: s/out/free_rd/ is a bit more readable. Here "out" is
only used for error exits and could be confusing.

> +	}
> +
> +	for (i = 0; i < pgt->pgd_pages; i++) {
> +		phys_addr_t pgd_phys = kvm->arch.mmu.pgd_phys + i * PAGE_SIZE;
> +
> +		if (rmi_granule_delegate(pgd_phys)) {
> +			r = -ENXIO;
> +			goto out_undelegate_tables;
> +		}
> +	}
> +
> +	realm->ia_bits = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
> +
> +	params->rtt_level_start = get_start_level(realm);
> +	params->rtt_num_start = pgt->pgd_pages;
> +	params->rtt_base = kvm->arch.mmu.pgd_phys;
> +	params->vmid = realm->vmid;
> +
> +	params_phys = virt_to_phys(params);
> +
> +	if (rmi_realm_create(rd_phys, params_phys)) {
> +		r = -ENXIO;
> +		goto out_undelegate_tables;
> +	}
> +
> +	realm->rd = rd;
> +
> +	if (WARN_ON(rmi_rec_aux_count(rd_phys, &realm->num_aux))) {
> +		WARN_ON(rmi_realm_destroy(rd_phys));
> +		goto out_undelegate_tables;
> +	}
> +
> +	return 0;
> +
> +out_undelegate_tables:
> +	while (--i >= 0) {
> +		phys_addr_t pgd_phys = kvm->arch.mmu.pgd_phys + i * PAGE_SIZE;
> +
> +		WARN_ON(rmi_granule_undelegate(pgd_phys));
> +	}
> +	WARN_ON(rmi_granule_undelegate(rd_phys));
> +out:
> +	free_page((unsigned long)rd);
> +	return r;
> +}
> +
> +/* Protects access to rme_vmid_bitmap */
> +static DEFINE_SPINLOCK(rme_vmid_lock);
> +static unsigned long *rme_vmid_bitmap;
> +
> +static int rme_vmid_init(void)
> +{
> +	unsigned int vmid_count = 1 << kvm_get_vmid_bits();

minor nit: RMM has a fixed VMID width of 16bits. Do we need to 
explicitly use that, instead of relying what kvm thinks ? (Though
in practise, this would only be a problem if the architecture
evolves to support something more).

> +
> +	rme_vmid_bitmap = bitmap_zalloc(vmid_count, GFP_KERNEL);
> +	if (!rme_vmid_bitmap) {
> +		kvm_err("%s: Couldn't allocate rme vmid bitmap\n", __func__);
> +		return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +static int rme_vmid_reserve(void)
> +{
> +	int ret;
> +	unsigned int vmid_count = 1 << kvm_get_vmid_bits();
> +
> +	spin_lock(&rme_vmid_lock);
> +	ret = bitmap_find_free_region(rme_vmid_bitmap, vmid_count, 0);
> +	spin_unlock(&rme_vmid_lock);
> +
> +	return ret;
> +}
> +
> +static void rme_vmid_release(unsigned int vmid)
> +{
> +	spin_lock(&rme_vmid_lock);
> +	bitmap_release_region(rme_vmid_bitmap, vmid, 0);
> +	spin_unlock(&rme_vmid_lock);
> +}
> +
> +static int kvm_create_realm(struct kvm *kvm)
> +{
> +	struct realm *realm = &kvm->arch.realm;
> +	int ret;
> +
> +	if (!kvm_is_realm(kvm) || kvm_realm_is_created(kvm))
> +		return -EEXIST;

Minor nit:

	if (!kvm_is_realm(kvm)
		return -EIO or even -EINVAL ?

	if (kvm_realm_is_created(kvm))
		return -EEXIST;

> +
> +	ret = rme_vmid_reserve();
> +	if (ret < 0)
> +		return ret;
> +	realm->vmid = ret;
> +
> +	ret = realm_create_rd(kvm);
> +	if (ret) {
> +		rme_vmid_release(realm->vmid);
> +		return ret;
> +	}
> +
> +	WRITE_ONCE(realm->state, REALM_STATE_NEW);
> +
> +	/* The realm is up, free the parameters.  */
> +	free_page((unsigned long)realm->params);
> +	realm->params = NULL;
> +
> +	return 0;
> +}
> +
> +static int config_realm_hash_algo(struct realm *realm,
> +				  struct kvm_cap_arm_rme_config_item *cfg)
> +{
> +	switch (cfg->hash_algo) {
> +	case KVM_CAP_ARM_RME_MEASUREMENT_ALGO_SHA256:
> +		if (!rme_supports(RMI_FEATURE_REGISTER_0_HASH_SHA_256))
> +			return -EINVAL;
> +		break;
> +	case KVM_CAP_ARM_RME_MEASUREMENT_ALGO_SHA512:
> +		if (!rme_supports(RMI_FEATURE_REGISTER_0_HASH_SHA_512))
> +			return -EINVAL;

Do we need to add a comment here on why we don't expose the supported 
"hash" algo as part of the UABI ? Something like :

	/*
	 * The hash algorithm for the measurements is a choosen by
	 * the Realm owner (since it affects the attestation), we
	 * would like the owner to get what they wants.
	 */

> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +	realm->params->hash_algo = cfg->hash_algo;
> +	return 0;
> +}
> +
> +static int kvm_rme_config_realm(struct kvm *kvm, struct kvm_enable_cap *cap)
> +{
> +	struct kvm_cap_arm_rme_config_item cfg;
> +	struct realm *realm = &kvm->arch.realm;
> +	int r = 0;
> +
> +	if (kvm_realm_is_created(kvm))
> +		return -EINVAL;

minor nit: May be return -EEXIST or -EIO rather than, "Invalid 
(parameter)" ?


> +
> +	if (copy_from_user(&cfg, (void __user *)cap->args[1], sizeof(cfg)))
> +		return -EFAULT;
> +
> +	switch (cfg.cfg) {
> +	case KVM_CAP_ARM_RME_CFG_RPV:
> +		memcpy(&realm->params->rpv, &cfg.rpv, sizeof(cfg.rpv));
> +		break;
> +	case KVM_CAP_ARM_RME_CFG_HASH_ALGO:
> +		r = config_realm_hash_algo(realm, &cfg);
> +		break;
> +	default:
> +		r = -EINVAL;
> +	}
> +
> +	return r;
> +}
> +
> +int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> +{
> +	int r = 0;
> +
> +	if (!kvm_is_realm(kvm))
> +		return -EINVAL;
> +
> +	switch (cap->args[0]) {
> +	case KVM_CAP_ARM_RME_CONFIG_REALM:
> +		r = kvm_rme_config_realm(kvm, cap);
> +		break;
> +	case KVM_CAP_ARM_RME_CREATE_RD:
> +		r = kvm_create_realm(kvm);
> +		break;
> +	default:
> +		r = -EINVAL;
> +		break;
> +	}
> +
> +	return r;
> +}
> +
> +void kvm_destroy_realm(struct kvm *kvm)
> +{
> +	struct realm *realm = &kvm->arch.realm;
> +	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> +	int i;
> +
> +	if (realm->params) {
> +		free_page((unsigned long)realm->params);
> +		realm->params = NULL;
> +	}
> +
> +	if (!kvm_realm_is_created(kvm))
> +		return;
> +
> +	WRITE_ONCE(realm->state, REALM_STATE_DYING);
> +
> +	if (realm->rd) {
> +		phys_addr_t rd_phys = virt_to_phys(realm->rd);
> +
> +		if (WARN_ON(rmi_realm_destroy(rd_phys)))
> +			return;
> +		if (WARN_ON(rmi_granule_undelegate(rd_phys)))
> +			return;
> +		free_page((unsigned long)realm->rd);
> +		realm->rd = NULL;
> +	}
> +
> +	rme_vmid_release(realm->vmid);
> +
> +	for (i = 0; i < pgt->pgd_pages; i++) {
> +		phys_addr_t pgd_phys = kvm->arch.mmu.pgd_phys + i * PAGE_SIZE;
> +
> +		if (WARN_ON(rmi_granule_undelegate(pgd_phys)))
> +			return;
> +	}
> +
> +	WRITE_ONCE(realm->state, REALM_STATE_DEAD);
> +

May be add in a comment here:

      /* Now that the Realm is destroyed, free the entry level RTTs */
> +	kvm_free_stage2_pgd(&kvm->arch.mmu);



> +}
> +
> +int kvm_init_realm_vm(struct kvm *kvm)
> +{
> +	struct realm_params *params;
> +
> +	params = (struct realm_params *)get_zeroed_page(GFP_KERNEL);
> +	if (!params)
> +		return -ENOMEM;
> +
> +	/* Default parameters, not exposed to user space */

This is a bit misleading. The value comes from the userspace and...

> +	params->s2sz = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);

(minor nit) we initialise most of the params, including those that come
from KVM later. So, as such may be a good idea to move it together at
create_realm, unless we need it.

> +	kvm->arch.realm.params = params;
> +	return 0;
> +}
> +
>   int kvm_init_rme(void)
>   {
> +	int ret;
> +
>   	if (PAGE_SIZE != SZ_4K)
>   		/* Only 4k page size on the host is supported */
>   		return 0;
> @@ -46,6 +321,13 @@ int kvm_init_rme(void)
>   		/* Continue without realm support */
>   		return 0;
>   
> +	if (WARN_ON(rmi_features(0, &rmm_feat_reg0)))
> +		return 0;
> +
> +	ret = rme_vmid_init();
> +	if (ret)
> +		return ret;
> +
>   	/* Future patch will enable static branch kvm_rme_is_available */
>   
>   	return 0;


Suzuki


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 12/43] arm64: RME: Keep a spare page delegated to the RMM
  2024-04-12  8:42   ` [PATCH v2 12/43] arm64: RME: Keep a spare page delegated to the RMM Steven Price
@ 2024-04-17 10:19     ` Suzuki K Poulose
  0 siblings, 0 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-17 10:19 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

Hi Steven

On 12/04/2024 09:42, Steven Price wrote:
> Pages can only be populated/destroyed on the RMM at the 4KB granule,
> this requires creating the full depth of RTTs. However if the pages are
> going to be combined into a 4MB huge page the last RTT is only

minor nit: 2MB huge page.

> temporarily needed. Similarly when freeing memory the huge page must be
> temporarily split requiring temporary usage of the full depth oF RTTs.
> 
> To avoid needing to perform a temporary allocation and delegation of a
> page for this purpose we keep a spare delegated page around. In
> particular this avoids the need for memory allocation while destroying
> the realm guest.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   arch/arm64/include/asm/kvm_rme.h | 5 +++++
>   arch/arm64/kvm/rme.c             | 8 ++++++++
>   2 files changed, 13 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
> index cf8cc4d30364..fba85e9ce3ae 100644
> --- a/arch/arm64/include/asm/kvm_rme.h
> +++ b/arch/arm64/include/asm/kvm_rme.h
> @@ -50,6 +50,9 @@ enum realm_state {
>    * @state: The lifetime state machine for the realm
>    * @rd: Kernel mapping of the Realm Descriptor (RD)
>    * @params: Parameters for the RMI_REALM_CREATE command
> + * @spare_page: A physical page that has been delegated to the Realm world but
> + *              is otherwise free. Used to avoid temporary allocation during
> + *              RTT operations.
>    * @num_aux: The number of auxiliary pages required by the RMM
>    * @vmid: VMID to be used by the RMM for the realm
>    * @ia_bits: Number of valid Input Address bits in the IPA
> @@ -60,6 +63,8 @@ struct realm {
>   	void *rd;
>   	struct realm_params *params;
>   
> +	phys_addr_t spare_page;
> +
>   	unsigned long num_aux;
>   	unsigned int vmid;
>   	unsigned int ia_bits;
> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
> index 658d14e8d87d..9652ec6ab2fd 100644
> --- a/arch/arm64/kvm/rme.c
> +++ b/arch/arm64/kvm/rme.c
> @@ -103,6 +103,7 @@ static int realm_create_rd(struct kvm *kvm)
>   	}
>   
>   	realm->rd = rd;
> +	realm->spare_page = PHYS_ADDR_MAX;
>   
>   	if (WARN_ON(rmi_rec_aux_count(rd_phys, &realm->num_aux))) {
>   		WARN_ON(rmi_realm_destroy(rd_phys));
> @@ -283,6 +284,13 @@ void kvm_destroy_realm(struct kvm *kvm)
>   
>   	rme_vmid_release(realm->vmid);
>   
> +	if (realm->spare_page != PHYS_ADDR_MAX) {
> +		/* Leak the page if the undelegate fails */
> +		if (!WARN_ON(rmi_granule_undelegate(realm->spare_page)))
> +			free_page((unsigned long)phys_to_virt(realm->spare_page));
> +		realm->spare_page = PHYS_ADDR_MAX;
> +	}
> +
>   	for (i = 0; i < pgt->pgd_pages; i++) {
>   		phys_addr_t pgd_phys = kvm->arch.mmu.pgd_phys + i * PAGE_SIZE;
>   

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 11/43] arm64: kvm: Allow passing machine type in KVM creation
  2024-04-12  8:42   ` [PATCH v2 11/43] arm64: kvm: Allow passing machine type in KVM creation Steven Price
@ 2024-04-17 10:20     ` Suzuki K Poulose
  0 siblings, 0 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-17 10:20 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

On 12/04/2024 09:42, Steven Price wrote:
> Previously machine type was used purely for specifying the physical
> address size of the guest. Reserve the higher bits to specify an ARM
> specific machine type and declare a new type 'KVM_VM_TYPE_ARM_REALM'
> used to create a realm guest.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   arch/arm64/kvm/arm.c     | 17 +++++++++++++++++
>   arch/arm64/kvm/mmu.c     |  3 ---
>   include/uapi/linux/kvm.h | 19 +++++++++++++++----
>   3 files changed, 32 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 22da6493912a..c5a6139d5454 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -173,6 +173,23 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>   	mutex_unlock(&kvm->lock);
>   #endif
>   
> +	if (type & ~(KVM_VM_TYPE_ARM_MASK | KVM_VM_TYPE_ARM_IPA_SIZE_MASK))
> +		return -EINVAL;
> +
> +	switch (type & KVM_VM_TYPE_ARM_MASK) {
> +	case KVM_VM_TYPE_ARM_NORMAL:
> +		break;
> +	case KVM_VM_TYPE_ARM_REALM:
> +		kvm->arch.is_realm = true;
> +		if (!kvm_is_realm(kvm)) {
> +			/* Realm support unavailable */
> +			return -EINVAL;
> +		}
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
>   	ret = kvm_share_hyp(kvm, kvm + 1);
>   	if (ret)
>   		return ret;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index aae365647b62..af4564f3add5 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -877,9 +877,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
>   	if (kvm_is_realm(kvm))
>   		ipa_limit = kvm_realm_ipa_limit();
>   
> -	if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
> -		return -EINVAL;
> -
>   	phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
>   	if (is_protected_kvm_enabled()) {
>   		phys_shift = kvm_ipa_limit;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index a1147036d1bd..5153c837c8c7 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -635,14 +635,25 @@ struct kvm_enable_cap {
>   #define KVM_S390_SIE_PAGE_OFFSET 1
>   
>   /*
> - * On arm64, machine type can be used to request the physical
> - * address size for the VM. Bits[7-0] are reserved for the guest
> - * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
> - * value 0 implies the default IPA size, 40bits.
> + * On arm64, machine type can be used to request both the machine type and
> + * the physical address size for the VM.
> + *
> + * Bits[11-8] are reserved for the ARM specific machine type.
> + *
> + * Bits[7-0] are reserved for the guest PA size shift (i.e, log2(PA_Size)).
> + * For backward compatibility, value 0 implies the default IPA size, 40bits.
>    */
> +#define KVM_VM_TYPE_ARM_SHIFT		8
> +#define KVM_VM_TYPE_ARM_MASK		(0xfULL << KVM_VM_TYPE_ARM_SHIFT)
> +#define KVM_VM_TYPE_ARM(_type)		\
> +	(((_type) << KVM_VM_TYPE_ARM_SHIFT) & KVM_VM_TYPE_ARM_MASK)
> +#define KVM_VM_TYPE_ARM_NORMAL		KVM_VM_TYPE_ARM(0)
> +#define KVM_VM_TYPE_ARM_REALM		KVM_VM_TYPE_ARM(1)
> +
>   #define KVM_VM_TYPE_ARM_IPA_SIZE_MASK	0xffULL
>   #define KVM_VM_TYPE_ARM_IPA_SIZE(x)		\
>   	((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
> +
>   /*
>    * ioctls for /dev/kvm fds:
>    */


Looks good to me.

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 13/43] arm64: RME: RTT handling
  2024-04-12  8:42   ` [PATCH v2 13/43] arm64: RME: RTT handling Steven Price
@ 2024-04-17 13:37     ` Suzuki K Poulose
  2024-04-24 10:59       ` Steven Price
  0 siblings, 1 reply; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-17 13:37 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

Hi Steven

minort nit, Subject: arm64: RME: RTT tear down

This patch is all about tearing the RTTs, so may be the subject could
be adjusted accordingly.

On 12/04/2024 09:42, Steven Price wrote:
> The RMM owns the stage 2 page tables for a realm, and KVM must request
> that the RMM creates/destroys entries as necessary. The physical pages
> to store the page tables are delegated to the realm as required, and can
> be undelegated when no longer used.
> 
> Creating new RTTs is the easy part, tearing down is a little more
> tricky. The result of realm_rtt_destroy() can be used to effectively
> walk the tree and destroy the entries (undelegating pages that were
> given to the realm).

The patch looks functionally correct to me. Some minor style related
comments below.

> Signed-off-by: Steven Price <steven.price@arm.com>

> ---
>   arch/arm64/include/asm/kvm_rme.h |  19 ++++
>   arch/arm64/kvm/mmu.c             |   6 +-
>   arch/arm64/kvm/rme.c             | 171 +++++++++++++++++++++++++++++++
>   3 files changed, 193 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
> index fba85e9ce3ae..4ab5cb5e91b3 100644
> --- a/arch/arm64/include/asm/kvm_rme.h
> +++ b/arch/arm64/include/asm/kvm_rme.h
> @@ -76,5 +76,24 @@ u32 kvm_realm_ipa_limit(void);
>   int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
>   int kvm_init_realm_vm(struct kvm *kvm);
>   void kvm_destroy_realm(struct kvm *kvm);
> +void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits);
> +
> +#define RME_RTT_BLOCK_LEVEL	2
> +#define RME_RTT_MAX_LEVEL	3
> +
> +#define RME_PAGE_SHIFT		12
> +#define RME_PAGE_SIZE		BIT(RME_PAGE_SHIFT)
> +/* See ARM64_HW_PGTABLE_LEVEL_SHIFT() */
> +#define RME_RTT_LEVEL_SHIFT(l)	\
> +	((RME_PAGE_SHIFT - 3) * (4 - (l)) + 3)
> +#define RME_L2_BLOCK_SIZE	BIT(RME_RTT_LEVEL_SHIFT(2))
> +
> +static inline unsigned long rme_rtt_level_mapsize(int level)
> +{
> +	if (WARN_ON(level > RME_RTT_MAX_LEVEL))
> +		return RME_PAGE_SIZE;
> +
> +	return (1UL << RME_RTT_LEVEL_SHIFT(level));
> +}
> 

super minor nit: We only support 4K for now, so may be could reuse
the ARM64 generic macro helpers. I am fine either way.


>   #endif
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index af4564f3add5..46f0c4e80ace 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1012,17 +1012,17 @@ void stage2_unmap_vm(struct kvm *kvm)
>   void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>   {
>   	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
> -	struct kvm_pgtable *pgt = NULL;
> +	struct kvm_pgtable *pgt;
>   
>   	write_lock(&kvm->mmu_lock);
> +	pgt = mmu->pgt;
>   	if (kvm_is_realm(kvm) &&
>   	    (kvm_realm_state(kvm) != REALM_STATE_DEAD &&
>   	     kvm_realm_state(kvm) != REALM_STATE_NONE)) {
> -		/* TODO: teardown rtts */
>   		write_unlock(&kvm->mmu_lock);
> +		kvm_realm_destroy_rtts(kvm, pgt->ia_bits);
>   		return;
>   	}
> -	pgt = mmu->pgt;
>   	if (pgt) {
>   		mmu->pgd_phys = 0;
>   		mmu->pgt = NULL;
> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
> index 9652ec6ab2fd..09b59bcad8b6 100644
> --- a/arch/arm64/kvm/rme.c
> +++ b/arch/arm64/kvm/rme.c
> @@ -47,6 +47,53 @@ static int rmi_check_version(void)
>   	return 0;
>   }
>   
> +static phys_addr_t __alloc_delegated_page(struct realm *realm,
> +					  struct kvm_mmu_memory_cache *mc,
> +					  gfp_t flags)

minor nit: Do we need "__" here ? The counter part is plain 
free_delegated_page without the "__". We could drop the prefix

Or we could split the function as:

alloc_delegated_page()
{
   if (spare_page_available)
	return spare_page;
   return __alloc_delegated_page(); /* Alloc and delegate a page */
}


> +{
> +	phys_addr_t phys = PHYS_ADDR_MAX;
> +	void *virt;
> +
> +	if (realm->spare_page != PHYS_ADDR_MAX) {
> +		swap(realm->spare_page, phys);
> +		goto out;
> +	}
> +
> +	if (mc)
> +		virt = kvm_mmu_memory_cache_alloc(mc);
> +	else
> +		virt = (void *)__get_free_page(flags);
> +
> +	if (!virt)
> +		goto out;
> +
> +	phys = virt_to_phys(virt);
> +
> +	if (rmi_granule_delegate(phys)) {
> +		free_page((unsigned long)virt);
> +
> +		phys = PHYS_ADDR_MAX;
> +	}
> +
> +out:
> +	return phys;
> +}
> +
> +static void free_delegated_page(struct realm *realm, phys_addr_t phys)
> +{
> +	if (realm->spare_page == PHYS_ADDR_MAX) {
> +		realm->spare_page = phys;
> +		return;
> +	}
> +
> +	if (WARN_ON(rmi_granule_undelegate(phys))) {
> +		/* Undelegate failed: leak the page */
> +		return;
> +	}
> +
> +	free_page((unsigned long)phys_to_virt(phys));
> +}
> +
>   u32 kvm_realm_ipa_limit(void)
>   {
>   	return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
> @@ -124,6 +171,130 @@ static int realm_create_rd(struct kvm *kvm)
>   	return r;
>   }
>   
> +static int realm_rtt_destroy(struct realm *realm, unsigned long addr,
> +			     int level, phys_addr_t *rtt_granule,
> +			     unsigned long *next_addr)
> +{
> +	unsigned long out_rtt;
> +	unsigned long out_top;
> +	int ret;
> +
> +	ret = rmi_rtt_destroy(virt_to_phys(realm->rd), addr, level,
> +			      &out_rtt, &out_top);
> +
> +	if (rtt_granule)
> +		*rtt_granule = out_rtt;
> +	if (next_addr)
> +		*next_addr = out_top;

minor nit: As mentioned in the previous patch, we could move this check 
to the rmi_rtt_destroy().

> +
> +	return ret;
> +}
> +
> +static int realm_tear_down_rtt_level(struct realm *realm, int level,
> +				     unsigned long start, unsigned long end)
> +{
> +	ssize_t map_size;
> +	unsigned long addr, next_addr;
> +
> +	if (WARN_ON(level > RME_RTT_MAX_LEVEL))
> +		return -EINVAL;
> +
> +	map_size = rme_rtt_level_mapsize(level - 1);
> +
> +	for (addr = start; addr < end; addr = next_addr) {
> +		phys_addr_t rtt_granule;
> +		int ret;
> +		unsigned long align_addr = ALIGN(addr, map_size);
> +
> +		next_addr = ALIGN(addr + 1, map_size);
> +
> +		if (next_addr <= end && align_addr == addr) {
> +			ret = realm_rtt_destroy(realm, addr, level,
> +						&rtt_granule, &next_addr);
> +		} else {
> +			/* Recurse a level deeper */
> +			ret = realm_tear_down_rtt_level(realm,
> +							level + 1,
> +							addr,
> +							min(next_addr, end));
> +			if (ret)
> +				return ret;
> +			continue;
> +		}

I think it would be better readable if we did something like :

		/*
		 * The target range is smaller than what this level
		 * covers. Go deeper.
		 */
		if (next_addr > end || align_addr != addr) {
			ret = realm_tear_down_rtt_level(realm,
							level + 1, addr,
							min(next_addr, end));
			if (ret)
				return ret;
			continue;
		}

		ret = realm_rtt_destroy(realm, addr, level,
					&rtt_granule, &next_addr);
		
> +
> +		switch (RMI_RETURN_STATUS(ret)) {
> +		case RMI_SUCCESS:
> +			if (!WARN_ON(rmi_granule_undelegate(rtt_granule)))
> +				free_page((unsigned long)phys_to_virt(rtt_granule));
> +			break;
> +		case RMI_ERROR_RTT:
> +			if (next_addr > addr) {
> +				/* unassigned or destroyed */

minor nit:
				/* RTT doesn't exist, skip */

> +				break;
> +			}

> +			if (WARN_ON(RMI_RETURN_INDEX(ret) != level))
> +				return -EBUSY;

In practise, we only call this for the full IPA range and we wouldn't go 
deeper, if the top level entry was missing. So, there is no reason why
the RMM didn't walk to the requested level. May be we could add a 
comment here :
			/*
			 * We tear down the RTT range for the full IPA
			 * space, after everything is unmapped. Also we
			 * descend down only if we cannot tear down a
			 * top level RTT. Thus RMM must be able to walk
			 * to the requested level. e.g., a block mapping
			 * exists at L1 or L2.
			 */

> +			if (WARN_ON(level == RME_RTT_MAX_LEVEL)) {
> +				// Live entry
> +				return -EBUSY;


The first part of the comment above applies to this. So may be it is
good to have it.


> +			}

> +			/* Recurse a level deeper */

minor nit:
			/*
			 * The table has active entries in it, recurse
			 * deeper and tear down the RTTs.
			 */

> +			next_addr = ALIGN(addr + 1, map_size);
> +			ret = realm_tear_down_rtt_level(realm,
> +							level + 1,
> +							addr,
> +							next_addr);
> +			if (ret)
> +				return ret;
> +			/* Try again at this level */

			/*
			 * Now that the children RTTs are destroyed,
			 * retry at this level.
			 */
> +			next_addr = addr;
> +			break;
> +		default:
> +			WARN_ON(1);
> +			return -ENXIO;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int realm_tear_down_rtt_range(struct realm *realm,
> +				     unsigned long start, unsigned long end)
> +{
> +	return realm_tear_down_rtt_level(realm, get_start_level(realm) + 1,
> +					 start, end);
> +}
> +
> +static void ensure_spare_page(struct realm *realm)
> +{
> +	phys_addr_t tmp_rtt;
> +
> +	/*
> +	 * Make sure we have a spare delegated page for tearing down the
> +	 * block mappings. We do this by allocating then freeing a page.
> +	 * We must use Atomic allocations as we are called with kvm->mmu_lock
> +	 * held.
> +	 */
> +	tmp_rtt = __alloc_delegated_page(realm, NULL, GFP_ATOMIC);
> +
> +	/*
> +	 * If the allocation failed, continue as we may not have a block level
> +	 * mapping so it may not be fatal, otherwise free it to assign it
> +	 * to the spare page.
> +	 */
> +	if (tmp_rtt != PHYS_ADDR_MAX)
> +		free_delegated_page(realm, tmp_rtt);
> +}
> +
> +void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits)
> +{
> +	struct realm *realm = &kvm->arch.realm;
> +
> +	ensure_spare_page(realm);
> +
> +	WARN_ON(realm_tear_down_rtt_range(realm, 0, (1UL << ia_bits)));
> +}

minor nit: We don't seem to be using the "spare_page" yet in this patch. 
May be a good idea to move all the related changes 
(alloc_delegated_page() / free_delegated_page, ensure_spare_page() etc)
to the patch where it may be better suited ?

Suzuki

> +
>   /* Protects access to rme_vmid_bitmap */
>   static DEFINE_SPINLOCK(rme_vmid_lock);
>   static unsigned long *rme_vmid_bitmap;


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 14/43] arm64: RME: Allocate/free RECs to match vCPUs
  2024-04-12  8:42   ` [PATCH v2 14/43] arm64: RME: Allocate/free RECs to match vCPUs Steven Price
@ 2024-04-18  9:23     ` Suzuki K Poulose
  0 siblings, 0 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-18  9:23 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

On 12/04/2024 09:42, Steven Price wrote:
> The RMM maintains a data structure known as the Realm Execution Context
> (or REC). It is similar to struct kvm_vcpu and tracks the state of the
> virtual CPUs. KVM must delegate memory and request the structures are
> created when vCPUs are created, and suitably tear down on destruction.
> 

May be a good idea to add a note about the AUX granules, to help the
reader with the context.

RECs must be supplied with additional pages (AUX Granules) for storing
the larger registers state (e.g., SVE). The number of AUX granules for
a REC depends on the "parameters" with which the Realm was created.

Also the register states for the REC cannot be modified by KVM after the
REC is created.




> See Realm Management Monitor specification (DEN0137) for more information:
> https://developer.arm.com/documentation/den0137/
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   arch/arm64/include/asm/kvm_emulate.h |   2 +
>   arch/arm64/include/asm/kvm_host.h    |   3 +
>   arch/arm64/include/asm/kvm_rme.h     |  18 ++++
>   arch/arm64/kvm/arm.c                 |   2 +
>   arch/arm64/kvm/reset.c               |  11 ++
>   arch/arm64/kvm/rme.c                 | 150 +++++++++++++++++++++++++++
>   6 files changed, 186 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index c606316f4729..2209a7c6267f 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -631,6 +631,8 @@ static inline bool kvm_realm_is_created(struct kvm *kvm)
>   
>   static inline bool vcpu_is_rec(struct kvm_vcpu *vcpu)
>   {
> +	if (static_branch_unlikely(&kvm_rme_is_available))
> +		return vcpu->arch.rec.mpidr != INVALID_HWID;
>   	return false;
>   }
>   
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 63b68b85db3f..f7ac40ce0caf 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -694,6 +694,9 @@ struct kvm_vcpu_arch {
>   
>   	/* Per-vcpu CCSIDR override or NULL */
>   	u32 *ccsidr;
> +
> +	/* Realm meta data */
> +	struct realm_rec rec;
>   };
>   
>   /*
> diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
> index 4ab5cb5e91b3..915e76068b00 100644
> --- a/arch/arm64/include/asm/kvm_rme.h
> +++ b/arch/arm64/include/asm/kvm_rme.h
> @@ -6,6 +6,7 @@
>   #ifndef __ASM_KVM_RME_H
>   #define __ASM_KVM_RME_H
>   
> +#include <asm/rmi_smc.h>
>   #include <uapi/linux/kvm.h>
>   
>   /**
> @@ -70,6 +71,21 @@ struct realm {
>   	unsigned int ia_bits;
>   };
>   
> +/**
> + * struct realm_rec - Additional per VCPU data for a Realm
> + *
> + * @mpidr: MPIDR (Multiprocessor Affinity Register) value to identify this VCPU
> + * @rec_page: Kernel VA of the RMM's private page for this REC
> + * @aux_pages: Additional pages private to the RMM for this REC
> + * @run: Kernel VA of the RmiRecRun structure shared with the RMM
> + */
> +struct realm_rec {
> +	unsigned long mpidr;
> +	void *rec_page;
> +	struct page *aux_pages[REC_PARAMS_AUX_GRANULES];
> +	struct rec_run *run;
> +};
> +
>   int kvm_init_rme(void);
>   u32 kvm_realm_ipa_limit(void);
>   
> @@ -77,6 +93,8 @@ int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
>   int kvm_init_realm_vm(struct kvm *kvm);
>   void kvm_destroy_realm(struct kvm *kvm);
>   void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits);
> +int kvm_create_rec(struct kvm_vcpu *vcpu);
> +void kvm_destroy_rec(struct kvm_vcpu *vcpu);
>   
>   #define RME_RTT_BLOCK_LEVEL	2
>   #define RME_RTT_MAX_LEVEL	3
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index c5a6139d5454..d70c511e16a0 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -432,6 +432,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>   	/* Force users to call KVM_ARM_VCPU_INIT */
>   	vcpu_clear_flag(vcpu, VCPU_INITIALIZED);
>   
> +	vcpu->arch.rec.mpidr = INVALID_HWID;
> +
>   	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
>   
>   	/*
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 68d1d05672bd..6e6eb4a15095 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -134,6 +134,11 @@ int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature)
>   			return -EPERM;
>   
>   		return kvm_vcpu_finalize_sve(vcpu);
> +	case KVM_ARM_VCPU_REC:
> +		if (!kvm_is_realm(vcpu->kvm))
> +			return -EINVAL;
> +
> +		return kvm_create_rec(vcpu);
>   	}
>   
>   	return -EINVAL;
> @@ -144,6 +149,11 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
>   	if (vcpu_has_sve(vcpu) && !kvm_arm_vcpu_sve_finalized(vcpu))
>   		return false;
>   
> +	if (kvm_is_realm(vcpu->kvm) &&
> +	    !(vcpu_is_rec(vcpu) &&
> +	      READ_ONCE(vcpu->kvm->arch.realm.state) == REALM_STATE_ACTIVE))
> +		return false;
> +
>   	return true;
>   }
>   
> @@ -157,6 +167,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
>   		kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu));
>   	kfree(sve_state);
>   	kfree(vcpu->arch.ccsidr);
> +	kvm_destroy_rec(vcpu);
>   }
>   
>   static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
> index 09b59bcad8b6..629a095bea61 100644
> --- a/arch/arm64/kvm/rme.c
> +++ b/arch/arm64/kvm/rme.c
> @@ -474,6 +474,156 @@ void kvm_destroy_realm(struct kvm *kvm)
>   	kvm_free_stage2_pgd(&kvm->arch.mmu);
>   }
>   
> +static void free_rec_aux(struct page **aux_pages,
> +			 unsigned int num_aux)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < num_aux; i++) {
> +		phys_addr_t aux_page_phys = page_to_phys(aux_pages[i]);
> +
> +		/* If the undelegate fails then leak the page */
> +		if (WARN_ON(rmi_granule_undelegate(aux_page_phys)))
> +			continue;
> +
> +		__free_page(aux_pages[i]);
> +	}
> +}
> +
> +static int alloc_rec_aux(struct page **aux_pages,
> +			 u64 *aux_phys_pages,
> +			 unsigned int num_aux)
> +{
> +	int ret;
> +	unsigned int i;
> +
> +	for (i = 0; i < num_aux; i++) {
> +		struct page *aux_page;
> +		phys_addr_t aux_page_phys;
> +
> +		aux_page = alloc_page(GFP_KERNEL);
> +		if (!aux_page) {
> +			ret = -ENOMEM;
> +			goto out_err;
> +		}
> +		aux_page_phys = page_to_phys(aux_page);
> +		if (rmi_granule_delegate(aux_page_phys)) {
> +			__free_page(aux_page);
> +			ret = -ENXIO;
> +			goto out_err;
> +		}
> +		aux_pages[i] = aux_page;
> +		aux_phys_pages[i] = aux_page_phys;
> +	}
> +
> +	return 0;
> +out_err:
> +	free_rec_aux(aux_pages, i);
> +	return ret;
> +}
> +
> +int kvm_create_rec(struct kvm_vcpu *vcpu)
> +{
> +	struct user_pt_regs *vcpu_regs = vcpu_gp_regs(vcpu);
> +	unsigned long mpidr = kvm_vcpu_get_mpidr_aff(vcpu);
> +	struct realm *realm = &vcpu->kvm->arch.realm;
> +	struct realm_rec *rec = &vcpu->arch.rec;
> +	unsigned long rec_page_phys;
> +	struct rec_params *params;
> +	int r, i;
> +
> +	if (kvm_realm_state(vcpu->kvm) != REALM_STATE_NEW)
> +		return -ENOENT;
> +
> +	/*
> +	 * The RMM will report PSCI v1.0 to Realms and the KVM_ARM_VCPU_PSCI_0_2
> +	 * flag covers v0.2 and onwards.
> +	 */
> +	if (!vcpu_has_feature(vcpu, KVM_ARM_VCPU_PSCI_0_2))
> +		return -EINVAL;
> +
> +	BUILD_BUG_ON(sizeof(*params) > PAGE_SIZE);
> +	BUILD_BUG_ON(sizeof(*rec->run) > PAGE_SIZE);
> +
> +	params = (struct rec_params *)get_zeroed_page(GFP_KERNEL);
> +	rec->rec_page = (void *)__get_free_page(GFP_KERNEL);
> +	rec->run = (void *)get_zeroed_page(GFP_KERNEL);
> +	if (!params || !rec->rec_page || !rec->run) {
> +		r = -ENOMEM;
> +		goto out_free_pages;
> +	}
> +
> +	for (i = 0; i < ARRAY_SIZE(params->gprs); i++)
> +		params->gprs[i] = vcpu_regs->regs[i];
> +
> +	params->pc = vcpu_regs->pc;
> +
> +	if (vcpu->vcpu_id == 0)
> +		params->flags |= REC_PARAMS_FLAG_RUNNABLE;
> +
> +	rec_page_phys = virt_to_phys(rec->rec_page);
> +
> +	if (rmi_granule_delegate(rec_page_phys)) {
> +		r = -ENXIO;
> +		goto out_free_pages;
> +	}
> +
> +	r = alloc_rec_aux(rec->aux_pages, params->aux, realm->num_aux);
> +	if (r)
> +		goto out_undelegate_rmm_rec;
> +
> +	params->num_rec_aux = realm->num_aux;
> +	params->mpidr = mpidr;
> +
> +	if (rmi_rec_create(virt_to_phys(realm->rd),
> +			   rec_page_phys,
> +			   virt_to_phys(params))) {
> +		r = -ENXIO;
> +		goto out_free_rec_aux;
> +	}
> +
> +	rec->mpidr = mpidr;
> +
> +	free_page((unsigned long)params);
> +	return 0;
> +
> +out_free_rec_aux:
> +	free_rec_aux(rec->aux_pages, realm->num_aux);
> +out_undelegate_rmm_rec:
> +	if (WARN_ON(rmi_granule_undelegate(rec_page_phys)))
> +		rec->rec_page = NULL;
> +out_free_pages:
> +	free_page((unsigned long)rec->run);
> +	free_page((unsigned long)rec->rec_page);
> +	free_page((unsigned long)params);
> +	return r;
> +}
> +
> +void kvm_destroy_rec(struct kvm_vcpu *vcpu)
> +{
> +	struct realm *realm = &vcpu->kvm->arch.realm;
> +	struct realm_rec *rec = &vcpu->arch.rec;
> +	unsigned long rec_page_phys;
> +
> +	if (!vcpu_is_rec(vcpu))
> +		return;
> +
> +	rec_page_phys = virt_to_phys(rec->rec_page);
> +
> +	/* If the REC destroy fails, leak all pages relating to the REC */

minor nit: May be we could clarify the situation of AUX granules.

	/*
	 * We cannot reclaim the REC page and any AUX pages
	 * until the REC is destroyed. So, if we fail to destroy
	 * the REC, leak the REC and AUX pages.
	 */
> +	if (WARN_ON(rmi_rec_destroy(rec_page_phys)))
> +		return;
> +
> +	free_rec_aux(rec->aux_pages, realm->num_aux);
> +
> +	/* If the undelegate fails then leak the REC page */
> +	if (WARN_ON(rmi_granule_undelegate(rec_page_phys)))
> +		return;
> +
> +	free_page((unsigned long)rec->rec_page);


> +	free_page((unsigned long)rec->run);

I think this can be freed irrespective of the delegated pages.
So, may be we could do move it to the top.


> +}
> +

Suzuki


>   int kvm_init_realm_vm(struct kvm *kvm)
>   {
>   	struct realm_params *params;


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 16/43] KVM: arm64: Support timers in realm RECs
  2024-04-12  8:42   ` [PATCH v2 16/43] KVM: arm64: Support timers in realm RECs Steven Price
@ 2024-04-18  9:30     ` Suzuki K Poulose
  0 siblings, 0 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-18  9:30 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

On 12/04/2024 09:42, Steven Price wrote:
> The RMM keeps track of the timer while the realm REC is running, but on
> exit to the normal world KVM is responsible for handling the timers.
> 

minor nit: It may be worth mentioning this will be hooked in, when we
add the Realm exit handling.

Otherwise looks good to me.


Suzuki


> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   arch/arm64/kvm/arch_timer.c  | 45 ++++++++++++++++++++++++++++++++----
>   include/kvm/arm_arch_timer.h |  2 ++
>   2 files changed, 43 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 879982b1cc73..0b2be34a9ba3 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -162,6 +162,13 @@ static void timer_set_cval(struct arch_timer_context *ctxt, u64 cval)
>   
>   static void timer_set_offset(struct arch_timer_context *ctxt, u64 offset)
>   {
> +	struct kvm_vcpu *vcpu = ctxt->vcpu;
> +
> +	if (kvm_is_realm(vcpu->kvm)) {
> +		WARN_ON(offset);
> +		return;
> +	}
> +
>   	if (!ctxt->offset.vm_offset) {
>   		WARN(offset, "timer %ld\n", arch_timer_ctx_index(ctxt));
>   		return;
> @@ -460,6 +467,21 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>   	}
>   }
>   
> +void kvm_realm_timers_update(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *arch_timer = &vcpu->arch.timer_cpu;
> +	int i;
> +
> +	for (i = 0; i < NR_KVM_EL0_TIMERS; i++) {
> +		struct arch_timer_context *timer = &arch_timer->timers[i];
> +		bool status = timer_get_ctl(timer) & ARCH_TIMER_CTRL_IT_STAT;
> +		bool level = kvm_timer_irq_can_fire(timer) && status;
> +
> +		if (level != timer->irq.level)
> +			kvm_timer_update_irq(vcpu, level, timer);
> +	}
> +}
> +
>   /* Only called for a fully emulated timer */
>   static void timer_emulate(struct arch_timer_context *ctx)
>   {
> @@ -831,6 +853,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>   	if (unlikely(!timer->enabled))
>   		return;
>   
> +	kvm_timer_unblocking(vcpu);
> +
>   	get_timer_map(vcpu, &map);
>   
>   	if (static_branch_likely(&has_gic_active_state)) {
> @@ -844,8 +868,6 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>   		kvm_timer_vcpu_load_nogic(vcpu);
>   	}
>   
> -	kvm_timer_unblocking(vcpu);
> -
>   	timer_restore_state(map.direct_vtimer);
>   	if (map.direct_ptimer)
>   		timer_restore_state(map.direct_ptimer);
> @@ -988,7 +1010,9 @@ static void timer_context_init(struct kvm_vcpu *vcpu, int timerid)
>   
>   	ctxt->vcpu = vcpu;
>   
> -	if (timerid == TIMER_VTIMER)
> +	if (kvm_is_realm(vcpu->kvm))
> +		ctxt->offset.vm_offset = NULL;
> +	else if (timerid == TIMER_VTIMER)
>   		ctxt->offset.vm_offset = &kvm->arch.timer_data.voffset;
>   	else
>   		ctxt->offset.vm_offset = &kvm->arch.timer_data.poffset;
> @@ -1011,13 +1035,19 @@ static void timer_context_init(struct kvm_vcpu *vcpu, int timerid)
>   void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
>   {
>   	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
> +	u64 cntvoff;
>   
>   	for (int i = 0; i < NR_KVM_TIMERS; i++)
>   		timer_context_init(vcpu, i);
>   
> +	if (kvm_is_realm(vcpu->kvm))
> +		cntvoff = 0;
> +	else
> +		cntvoff = kvm_phys_timer_read();
> +
>   	/* Synchronize offsets across timers of a VM if not already provided */
>   	if (!test_bit(KVM_ARCH_FLAG_VM_COUNTER_OFFSET, &vcpu->kvm->arch.flags)) {
> -		timer_set_offset(vcpu_vtimer(vcpu), kvm_phys_timer_read());
> +		timer_set_offset(vcpu_vtimer(vcpu), cntvoff);
>   		timer_set_offset(vcpu_ptimer(vcpu), 0);
>   	}
>   
> @@ -1525,6 +1555,13 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
>   		return -EINVAL;
>   	}
>   
> +	/*
> +	 * We don't use mapped IRQs for Realms because the RMI doesn't allow
> +	 * us setting the LR.HW bit in the VGIC.
> +	 */
> +	if (vcpu_is_rec(vcpu))
> +		return 0;
> +
>   	get_timer_map(vcpu, &map);
>   
>   	ret = kvm_vgic_map_phys_irq(vcpu,
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index c819c5d16613..d8ab297560d0 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -112,6 +112,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
>   int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
>   int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
>   
> +void kvm_realm_timers_update(struct kvm_vcpu *vcpu);
> +
>   u64 kvm_phys_timer_read(void);
>   
>   void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 04/43] arm64: RME: Handle Granule Protection Faults (GPFs)
  2024-04-16 11:17     ` Suzuki K Poulose
@ 2024-04-18 13:17       ` Steven Price
  0 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-18 13:17 UTC (permalink / raw)
  To: Suzuki K Poulose, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

On 16/04/2024 12:17, Suzuki K Poulose wrote:
> On 12/04/2024 09:42, Steven Price wrote:
>> If the host attempts to access granules that have been delegated for use
>> in a realm these accesses will be caught and will trigger a Granule
>> Protection Fault (GPF).
>>
>> A fault during a page walk signals a bug in the kernel and is handled by
>> oopsing the kernel. A non-page walk fault could be caused by user space
>> having access to a page which has been delegated to the kernel and will
>> trigger a SIGBUS to allow debugging why user space is trying to access a
>> delegated page.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   arch/arm64/mm/fault.c | 29 ++++++++++++++++++++++++-----
>>   1 file changed, 24 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 8251e2fea9c7..91da0f446dd9 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -765,6 +765,25 @@ static int do_tag_check_fault(unsigned long far,
>> unsigned long esr,
>>       return 0;
>>   }
>>   +static int do_gpf_ptw(unsigned long far, unsigned long esr, struct
>> pt_regs *regs)
>> +{
>> +    const struct fault_info *inf = esr_to_fault_info(esr);
>> +
>> +    die_kernel_fault(inf->name, far, esr, regs);
>> +    return 0;
>> +}
>> +
>> +static int do_gpf(unsigned long far, unsigned long esr, struct
>> pt_regs *regs)
>> +{
>> +    const struct fault_info *inf = esr_to_fault_info(esr);
>> +
>> +    if (!is_el1_instruction_abort(esr) && fixup_exception(regs))
>> +        return 0;
>> +
>> +    arm64_notify_die(inf->name, regs, inf->sig, inf->code, far, esr);
>> +    return 0;
>> +}
>> +
>>   static const struct fault_info fault_info[] = {
>>       { do_bad,        SIGKILL, SI_KERNEL,    "ttbr address size
>> fault"    },
>>       { do_bad,        SIGKILL, SI_KERNEL,    "level 1 address size
>> fault"    },
>> @@ -802,11 +821,11 @@ static const struct fault_info fault_info[] = {
>>       { do_alignment_fault,    SIGBUS,  BUS_ADRALN,    "alignment
>> fault"        },
>>       { do_bad,        SIGKILL, SI_KERNEL,    "unknown 34"            },
>>       { do_bad,        SIGKILL, SI_KERNEL,    "unknown 35"            },
> 
> Should this also be converted to do_gpf_ptw, "GPF at level -1", given we
> support LPA2 ?

Ah, yes I somehow missed that. Although something has gone majorly wrong
if this triggers! ;)

Steve

>> -    { do_bad,        SIGKILL, SI_KERNEL,    "unknown 36"            },
>> -    { do_bad,        SIGKILL, SI_KERNEL,    "unknown 37"            },
>> -    { do_bad,        SIGKILL, SI_KERNEL,    "unknown 38"            },
>> -    { do_bad,        SIGKILL, SI_KERNEL,    "unknown 39"            },
>> -    { do_bad,        SIGKILL, SI_KERNEL,    "unknown 40"            },
>> +    { do_gpf_ptw,        SIGKILL, SI_KERNEL,    "Granule Protection
>> Fault at level 0" },
>> +    { do_gpf_ptw,        SIGKILL, SI_KERNEL,    "Granule Protection
>> Fault at level 1" },
>> +    { do_gpf_ptw,        SIGKILL, SI_KERNEL,    "Granule Protection
>> Fault at level 2" },
>> +    { do_gpf_ptw,        SIGKILL, SI_KERNEL,    "Granule Protection
>> Fault at level 3" },
>> +    { do_gpf,        SIGBUS,  SI_KERNEL,    "Granule Protection Fault
>> not on table walk" },
>>       { do_bad,        SIGKILL, SI_KERNEL,    "level -1 address size
>> fault"    },
>>       { do_bad,        SIGKILL, SI_KERNEL,    "unknown 42"            },
>>       { do_translation_fault,    SIGSEGV, SEGV_MAPERR,    "level -1
>> translation fault"    },
> 
> 
> Rest looks fine to me.
> 
> Suzuki


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 05/43] arm64: RME: Add SMC definitions for calling the RMM
  2024-04-16 12:38     ` Suzuki K Poulose
@ 2024-04-18 13:17       ` Steven Price
  0 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-18 13:17 UTC (permalink / raw)
  To: Suzuki K Poulose, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

On 16/04/2024 13:38, Suzuki K Poulose wrote:
> Hi Steven
> 
> On 12/04/2024 09:42, Steven Price wrote:
>> The RMM (Realm Management Monitor) provides functionality that can be
>> accessed by SMC calls from the host.
>>
>> The SMC definitions are based on DEN0137[1] version 1.0-eac5
>>
>> [1] https://developer.arm.com/documentation/den0137/1-0eac5/
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   arch/arm64/include/asm/rmi_smc.h | 250 +++++++++++++++++++++++++++++++
>>   1 file changed, 250 insertions(+)
>>   create mode 100644 arch/arm64/include/asm/rmi_smc.h
>>
>> diff --git a/arch/arm64/include/asm/rmi_smc.h
>> b/arch/arm64/include/asm/rmi_smc.h
>> new file mode 100644
>> index 000000000000..c205efdb18d8
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/rmi_smc.h
>> @@ -0,0 +1,250 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * Copyright (C) 2023 ARM Ltd.
>> + *
>> + * The values and structures in this file are from the Realm
>> Management Monitor
>> + * specification (DEN0137) version A-bet0:
>> + * https://developer.arm.com/documentation/den0137/1-0bet0/
> 
> This should now point to eac5 instead.

Typical - I searched through the commit logs, but forgot I'd put a
reference in the code too! Thanks for spotting.

>> + */
>> +
>> +#ifndef __ASM_RME_SMC_H
>> +#define __ASM_RME_SMC_H
>> +
>> +#include <linux/arm-smccc.h>
>> +
>> +#define SMC_RxI_CALL(func)                \
>> +    ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,        \
>> +               ARM_SMCCC_SMC_64,        \
>> +               ARM_SMCCC_OWNER_STANDARD,    \
>> +               (func))
>> +
>> +#define SMC_RMI_DATA_CREATE        SMC_RxI_CALL(0x0153)
>> +#define SMC_RMI_DATA_CREATE_UNKNOWN    SMC_RxI_CALL(0x0154)
>> +#define SMC_RMI_DATA_DESTROY        SMC_RxI_CALL(0x0155)
>> +#define SMC_RMI_FEATURES        SMC_RxI_CALL(0x0165)
>> +#define SMC_RMI_GRANULE_DELEGATE    SMC_RxI_CALL(0x0151)
>> +#define SMC_RMI_GRANULE_UNDELEGATE    SMC_RxI_CALL(0x0152)
>> +#define SMC_RMI_PSCI_COMPLETE        SMC_RxI_CALL(0x0164)
>> +#define SMC_RMI_REALM_ACTIVATE        SMC_RxI_CALL(0x0157)
>> +#define SMC_RMI_REALM_CREATE        SMC_RxI_CALL(0x0158)
>> +#define SMC_RMI_REALM_DESTROY        SMC_RxI_CALL(0x0159)
>> +#define SMC_RMI_REC_AUX_COUNT        SMC_RxI_CALL(0x0167)
>> +#define SMC_RMI_REC_CREATE        SMC_RxI_CALL(0x015a)
>> +#define SMC_RMI_REC_DESTROY        SMC_RxI_CALL(0x015b)
>> +#define SMC_RMI_REC_ENTER        SMC_RxI_CALL(0x015c)
>> +#define SMC_RMI_RTT_CREATE        SMC_RxI_CALL(0x015d)
>> +#define SMC_RMI_RTT_DESTROY        SMC_RxI_CALL(0x015e)
>> +#define SMC_RMI_RTT_FOLD        SMC_RxI_CALL(0x0166)
>> +#define SMC_RMI_RTT_INIT_RIPAS        SMC_RxI_CALL(0x0168)
>> +#define SMC_RMI_RTT_MAP_UNPROTECTED    SMC_RxI_CALL(0x015f)
>> +#define SMC_RMI_RTT_READ_ENTRY        SMC_RxI_CALL(0x0161)
>> +#define SMC_RMI_RTT_SET_RIPAS        SMC_RxI_CALL(0x0169)
>> +#define SMC_RMI_RTT_UNMAP_UNPROTECTED    SMC_RxI_CALL(0x0162)
>> +#define SMC_RMI_VERSION            SMC_RxI_CALL(0x0150)
>> +
>> +#define RMI_ABI_MAJOR_VERSION    1
>> +#define RMI_ABI_MINOR_VERSION    0
>> +
>> +#define RMI_UNASSIGNED            0
>> +#define RMI_ASSIGNED            1
>> +#define RMI_TABLE            2
>> +
>> +#define RMI_ABI_VERSION_GET_MAJOR(version) ((version) >> 16)
>> +#define RMI_ABI_VERSION_GET_MINOR(version) ((version) & 0xFFFF)
>> +#define RMI_ABI_VERSION(major, minor)      (((major) << 16) | (minor))
>> +
>> +#define RMI_RETURN_STATUS(ret)        ((ret) & 0xFF)
>> +#define RMI_RETURN_INDEX(ret)        (((ret) >> 8) & 0xFF)
>> +
>> +#define RMI_SUCCESS        0
>> +#define RMI_ERROR_INPUT        1
>> +#define RMI_ERROR_REALM        2
>> +#define RMI_ERROR_REC        3
>> +#define RMI_ERROR_RTT        4
>> +
>> +#define RMI_EMPTY        0
>> +#define RMI_RAM            1
>> +#define RMI_DESTROYED        2
>> +
>> +#define RMI_NO_MEASURE_CONTENT    0
>> +#define RMI_MEASURE_CONTENT    1
>> +
>> +#define RMI_FEATURE_REGISTER_0_S2SZ        GENMASK(7, 0)
>> +#define RMI_FEATURE_REGISTER_0_LPA2        BIT(8)
>> +#define RMI_FEATURE_REGISTER_0_SVE_EN        BIT(9)
>> +#define RMI_FEATURE_REGISTER_0_SVE_VL        GENMASK(13, 10)
>> +#define RMI_FEATURE_REGISTER_0_NUM_BPS        GENMASK(17, 14)
>> +#define RMI_FEATURE_REGISTER_0_NUM_WPS        GENMASK(21, 18)
>> +#define RMI_FEATURE_REGISTER_0_PMU_EN        BIT(22)
>> +#define RMI_FEATURE_REGISTER_0_PMU_NUM_CTRS    GENMASK(27, 23)
>> +#define RMI_FEATURE_REGISTER_0_HASH_SHA_256    BIT(28)
>> +#define RMI_FEATURE_REGISTER_0_HASH_SHA_512    BIT(29)
>> +
>> +#define RMI_REALM_PARAM_FLAG_LPA2        BIT(0)
>> +#define RMI_REALM_PARAM_FLAG_SVE        BIT(1)
>> +#define RMI_REALM_PARAM_FLAG_PMU        BIT(2)
>> +
>> +/*
>> + * Note many of these fields are smaller than u64 but all fields have
>> u64
>> + * alignment, so use u64 to ensure correct alignment.
>> + */
>> +struct realm_params {
>> +    union { /* 0x0 */
>> +        struct {
>> +            u64 flags;
>> +            u64 s2sz;
>> +            u64 sve_vl;
>> +            u64 num_bps;
>> +            u64 num_wps;
>> +            u64 pmu_num_ctrs;
>> +            u64 hash_algo;
>> +        };
>> +        u8 padding_1[0x400];
>> +    };
>> +    union { /* 0x400 */
>> +        u8 rpv[64];
>> +        u8 padding_2[0x400];
>> +    };
>> +    union { /* 0x800 */
>> +        struct {
>> +            u64 vmid;
>> +            u64 rtt_base;
>> +            s64 rtt_level_start;
>> +            u64 rtt_num_start;
>> +        };
>> +        u8 padding_3[0x800];
>> +    };
>> +};
>> +
>> +/*
>> + * The number of GPRs (starting from X0) that are
>> + * configured by the host when a REC is created.
>> + */
>> +#define REC_CREATE_NR_GPRS        8
>> +
>> +#define REC_PARAMS_FLAG_RUNNABLE    BIT_ULL(0)
>> +
>> +#define REC_PARAMS_AUX_GRANULES        16
>> +
>> +struct rec_params {
>> +    union { /* 0x0 */
>> +        u64 flags;
>> +        u8 padding1[0x100];
>> +    };
>> +    union { /* 0x100 */
>> +        u64 mpidr;
>> +        u8 padding2[0x100];
>> +    };
>> +    union { /* 0x200 */
>> +        u64 pc;
>> +        u8 padding3[0x100];
>> +    };
>> +    union { /* 0x300 */
>> +        u64 gprs[REC_CREATE_NR_GPRS];
>> +        u8 padding4[0x500];
>> +    };
>> +    union { /* 0x800 */
>> +        struct {
>> +            u64 num_rec_aux;
>> +            u64 aux[REC_PARAMS_AUX_GRANULES];
>> +        };
>> +        u8 padding5[0x800];
>> +    };
>> +};
>> +
>> +#define RMI_EMULATED_MMIO        BIT(0)
>> +#define RMI_INJECT_SEA            BIT(1)
>> +#define RMI_TRAP_WFI            BIT(2)
>> +#define RMI_TRAP_WFE            BIT(3)
> 
> For completeness, we could add :
> 
> #define RMI_RIPAS_RESPONSE        BIT(4)
> 
> Not sure if we use it later in the series.

Yes, I'll add for completeness. Currently KVM will never reject a RIPAS
change request from the guest. I'm not sure in what situation it would
make sense to do such a thing. The current uABI doesn't allow the VMM to
have a say in it either as the RIPAS change is completed before the exit
to the VMM. The expectation is therefore that the VMM would simply
terminate a Realm guest that attempted a RIPAS change that it disagreed
with.

>> +
>> +#define REC_RUN_GPRS            31
>> +#define REC_GIC_NUM_LRS            16
>> +
>> +struct rec_entry {

While I'm reading this (and the spec) again - I notice that the spec
says "RecEnter" not 'entry' - I'll rename this to be consistent.

>> +    union { /* 0x000 */
>> +        u64 flags;
>> +        u8 padding0[0x200];
>> +    };
>> +    union { /* 0x200 */
>> +        u64 gprs[REC_RUN_GPRS];
>> +        u8 padding2[0x100];
>> +    };
>> +    union { /* 0x300 */
>> +        struct {
>> +            u64 gicv3_hcr;
>> +            u64 gicv3_lrs[REC_GIC_NUM_LRS];
>> +        };
>> +        u8 padding3[0x100];
>> +    };
>> +    u8 padding4[0x400];
>> +};
>> +
>> +struct rec_exit {
>> +    union { /* 0x000 */
>> +        u8 exit_reason;
>> +        u8 padding0[0x100];
>> +    };
>> +    union { /* 0x100 */
>> +        struct {
>> +            u64 esr;
>> +            u64 far;
>> +            u64 hpfar;
>> +        };
>> +        u8 padding1[0x100];
>> +    };
>> +    union { /* 0x200 */
>> +        u64 gprs[REC_RUN_GPRS];
>> +        u8 padding2[0x100];
>> +    };
>> +    union { /* 0x300 */
>> +        struct {
>> +            u64 gicv3_hcr;
>> +            u64 gicv3_lrs[REC_GIC_NUM_LRS];
>> +            u64 gicv3_misr;
>> +            u64 gicv3_vmcr;
>> +        };
>> +        u8 padding3[0x100];
>> +    };
>> +    union { /* 0x400 */
>> +        struct {
>> +            u64 cntp_ctl;
>> +            u64 cntp_cval;
>> +            u64 cntv_ctl;
>> +            u64 cntv_cval;
>> +        };
>> +        u8 padding4[0x100];
>> +    };
>> +    union { /* 0x500 */
>> +        struct {
>> +            u64 ripas_base;
>> +            u64 ripas_top;
>> +            u64 ripas_value;
>> +        };
>> +        u8 padding5[0x100];
>> +    };
>> +    union { /* 0x600 */
>> +        u16 imm;
>> +        u8 padding6[0x100];
>> +    };
>> +    union { /* 0x700 */
>> +        struct {
>> +            u64 pmu_ovf_status;
> 
> This is u8 as per section B4.4.10 RmiPmuOverflowStatus type.

Indeed - I'm not sure where I got u64 from - it was probably to provide
padding in an older version of the spec.

>> +        };
>> +        u8 padding7[0x100];
>> +    };
>> +};
>> +
>> +struct rec_run {
>> +    struct rec_entry entry;
>> +    struct rec_exit exit;
>> +};
>> +
>> +#define RMI_EXIT_SYNC            0x00
>> +#define RMI_EXIT_IRQ            0x01
>> +#define RMI_EXIT_FIQ            0x02
>> +#define RMI_EXIT_PSCI            0x03
>> +#define RMI_EXIT_RIPAS_CHANGE        0x04
>> +#define RMI_EXIT_HOST_CALL        0x05
>> +#define RMI_EXIT_SERROR            0x06
> 
> Minor nit: Like the other definitions, it may be good to keep the
> defintions of the "exit_reason" above the field declaration.

Yes, makes sense - I'll move these.

Thanks for the review!

Steve

> 
> Rest looks fine to me.
> 
> Suzuki
>> +
>> +#endif
> 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 09/43] arm64: RME: ioctls to create and configure realms
  2024-04-12  8:42   ` [PATCH v2 09/43] arm64: RME: ioctls to create and configure realms Steven Price
  2024-04-17  9:51     ` Suzuki K Poulose
@ 2024-04-18 16:04     ` Suzuki K Poulose
  1 sibling, 0 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-18 16:04 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni, Jean-Philippe Brucker

On 12/04/2024 09:42, Steven Price wrote:
> Add the KVM_CAP_ARM_RME_CREATE_FD ioctl to create a realm. This involves
> delegating pages to the RMM to hold the Realm Descriptor (RD) and for
> the base level of the Realm Translation Tables (RTT). A VMID also need
> to be picked, since the RMM has a separate VMID address space a
> dedicated allocator is added for this purpose.
> 
> KVM_CAP_ARM_RME_CONFIG_REALM is provided to allow configuring the realm
> before it is created.
> 
> Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Steven Price <steven.price@arm.com>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> ---
>   arch/arm64/include/asm/kvm_emulate.h |   5 +
>   arch/arm64/include/asm/kvm_rme.h     |  19 ++
>   arch/arm64/kvm/arm.c                 |  18 ++
>   arch/arm64/kvm/mmu.c                 |  15 +-
>   arch/arm64/kvm/rme.c                 | 282 +++++++++++++++++++++++++++
>   5 files changed, 337 insertions(+), 2 deletions(-)
> 


> @@ -1014,6 +1018,13 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>   	struct kvm_pgtable *pgt = NULL;
>   
>   	write_lock(&kvm->mmu_lock);
> +	if (kvm_is_realm(kvm) &&
> +	    (kvm_realm_state(kvm) != REALM_STATE_DEAD &&
> +	     kvm_realm_state(kvm) != REALM_STATE_NONE)) {
> +		/* TODO: teardown rtts */
> +		write_unlock(&kvm->mmu_lock);
> +		return;
> +	}
>   	pgt = mmu->pgt;
>   	if (pgt) {
>   		mmu->pgd_phys = 0;

See my comment below.

...

> +
> +void kvm_destroy_realm(struct kvm *kvm)
> +{

...

> +	for (i = 0; i < pgt->pgd_pages; i++) {
> +		phys_addr_t pgd_phys = kvm->arch.mmu.pgd_phys + i * PAGE_SIZE;
> +
> +		if (WARN_ON(rmi_granule_undelegate(pgd_phys)))
> +			return;

I think we need to either:

	a. memset() the root RTT pages to 0 here.

OR

         b. for Realms, avoid walking the page table triggered via

  kvm_pgtable_stage2_destroy()->kvm_pgtable_walk().

Even though the root RTTs are all empty (invalid entries, written using 
RMM's memory encryption.), the Host might be seeing "garbage" which
might look like "valid" entries and thus triggering crashes.

I prefer not walking the RTTs for a Realm and thus simply skip the walk.

Suzuki


> +	}
> +
> +	WRITE_ONCE(realm->state, REALM_STATE_DEAD);
> +
> +	kvm_free_stage2_pgd(&kvm->arch.mmu);
> +}
> +
> +int kvm_init_realm_vm(struct kvm *kvm)
> +{
> +	struct realm_params *params;
> +
> +	params = (struct realm_params *)get_zeroed_page(GFP_KERNEL);
> +	if (!params)
> +		return -ENOMEM;
> +
> +	/* Default parameters, not exposed to user space */
> +	params->s2sz = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
> +	kvm->arch.realm.params = params;
> +	return 0;
> +}
> +
>   int kvm_init_rme(void)
>   {
> +	int ret;
> +
>   	if (PAGE_SIZE != SZ_4K)
>   		/* Only 4k page size on the host is supported */
>   		return 0;
> @@ -46,6 +321,13 @@ int kvm_init_rme(void)
>   		/* Continue without realm support */
>   		return 0;
>   
> +	if (WARN_ON(rmi_features(0, &rmm_feat_reg0)))
> +		return 0;
> +
> +	ret = rme_vmid_init();
> +	if (ret)
> +		return ret;
> +
>   	/* Future patch will enable static branch kvm_rme_is_available */
>   
>   	return 0;


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 33/43] arm64: rme: Enable PMU support with a realm guest
  2024-04-13 23:44     ` kernel test robot
@ 2024-04-18 16:06       ` Suzuki K Poulose
  0 siblings, 0 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-18 16:06 UTC (permalink / raw)
  To: kernel test robot, Steven Price, kvm, kvmarm
  Cc: oe-kbuild-all, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Jean-Philippe Brucker

On 14/04/2024 00:44, kernel test robot wrote:
> Hi Steven,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on kvmarm/next]
> [also build test ERROR on kvm/queue arm64/for-next/core linus/master v6.9-rc3 next-20240412]
> [cannot apply to kvm/linux-next]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Steven-Price/KVM-Prepare-for-handling-only-shared-mappings-in-mmu_notifier-events/20240412-170311
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next
> patch link:    https://lore.kernel.org/r/20240412084309.1733783-34-steven.price%40arm.com
> patch subject: [PATCH v2 33/43] arm64: rme: Enable PMU support with a realm guest
> config: arm64-randconfig-r064-20240414 (https://download.01.org/0day-ci/archive/20240414/202404140723.GKwnJxeZ-lkp@intel.com/config)
> compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 8b3b4a92adee40483c27f26c478a384cd69c6f05)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240414/202404140723.GKwnJxeZ-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202404140723.GKwnJxeZ-lkp@intel.com/

I guess the problem is with CONFIG_HW_PERF_EVENT not set, arm_pmu is an
empty struct, triggering all these errors.

Suzuki



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 17/43] arm64: RME: Allow VMM to set RIPAS
  2024-04-12  8:42   ` [PATCH v2 17/43] arm64: RME: Allow VMM to set RIPAS Steven Price
@ 2024-04-19  9:34     ` Suzuki K Poulose
  2024-04-19 10:20       ` Suzuki K Poulose
  2024-05-01 15:47       ` Steven Price
  2024-04-25  9:53     ` Fuad Tabba
  2024-05-01 14:27     ` Jean-Philippe Brucker
  2 siblings, 2 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-19  9:34 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

On 12/04/2024 09:42, Steven Price wrote:
> Each page within the protected region of the realm guest can be marked
> as either RAM or EMPTY. Allow the VMM to control this before the guest
> has started and provide the equivalent functions to change this (with
> the guest's approval) at runtime.
> 
> When transitioning from RIPAS RAM (1) to RIPAS EMPTY (0) the memory is
> unmapped from the guest and undelegated allowing the memory to be reused
> by the host. When transitioning to RIPAS RAM the actual population of
> the leaf RTTs is done later on stage 2 fault, however it may be
> necessary to allocate additional RTTs to represent the range requested.

minor nit: To give a bit more context:

"however it may be necessary to allocate additional RTTs in order for
the RMM to track the RIPAS for the requested range".

> 
> When freeing a block mapping it is necessary to temporarily unfold the
> RTT which requires delegating an extra page to the RMM, this page can
> then be recovered once the contents of the block mapping have been
> freed. A spare, delegated page (spare_page) is used for this purpose.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   arch/arm64/include/asm/kvm_rme.h |  16 ++
>   arch/arm64/kvm/mmu.c             |   8 +-
>   arch/arm64/kvm/rme.c             | 390 +++++++++++++++++++++++++++++++
>   3 files changed, 411 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
> index 915e76068b00..cc8f81cfc3c0 100644
> --- a/arch/arm64/include/asm/kvm_rme.h
> +++ b/arch/arm64/include/asm/kvm_rme.h
> @@ -96,6 +96,14 @@ void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits);
>   int kvm_create_rec(struct kvm_vcpu *vcpu);
>   void kvm_destroy_rec(struct kvm_vcpu *vcpu);
>   
> +void kvm_realm_unmap_range(struct kvm *kvm,
> +			   unsigned long ipa,
> +			   u64 size,
> +			   bool unmap_private);
> +int realm_set_ipa_state(struct kvm_vcpu *vcpu,
> +			unsigned long addr, unsigned long end,
> +			unsigned long ripas);
> +
>   #define RME_RTT_BLOCK_LEVEL	2
>   #define RME_RTT_MAX_LEVEL	3
>   
> @@ -114,4 +122,12 @@ static inline unsigned long rme_rtt_level_mapsize(int level)
>   	return (1UL << RME_RTT_LEVEL_SHIFT(level));
>   }
>   
> +static inline bool realm_is_addr_protected(struct realm *realm,
> +					   unsigned long addr)
> +{
> +	unsigned int ia_bits = realm->ia_bits;
> +
> +	return !(addr & ~(BIT(ia_bits - 1) - 1));
> +}
> +
>   #endif
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 46f0c4e80ace..8a7b5449697f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -310,6 +310,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
>    * @start: The intermediate physical base address of the range to unmap
>    * @size:  The size of the area to unmap
>    * @may_block: Whether or not we are permitted to block
> + * @only_shared: If true then protected mappings should not be unmapped
>    *
>    * Clear a range of stage-2 mappings, lowering the various ref-counts.  Must
>    * be called while holding mmu_lock (unless for freeing the stage2 pgd before
> @@ -317,7 +318,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
>    * with things behind our backs.
>    */
>   static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size,
> -				 bool may_block)
> +				 bool may_block, bool only_shared)
>   {
>   	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
>   	phys_addr_t end = start + size;
> @@ -330,7 +331,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
>   
>   static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>   {
> -	__unmap_stage2_range(mmu, start, size, true);
> +	__unmap_stage2_range(mmu, start, size, true, false);
>   }
>   
>   static void stage2_flush_memslot(struct kvm *kvm,
> @@ -1771,7 +1772,8 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>   
>   	__unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
>   			     (range->end - range->start) << PAGE_SHIFT,
> -			     range->may_block);
> +			     range->may_block,
> +			     range->only_shared);
>   
>   	return false;
>   }
> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
> index 629a095bea61..9e5983c51393 100644
> --- a/arch/arm64/kvm/rme.c
> +++ b/arch/arm64/kvm/rme.c
> @@ -79,6 +79,12 @@ static phys_addr_t __alloc_delegated_page(struct realm *realm,
>   	return phys;
>   }
>   
> +static phys_addr_t alloc_delegated_page(struct realm *realm,
> +					struct kvm_mmu_memory_cache *mc)
> +{
> +	return __alloc_delegated_page(realm, mc, GFP_KERNEL);
> +}
> +
>   static void free_delegated_page(struct realm *realm, phys_addr_t phys)
>   {
>   	if (realm->spare_page == PHYS_ADDR_MAX) {
> @@ -94,6 +100,151 @@ static void free_delegated_page(struct realm *realm, phys_addr_t phys)
>   	free_page((unsigned long)phys_to_virt(phys));
>   }
>   
> +static int realm_rtt_create(struct realm *realm,
> +			    unsigned long addr,
> +			    int level,
> +			    phys_addr_t phys)
> +{
> +	addr = ALIGN_DOWN(addr, rme_rtt_level_mapsize(level - 1));
> +	return rmi_rtt_create(virt_to_phys(realm->rd), phys, addr, level);
> +}
> +
> +static int realm_rtt_fold(struct realm *realm,
> +			  unsigned long addr,
> +			  int level,
> +			  phys_addr_t *rtt_granule)
> +{
> +	unsigned long out_rtt;
> +	int ret;
> +
> +	ret = rmi_rtt_fold(virt_to_phys(realm->rd), addr, level, &out_rtt);
> +
> +	if (RMI_RETURN_STATUS(ret) == RMI_SUCCESS && rtt_granule)
> +		*rtt_granule = out_rtt;
> +
> +	return ret;
> +}
> +
> +static int realm_destroy_protected(struct realm *realm,
> +				   unsigned long ipa,
> +				   unsigned long *next_addr)
> +{
> +	unsigned long rd = virt_to_phys(realm->rd);
> +	unsigned long addr;
> +	phys_addr_t rtt;
> +	int ret;
> +
> +loop:
> +	ret = rmi_data_destroy(rd, ipa, &addr, next_addr);
> +	if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
> +		if (*next_addr > ipa)
> +			return 0; /* UNASSIGNED */
> +		rtt = alloc_delegated_page(realm, NULL);
> +		if (WARN_ON(rtt == PHYS_ADDR_MAX))
> +			return -1;
> +		/* ASSIGNED - ipa is mapped as a block, so split */
> +		ret = realm_rtt_create(realm, ipa,
> +				       RMI_RETURN_INDEX(ret) + 1, rtt);

Could we not go all the way to L3 (rather than 1 level deeper) and try
again ? That way, we are covered for block mappings at L1 (1G).

> +		if (WARN_ON(ret)) {
> +			free_delegated_page(realm, rtt);
> +			return -1;
> +		}
> +		/* retry */
> +		goto loop;
> +	} else if (WARN_ON(ret)) {
> +		return -1;
> +	}
> +	ret = rmi_granule_undelegate(addr);
> +
> +	/*
> +	 * If the undelegate fails then something has gone seriously
> +	 * wrong: take an extra reference to just leak the page
> +	 */
> +	if (WARN_ON(ret))
> +		get_page(phys_to_page(addr));
> +
> +	return 0;
> +}
> +
> +static void realm_unmap_range_shared(struct kvm *kvm,
> +				     int level,
> +				     unsigned long start,
> +				     unsigned long end)
> +{
> +	struct realm *realm = &kvm->arch.realm;
> +	unsigned long rd = virt_to_phys(realm->rd);
> +	ssize_t map_size = rme_rtt_level_mapsize(level);
> +	unsigned long next_addr, addr;
> +	unsigned long shared_bit = BIT(realm->ia_bits - 1);
> +
> +	if (WARN_ON(level > RME_RTT_MAX_LEVEL))
> +		return;
> +
> +	start |= shared_bit;
> +	end |= shared_bit;
> +
> +	for (addr = start; addr < end; addr = next_addr) {
> +		unsigned long align_addr = ALIGN(addr, map_size);
> +		int ret;
> +
> +		next_addr = ALIGN(addr + 1, map_size);
> +
> +		if (align_addr != addr || next_addr > end) {
> +			/* Need to recurse deeper */
> +			if (addr < align_addr)
> +				next_addr = align_addr;
> +			realm_unmap_range_shared(kvm, level + 1, addr,
> +						 min(next_addr, end));
> +			continue;
> +		}
> +
> +		ret = rmi_rtt_unmap_unprotected(rd, addr, level, &next_addr);

minor nit: We could potentially use rmi_rtt_destroy() to tear down
shared mappings without unmapping them individually, if the range
is big enough. All such optimisations could come later though.

> +		switch (RMI_RETURN_STATUS(ret)) {
> +		case RMI_SUCCESS:
> +			break;
> +		case RMI_ERROR_RTT:
> +			if (next_addr == addr) {

At this point we have a block aligned address, but the mapping is
further deep. Given, start from top to down, we implicitly handle
the case of block mappings. Not sure if that needs to be in a comment
here.

> +				next_addr = ALIGN(addr + 1, map_size);

Reset to the "actual next" as it was overwritten by the RMI call.

> +				realm_unmap_range_shared(kvm, level + 1, addr,
> +							 next_addr);
> +			}
> +			break;
> +		default:
> +			WARN_ON(1);
> +		}
> +	}
> +}
> +
> +static void realm_unmap_range_private(struct kvm *kvm,
> +				      unsigned long start,
> +				      unsigned long end)
> +{
> +	struct realm *realm = &kvm->arch.realm;
> +	ssize_t map_size = RME_PAGE_SIZE;
> +	unsigned long next_addr, addr;
> +
> +	for (addr = start; addr < end; addr = next_addr) {
> +		int ret;
> +
> +		next_addr = ALIGN(addr + 1, map_size);
> +
> +		ret = realm_destroy_protected(realm, addr, &next_addr);
> +
> +		if (WARN_ON(ret))
> +			break;
> +	}
> +}
> +
> +static void realm_unmap_range(struct kvm *kvm,
> +			      unsigned long start,
> +			      unsigned long end,
> +			      bool unmap_private)
> +{
> +	realm_unmap_range_shared(kvm, RME_RTT_MAX_LEVEL - 1, start, end);

minor nit: We already have a helper to find a suitable start level
(defined below), may be we could use that ? And even do the rtt_destroy 
optimisation for unprotected range.

> +	if (unmap_private)
> +		realm_unmap_range_private(kvm, start, end);
> +}
> +
>   u32 kvm_realm_ipa_limit(void)
>   {
>   	return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
> @@ -190,6 +341,30 @@ static int realm_rtt_destroy(struct realm *realm, unsigned long addr,
>   	return ret;
>   }
>   
> +static int realm_create_rtt_levels(struct realm *realm,
> +				   unsigned long ipa,
> +				   int level,
> +				   int max_level,
> +				   struct kvm_mmu_memory_cache *mc)
> +{
> +	if (WARN_ON(level == max_level))
> +		return 0;
> +
> +	while (level++ < max_level) {
> +		phys_addr_t rtt = alloc_delegated_page(realm, mc);
> +
> +		if (rtt == PHYS_ADDR_MAX)
> +			return -ENOMEM;
> +
> +		if (realm_rtt_create(realm, ipa, level, rtt)) {
> +			free_delegated_page(realm, rtt);
> +			return -ENXIO;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>   static int realm_tear_down_rtt_level(struct realm *realm, int level,
>   				     unsigned long start, unsigned long end)
>   {
> @@ -265,6 +440,68 @@ static int realm_tear_down_rtt_range(struct realm *realm,
>   					 start, end);
>   }
>   
> +/*
> + * Returns 0 on successful fold, a negative value on error, a positive value if
> + * we were not able to fold all tables at this level.
> + */
> +static int realm_fold_rtt_level(struct realm *realm, int level,
> +				unsigned long start, unsigned long end)
> +{
> +	int not_folded = 0;
> +	ssize_t map_size;
> +	unsigned long addr, next_addr;
> +
> +	if (WARN_ON(level > RME_RTT_MAX_LEVEL))
> +		return -EINVAL;
> +
> +	map_size = rme_rtt_level_mapsize(level - 1);
> +
> +	for (addr = start; addr < end; addr = next_addr) {
> +		phys_addr_t rtt_granule;
> +		int ret;
> +		unsigned long align_addr = ALIGN(addr, map_size);
> +
> +		next_addr = ALIGN(addr + 1, map_size);
> +
> +		ret = realm_rtt_fold(realm, align_addr, level, &rtt_granule);
> +
> +		switch (RMI_RETURN_STATUS(ret)) {
> +		case RMI_SUCCESS:
> +			if (!WARN_ON(rmi_granule_undelegate(rtt_granule)))
> +				free_page((unsigned long)phys_to_virt(rtt_granule));

minor nit: Do we need a wrapper function for things like this, abd 
leaking the page if undelegate fails, something like
rme_reclaim_delegated_page()  ?


> +			break;
> +		case RMI_ERROR_RTT:
> +			if (level == RME_RTT_MAX_LEVEL ||
> +			    RMI_RETURN_INDEX(ret) < level) {
> +				not_folded++;
> +				break;
> +			}
> +			/* Recurse a level deeper */
> +			ret = realm_fold_rtt_level(realm,
> +						   level + 1,
> +						   addr,
> +						   next_addr);
> +			if (ret < 0)
> +				return ret;
> +			else if (ret == 0)
> +				/* Try again at this level */
> +				next_addr = addr;
> +			break;
> +		default:
> +			return -ENXIO;
> +		}
> +	}
> +
> +	return not_folded;
> +}
> +
> +static int realm_fold_rtt_range(struct realm *realm,
> +				unsigned long start, unsigned long end)
> +{
> +	return realm_fold_rtt_level(realm, get_start_level(realm) + 1,
> +				    start, end);
> +}
> +
>   static void ensure_spare_page(struct realm *realm)
>   {
>   	phys_addr_t tmp_rtt;
> @@ -295,6 +532,147 @@ void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits)
>   	WARN_ON(realm_tear_down_rtt_range(realm, 0, (1UL << ia_bits)));
>   }
>   
> +void kvm_realm_unmap_range(struct kvm *kvm, unsigned long ipa, u64 size,
> +			   bool unmap_private)
> +{
> +	unsigned long end = ipa + size;
> +	struct realm *realm = &kvm->arch.realm;
> +
> +	end = min(BIT(realm->ia_bits - 1), end);
> +
> +	ensure_spare_page(realm);
> +
> +	realm_unmap_range(kvm, ipa, end, unmap_private);
> +
> +	realm_fold_rtt_range(realm, ipa, end);

Shouldn't this be :

	if (unmap_private)
		realm_fold_rtt_range(realm, ipa, end);

Also it is fine to reclaim RTTs from the protected space, not the
unprotected half, as long as we use RTT_DESTROY in unmap_shared routine.

> +}
> +
> +static int find_map_level(struct realm *realm,
> +			  unsigned long start,
> +			  unsigned long end)
> +{
> +	int level = RME_RTT_MAX_LEVEL;
> +
> +	while (level > get_start_level(realm)) {
> +		unsigned long map_size = rme_rtt_level_mapsize(level - 1);
> +
> +		if (!IS_ALIGNED(start, map_size) ||
> +		    (start + map_size) > end)
> +			break;
> +
> +		level--;
> +	}
> +
> +	return level;
> +}
> +
> +int realm_set_ipa_state(struct kvm_vcpu *vcpu,
> +			unsigned long start,
> +			unsigned long end,
> +			unsigned long ripas)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct realm *realm = &kvm->arch.realm;
> +	struct realm_rec *rec = &vcpu->arch.rec;
> +	phys_addr_t rd_phys = virt_to_phys(realm->rd);
> +	phys_addr_t rec_phys = virt_to_phys(rec->rec_page);
> +	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> +	unsigned long ipa = start;
> +	int ret = 0;
> +
> +	while (ipa < end) {
> +		unsigned long next;
> +
> +		ret = rmi_rtt_set_ripas(rd_phys, rec_phys, ipa, end, &next);
> +
> +		if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
> +			int walk_level = RMI_RETURN_INDEX(ret);
> +			int level = find_map_level(realm, ipa, end);

Might be worth adding a comment here. Check if the RMM needs tables to
create deeper level tables.

> +
> +			if (walk_level < level) {
> +				ret = realm_create_rtt_levels(realm, ipa,
> +							      walk_level,
> +							      level,
> +							      memcache);
				/* Retry with RTTs created */

> +				if (!ret)
> +					continue;
> +			} else {
> +				ret = -EINVAL;
> +			}
> +
> +			break;
> +		} else if (RMI_RETURN_STATUS(ret) != RMI_SUCCESS) {
> +			WARN(1, "Unexpected error in %s: %#x\n", __func__,
> +			     ret);
> +			ret = -EINVAL;
> +			break;
> +		}
> +		ipa = next;
> +	}
> +
> +	if (ripas == RMI_EMPTY && ipa != start)
> +		kvm_realm_unmap_range(kvm, start, ipa - start, true);

This triggers unmapping the "shared" aliases too, which is not necessary.

> +
> +	return ret;
> +}
> +
> +static int realm_init_ipa_state(struct realm *realm,
> +				unsigned long ipa,
> +				unsigned long end)
> +{
> +	phys_addr_t rd_phys = virt_to_phys(realm->rd);
> +	int ret;
> +
> +	while (ipa < end) {
> +		unsigned long next;
> +
> +		ret = rmi_rtt_init_ripas(rd_phys, ipa, end, &next);
> +
> +		if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
> +			int err_level = RMI_RETURN_INDEX(ret);
> +			int level = find_map_level(realm, ipa, end);
> +
> +			if (WARN_ON(err_level >= level))
> +				return -ENXIO;
> +
> +			ret = realm_create_rtt_levels(realm, ipa,
> +						      err_level,
> +						      level, NULL);
> +			if (ret)
> +				return ret;
> +			/* Retry with the RTT levels in place */
> +			continue;
> +		} else if (WARN_ON(ret)) {
> +			return -ENXIO;
> +		}
> +
> +		ipa = next;
> +	}
> +
> +	return 0;
> +}
> +
> +static int kvm_init_ipa_range_realm(struct kvm *kvm,
> +				    struct kvm_cap_arm_rme_init_ipa_args *args)
> +{
> +	int ret = 0;
> +	gpa_t addr, end;
> +	struct realm *realm = &kvm->arch.realm;
> +
> +	addr = args->init_ipa_base;
> +	end = addr + args->init_ipa_size;
> +
> +	if (end < addr)
> +		return -EINVAL;
> +
> +	if (kvm_realm_state(kvm) != REALM_STATE_NEW)
> +		return -EINVAL;
> +
> +	ret = realm_init_ipa_state(realm, addr, end);
> +
> +	return ret;

super minor nit:

	return realm_init_ipa_state(realm, addr, end);

> +}
> +
>   /* Protects access to rme_vmid_bitmap */
>   static DEFINE_SPINLOCK(rme_vmid_lock);
>   static unsigned long *rme_vmid_bitmap;
> @@ -418,6 +796,18 @@ int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>   	case KVM_CAP_ARM_RME_CREATE_RD:
>   		r = kvm_create_realm(kvm);
>   		break;
> +	case KVM_CAP_ARM_RME_INIT_IPA_REALM: {
> +		struct kvm_cap_arm_rme_init_ipa_args args;
> +		void __user *argp = u64_to_user_ptr(cap->args[1]);
> +
> +		if (copy_from_user(&args, argp, sizeof(args))) {
> +			r = -EFAULT;
> +			break;
> +		}
> +
> +		r = kvm_init_ipa_range_realm(kvm, &args);
> +		break;
> +	}


Suzuki

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 17/43] arm64: RME: Allow VMM to set RIPAS
  2024-04-19  9:34     ` Suzuki K Poulose
@ 2024-04-19 10:20       ` Suzuki K Poulose
  2024-05-01 15:47       ` Steven Price
  1 sibling, 0 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-19 10:20 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

On 19/04/2024 10:34, Suzuki K Poulose wrote:
> On 12/04/2024 09:42, Steven Price wrote:
>> Each page within the protected region of the realm guest can be marked
>> as either RAM or EMPTY. Allow the VMM to control this before the guest
>> has started and provide the equivalent functions to change this (with
>> the guest's approval) at runtime.
>>
>> When transitioning from RIPAS RAM (1) to RIPAS EMPTY (0) the memory is
>> unmapped from the guest and undelegated allowing the memory to be reused
>> by the host. When transitioning to RIPAS RAM the actual population of
>> the leaf RTTs is done later on stage 2 fault, however it may be
>> necessary to allocate additional RTTs to represent the range requested.
> 
> minor nit: To give a bit more context:
> 
> "however it may be necessary to allocate additional RTTs in order for
> the RMM to track the RIPAS for the requested range".
> 
>>
>> When freeing a block mapping it is necessary to temporarily unfold the
>> RTT which requires delegating an extra page to the RMM, this page can
>> then be recovered once the contents of the block mapping have been
>> freed. A spare, delegated page (spare_page) is used for this purpose.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   arch/arm64/include/asm/kvm_rme.h |  16 ++
>>   arch/arm64/kvm/mmu.c             |   8 +-
>>   arch/arm64/kvm/rme.c             | 390 +++++++++++++++++++++++++++++++
>>   3 files changed, 411 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_rme.h 
>> b/arch/arm64/include/asm/kvm_rme.h
>> index 915e76068b00..cc8f81cfc3c0 100644
>> --- a/arch/arm64/include/asm/kvm_rme.h
>> +++ b/arch/arm64/include/asm/kvm_rme.h
>> @@ -96,6 +96,14 @@ void kvm_realm_destroy_rtts(struct kvm *kvm, u32 
>> ia_bits);
>>   int kvm_create_rec(struct kvm_vcpu *vcpu);
>>   void kvm_destroy_rec(struct kvm_vcpu *vcpu);
>> +void kvm_realm_unmap_range(struct kvm *kvm,
>> +               unsigned long ipa,
>> +               u64 size,
>> +               bool unmap_private);
>> +int realm_set_ipa_state(struct kvm_vcpu *vcpu,
>> +            unsigned long addr, unsigned long end,
>> +            unsigned long ripas);
>> +
>>   #define RME_RTT_BLOCK_LEVEL    2
>>   #define RME_RTT_MAX_LEVEL    3
>> @@ -114,4 +122,12 @@ static inline unsigned long 
>> rme_rtt_level_mapsize(int level)
>>       return (1UL << RME_RTT_LEVEL_SHIFT(level));
>>   }
>> +static inline bool realm_is_addr_protected(struct realm *realm,
>> +                       unsigned long addr)
>> +{
>> +    unsigned int ia_bits = realm->ia_bits;
>> +
>> +    return !(addr & ~(BIT(ia_bits - 1) - 1));
>> +}
>> +
>>   #endif
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 46f0c4e80ace..8a7b5449697f 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -310,6 +310,7 @@ static void invalidate_icache_guest_page(void *va, 
>> size_t size)
>>    * @start: The intermediate physical base address of the range to unmap
>>    * @size:  The size of the area to unmap
>>    * @may_block: Whether or not we are permitted to block
>> + * @only_shared: If true then protected mappings should not be unmapped
>>    *
>>    * Clear a range of stage-2 mappings, lowering the various 
>> ref-counts.  Must
>>    * be called while holding mmu_lock (unless for freeing the stage2 
>> pgd before
>> @@ -317,7 +318,7 @@ static void invalidate_icache_guest_page(void *va, 
>> size_t size)
>>    * with things behind our backs.
>>    */
>>   static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t 
>> start, u64 size,
>> -                 bool may_block)
>> +                 bool may_block, bool only_shared)
>>   {
>>       struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
>>       phys_addr_t end = start + size;
>> @@ -330,7 +331,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu 
>> *mmu, phys_addr_t start, u64
>>   static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t 
>> start, u64 size)
>>   {
>> -    __unmap_stage2_range(mmu, start, size, true);
>> +    __unmap_stage2_range(mmu, start, size, true, false);
>>   }
>>   static void stage2_flush_memslot(struct kvm *kvm,
>> @@ -1771,7 +1772,8 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
>> kvm_gfn_range *range)
>>       __unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
>>                    (range->end - range->start) << PAGE_SHIFT,
>> -                 range->may_block);
>> +                 range->may_block,
>> +                 range->only_shared);
>>       return false;
>>   }
>> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
>> index 629a095bea61..9e5983c51393 100644
>> --- a/arch/arm64/kvm/rme.c
>> +++ b/arch/arm64/kvm/rme.c
>> @@ -79,6 +79,12 @@ static phys_addr_t __alloc_delegated_page(struct 
>> realm *realm,
>>       return phys;
>>   }
>> +static phys_addr_t alloc_delegated_page(struct realm *realm,
>> +                    struct kvm_mmu_memory_cache *mc)
>> +{
>> +    return __alloc_delegated_page(realm, mc, GFP_KERNEL);
>> +}
>> +
>>   static void free_delegated_page(struct realm *realm, phys_addr_t phys)
>>   {
>>       if (realm->spare_page == PHYS_ADDR_MAX) {
>> @@ -94,6 +100,151 @@ static void free_delegated_page(struct realm 
>> *realm, phys_addr_t phys)
>>       free_page((unsigned long)phys_to_virt(phys));
>>   }
>> +static int realm_rtt_create(struct realm *realm,
>> +                unsigned long addr,
>> +                int level,
>> +                phys_addr_t phys)
>> +{
>> +    addr = ALIGN_DOWN(addr, rme_rtt_level_mapsize(level - 1));
>> +    return rmi_rtt_create(virt_to_phys(realm->rd), phys, addr, level);
>> +}
>> +
>> +static int realm_rtt_fold(struct realm *realm,
>> +              unsigned long addr,
>> +              int level,
>> +              phys_addr_t *rtt_granule)
>> +{
>> +    unsigned long out_rtt;
>> +    int ret;
>> +
>> +    ret = rmi_rtt_fold(virt_to_phys(realm->rd), addr, level, &out_rtt);
>> +
>> +    if (RMI_RETURN_STATUS(ret) == RMI_SUCCESS && rtt_granule)
>> +        *rtt_granule = out_rtt;
>> +
>> +    return ret;
>> +}
>> +
>> +static int realm_destroy_protected(struct realm *realm,
>> +                   unsigned long ipa,
>> +                   unsigned long *next_addr)
>> +{
>> +    unsigned long rd = virt_to_phys(realm->rd);
>> +    unsigned long addr;
>> +    phys_addr_t rtt;
>> +    int ret;
>> +
>> +loop:
>> +    ret = rmi_data_destroy(rd, ipa, &addr, next_addr);
>> +    if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
>> +        if (*next_addr > ipa)
>> +            return 0; /* UNASSIGNED */
>> +        rtt = alloc_delegated_page(realm, NULL);
>> +        if (WARN_ON(rtt == PHYS_ADDR_MAX))
>> +            return -1;
>> +        /* ASSIGNED - ipa is mapped as a block, so split */
>> +        ret = realm_rtt_create(realm, ipa,
>> +                       RMI_RETURN_INDEX(ret) + 1, rtt);
> 
> Could we not go all the way to L3 (rather than 1 level deeper) and try
> again ? That way, we are covered for block mappings at L1 (1G).
> 
>> +        if (WARN_ON(ret)) {
>> +            free_delegated_page(realm, rtt);
>> +            return -1;
>> +        }
>> +        /* retry */
>> +        goto loop;
>> +    } else if (WARN_ON(ret)) {
>> +        return -1;
>> +    }
>> +    ret = rmi_granule_undelegate(addr);
>> +
>> +    /*
>> +     * If the undelegate fails then something has gone seriously
>> +     * wrong: take an extra reference to just leak the page
>> +     */
>> +    if (WARN_ON(ret))
>> +        get_page(phys_to_page(addr));
>> +
>> +    return 0;
>> +}
>> +
>> +static void realm_unmap_range_shared(struct kvm *kvm,
>> +                     int level,
>> +                     unsigned long start,
>> +                     unsigned long end)
>> +{
>> +    struct realm *realm = &kvm->arch.realm;
>> +    unsigned long rd = virt_to_phys(realm->rd);
>> +    ssize_t map_size = rme_rtt_level_mapsize(level);
>> +    unsigned long next_addr, addr;
>> +    unsigned long shared_bit = BIT(realm->ia_bits - 1);
>> +
>> +    if (WARN_ON(level > RME_RTT_MAX_LEVEL))
>> +        return;
>> +
>> +    start |= shared_bit;
>> +    end |= shared_bit;
>> +
>> +    for (addr = start; addr < end; addr = next_addr) {
>> +        unsigned long align_addr = ALIGN(addr, map_size);
>> +        int ret;
>> +
>> +        next_addr = ALIGN(addr + 1, map_size);
>> +
>> +        if (align_addr != addr || next_addr > end) {
>> +            /* Need to recurse deeper */
>> +            if (addr < align_addr)
>> +                next_addr = align_addr;
>> +            realm_unmap_range_shared(kvm, level + 1, addr,
>> +                         min(next_addr, end));
>> +            continue;
>> +        }
>> +
>> +        ret = rmi_rtt_unmap_unprotected(rd, addr, level, &next_addr);
> 
> minor nit: We could potentially use rmi_rtt_destroy() to tear down
> shared mappings without unmapping them individually, if the range
> is big enough. All such optimisations could come later though.
> 
>> +        switch (RMI_RETURN_STATUS(ret)) {
>> +        case RMI_SUCCESS:
>> +            break;
>> +        case RMI_ERROR_RTT:
>> +            if (next_addr == addr) {
> 
> At this point we have a block aligned address, but the mapping is
> further deep. Given, start from top to down, we implicitly handle
> the case of block mappings. Not sure if that needs to be in a comment
> here.
> 
>> +                next_addr = ALIGN(addr + 1, map_size);
> 
> Reset to the "actual next" as it was overwritten by the RMI call.
> 
>> +                realm_unmap_range_shared(kvm, level + 1, addr,
>> +                             next_addr);
>> +            }
>> +            break;
>> +        default:
>> +            WARN_ON(1);
>> +        }
>> +    }
>> +}
>> +
>> +static void realm_unmap_range_private(struct kvm *kvm,
>> +                      unsigned long start,
>> +                      unsigned long end)
>> +{
>> +    struct realm *realm = &kvm->arch.realm;
>> +    ssize_t map_size = RME_PAGE_SIZE;
>> +    unsigned long next_addr, addr;
>> +
>> +    for (addr = start; addr < end; addr = next_addr) {
>> +        int ret;
>> +
>> +        next_addr = ALIGN(addr + 1, map_size);
>> +
>> +        ret = realm_destroy_protected(realm, addr, &next_addr);
>> +
>> +        if (WARN_ON(ret))
>> +            break;
>> +    }
>> +}
>> +
>> +static void realm_unmap_range(struct kvm *kvm,
>> +                  unsigned long start,
>> +                  unsigned long end,
>> +                  bool unmap_private)
>> +{
>> +    realm_unmap_range_shared(kvm, RME_RTT_MAX_LEVEL - 1, start, end);
> 
> minor nit: We already have a helper to find a suitable start level
> (defined below), may be we could use that ? And even do the rtt_destroy 
> optimisation for unprotected range.
> 
>> +    if (unmap_private)
>> +        realm_unmap_range_private(kvm, start, end);
>> +}
>> +
>>   u32 kvm_realm_ipa_limit(void)
>>   {
>>       return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
>> @@ -190,6 +341,30 @@ static int realm_rtt_destroy(struct realm *realm, 
>> unsigned long addr,
>>       return ret;
>>   }
>> +static int realm_create_rtt_levels(struct realm *realm,
>> +                   unsigned long ipa,
>> +                   int level,
>> +                   int max_level,
>> +                   struct kvm_mmu_memory_cache *mc)
>> +{
>> +    if (WARN_ON(level == max_level))
>> +        return 0;
>> +
>> +    while (level++ < max_level) {
>> +        phys_addr_t rtt = alloc_delegated_page(realm, mc);
>> +
>> +        if (rtt == PHYS_ADDR_MAX)
>> +            return -ENOMEM;
>> +
>> +        if (realm_rtt_create(realm, ipa, level, rtt)) {
>> +            free_delegated_page(realm, rtt);
>> +            return -ENXIO;
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   static int realm_tear_down_rtt_level(struct realm *realm, int level,
>>                        unsigned long start, unsigned long end)
>>   {
>> @@ -265,6 +440,68 @@ static int realm_tear_down_rtt_range(struct realm 
>> *realm,
>>                        start, end);
>>   }
>> +/*
>> + * Returns 0 on successful fold, a negative value on error, a 
>> positive value if
>> + * we were not able to fold all tables at this level.
>> + */
>> +static int realm_fold_rtt_level(struct realm *realm, int level,
>> +                unsigned long start, unsigned long end)
>> +{
>> +    int not_folded = 0;
>> +    ssize_t map_size;
>> +    unsigned long addr, next_addr;
>> +
>> +    if (WARN_ON(level > RME_RTT_MAX_LEVEL))
>> +        return -EINVAL;
>> +
>> +    map_size = rme_rtt_level_mapsize(level - 1);
>> +
>> +    for (addr = start; addr < end; addr = next_addr) {
>> +        phys_addr_t rtt_granule;
>> +        int ret;
>> +        unsigned long align_addr = ALIGN(addr, map_size);
>> +
>> +        next_addr = ALIGN(addr + 1, map_size);
>> +
>> +        ret = realm_rtt_fold(realm, align_addr, level, &rtt_granule);
>> +
>> +        switch (RMI_RETURN_STATUS(ret)) {
>> +        case RMI_SUCCESS:
>> +            if (!WARN_ON(rmi_granule_undelegate(rtt_granule)))
>> +                free_page((unsigned long)phys_to_virt(rtt_granule));
> 
> minor nit: Do we need a wrapper function for things like this, abd 
> leaking the page if undelegate fails, something like
> rme_reclaim_delegated_page()  ?
> 
> 
>> +            break;
>> +        case RMI_ERROR_RTT:
>> +            if (level == RME_RTT_MAX_LEVEL ||
>> +                RMI_RETURN_INDEX(ret) < level) {
>> +                not_folded++;
>> +                break;
>> +            }
>> +            /* Recurse a level deeper */
>> +            ret = realm_fold_rtt_level(realm,
>> +                           level + 1,
>> +                           addr,
>> +                           next_addr);
>> +            if (ret < 0)
>> +                return ret;
>> +            else if (ret == 0)
>> +                /* Try again at this level */
>> +                next_addr = addr;
>> +            break;
>> +        default:
>> +            return -ENXIO;
>> +        }
>> +    }
>> +
>> +    return not_folded;
>> +}
>> +
>> +static int realm_fold_rtt_range(struct realm *realm,
>> +                unsigned long start, unsigned long end)
>> +{
>> +    return realm_fold_rtt_level(realm, get_start_level(realm) + 1,
>> +                    start, end);
>> +}
>> +
>>   static void ensure_spare_page(struct realm *realm)
>>   {
>>       phys_addr_t tmp_rtt;
>> @@ -295,6 +532,147 @@ void kvm_realm_destroy_rtts(struct kvm *kvm, u32 
>> ia_bits)
>>       WARN_ON(realm_tear_down_rtt_range(realm, 0, (1UL << ia_bits)));
>>   }
>> +void kvm_realm_unmap_range(struct kvm *kvm, unsigned long ipa, u64 size,
>> +               bool unmap_private)
>> +{
>> +    unsigned long end = ipa + size;
>> +    struct realm *realm = &kvm->arch.realm;
>> +
>> +    end = min(BIT(realm->ia_bits - 1), end);
>> +
>> +    ensure_spare_page(realm);
>> +
>> +    realm_unmap_range(kvm, ipa, end, unmap_private);
>> +
>> +    realm_fold_rtt_range(realm, ipa, end);
> 
> Shouldn't this be :
> 
>      if (unmap_private)
>          realm_fold_rtt_range(realm, ipa, end);
> 
> Also it is fine to reclaim RTTs from the protected space, not the
> unprotected half, as long as we use RTT_DESTROY in unmap_shared routine.

Thinking about this a bit more, we could :
1. Rename this to realm_reclaim_rtts_range()
2. Use "FOLD" vs "DESTROY" depending on the state of the Realm. If the
    realm is DYING (or add a state in the kvm_pgtable_stage2_destroy() to
    indicate that stage2 can now be "destroyed")  and use DESTROY
    wherever it is safe to do so.

Suzuki


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 06/43] arm64: RME: Add wrappers for RMI calls
  2024-04-16 13:14     ` Suzuki K Poulose
@ 2024-04-19 11:18       ` Steven Price
  0 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-19 11:18 UTC (permalink / raw)
  To: Suzuki K Poulose, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

On 16/04/2024 14:14, Suzuki K Poulose wrote:
> Hi Steven
> 
> On 12/04/2024 09:42, Steven Price wrote:
>> The wrappers make the call sites easier to read and deal with the
>> boiler plate of handling the error codes from the RMM.
>>
> 
> I have compared the parameters and output values to that of the RMM spec
> and they match. There are some minor nits below.
> 
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   arch/arm64/include/asm/rmi_cmds.h | 509 ++++++++++++++++++++++++++++++
>>   1 file changed, 509 insertions(+)
>>   create mode 100644 arch/arm64/include/asm/rmi_cmds.h
>>
>> diff --git a/arch/arm64/include/asm/rmi_cmds.h
>> b/arch/arm64/include/asm/rmi_cmds.h
>> new file mode 100644
>> index 000000000000..c21414127e8e
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/rmi_cmds.h
>> @@ -0,0 +1,509 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * Copyright (C) 2023 ARM Ltd.
>> + */
>> +
>> +#ifndef __ASM_RMI_CMDS_H
>> +#define __ASM_RMI_CMDS_H
>> +
>> +#include <linux/arm-smccc.h>
>> +
>> +#include <asm/rmi_smc.h>
>> +
>> +struct rtt_entry {
>> +    unsigned long walk_level;
>> +    unsigned long desc;
>> +    int state;
>> +    int ripas;
>> +};
>> +
> 
> ...
> 
>> +/**
>> + * rmi_data_destroy() - Destroy a Data Granule
>> + * @rd: PA of the RD
>> + * @ipa: IPA at which the granule is mapped in the guest
>> + * @data_out: PA of the granule which was destroyed
>> + * @top_out: Top IPA of non-live RTT entries
>> + *
>> + * Transitions the granule to DESTROYED state, the address cannot be
>> used by
>> + * the guest for the lifetime of the Realm.
>> + *
>> + * Return: RMI return code
>> + */
>> +static inline int rmi_data_destroy(unsigned long rd, unsigned long ipa,
>> +                   unsigned long *data_out,
>> +                   unsigned long *top_out)
>> +{
>> +    struct arm_smccc_res res;
>> +
>> +    arm_smccc_1_1_invoke(SMC_RMI_DATA_DESTROY, rd, ipa, &res);
>> +
>> +    *data_out = res.a1;
>> +    *top_out = res.a2;
> 
> minor nit: Do we need to be safer by checking the parameters before
> filling them in ? i.e.,
> 
>     if (ptr)
>         *ptr = result_out;
> 
> This applies for others calls below.

I had taken the approach of making all the out-parameters required (i.e.
non-NULL). But I guess I can switch over to allowing NULL - hopefully
the compiler will optimise these checks away, but there are some
situations where we are currently ignoring the extra out-parameters that
could be tidied up.

> 
>> +
>> +    return res.a0;
>> +}
> 
>> +
>> +/**
>> + * rmi_realm_destroy() - Destroy a Realm
>> + * @rd: PA of the RD
>> + *
>> + * Destroys a Realm, all objects belonging to the Realm must be
>> destroyed first.
>> + *
>> + * Return: RMI return code
>> + */
>> +static inline int rmi_realm_destroy(unsigned long rd)
>> +{
>> +    struct arm_smccc_res res;
>> +
>> +    arm_smccc_1_1_invoke(SMC_RMI_REALM_DESTROY, rd, &res);
>> +
>> +    return res.a0;
>> +}
>> +
>> +/**
>> + * rmi_rec_aux_count() - Get number of auxiliary Granules required
>> + * @rd: PA of the RD
>> + * @aux_count: Number of pages written to this pointer
>> + *
>> + * A REC may require extra auxiliary pages to be delegateed for the
>> RMM to
> 
> minor nit: "s/delegateed/delegated/"
> 
> ...
> 
>> +/**
>> + * rmi_rtt_read_entry() - Read an RTTE
>> + * @rd: PA of the RD
>> + * @ipa: IPA for which to read the RTTE
>> + * @level: RTT level at which to read the RTTE
>> + * @rtt: Output structure describing the RTTE
>> + *
>> + * Reads a RTTE (Realm Translation Table Entry).
>> + *
>> + * Return: RMI return code
>> + */
>> +static inline int rmi_rtt_read_entry(unsigned long rd, unsigned long
>> ipa,
>> +                     long level, struct rtt_entry *rtt)
>> +{
>> +    struct arm_smccc_1_2_regs regs = {
>> +        SMC_RMI_RTT_READ_ENTRY,
>> +        rd, ipa, level
>> +    };
>> +
>> +    arm_smccc_1_2_smc(&regs, &regs);
>> +
>> +    rtt->walk_level = regs.a1;
>> +    rtt->state = regs.a2 & 0xFF;
> 
> minor nit: We mask the state, but not the "ripas". Both of them are u8.
> For consistency, we should mask both or neither.

Good point - I'll mask ripas as well. I suspect this is a bug that crept
in when I was updating for the new RIPAS state.

>> +    rtt->desc = regs.a3;
>> +    rtt->ripas = regs.a4;
>> +
>> +    return regs.a0;
>> +}
>> +
> 
> ...
> 
>> +/**
>> + * rmi_rtt_get_phys() - Get the PA from a RTTE
>> + * @rtt: The RTTE
>> + *
>> + * Return: the physical address from a RTT entry.
>> + */
>> +static inline phys_addr_t rmi_rtt_get_phys(struct rtt_entry *rtt)
>> +{
>> +    return rtt->desc & GENMASK(47, 12);
>> +}
> 
> I guess this may need to change with the LPA2 support in RMM and must be
> used in conjunction with the "realm" object to make the correct
> conversion.

Actually this is currently unused, and there's a potential bug lurking
in realm_map_protected() where rtt->desc is assumed to be a valid
physical address. I'll move the function there and fix it up by also
taking a realm argument. I've tried to keep the realm structure out of
this file.

Thanks,

Steve


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 18/43] arm64: RME: Handle realm enter/exit
  2024-04-12  8:42   ` [PATCH v2 18/43] arm64: RME: Handle realm enter/exit Steven Price
@ 2024-04-19 13:00     ` Suzuki K Poulose
  0 siblings, 0 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-19 13:00 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

Hi Steven

On 12/04/2024 09:42, Steven Price wrote:
> Entering a realm is done using a SMC call to the RMM. On exit the
> exit-codes need to be handled slightly differently to the normal KVM
> path so define our own functions for realm enter/exit and hook them
> in if the guest is a realm guest.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>

Please find a comment about RIPAS change exit handling below. Rest
looks good to me.


> ---
>   arch/arm64/include/asm/kvm_rme.h |   3 +
>   arch/arm64/kvm/Makefile          |   2 +-
>   arch/arm64/kvm/arm.c             |  19 +++-
>   arch/arm64/kvm/rme-exit.c        | 180 +++++++++++++++++++++++++++++++
>   arch/arm64/kvm/rme.c             |  11 ++
>   5 files changed, 209 insertions(+), 6 deletions(-)
>   create mode 100644 arch/arm64/kvm/rme-exit.c
> 
...

>   
>   	/* Tell userspace about in-kernel device output levels */
> diff --git a/arch/arm64/kvm/rme-exit.c b/arch/arm64/kvm/rme-exit.c
> new file mode 100644
> index 000000000000..5bf58c9b42b7
> --- /dev/null
> +++ b/arch/arm64/kvm/rme-exit.c
> @@ -0,0 +1,180 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2023 ARM Ltd.
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <kvm/arm_hypercalls.h>
> +#include <kvm/arm_psci.h>
> +
> +#include <asm/rmi_smc.h>
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_rme.h>
> +#include <asm/kvm_mmu.h>
> +
> +typedef int (*exit_handler_fn)(struct kvm_vcpu *vcpu);
> +
> +static int rec_exit_reason_notimpl(struct kvm_vcpu *vcpu)
> +{
> +	struct realm_rec *rec = &vcpu->arch.rec;
> +
> +	pr_err("[vcpu %d] Unhandled exit reason from realm (ESR: %#llx)\n",
> +	       vcpu->vcpu_id, rec->run->exit.esr);
> +	return -ENXIO;
> +}
> +
> +static int rec_exit_sync_dabt(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_handle_guest_abort(vcpu);
> +}
> +
> +static int rec_exit_sync_iabt(struct kvm_vcpu *vcpu)
> +{
> +	struct realm_rec *rec = &vcpu->arch.rec;
> +
> +	pr_err("[vcpu %d] Unhandled instruction abort (ESR: %#llx).\n",
> +	       vcpu->vcpu_id, rec->run->exit.esr);
> +	return -ENXIO;
> +}
> +
> +static int rec_exit_sys_reg(struct kvm_vcpu *vcpu)
> +{
> +	struct realm_rec *rec = &vcpu->arch.rec;
> +	unsigned long esr = kvm_vcpu_get_esr(vcpu);
> +	int rt = kvm_vcpu_sys_get_rt(vcpu);
> +	bool is_write = !(esr & 1);
> +	int ret;
> +
> +	if (is_write)
> +		vcpu_set_reg(vcpu, rt, rec->run->exit.gprs[0]);
> +
> +	ret = kvm_handle_sys_reg(vcpu);
> +
> +	if (ret >= 0 && !is_write)
> +		rec->run->entry.gprs[0] = vcpu_get_reg(vcpu, rt);
> +
> +	return ret;
> +}
> +
> +static exit_handler_fn rec_exit_handlers[] = {
> +	[0 ... ESR_ELx_EC_MAX]	= rec_exit_reason_notimpl,
> +	[ESR_ELx_EC_SYS64]	= rec_exit_sys_reg,
> +	[ESR_ELx_EC_DABT_LOW]	= rec_exit_sync_dabt,
> +	[ESR_ELx_EC_IABT_LOW]	= rec_exit_sync_iabt
> +};
> +
> +static int rec_exit_psci(struct kvm_vcpu *vcpu)
> +{
> +	struct realm_rec *rec = &vcpu->arch.rec;
> +	int i;
> +	int ret;
> +
> +	for (i = 0; i < REC_RUN_GPRS; i++)
> +		vcpu_set_reg(vcpu, i, rec->run->exit.gprs[i]);
> +
> +	ret = kvm_smccc_call_handler(vcpu);
> +
> +	for (i = 0; i < REC_RUN_GPRS; i++)
> +		rec->run->entry.gprs[i] = vcpu_get_reg(vcpu, i);
> +
> +	return ret;
> +}
> +
> +static int rec_exit_ripas_change(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct realm *realm = &kvm->arch.realm;
> +	struct realm_rec *rec = &vcpu->arch.rec;
> +	unsigned long base = rec->run->exit.ripas_base;
> +	unsigned long top = rec->run->exit.ripas_top;
> +	unsigned long ripas = rec->run->exit.ripas_value & 1;
> +	int ret = -EINVAL;
> +
> +	if (realm_is_addr_protected(realm, base) &&
> +	    realm_is_addr_protected(realm, top - 1)) {
> +		kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache,
> +					   kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu));
> +		write_lock(&kvm->mmu_lock);
> +		ret = realm_set_ipa_state(vcpu, base, top, ripas);
> +		write_unlock(&kvm->mmu_lock);
> +	}

There are couple of issues here :

1. We don't verify if the range is covered by a memslot, for ripas == 
RIPAS_RAM conversions and reject the request if not backed by memory.

2. The realm_set_ipa_state() might have partially completed the request
before it ran out of pages in the cache. And as such we should only
let the VMM know about what was completed, below. So, we need some
feedback from realm_set_ipa_state() to indicate the completed range.

Rest looks fine to me.

Suzuki


> +
> +	WARN(ret && ret != -ENOMEM,
> +	     "Unable to satisfy SET_IPAS for %#lx - %#lx, ripas: %#lx\n",
> +	     base, top, ripas);
> +
> +	/* Exit to VMM to complete the change */
> +	kvm_prepare_memory_fault_exit(vcpu, base, top - base, false, false,
> +				      ripas == 1);
> +
> +	return 0;
> +}
> +
> +static void update_arch_timer_irq_lines(struct kvm_vcpu *vcpu)
> +{
> +	struct realm_rec *rec = &vcpu->arch.rec;
> +
> +	__vcpu_sys_reg(vcpu, CNTV_CTL_EL0) = rec->run->exit.cntv_ctl;
> +	__vcpu_sys_reg(vcpu, CNTV_CVAL_EL0) = rec->run->exit.cntv_cval;
> +	__vcpu_sys_reg(vcpu, CNTP_CTL_EL0) = rec->run->exit.cntp_ctl;
> +	__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = rec->run->exit.cntp_cval;
> +
> +	kvm_realm_timers_update(vcpu);
> +}
> +
> +/*
> + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
> + * proper exit to userspace.
> + */
> +int handle_rme_exit(struct kvm_vcpu *vcpu, int rec_run_ret)
> +{
> +	struct realm_rec *rec = &vcpu->arch.rec;
> +	u8 esr_ec = ESR_ELx_EC(rec->run->exit.esr);
> +	unsigned long status, index;
> +
> +	status = RMI_RETURN_STATUS(rec_run_ret);
> +	index = RMI_RETURN_INDEX(rec_run_ret);
> +
> +	/*
> +	 * If a PSCI_SYSTEM_OFF request raced with a vcpu executing, we might
> +	 * see the following status code and index indicating an attempt to run
> +	 * a REC when the RD state is SYSTEM_OFF.  In this case, we just need to
> +	 * return to user space which can deal with the system event or will try
> +	 * to run the KVM VCPU again, at which point we will no longer attempt
> +	 * to enter the Realm because we will have a sleep request pending on
> +	 * the VCPU as a result of KVM's PSCI handling.
> +	 */
> +	if (status == RMI_ERROR_REALM && index == 1) {
> +		vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
> +		return 0;
> +	}
> +
> +	if (rec_run_ret)
> +		return -ENXIO;
> +
> +	vcpu->arch.fault.esr_el2 = rec->run->exit.esr;
> +	vcpu->arch.fault.far_el2 = rec->run->exit.far;
> +	vcpu->arch.fault.hpfar_el2 = rec->run->exit.hpfar;
> +
> +	update_arch_timer_irq_lines(vcpu);
> +
> +	/* Reset the emulation flags for the next run of the REC */
> +	rec->run->entry.flags = 0;
> +
> +	switch (rec->run->exit.exit_reason) {
> +	case RMI_EXIT_SYNC:
> +		return rec_exit_handlers[esr_ec](vcpu);
> +	case RMI_EXIT_IRQ:
> +	case RMI_EXIT_FIQ:
> +		return 1;
> +	case RMI_EXIT_PSCI:
> +		return rec_exit_psci(vcpu);
> +	case RMI_EXIT_RIPAS_CHANGE:
> +		return rec_exit_ripas_change(vcpu);
> +	}

I guess, the RMI_EXIT_HOST_CALL could be handled here similar to the 
rec_exit_psci() ?

Suzuki

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 20/43] arm64: RME: Allow populating initial contents
  2024-04-12  8:42   ` [PATCH v2 20/43] arm64: RME: Allow populating initial contents Steven Price
@ 2024-04-19 13:17     ` Suzuki K Poulose
  0 siblings, 0 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-19 13:17 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

On 12/04/2024 09:42, Steven Price wrote:
> The VMM needs to populate the realm with some data before starting (e.g.
> a kernel and initrd). This is measured by the RMM and used as part of
> the attestation later on.
> 
> Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   arch/arm64/kvm/rme.c | 234 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 234 insertions(+)
> 
> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
> index 0a3f823b2446..4aab507f896e 100644
> --- a/arch/arm64/kvm/rme.c
> +++ b/arch/arm64/kvm/rme.c
> @@ -4,6 +4,7 @@
>    */
>   
>   #include <linux/kvm_host.h>
> +#include <linux/hugetlb.h>
>   
>   #include <asm/kvm_emulate.h>
>   #include <asm/kvm_mmu.h>
> @@ -547,6 +548,227 @@ void kvm_realm_unmap_range(struct kvm *kvm, unsigned long ipa, u64 size,
>   	realm_fold_rtt_range(realm, ipa, end);
>   }
>   
> +static int realm_create_protected_data_page(struct realm *realm,
> +					    unsigned long ipa,
> +					    struct page *dst_page,
> +					    struct page *src_page,
> +					    unsigned long flags)
> +{
> +	phys_addr_t dst_phys, src_phys;
> +	int ret;
> +
> +	dst_phys = page_to_phys(dst_page);
> +	src_phys = page_to_phys(src_page);
> +
> +	if (rmi_granule_delegate(dst_phys))
> +		return -ENXIO;
> +
> +	ret = rmi_data_create(virt_to_phys(realm->rd), dst_phys, ipa, src_phys,
> +			      flags);
> +
> +	if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
> +		/* Create missing RTTs and retry */
> +		int level = RMI_RETURN_INDEX(ret);
> +
> +		ret = realm_create_rtt_levels(realm, ipa, level,
> +					      RME_RTT_MAX_LEVEL, NULL);
> +		if (ret)
> +			goto err;
> +
> +		ret = rmi_data_create(virt_to_phys(realm->rd), dst_phys, ipa,
> +				      src_phys, flags);
> +	}
> +
> +	if (ret)
> +		goto err;

ultra minor nit:

	if (!ret)
		return 0;

> +
> +	return 0;
> +
> +err:
> +	if (WARN_ON(rmi_granule_undelegate(dst_phys))) {
> +		/* Page can't be returned to NS world so is lost */
> +		get_page(dst_page);
> +	}
> +	return -ENXIO;
> +}
> +
> +static int fold_rtt(struct realm *realm, unsigned long addr, int level)
> +{
> +	phys_addr_t rtt_addr;
> +	int ret;
> +
> +	ret = realm_rtt_fold(realm, addr, level + 1, &rtt_addr);
> +	if (ret)
> +		return ret;
> +
> +	free_delegated_page(realm, rtt_addr);
> +
> +	return 0;
> +}
> +
> +static int populate_par_region(struct kvm *kvm,
> +			       phys_addr_t ipa_base,
> +			       phys_addr_t ipa_end,
> +			       u32 flags)
> +{
> +	struct realm *realm = &kvm->arch.realm;
> +	struct kvm_memory_slot *memslot;
> +	gfn_t base_gfn, end_gfn;
> +	int idx;
> +	phys_addr_t ipa;
> +	int ret = 0;
> +	struct page *tmp_page;
> +	unsigned long data_flags = 0;
> +
> +	base_gfn = gpa_to_gfn(ipa_base);
> +	end_gfn = gpa_to_gfn(ipa_end);
> +
> +	if (flags & KVM_ARM_RME_POPULATE_FLAGS_MEASURE)
> +		data_flags = RMI_MEASURE_CONTENT;
> +
> +	idx = srcu_read_lock(&kvm->srcu);
> +	memslot = gfn_to_memslot(kvm, base_gfn);
> +	if (!memslot) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	/* We require the region to be contained within a single memslot */
> +	if (memslot->base_gfn + memslot->npages < end_gfn) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	tmp_page = alloc_page(GFP_KERNEL);
> +	if (!tmp_page) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	mmap_read_lock(current->mm);
> +
> +	ipa = ipa_base;
> +	while (ipa < ipa_end) {
> +		struct vm_area_struct *vma;
> +		unsigned long map_size;
> +		unsigned int vma_shift;
> +		unsigned long offset;
> +		unsigned long hva;
> +		struct page *page;
> +		kvm_pfn_t pfn;
> +		int level;
> +
> +		hva = gfn_to_hva_memslot(memslot, gpa_to_gfn(ipa));
> +		vma = vma_lookup(current->mm, hva);
> +		if (!vma) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		if (is_vm_hugetlb_page(vma))
> +			vma_shift = huge_page_shift(hstate_vma(vma));
> +		else
> +			vma_shift = PAGE_SHIFT;
> +
> +		map_size = 1 << vma_shift;
> +
> +		/*
> +		 * FIXME: This causes over mapping, but there's no good
> +		 * solution here with the ABI as it stands
> +		 */
> +		ipa = ALIGN_DOWN(ipa, map_size);
> +
> +		switch (map_size) {
> +		case RME_L2_BLOCK_SIZE:
> +			level = 2;
> +			break;
> +		case PAGE_SIZE:
> +			level = 3;
> +			break;
> +		default:
> +			WARN_ONCE(1, "Unsupport vma_shift %d", vma_shift);

Do we really need this WARNing  ? Could we not fallback to the next 
possible mapping size ? e.g: if 1G, we could at least try 2M. I guess
this is more or less similar to what we would do on a fault and it may
be a good idea to see if could reuse the core bits from user_mem_abort ?

> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		pfn = gfn_to_pfn_memslot(memslot, gpa_to_gfn(ipa));
> +
> +		if (is_error_pfn(pfn)) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		if (level < RME_RTT_MAX_LEVEL) {
> +			/*
> +			 * A temporary RTT is needed during the map, precreate
> +			 * it, however if there is an error (e.g. missing
> +			 * parent tables) this will be handled in the
> +			 * realm_create_protected_data_page() call.
> +			 */
> +			realm_create_rtt_levels(realm, ipa, level,
> +						RME_RTT_MAX_LEVEL, NULL);
> +		}
> +
> +		page = pfn_to_page(pfn);
> +
> +		for (offset = 0; offset < map_size && !ret;
> +		     offset += PAGE_SIZE, page++) {
> +			phys_addr_t page_ipa = ipa + offset;
> +
> +			ret = realm_create_protected_data_page(realm, page_ipa,
> +							       page, tmp_page,
> +							       data_flags);
> +		}
> +		if (ret)
> +			goto err_release_pfn;
> +
> +		if (level == 2) {
> +			ret = fold_rtt(realm, ipa, level);
> +			if (ret)
> +				goto err_release_pfn;

Do we care about the FOLD error here ? Ideally we shouldn't hit an 
error, but unable to fold is not an error case ? We could live with
L3 mappings ?


Suzuki


> +		}
> +
> +		ipa += map_size;
> +		kvm_release_pfn_dirty(pfn);
> +err_release_pfn:
> +		if (ret) {
> +			kvm_release_pfn_clean(pfn);
> +			break;
> +		}
> +	}
> +
> +	mmap_read_unlock(current->mm);
> +	__free_page(tmp_page);
> +
> +out:
> +	srcu_read_unlock(&kvm->srcu, idx);
> +	return ret;
> +}
> +
> +static int kvm_populate_realm(struct kvm *kvm,
> +			      struct kvm_cap_arm_rme_populate_realm_args *args)
> +{
> +	phys_addr_t ipa_base, ipa_end;
> +
> +	if (kvm_realm_state(kvm) != REALM_STATE_NEW)
> +		return -EINVAL;
> +
> +	if (!IS_ALIGNED(args->populate_ipa_base, PAGE_SIZE) ||
> +	    !IS_ALIGNED(args->populate_ipa_size, PAGE_SIZE))
> +		return -EINVAL;
> +
> +	if (args->flags & ~RMI_MEASURE_CONTENT)
> +		return -EINVAL;
> +
> +	ipa_base = args->populate_ipa_base;
> +	ipa_end = ipa_base + args->populate_ipa_size;
> +
> +	if (ipa_end < ipa_base)
> +		return -EINVAL;
> +
> +	return populate_par_region(kvm, ipa_base, ipa_end, args->flags);
> +}
> +
>   static int find_map_level(struct realm *realm,
>   			  unsigned long start,
>   			  unsigned long end)
> @@ -808,6 +1030,18 @@ int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>   		r = kvm_init_ipa_range_realm(kvm, &args);
>   		break;
>   	}
> +	case KVM_CAP_ARM_RME_POPULATE_REALM: {
> +		struct kvm_cap_arm_rme_populate_realm_args args;
> +		void __user *argp = u64_to_user_ptr(cap->args[1]);
> +
> +		if (copy_from_user(&args, argp, sizeof(args))) {
> +			r = -EFAULT;
> +			break;
> +		}
> +
> +		r = kvm_populate_realm(kvm, &args);
> +		break;
> +	}
>   	default:
>   		r = -EINVAL;
>   		break;


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 07/43] arm64: RME: Check for RME support at KVM init
  2024-04-16 13:30     ` Suzuki K Poulose
@ 2024-04-22 15:39       ` Steven Price
  0 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-22 15:39 UTC (permalink / raw)
  To: Suzuki K Poulose, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

On 16/04/2024 14:30, Suzuki K Poulose wrote:
> Hi Steven
> 
> On 12/04/2024 09:42, Steven Price wrote:
>> Query the RMI version number and check if it is a compatible version. A
>> static key is also provided to signal that a supported RMM is available.
>>
>> Functions are provided to query if a VM or VCPU is a realm (or rec)
>> which currently will always return false.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   arch/arm64/include/asm/kvm_emulate.h | 18 +++++++++
>>   arch/arm64/include/asm/kvm_host.h    |  4 ++
>>   arch/arm64/include/asm/kvm_rme.h     | 56 ++++++++++++++++++++++++++++
>>   arch/arm64/include/asm/virt.h        |  1 +
>>   arch/arm64/kvm/Makefile              |  3 +-
>>   arch/arm64/kvm/arm.c                 |  9 +++++
>>   arch/arm64/kvm/rme.c                 | 52 ++++++++++++++++++++++++++
>>   7 files changed, 142 insertions(+), 1 deletion(-)
>>   create mode 100644 arch/arm64/include/asm/kvm_rme.h
>>   create mode 100644 arch/arm64/kvm/rme.c
>>
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index 975af30af31f..6f08398537e2 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -611,4 +611,22 @@ static __always_inline void
>> kvm_reset_cptr_el2(struct kvm_vcpu *vcpu)
>>         kvm_write_cptr_el2(val);
>>   }
>> +
>> +static inline bool kvm_is_realm(struct kvm *kvm)
>> +{
>> +    if (static_branch_unlikely(&kvm_rme_is_available))
>> +        return kvm->arch.is_realm;
>> +    return false;
>> +}
>> +
>> +static inline enum realm_state kvm_realm_state(struct kvm *kvm)
>> +{
>> +    return READ_ONCE(kvm->arch.realm.state);
>> +}
>> +
>> +static inline bool vcpu_is_rec(struct kvm_vcpu *vcpu)
>> +{
>> +    return false;
>> +}
>> +
>>   #endif /* __ARM64_KVM_EMULATE_H__ */
>> diff --git a/arch/arm64/include/asm/kvm_host.h
>> b/arch/arm64/include/asm/kvm_host.h
>> index 9e8a496fb284..63b68b85db3f 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -27,6 +27,7 @@
>>   #include <asm/fpsimd.h>
>>   #include <asm/kvm.h>
>>   #include <asm/kvm_asm.h>
>> +#include <asm/kvm_rme.h>
>>   #include <asm/vncr_mapping.h>
>>     #define __KVM_HAVE_ARCH_INTC_INITIALIZED
>> @@ -348,6 +349,9 @@ struct kvm_arch {
>>        * the associated pKVM instance in the hypervisor.
>>        */
>>       struct kvm_protected_vm pkvm;
>> +
>> +    bool is_realm;
>> +    struct realm realm;
>>   };
>>     struct kvm_vcpu_fault_info {
>> diff --git a/arch/arm64/include/asm/kvm_rme.h
>> b/arch/arm64/include/asm/kvm_rme.h
>> new file mode 100644
>> index 000000000000..922da3f47227
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/kvm_rme.h
>> @@ -0,0 +1,56 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * Copyright (C) 2023 ARM Ltd.
>> + */
>> +
>> +#ifndef __ASM_KVM_RME_H
>> +#define __ASM_KVM_RME_H
>> +
>> +/**
>> + * enum realm_state - State of a Realm
>> + */
>> +enum realm_state {
>> +    /**
>> +     * @REALM_STATE_NONE:
>> +     *      Realm has not yet been created. rmi_realm_create() may be
>> +     *      called to create the realm.
>> +     */
>> +    REALM_STATE_NONE,
>> +    /**
>> +     * @REALM_STATE_NEW:
>> +     *      Realm is under construction, not eligible for execution.
>> Pages
>> +     *      may be populated with rmi_data_create().
>> +     */
>> +    REALM_STATE_NEW,
>> +    /**
>> +     * @REALM_STATE_ACTIVE:
>> +     *      Realm has been created and is eligible for execution with
>> +     *      rmi_rec_enter(). Pages may no longer be populated with
>> +     *      rmi_data_create().
>> +     */
>> +    REALM_STATE_ACTIVE,
>> +    /**
>> +     * @REALM_STATE_DYING:
>> +     *      Realm is in the process of being destroyed or has already
>> been
>> +     *      destroyed.
>> +     */
>> +    REALM_STATE_DYING,
>> +    /**
>> +     * @REALM_STATE_DEAD:
>> +     *      Realm has been destroyed.
>> +     */
>> +    REALM_STATE_DEAD
>> +};
>> +
>> +/**
>> + * struct realm - Additional per VM data for a Realm
>> + *
>> + * @state: The lifetime state machine for the realm
>> + */
>> +struct realm {
>> +    enum realm_state state;
>> +};
>> +
>> +int kvm_init_rme(void);
>> +
>> +#endif
>> diff --git a/arch/arm64/include/asm/virt.h
>> b/arch/arm64/include/asm/virt.h
>> index 261d6e9df2e1..12cf36c38189 100644
>> --- a/arch/arm64/include/asm/virt.h
>> +++ b/arch/arm64/include/asm/virt.h
>> @@ -81,6 +81,7 @@ void __hyp_reset_vectors(void);
>>   bool is_kvm_arm_initialised(void);
>>     DECLARE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
>> +DECLARE_STATIC_KEY_FALSE(kvm_rme_is_available);
>>     /* Reports the availability of HYP mode */
>>   static inline bool is_hyp_mode_available(void)
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index c0c050e53157..1c1d8cdf381f 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -20,7 +20,8 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o
>> pvtime.o \
>>        vgic/vgic-v3.o vgic/vgic-v4.o \
>>        vgic/vgic-mmio.o vgic/vgic-mmio-v2.o \
>>        vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
>> -     vgic/vgic-its.o vgic/vgic-debug.o
>> +     vgic/vgic-its.o vgic/vgic-debug.o \
>> +     rme.o
>>     kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o pmu.o
>>   diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 3dee5490eea9..2056c660c5ee 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -38,6 +38,7 @@
>>   #include <asm/kvm_mmu.h>
>>   #include <asm/kvm_nested.h>
>>   #include <asm/kvm_pkvm.h>
>> +#include <asm/kvm_rme.h>
>>   #include <asm/kvm_emulate.h>
>>   #include <asm/sections.h>
>>   @@ -47,6 +48,8 @@
>>     static enum kvm_mode kvm_mode = KVM_MODE_DEFAULT;
>>   +DEFINE_STATIC_KEY_FALSE(kvm_rme_is_available);
>> +
>>   DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
>>     DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>> @@ -2562,6 +2565,12 @@ static __init int kvm_arm_init(void)
>>         in_hyp_mode = is_kernel_in_hyp_mode();
>>   +    if (in_hyp_mode) {
>> +        err = kvm_init_rme();
>> +        if (err)
>> +            return err;
>> +    }
>> +
>>       if (cpus_have_final_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE) ||
>>           cpus_have_final_cap(ARM64_WORKAROUND_1508412))
>>           kvm_info("Guests without required CPU erratum workarounds
>> can deadlock system!\n" \
>> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
>> new file mode 100644
>> index 000000000000..3dbbf9d046bf
>> --- /dev/null
>> +++ b/arch/arm64/kvm/rme.c
>> @@ -0,0 +1,52 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (C) 2023 ARM Ltd.
>> + */
>> +
>> +#include <linux/kvm_host.h>
>> +
>> +#include <asm/rmi_cmds.h>
>> +#include <asm/virt.h>
>> +
>> +static int rmi_check_version(void)
>> +{
>> +    struct arm_smccc_res res;
>> +    int version_major, version_minor;
>> +    unsigned long host_version = RMI_ABI_VERSION(RMI_ABI_MAJOR_VERSION,
>> +                             RMI_ABI_MINOR_VERSION);
>> +
>> +    arm_smccc_1_1_invoke(SMC_RMI_VERSION, host_version, &res);
>> +
>> +    if (res.a0 == SMCCC_RET_NOT_SUPPORTED)
>> +        return -ENXIO;
>> +
>> +    version_major = RMI_ABI_VERSION_GET_MAJOR(res.a1);
>> +    version_minor = RMI_ABI_VERSION_GET_MINOR(res.a1);
>> +
> 
> We don't seem to be using the res.a0 to determin if the RMM supports our
> requested version. As per RMM spec, section B4.3.23 :
> 
> "
> The status code and lower revision output values indicate which of the
> following is true, in order of precedence:
>  a) The RMM supports an interface revision which is compatible with the
>     requested revision.
>      • The status code is RMI_SUCCESS.
>      • The lower revision is equal to the requested revision.
>  b) The RMM does not support an interface revision which is compatible
>     with the requested revision The RMM supports an interface revision
>     which is incompatible with and less than the requested revision.
>      • The status code is RMI_ERROR_INPUT.
>      • The lower revision is the highest interface revision which is
>        both less than the requested revision and supported by the RMM.
> 
>  c) The RMM does not support an interface revision which is compatible
>     with the requested revision The RMM supports an interface revision
>     which is incompatible with and greater than the requested revision.
>      • The status code is RMI_ERROR_INPUT.
>      • The lower revision is equal to the higher revision.
> 
> So, we could simply check the res.a0 for RMI_SUCCESS and proceed with
> marking RMM available.

Good point - this didn't work in a previous version of the spec, but we
should be able to rely on the return value now.

>> +    if (version_major != RMI_ABI_MAJOR_VERSION) {
>> +        kvm_err("Unsupported RMI ABI (v%d.%d) host supports v%d.%d\n",
>> +            version_major, version_minor,
>> +            RMI_ABI_MAJOR_VERSION,
>> +            RMI_ABI_MINOR_VERSION);
>> +        return -ENXIO;
>> +    }
>> +
>> +    kvm_info("RMI ABI version %d.%d\n", version_major, version_minor);
>> +
>> +    return 0;
>> +}
>> +
>> +int kvm_init_rme(void)
>> +{
>> +    if (PAGE_SIZE != SZ_4K)
>> +        /* Only 4k page size on the host is supported */
>> +        return 0;
>> +
>> +    if (rmi_check_version())
>> +        /* Continue without realm support */
>> +        return 0;
>> +
>> +    /* Future patch will enable static branch kvm_rme_is_available */
>> +
>> +    return 0;
> 
> Do we ever expect this to fail the kvm initialisation ? Otherwise, we
> could leave it as a void ?

Technically in a later patch the return from rme_vmid_init() can cause
such a failure. But it's not clear that it makes any sense to completely
kill KVM because of that. So I'll change this to a void return.

Thanks,

Steve


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 09/43] arm64: RME: ioctls to create and configure realms
  2024-04-17  9:51     ` Suzuki K Poulose
@ 2024-04-22 16:33       ` Steven Price
  0 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-22 16:33 UTC (permalink / raw)
  To: Suzuki K Poulose, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni, Jean-Philippe Brucker

On 17/04/2024 10:51, Suzuki K Poulose wrote:
> Hi Steven
> 
> On 12/04/2024 09:42, Steven Price wrote:
>> Add the KVM_CAP_ARM_RME_CREATE_FD ioctl to create a realm. This involves
> 
> minor nit: s/FD/RD
> 
>> delegating pages to the RMM to hold the Realm Descriptor (RD) and for
>> the base level of the Realm Translation Tables (RTT). A VMID also need
>> to be picked, since the RMM has a separate VMID address space a
>> dedicated allocator is added for this purpose.
>>
>> KVM_CAP_ARM_RME_CONFIG_REALM is provided to allow configuring the realm
>> before it is created.
> 
> It might be helpful to provide a bit more background on the Realm
> parameters. Something like:
> 
> Realm parameters for Realm Descriptor creation could be classified as:
> 
>  1. Parameters specific to the Realm stage2 (e.g. IPA Size, vmid, stage2
>     entry level, entry level RTTs, number of RTTs in start level, LPA2)
>     Most of these are not measured by RMM and comes from KVM book
>     keeping.
> 
>  2. Parameters controlling "Arm Architecture features for the VM". (e.g.
>     SVE VL, PMU counters, number of HW BRPs/WPs), configured by the VMM
>     using the "user ID register write" mechanism. These will be
>     supported in the later patches.
> 
>  3. Parameters are not part of the core Arm architecture but defined
>     by the RMM spec (e.g. Hash algorithm for measurement,
>     Personalisation value). These are programmed via
>     KVM_CAP_ARM_RME_CONFIG_REALM.
> 

Thanks, I'll steal that wording ;)

> 
> Also it may be a good idea to call out one of the issues that we have
> with the UABI w.r.t the IPA Size. The IPA Size supported by RMM *could*
> be different from the normal IPA Size as supported by KVM. We do not
> expect this to be common, but is not impossible.
> 
> If the RMM_IPA_Size < Normal_IPA_Size, we have a problem with
> advertising the "IPA Size" to the VMM. Right now we advertise
> the "normal limit" by KVM_CAP_ARM_VM_IPA_SIZE and the IPA Size
> is configured via vm_type[7:0] in KVM_CREATE_VM. Given we have
> to configure the IPA size for a "Realm VM" at CREATE_VM time too,
> the VMM is unable to choose a valid IPA Size for the Realm. We
> have the following options:
> 
> 1. Given IPA Size for a Realm is measured, the user must get
>    what they choose. i.e., if the platform cannot support the
>    requested size, don't run your Realm VM. In this case, we
>    don't need to do anything.
> 
> 2. Add KVM_CAP_ARM_VM_RMM_IPA_SIZE to expose the RMM limit
>    for the VMM to choose.
> 
> 3. VMM to create a Realm VM using the default IPA Size and then
>    check the KVM_CAP_ARM_VM_IPA_SIZE on the "kvm" instance (which
>    is Realm) and get the RMM IPA limit.
> 
> I prefer 2 or 1, in that order of preference. Happy to hear suggestions.

My gut feeling is that we should go for 1, and add 2 if we actually see
a need for it.

I'm struggling to see why the VMM would want to choose the IPA size
dynamically (surely the attestation server doesn't want to deal with
different IPA sizes?). So the IPA size is going to be picked by the
container owner based on actual need and the VMM can attempt to create a
VM and fail if the platform doesn't support that size. In that case
because attestation would fail with a smaller IPA size the correct
response is failure.

I'll stick something in the commit message to record that.

>>
>> Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
>> ---
>>   arch/arm64/include/asm/kvm_emulate.h |   5 +
>>   arch/arm64/include/asm/kvm_rme.h     |  19 ++
>>   arch/arm64/kvm/arm.c                 |  18 ++
>>   arch/arm64/kvm/mmu.c                 |  15 +-
>>   arch/arm64/kvm/rme.c                 | 282 +++++++++++++++++++++++++++
>>   5 files changed, 337 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index 6f08398537e2..c606316f4729 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -624,6 +624,11 @@ static inline enum realm_state
>> kvm_realm_state(struct kvm *kvm)
>>       return READ_ONCE(kvm->arch.realm.state);
>>   }
>>   +static inline bool kvm_realm_is_created(struct kvm *kvm)
>> +{
>> +    return kvm_is_realm(kvm) && kvm_realm_state(kvm) !=
>> REALM_STATE_NONE;
>> +}
>> +
>>   static inline bool vcpu_is_rec(struct kvm_vcpu *vcpu)
>>   {
>>       return false;
>> diff --git a/arch/arm64/include/asm/kvm_rme.h
>> b/arch/arm64/include/asm/kvm_rme.h
>> index 922da3f47227..cf8cc4d30364 100644
>> --- a/arch/arm64/include/asm/kvm_rme.h
>> +++ b/arch/arm64/include/asm/kvm_rme.h
>> @@ -6,6 +6,8 @@
>>   #ifndef __ASM_KVM_RME_H
>>   #define __ASM_KVM_RME_H
>>   +#include <uapi/linux/kvm.h>
>> +
>>   /**
>>    * enum realm_state - State of a Realm
>>    */
>> @@ -46,11 +48,28 @@ enum realm_state {
>>    * struct realm - Additional per VM data for a Realm
>>    *
>>    * @state: The lifetime state machine for the realm
>> + * @rd: Kernel mapping of the Realm Descriptor (RD)
>> + * @params: Parameters for the RMI_REALM_CREATE command
>> + * @num_aux: The number of auxiliary pages required by the RMM
>> + * @vmid: VMID to be used by the RMM for the realm
>> + * @ia_bits: Number of valid Input Address bits in the IPA
>>    */
>>   struct realm {
>>       enum realm_state state;
>> +
>> +    void *rd;
>> +    struct realm_params *params;
>> +
>> +    unsigned long num_aux;
>> +    unsigned int vmid;
>> +    unsigned int ia_bits;
>>   };
>>     int kvm_init_rme(void);
>> +u32 kvm_realm_ipa_limit(void);
>> +
>> +int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
>> +int kvm_init_realm_vm(struct kvm *kvm);
>> +void kvm_destroy_realm(struct kvm *kvm);
>>     #endif
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 2056c660c5ee..5729ea430d6d 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -119,6 +119,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>>           }
>>           mutex_unlock(&kvm->slots_lock);
>>           break;
>> +    case KVM_CAP_ARM_RME:
>> +        if (!kvm_is_realm(kvm))
>> +            return -EINVAL;
>> +        mutex_lock(&kvm->lock);
>> +        r = kvm_realm_enable_cap(kvm, cap);
>> +        mutex_unlock(&kvm->lock);
>> +        break;
>>       default:
>>           r = -EINVAL;
>>           break;
>> @@ -179,6 +186,13 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned
>> long type)
>>         bitmap_zero(kvm->arch.vcpu_features, KVM_VCPU_MAX_FEATURES);
>>   +    /* Initialise the realm bits after the generic bits are enabled */
>> +    if (kvm_is_realm(kvm)) {
>> +        ret = kvm_init_realm_vm(kvm);
>> +        if (ret)
>> +            goto err_free_cpumask;
>> +    }
>> +
>>       return 0;
>>     err_free_cpumask:
>> @@ -219,6 +233,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>>       kvm_unshare_hyp(kvm, kvm + 1);
>>         kvm_arm_teardown_hypercalls(kvm);
>> +    kvm_destroy_realm(kvm);
>>   }
>>     int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> @@ -328,6 +343,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm,
>> long ext)
>>       case KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES:
>>           r = BIT(0);
>>           break;
>> +    case KVM_CAP_ARM_RME:
>> +        r = static_key_enabled(&kvm_rme_is_available);
>> +        break;
>>       default:
>>           r = 0;
>>       }
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 18680771cdb0..aae365647b62 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -872,6 +872,10 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct
>> kvm_s2_mmu *mmu, unsigned long t
>>       struct kvm_pgtable *pgt;
>>       u64 mmfr0, mmfr1;
>>       u32 phys_shift;
>> +    u32 ipa_limit = kvm_ipa_limit;
>> +
>> +    if (kvm_is_realm(kvm))
>> +        ipa_limit = kvm_realm_ipa_limit();
>>         if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
>>           return -EINVAL;
>> @@ -880,12 +884,12 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct
>> kvm_s2_mmu *mmu, unsigned long t
>>       if (is_protected_kvm_enabled()) {
>>           phys_shift = kvm_ipa_limit;
>>       } else if (phys_shift) {
>> -        if (phys_shift > kvm_ipa_limit ||
>> +        if (phys_shift > ipa_limit ||
>>               phys_shift < ARM64_MIN_PARANGE_BITS)
>>               return -EINVAL;
>>       } else {
>>           phys_shift = KVM_PHYS_SHIFT;
>> -        if (phys_shift > kvm_ipa_limit) {
>> +        if (phys_shift > ipa_limit) {
>>               pr_warn_once("%s using unsupported default IPA limit,
>> upgrade your VMM\n",
>>                        current->comm);
>>               return -EINVAL;
>> @@ -1014,6 +1018,13 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>>       struct kvm_pgtable *pgt = NULL;
>>         write_lock(&kvm->mmu_lock);
>> +    if (kvm_is_realm(kvm) &&
>> +        (kvm_realm_state(kvm) != REALM_STATE_DEAD &&
>> +         kvm_realm_state(kvm) != REALM_STATE_NONE)) {
>> +        /* TODO: teardown rtts */
>> +        write_unlock(&kvm->mmu_lock);
>> +        return;
>> +    }
> 
> This needs a comment to explain the rationale of deferring the
> Stage2 pgt freeing. Something like :
> 
>     /*
>      * For realms, we can free the entry level RTTs
>      * only after :
>      *  1. All of the stage2 mappings are torn down.
>      *  2. The Realm has been destroyed.
>      *
>      * So, come back later once the RD has been destroyed.
>      */

Actually the 'TODO' is really "this is handled in a future patch". I'll
make that more obvious.

>>       pgt = mmu->pgt;
>>       if (pgt) {
>>           mmu->pgd_phys = 0;
>> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
>> index 3dbbf9d046bf..658d14e8d87d 100644
>> --- a/arch/arm64/kvm/rme.c
>> +++ b/arch/arm64/kvm/rme.c
>> @@ -5,9 +5,20 @@
>>     #include <linux/kvm_host.h>
>>   +#include <asm/kvm_emulate.h>
>> +#include <asm/kvm_mmu.h>
>>   #include <asm/rmi_cmds.h>
>>   #include <asm/virt.h>
>>   +#include <asm/kvm_pgtable.h>
>> +
>> +static unsigned long rmm_feat_reg0;
>> +
>> +static bool rme_supports(unsigned long feature)
>> +{
>> +    return !!u64_get_bits(rmm_feat_reg0, feature);
>> +}
>> +
>>   static int rmi_check_version(void)
>>   {
>>       struct arm_smccc_res res;
>> @@ -36,8 +47,272 @@ static int rmi_check_version(void)
>>       return 0;
>>   }
>>   +u32 kvm_realm_ipa_limit(void)
>> +{
>> +    return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
>> +}
>> +
>> +static int get_start_level(struct realm *realm)
>> +{
>> +    return 4 - stage2_pgtable_levels(realm->ia_bits);
>> +}
>> +
>> +static int realm_create_rd(struct kvm *kvm)
>> +{
>> +    struct realm *realm = &kvm->arch.realm;
>> +    struct realm_params *params = realm->params;
>> +    void *rd = NULL;
>> +    phys_addr_t rd_phys, params_phys;
>> +    struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
>> +    int i, r;
>> +
>> +    if (WARN_ON(realm->rd) || WARN_ON(!realm->params))
>> +        return -EEXIST;
>> +
>> +    rd = (void *)__get_free_page(GFP_KERNEL);
>> +    if (!rd)
>> +        return -ENOMEM;
>> +
>> +    rd_phys = virt_to_phys(rd);
>> +    if (rmi_granule_delegate(rd_phys)) {
>> +        r = -ENXIO;
>> +        goto out;
> 
> super minor nit: s/out/free_rd/ is a bit more readable. Here "out" is
> only used for error exits and could be confusing.

It's a good point - I'll fix.

>> +    }
>> +
>> +    for (i = 0; i < pgt->pgd_pages; i++) {
>> +        phys_addr_t pgd_phys = kvm->arch.mmu.pgd_phys + i * PAGE_SIZE;
>> +
>> +        if (rmi_granule_delegate(pgd_phys)) {
>> +            r = -ENXIO;
>> +            goto out_undelegate_tables;
>> +        }
>> +    }
>> +
>> +    realm->ia_bits = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
>> +
>> +    params->rtt_level_start = get_start_level(realm);
>> +    params->rtt_num_start = pgt->pgd_pages;
>> +    params->rtt_base = kvm->arch.mmu.pgd_phys;
>> +    params->vmid = realm->vmid;
>> +
>> +    params_phys = virt_to_phys(params);
>> +
>> +    if (rmi_realm_create(rd_phys, params_phys)) {
>> +        r = -ENXIO;
>> +        goto out_undelegate_tables;
>> +    }
>> +
>> +    realm->rd = rd;
>> +
>> +    if (WARN_ON(rmi_rec_aux_count(rd_phys, &realm->num_aux))) {
>> +        WARN_ON(rmi_realm_destroy(rd_phys));
>> +        goto out_undelegate_tables;
>> +    }
>> +
>> +    return 0;
>> +
>> +out_undelegate_tables:
>> +    while (--i >= 0) {
>> +        phys_addr_t pgd_phys = kvm->arch.mmu.pgd_phys + i * PAGE_SIZE;
>> +
>> +        WARN_ON(rmi_granule_undelegate(pgd_phys));
>> +    }
>> +    WARN_ON(rmi_granule_undelegate(rd_phys));
>> +out:
>> +    free_page((unsigned long)rd);
>> +    return r;
>> +}
>> +
>> +/* Protects access to rme_vmid_bitmap */
>> +static DEFINE_SPINLOCK(rme_vmid_lock);
>> +static unsigned long *rme_vmid_bitmap;
>> +
>> +static int rme_vmid_init(void)
>> +{
>> +    unsigned int vmid_count = 1 << kvm_get_vmid_bits();
> 
> minor nit: RMM has a fixed VMID width of 16bits. Do we need to
> explicitly use that, instead of relying what kvm thinks ? (Though
> in practise, this would only be a problem if the architecture
> evolves to support something more).

I don't think the RMM has a fixed VMID width according to the spec. It says:

The VMID of a Realm is chosen by the Host. The VMID must be within the
range supported by the hardware platform. The RMM ensures that every
Realm on the system has a unique VMID.

It also declares a pseudo-code VmidIsValid function which says:

If the underlying hardware platform does not implement FEAT_VMID16 then
a VMID value with vmid[15:8] != 0 is invalid.

So it would seem from the spec that a VMID width <16 bits is supported.
On the other hand if we ever get >16 bit VMID support in the
architecture then the 16 bit vmid field in RmiRealmParams is clearly
going to trip us up.

At the moment my feeling is that 16 bit VMIDs ought to be enough for
anybody ... ;)

>> +
>> +    rme_vmid_bitmap = bitmap_zalloc(vmid_count, GFP_KERNEL);
>> +    if (!rme_vmid_bitmap) {
>> +        kvm_err("%s: Couldn't allocate rme vmid bitmap\n", __func__);
>> +        return -ENOMEM;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int rme_vmid_reserve(void)
>> +{
>> +    int ret;
>> +    unsigned int vmid_count = 1 << kvm_get_vmid_bits();
>> +
>> +    spin_lock(&rme_vmid_lock);
>> +    ret = bitmap_find_free_region(rme_vmid_bitmap, vmid_count, 0);
>> +    spin_unlock(&rme_vmid_lock);
>> +
>> +    return ret;
>> +}
>> +
>> +static void rme_vmid_release(unsigned int vmid)
>> +{
>> +    spin_lock(&rme_vmid_lock);
>> +    bitmap_release_region(rme_vmid_bitmap, vmid, 0);
>> +    spin_unlock(&rme_vmid_lock);
>> +}
>> +
>> +static int kvm_create_realm(struct kvm *kvm)
>> +{
>> +    struct realm *realm = &kvm->arch.realm;
>> +    int ret;
>> +
>> +    if (!kvm_is_realm(kvm) || kvm_realm_is_created(kvm))
>> +        return -EEXIST;
> 
> Minor nit:
> 
>     if (!kvm_is_realm(kvm)
>         return -EIO or even -EINVAL ?
> 
>     if (kvm_realm_is_created(kvm))
>         return -EEXIST;

Indeed EEXIST is a terrible error code for !realm. My preference is EIO.

>> +
>> +    ret = rme_vmid_reserve();
>> +    if (ret < 0)
>> +        return ret;
>> +    realm->vmid = ret;
>> +
>> +    ret = realm_create_rd(kvm);
>> +    if (ret) {
>> +        rme_vmid_release(realm->vmid);
>> +        return ret;
>> +    }
>> +
>> +    WRITE_ONCE(realm->state, REALM_STATE_NEW);
>> +
>> +    /* The realm is up, free the parameters.  */
>> +    free_page((unsigned long)realm->params);
>> +    realm->params = NULL;
>> +
>> +    return 0;
>> +}
>> +
>> +static int config_realm_hash_algo(struct realm *realm,
>> +                  struct kvm_cap_arm_rme_config_item *cfg)
>> +{
>> +    switch (cfg->hash_algo) {
>> +    case KVM_CAP_ARM_RME_MEASUREMENT_ALGO_SHA256:
>> +        if (!rme_supports(RMI_FEATURE_REGISTER_0_HASH_SHA_256))
>> +            return -EINVAL;
>> +        break;
>> +    case KVM_CAP_ARM_RME_MEASUREMENT_ALGO_SHA512:
>> +        if (!rme_supports(RMI_FEATURE_REGISTER_0_HASH_SHA_512))
>> +            return -EINVAL;
> 
> Do we need to add a comment here on why we don't expose the supported
> "hash" algo as part of the UABI ? Something like :
> 
>     /*
>      * The hash algorithm for the measurements is a choosen by
>      * the Realm owner (since it affects the attestation), we
>      * would like the owner to get what they wants.
>      */

I'm not really sure here's the right place for such a comment. Clearly
here we're just doing whatever the VMM requested and returning -EINVAL
if it's not supported (by KVM or the RMM).

There is the bigger question on whether discovery of the RMM's feature
set is necessary and easy enough with the current uAPI. As you say there
is a reasonable argument that the VMM doesn't have a choice so there's
no need for discovery. Equally adding an extra CAP ioctl for discovery
later is easy enough.

>> +        break;
>> +    default:
>> +        return -EINVAL;
>> +    }
>> +    realm->params->hash_algo = cfg->hash_algo;
>> +    return 0;
>> +}
>> +
>> +static int kvm_rme_config_realm(struct kvm *kvm, struct
>> kvm_enable_cap *cap)
>> +{
>> +    struct kvm_cap_arm_rme_config_item cfg;
>> +    struct realm *realm = &kvm->arch.realm;
>> +    int r = 0;
>> +
>> +    if (kvm_realm_is_created(kvm))
>> +        return -EINVAL;
> 
> minor nit: May be return -EEXIST or -EIO rather than, "Invalid
> (parameter)" ?

Yep, although EBUSY seems like the most descriptive to me.

> 
>> +
>> +    if (copy_from_user(&cfg, (void __user *)cap->args[1], sizeof(cfg)))
>> +        return -EFAULT;
>> +
>> +    switch (cfg.cfg) {
>> +    case KVM_CAP_ARM_RME_CFG_RPV:
>> +        memcpy(&realm->params->rpv, &cfg.rpv, sizeof(cfg.rpv));
>> +        break;
>> +    case KVM_CAP_ARM_RME_CFG_HASH_ALGO:
>> +        r = config_realm_hash_algo(realm, &cfg);
>> +        break;
>> +    default:
>> +        r = -EINVAL;
>> +    }
>> +
>> +    return r;
>> +}
>> +
>> +int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>> +{
>> +    int r = 0;
>> +
>> +    if (!kvm_is_realm(kvm))
>> +        return -EINVAL;
>> +
>> +    switch (cap->args[0]) {
>> +    case KVM_CAP_ARM_RME_CONFIG_REALM:
>> +        r = kvm_rme_config_realm(kvm, cap);
>> +        break;
>> +    case KVM_CAP_ARM_RME_CREATE_RD:
>> +        r = kvm_create_realm(kvm);
>> +        break;
>> +    default:
>> +        r = -EINVAL;
>> +        break;
>> +    }
>> +
>> +    return r;
>> +}
>> +
>> +void kvm_destroy_realm(struct kvm *kvm)
>> +{
>> +    struct realm *realm = &kvm->arch.realm;
>> +    struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
>> +    int i;
>> +
>> +    if (realm->params) {
>> +        free_page((unsigned long)realm->params);
>> +        realm->params = NULL;
>> +    }
>> +
>> +    if (!kvm_realm_is_created(kvm))
>> +        return;
>> +
>> +    WRITE_ONCE(realm->state, REALM_STATE_DYING);
>> +
>> +    if (realm->rd) {
>> +        phys_addr_t rd_phys = virt_to_phys(realm->rd);
>> +
>> +        if (WARN_ON(rmi_realm_destroy(rd_phys)))
>> +            return;
>> +        if (WARN_ON(rmi_granule_undelegate(rd_phys)))
>> +            return;
>> +        free_page((unsigned long)realm->rd);
>> +        realm->rd = NULL;
>> +    }
>> +
>> +    rme_vmid_release(realm->vmid);
>> +
>> +    for (i = 0; i < pgt->pgd_pages; i++) {
>> +        phys_addr_t pgd_phys = kvm->arch.mmu.pgd_phys + i * PAGE_SIZE;
>> +
>> +        if (WARN_ON(rmi_granule_undelegate(pgd_phys)))
>> +            return;
>> +    }
>> +
>> +    WRITE_ONCE(realm->state, REALM_STATE_DEAD);
>> +
> 
> May be add in a comment here:
> 
>      /* Now that the Realm is destroyed, free the entry level RTTs */
>> +    kvm_free_stage2_pgd(&kvm->arch.mmu);
> 
> 
> 
>> +}
>> +
>> +int kvm_init_realm_vm(struct kvm *kvm)
>> +{
>> +    struct realm_params *params;
>> +
>> +    params = (struct realm_params *)get_zeroed_page(GFP_KERNEL);
>> +    if (!params)
>> +        return -ENOMEM;
>> +
>> +    /* Default parameters, not exposed to user space */
> 
> This is a bit misleading. The value comes from the userspace and...
> 
>> +    params->s2sz = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
> 
> (minor nit) we initialise most of the params, including those that come
> from KVM later. So, as such may be a good idea to move it together at
> create_realm, unless we need it.

Indeed I'll move this to create_realm.

Thanks,

Steve

>> +    kvm->arch.realm.params = params;
>> +    return 0;
>> +}
>> +
>>   int kvm_init_rme(void)
>>   {
>> +    int ret;
>> +
>>       if (PAGE_SIZE != SZ_4K)
>>           /* Only 4k page size on the host is supported */
>>           return 0;
>> @@ -46,6 +321,13 @@ int kvm_init_rme(void)
>>           /* Continue without realm support */
>>           return 0;
>>   +    if (WARN_ON(rmi_features(0, &rmm_feat_reg0)))
>> +        return 0;
>> +
>> +    ret = rme_vmid_init();
>> +    if (ret)
>> +        return ret;
>> +
>>       /* Future patch will enable static branch kvm_rme_is_available */
>>         return 0;
> 
> 
> Suzuki
> 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 13/43] arm64: RME: RTT handling
  2024-04-17 13:37     ` Suzuki K Poulose
@ 2024-04-24 10:59       ` Steven Price
  0 siblings, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-24 10:59 UTC (permalink / raw)
  To: Suzuki K Poulose, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

On 17/04/2024 14:37, Suzuki K Poulose wrote:
> Hi Steven
> 
> minort nit, Subject: arm64: RME: RTT tear down
> 
> This patch is all about tearing the RTTs, so may be the subject could
> be adjusted accordingly.

Good point, this patch has evolved and is now all about tearing down.

> On 12/04/2024 09:42, Steven Price wrote:
>> The RMM owns the stage 2 page tables for a realm, and KVM must request
>> that the RMM creates/destroys entries as necessary. The physical pages
>> to store the page tables are delegated to the realm as required, and can
>> be undelegated when no longer used.
>>
>> Creating new RTTs is the easy part, tearing down is a little more
>> tricky. The result of realm_rtt_destroy() can be used to effectively
>> walk the tree and destroy the entries (undelegating pages that were
>> given to the realm).
> 
> The patch looks functionally correct to me. Some minor style related
> comments below.
> 
>> Signed-off-by: Steven Price <steven.price@arm.com>
> 
>> ---
>>   arch/arm64/include/asm/kvm_rme.h |  19 ++++
>>   arch/arm64/kvm/mmu.c             |   6 +-
>>   arch/arm64/kvm/rme.c             | 171 +++++++++++++++++++++++++++++++
>>   3 files changed, 193 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_rme.h
>> b/arch/arm64/include/asm/kvm_rme.h
>> index fba85e9ce3ae..4ab5cb5e91b3 100644
>> --- a/arch/arm64/include/asm/kvm_rme.h
>> +++ b/arch/arm64/include/asm/kvm_rme.h
>> @@ -76,5 +76,24 @@ u32 kvm_realm_ipa_limit(void);
>>   int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
>>   int kvm_init_realm_vm(struct kvm *kvm);
>>   void kvm_destroy_realm(struct kvm *kvm);
>> +void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits);
>> +
>> +#define RME_RTT_BLOCK_LEVEL    2
>> +#define RME_RTT_MAX_LEVEL    3
>> +
>> +#define RME_PAGE_SHIFT        12
>> +#define RME_PAGE_SIZE        BIT(RME_PAGE_SHIFT)
>> +/* See ARM64_HW_PGTABLE_LEVEL_SHIFT() */
>> +#define RME_RTT_LEVEL_SHIFT(l)    \
>> +    ((RME_PAGE_SHIFT - 3) * (4 - (l)) + 3)
>> +#define RME_L2_BLOCK_SIZE    BIT(RME_RTT_LEVEL_SHIFT(2))
>> +
>> +static inline unsigned long rme_rtt_level_mapsize(int level)
>> +{
>> +    if (WARN_ON(level > RME_RTT_MAX_LEVEL))
>> +        return RME_PAGE_SIZE;
>> +
>> +    return (1UL << RME_RTT_LEVEL_SHIFT(level));
>> +}
>>
> 
> super minor nit: We only support 4K for now, so may be could reuse
> the ARM64 generic macro helpers. I am fine either way.

Given we'll likely want to support host granules other than 4k in the
future I'd like to avoid using the generic ones. It's also a clear
signal that the code is referring to the RTTs rather than the host's
page tables.

> 
>>   #endif
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index af4564f3add5..46f0c4e80ace 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -1012,17 +1012,17 @@ void stage2_unmap_vm(struct kvm *kvm)
>>   void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>>   {
>>       struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
>> -    struct kvm_pgtable *pgt = NULL;
>> +    struct kvm_pgtable *pgt;
>>         write_lock(&kvm->mmu_lock);
>> +    pgt = mmu->pgt;
>>       if (kvm_is_realm(kvm) &&
>>           (kvm_realm_state(kvm) != REALM_STATE_DEAD &&
>>            kvm_realm_state(kvm) != REALM_STATE_NONE)) {
>> -        /* TODO: teardown rtts */
>>           write_unlock(&kvm->mmu_lock);
>> +        kvm_realm_destroy_rtts(kvm, pgt->ia_bits);
>>           return;
>>       }
>> -    pgt = mmu->pgt;
>>       if (pgt) {
>>           mmu->pgd_phys = 0;
>>           mmu->pgt = NULL;
>> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
>> index 9652ec6ab2fd..09b59bcad8b6 100644
>> --- a/arch/arm64/kvm/rme.c
>> +++ b/arch/arm64/kvm/rme.c
>> @@ -47,6 +47,53 @@ static int rmi_check_version(void)
>>       return 0;
>>   }
>>   +static phys_addr_t __alloc_delegated_page(struct realm *realm,
>> +                      struct kvm_mmu_memory_cache *mc,
>> +                      gfp_t flags)
> 
> minor nit: Do we need "__" here ? The counter part is plain
> free_delegated_page without the "__". We could drop the prefix
> 
> Or we could split the function as:
> 
> alloc_delegated_page()
> {
>   if (spare_page_available)
>     return spare_page;
>   return __alloc_delegated_page(); /* Alloc and delegate a page */
> }

I'm not really sure I follow. The reason for the 'wrapper' function
alloc_delegated_page() is because most call sites don't care about the
GFP flags (defaults to GFP_KERNEL), but for ensure_spare_page() we need
to pass GFP_ATOMIC.

Admittedly there are only 3 call sites in total and the wrapper isn't
added yet. I'll tidy this up by simply adding the GFP_KERNEL flag onto
the two call sites.

> 
>> +{
>> +    phys_addr_t phys = PHYS_ADDR_MAX;
>> +    void *virt;
>> +
>> +    if (realm->spare_page != PHYS_ADDR_MAX) {
>> +        swap(realm->spare_page, phys);
>> +        goto out;
>> +    }
>> +
>> +    if (mc)
>> +        virt = kvm_mmu_memory_cache_alloc(mc);
>> +    else
>> +        virt = (void *)__get_free_page(flags);
>> +
>> +    if (!virt)
>> +        goto out;
>> +
>> +    phys = virt_to_phys(virt);
>> +
>> +    if (rmi_granule_delegate(phys)) {
>> +        free_page((unsigned long)virt);
>> +
>> +        phys = PHYS_ADDR_MAX;
>> +    }
>> +
>> +out:
>> +    return phys;
>> +}
>> +
>> +static void free_delegated_page(struct realm *realm, phys_addr_t phys)
>> +{
>> +    if (realm->spare_page == PHYS_ADDR_MAX) {
>> +        realm->spare_page = phys;
>> +        return;
>> +    }
>> +
>> +    if (WARN_ON(rmi_granule_undelegate(phys))) {
>> +        /* Undelegate failed: leak the page */
>> +        return;
>> +    }
>> +
>> +    free_page((unsigned long)phys_to_virt(phys));
>> +}
>> +
>>   u32 kvm_realm_ipa_limit(void)
>>   {
>>       return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
>> @@ -124,6 +171,130 @@ static int realm_create_rd(struct kvm *kvm)
>>       return r;
>>   }
>>   +static int realm_rtt_destroy(struct realm *realm, unsigned long addr,
>> +                 int level, phys_addr_t *rtt_granule,
>> +                 unsigned long *next_addr)
>> +{
>> +    unsigned long out_rtt;
>> +    unsigned long out_top;
>> +    int ret;
>> +
>> +    ret = rmi_rtt_destroy(virt_to_phys(realm->rd), addr, level,
>> +                  &out_rtt, &out_top);
>> +
>> +    if (rtt_granule)
>> +        *rtt_granule = out_rtt;
>> +    if (next_addr)
>> +        *next_addr = out_top;
> 
> minor nit: As mentioned in the previous patch, we could move this check
> to the rmi_rtt_destroy().

Done, I've also dropped the check for rtt_granule - it's a bug to be
passing that as NULL.

>> +
>> +    return ret;
>> +}
>> +
>> +static int realm_tear_down_rtt_level(struct realm *realm, int level,
>> +                     unsigned long start, unsigned long end)
>> +{
>> +    ssize_t map_size;
>> +    unsigned long addr, next_addr;
>> +
>> +    if (WARN_ON(level > RME_RTT_MAX_LEVEL))
>> +        return -EINVAL;
>> +
>> +    map_size = rme_rtt_level_mapsize(level - 1);
>> +
>> +    for (addr = start; addr < end; addr = next_addr) {
>> +        phys_addr_t rtt_granule;
>> +        int ret;
>> +        unsigned long align_addr = ALIGN(addr, map_size);
>> +
>> +        next_addr = ALIGN(addr + 1, map_size);
>> +
>> +        if (next_addr <= end && align_addr == addr) {
>> +            ret = realm_rtt_destroy(realm, addr, level,
>> +                        &rtt_granule, &next_addr);
>> +        } else {
>> +            /* Recurse a level deeper */
>> +            ret = realm_tear_down_rtt_level(realm,
>> +                            level + 1,
>> +                            addr,
>> +                            min(next_addr, end));
>> +            if (ret)
>> +                return ret;
>> +            continue;
>> +        }
> 
> I think it would be better readable if we did something like :
> 
>         /*
>          * The target range is smaller than what this level
>          * covers. Go deeper.
>          */
>         if (next_addr > end || align_addr != addr) {
>             ret = realm_tear_down_rtt_level(realm,
>                             level + 1, addr,
>                             min(next_addr, end));
>             if (ret)
>                 return ret;
>             continue;
>         }
> 
>         ret = realm_rtt_destroy(realm, addr, level,
>                     &rtt_granule, &next_addr);

Yes that seems clearly, thanks for the suggestion.

>> +
>> +        switch (RMI_RETURN_STATUS(ret)) {
>> +        case RMI_SUCCESS:
>> +            if (!WARN_ON(rmi_granule_undelegate(rtt_granule)))
>> +                free_page((unsigned long)phys_to_virt(rtt_granule));
>> +            break;
>> +        case RMI_ERROR_RTT:
>> +            if (next_addr > addr) {
>> +                /* unassigned or destroyed */
> 
> minor nit:
>                 /* RTT doesn't exist, skip */

Indeed, this comment is out of date - the spec now calls this condition
"Missing RTT" so I'll use that wording.

> 
>> +                break;
>> +            }
> 
>> +            if (WARN_ON(RMI_RETURN_INDEX(ret) != level))
>> +                return -EBUSY;
> 
> In practise, we only call this for the full IPA range and we wouldn't go
> deeper, if the top level entry was missing. So, there is no reason why
> the RMM didn't walk to the requested level. May be we could add a
> comment here :
>             /*
>              * We tear down the RTT range for the full IPA
>              * space, after everything is unmapped. Also we
>              * descend down only if we cannot tear down a
>              * top level RTT. Thus RMM must be able to walk
>              * to the requested level. e.g., a block mapping
>              * exists at L1 or L2.
>              */

Sure, will add.

>> +            if (WARN_ON(level == RME_RTT_MAX_LEVEL)) {
>> +                // Live entry
>> +                return -EBUSY;
> 
> 
> The first part of the comment above applies to this. So may be it is
> good to have it.
> 
> 
>> +            }
> 
>> +            /* Recurse a level deeper */
> 
> minor nit:
>             /*
>              * The table has active entries in it, recurse
>              * deeper and tear down the RTTs.
>              */

Sure

>> +            next_addr = ALIGN(addr + 1, map_size);
>> +            ret = realm_tear_down_rtt_level(realm,
>> +                            level + 1,
>> +                            addr,
>> +                            next_addr);
>> +            if (ret)
>> +                return ret;
>> +            /* Try again at this level */
> 
>             /*
>              * Now that the children RTTs are destroyed,
>              * retry at this level.
>              */

Sure

>> +            next_addr = addr;
>> +            break;
>> +        default:
>> +            WARN_ON(1);
>> +            return -ENXIO;
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int realm_tear_down_rtt_range(struct realm *realm,
>> +                     unsigned long start, unsigned long end)
>> +{
>> +    return realm_tear_down_rtt_level(realm, get_start_level(realm) + 1,
>> +                     start, end);
>> +}
>> +
>> +static void ensure_spare_page(struct realm *realm)
>> +{
>> +    phys_addr_t tmp_rtt;
>> +
>> +    /*
>> +     * Make sure we have a spare delegated page for tearing down the
>> +     * block mappings. We do this by allocating then freeing a page.
>> +     * We must use Atomic allocations as we are called with
>> kvm->mmu_lock
>> +     * held.
>> +     */
>> +    tmp_rtt = __alloc_delegated_page(realm, NULL, GFP_ATOMIC);
>> +
>> +    /*
>> +     * If the allocation failed, continue as we may not have a block
>> level
>> +     * mapping so it may not be fatal, otherwise free it to assign it
>> +     * to the spare page.
>> +     */
>> +    if (tmp_rtt != PHYS_ADDR_MAX)
>> +        free_delegated_page(realm, tmp_rtt);
>> +}
>> +
>> +void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits)
>> +{
>> +    struct realm *realm = &kvm->arch.realm;
>> +
>> +    ensure_spare_page(realm);
>> +
>> +    WARN_ON(realm_tear_down_rtt_range(realm, 0, (1UL << ia_bits)));
>> +}
> 
> minor nit: We don't seem to be using the "spare_page" yet in this patch.
> May be a good idea to move all the related changes
> (alloc_delegated_page() / free_delegated_page, ensure_spare_page() etc)
> to the patch where it may be better suited ?

Good point - I think the calls get added in "arm64: RME: Allow VMM to
set RIPAS". I'll try to move them there.

Thanks,

Steve

> Suzuki
> 
>> +
>>   /* Protects access to rme_vmid_bitmap */
>>   static DEFINE_SPINLOCK(rme_vmid_lock);
>>   static unsigned long *rme_vmid_bitmap;
> 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 14/14] virt: arm-cca-guest: TSM_REPORT support for realms
  2024-04-12  8:42   ` [PATCH v2 14/14] virt: arm-cca-guest: TSM_REPORT support for realms Steven Price
@ 2024-04-24 13:06     ` Thomas Fossati
  2024-04-24 13:27       ` Suzuki K Poulose
  2024-04-24 13:19     ` Suzuki K Poulose
  1 sibling, 1 reply; 104+ messages in thread
From: Thomas Fossati @ 2024-04-24 13:06 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Sami Mujawar, Catalin Marinas, Marc Zyngier,
	Will Deacon, James Morse, Oliver Upton, Suzuki K Poulose,
	Zenghui Yu, linux-arm-kernel, linux-kernel, Joey Gouly,
	Alexandru Elisei, Christoffer Dall, Fuad Tabba, linux-coco,
	Ganapatrao Kulkarni

Hi Steven, Sami,

On Fri, 12 Apr 2024 at 10:47, Steven Price <steven.price@arm.com> wrote:
> +/**
> + * arm_cca_report_new - Generate a new attestation token.
> + *
> + * @report: pointer to the TSM report context information.
> + * @data:  pointer to the context specific data for this module.
> + *
> + * Initialise the attestation token generation using the challenge data
> + * passed in the TSM decriptor.

Here, it'd be good to document two interesting facts about challenge data:
1. It must be at least 32 bytes, and
2. If its size is less than 64 bytes, it will be zero-padded.

cheers!

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 14/14] virt: arm-cca-guest: TSM_REPORT support for realms
  2024-04-12  8:42   ` [PATCH v2 14/14] virt: arm-cca-guest: TSM_REPORT support for realms Steven Price
  2024-04-24 13:06     ` Thomas Fossati
@ 2024-04-24 13:19     ` Suzuki K Poulose
  1 sibling, 0 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-24 13:19 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Sami Mujawar, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni

On 12/04/2024 09:42, Steven Price wrote:
> From: Sami Mujawar <sami.mujawar@arm.com>
> 
> Introduce an arm-cca-guest driver that registers with
> the configfs-tsm module to provide user interfaces for
> retrieving an attestation token.
> 
> When a new report is requested the arm-cca-guest driver
> invokes the appropriate RSI interfaces to query an
> attestation token.
> 
> The steps to retrieve an attestation token are as follows:
>    1. Mount the configfs filesystem if not already mounted
>       mount -t configfs none /sys/kernel/config
>    2. Generate an attestation token
>       report=/sys/kernel/config/tsm/report/report0
>       mkdir $report
>       dd if=/dev/urandom bs=64 count=1 > $report/inblob
>       hexdump -C $report/outblob
>       rmdir $report
> 
> Signed-off-by: Sami Mujawar <sami.mujawar@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   drivers/virt/coco/Kconfig                     |   2 +
>   drivers/virt/coco/Makefile                    |   1 +
>   drivers/virt/coco/arm-cca-guest/Kconfig       |  11 +
>   drivers/virt/coco/arm-cca-guest/Makefile      |   2 +
>   .../virt/coco/arm-cca-guest/arm-cca-guest.c   | 208 ++++++++++++++++++
>   5 files changed, 224 insertions(+)
>   create mode 100644 drivers/virt/coco/arm-cca-guest/Kconfig
>   create mode 100644 drivers/virt/coco/arm-cca-guest/Makefile
>   create mode 100644 drivers/virt/coco/arm-cca-guest/arm-cca-guest.c
> 
> diff --git a/drivers/virt/coco/Kconfig b/drivers/virt/coco/Kconfig
> index 87d142c1f932..4fb69804b622 100644
> --- a/drivers/virt/coco/Kconfig
> +++ b/drivers/virt/coco/Kconfig
> @@ -12,3 +12,5 @@ source "drivers/virt/coco/efi_secret/Kconfig"
>   source "drivers/virt/coco/sev-guest/Kconfig"
>   
>   source "drivers/virt/coco/tdx-guest/Kconfig"
> +
> +source "drivers/virt/coco/arm-cca-guest/Kconfig"
> diff --git a/drivers/virt/coco/Makefile b/drivers/virt/coco/Makefile
> index 18c1aba5edb7..a6228a1bf992 100644
> --- a/drivers/virt/coco/Makefile
> +++ b/drivers/virt/coco/Makefile
> @@ -6,3 +6,4 @@ obj-$(CONFIG_TSM_REPORTS)	+= tsm.o
>   obj-$(CONFIG_EFI_SECRET)	+= efi_secret/
>   obj-$(CONFIG_SEV_GUEST)		+= sev-guest/
>   obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx-guest/
> +obj-$(CONFIG_ARM_CCA_GUEST)	+= arm-cca-guest/
> diff --git a/drivers/virt/coco/arm-cca-guest/Kconfig b/drivers/virt/coco/arm-cca-guest/Kconfig
> new file mode 100644
> index 000000000000..c4039c10dce2
> --- /dev/null
> +++ b/drivers/virt/coco/arm-cca-guest/Kconfig
> @@ -0,0 +1,11 @@
> +config ARM_CCA_GUEST
> +	tristate "Arm CCA Guest driver"
> +	depends on ARM64
> +	default m
> +	select TSM_REPORTS
> +	help
> +	  The driver provides userspace interface to request and
> +	  attestation report from the Realm Management Monitor(RMM).
> +
> +	  If you choose 'M' here, this module will be called
> +	  realm-guest.

This needs to be updated to arm-cca-guest.

Suzuki


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 14/14] virt: arm-cca-guest: TSM_REPORT support for realms
  2024-04-24 13:06     ` Thomas Fossati
@ 2024-04-24 13:27       ` Suzuki K Poulose
  0 siblings, 0 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-24 13:27 UTC (permalink / raw)
  To: Thomas Fossati, Steven Price
  Cc: kvm, kvmarm, Sami Mujawar, Catalin Marinas, Marc Zyngier,
	Will Deacon, James Morse, Oliver Upton, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

On 24/04/2024 14:06, Thomas Fossati wrote:
> Hi Steven, Sami,
> 
> On Fri, 12 Apr 2024 at 10:47, Steven Price <steven.price@arm.com> wrote:
>> +/**
>> + * arm_cca_report_new - Generate a new attestation token.
>> + *
>> + * @report: pointer to the TSM report context information.
>> + * @data:  pointer to the context specific data for this module.
>> + *
>> + * Initialise the attestation token generation using the challenge data
>> + * passed in the TSM decriptor.
> 
> Here, it'd be good to document two interesting facts about challenge data:
> 1. It must be at least 32 bytes, and
> 2. If its size is less than 64 bytes, it will be zero-padded.

Agreed !

Suzuki

> 
> cheers!


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 01/43] KVM: Prepare for handling only shared mappings in mmu_notifier events
  2024-04-12  8:42   ` [PATCH v2 01/43] KVM: Prepare for handling only shared mappings in mmu_notifier events Steven Price
@ 2024-04-25  9:48     ` Fuad Tabba
  2024-04-25 15:58       ` Steven Price
  0 siblings, 1 reply; 104+ messages in thread
From: Fuad Tabba @ 2024-04-25  9:48 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Sean Christopherson, Catalin Marinas, Marc Zyngier,
	Will Deacon, James Morse, Oliver Upton, Suzuki K Poulose,
	Zenghui Yu, linux-arm-kernel, linux-kernel, Joey Gouly,
	Alexandru Elisei, Christoffer Dall, linux-coco,
	Ganapatrao Kulkarni

Hi,

On Fri, Apr 12, 2024 at 9:43 AM Steven Price <steven.price@arm.com> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Add flags to "struct kvm_gfn_range" to let notifier events target only
> shared and only private mappings, and write up the existing mmu_notifier
> events to be shared-only (private memory is never associated with a
> userspace virtual address, i.e. can't be reached via mmu_notifiers).
>
> Add two flags so that KVM can handle the three possibilities (shared,
> private, and shared+private) without needing something like a tri-state
> enum.
>
> Link: https://lore.kernel.org/all/ZJX0hk+KpQP0KUyB@google.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  include/linux/kvm_host.h | 2 ++
>  virt/kvm/kvm_main.c      | 7 +++++++
>  2 files changed, 9 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 48f31dcd318a..c7581360fd88 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -268,6 +268,8 @@ struct kvm_gfn_range {
>         gfn_t start;
>         gfn_t end;
>         union kvm_mmu_notifier_arg arg;
> +       bool only_private;
> +       bool only_shared;
>         bool may_block;
>  };
>  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fb49c2a60200..3486ceef6f4e 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -633,6 +633,13 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
>                          * the second or later invocation of the handler).
>                          */
>                         gfn_range.arg = range->arg;
> +
> +                       /*
> +                        * HVA-based notifications aren't relevant to private
> +                        * mappings as they don't have a userspace mapping.
> +                        */
> +                       gfn_range.only_private = false;
> +                       gfn_range.only_shared = true;
>                         gfn_range.may_block = range->may_block;

I'd discussed this with Sean when he posted this earlier. Having two
booleans to encode three valid states could be confusing. In response,
Sean suggested using an enum instead:
https://lore.kernel.org/all/ZUO1Giju0GkUdF0o@google.com/

Cheers,
/fuad

>
>                         /*


> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 17/43] arm64: RME: Allow VMM to set RIPAS
  2024-04-12  8:42   ` [PATCH v2 17/43] arm64: RME: Allow VMM to set RIPAS Steven Price
  2024-04-19  9:34     ` Suzuki K Poulose
@ 2024-04-25  9:53     ` Fuad Tabba
  2024-05-01 14:27     ` Jean-Philippe Brucker
  2 siblings, 0 replies; 104+ messages in thread
From: Fuad Tabba @ 2024-04-25  9:53 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, linux-coco, Ganapatrao Kulkarni

Hi,

On Fri, Apr 12, 2024 at 9:43 AM Steven Price <steven.price@arm.com> wrote:
>
> Each page within the protected region of the realm guest can be marked
> as either RAM or EMPTY. Allow the VMM to control this before the guest
> has started and provide the equivalent functions to change this (with
> the guest's approval) at runtime.
>
> When transitioning from RIPAS RAM (1) to RIPAS EMPTY (0) the memory is
> unmapped from the guest and undelegated allowing the memory to be reused
> by the host. When transitioning to RIPAS RAM the actual population of
> the leaf RTTs is done later on stage 2 fault, however it may be
> necessary to allocate additional RTTs to represent the range requested.
>
> When freeing a block mapping it is necessary to temporarily unfold the
> RTT which requires delegating an extra page to the RMM, this page can
> then be recovered once the contents of the block mapping have been
> freed. A spare, delegated page (spare_page) is used for this purpose.
>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  arch/arm64/include/asm/kvm_rme.h |  16 ++
>  arch/arm64/kvm/mmu.c             |   8 +-
>  arch/arm64/kvm/rme.c             | 390 +++++++++++++++++++++++++++++++
>  3 files changed, 411 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
> index 915e76068b00..cc8f81cfc3c0 100644
> --- a/arch/arm64/include/asm/kvm_rme.h
> +++ b/arch/arm64/include/asm/kvm_rme.h
> @@ -96,6 +96,14 @@ void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits);
>  int kvm_create_rec(struct kvm_vcpu *vcpu);
>  void kvm_destroy_rec(struct kvm_vcpu *vcpu);
>
> +void kvm_realm_unmap_range(struct kvm *kvm,
> +                          unsigned long ipa,
> +                          u64 size,
> +                          bool unmap_private);
> +int realm_set_ipa_state(struct kvm_vcpu *vcpu,
> +                       unsigned long addr, unsigned long end,
> +                       unsigned long ripas);
> +
>  #define RME_RTT_BLOCK_LEVEL    2
>  #define RME_RTT_MAX_LEVEL      3
>
> @@ -114,4 +122,12 @@ static inline unsigned long rme_rtt_level_mapsize(int level)
>         return (1UL << RME_RTT_LEVEL_SHIFT(level));
>  }
>
> +static inline bool realm_is_addr_protected(struct realm *realm,
> +                                          unsigned long addr)
> +{
> +       unsigned int ia_bits = realm->ia_bits;
> +
> +       return !(addr & ~(BIT(ia_bits - 1) - 1));
> +}
> +
>  #endif
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 46f0c4e80ace..8a7b5449697f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -310,6 +310,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
>   * @start: The intermediate physical base address of the range to unmap
>   * @size:  The size of the area to unmap
>   * @may_block: Whether or not we are permitted to block
> + * @only_shared: If true then protected mappings should not be unmapped
>   *
>   * Clear a range of stage-2 mappings, lowering the various ref-counts.  Must
>   * be called while holding mmu_lock (unless for freeing the stage2 pgd before
> @@ -317,7 +318,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
>   * with things behind our backs.
>   */
>  static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size,
> -                                bool may_block)
> +                                bool may_block, bool only_shared)

I found the new added only_shared  parameter to be a bit confusing,
since this patch also introduces kvm_realm_unmap_range(..., bool
unmap_private), where unmap_private has a different meaning, which I
think is really unmap_all.. It might be better if the parameter meant
the same thing everywhere.  Having helpers to sort this out, similar
to what you have with realm_unmap_range_*() might be even clearer.

Thanks,
/fuad

>  {
>         struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
>         phys_addr_t end = start + size;
> @@ -330,7 +331,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
>
>  static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>  {
> -       __unmap_stage2_range(mmu, start, size, true);
> +       __unmap_stage2_range(mmu, start, size, true, false);
>  }
>
>  static void stage2_flush_memslot(struct kvm *kvm,
> @@ -1771,7 +1772,8 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>
>         __unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
>                              (range->end - range->start) << PAGE_SHIFT,
> -                            range->may_block);
> +                            range->may_block,
> +                            range->only_shared);
>
>         return false;
>  }
> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
> index 629a095bea61..9e5983c51393 100644
> --- a/arch/arm64/kvm/rme.c
> +++ b/arch/arm64/kvm/rme.c
> @@ -79,6 +79,12 @@ static phys_addr_t __alloc_delegated_page(struct realm *realm,
>         return phys;
>  }
>
> +static phys_addr_t alloc_delegated_page(struct realm *realm,
> +                                       struct kvm_mmu_memory_cache *mc)
> +{
> +       return __alloc_delegated_page(realm, mc, GFP_KERNEL);
> +}
> +
>  static void free_delegated_page(struct realm *realm, phys_addr_t phys)
>  {
>         if (realm->spare_page == PHYS_ADDR_MAX) {
> @@ -94,6 +100,151 @@ static void free_delegated_page(struct realm *realm, phys_addr_t phys)
>         free_page((unsigned long)phys_to_virt(phys));
>  }
>
> +static int realm_rtt_create(struct realm *realm,
> +                           unsigned long addr,
> +                           int level,
> +                           phys_addr_t phys)
> +{
> +       addr = ALIGN_DOWN(addr, rme_rtt_level_mapsize(level - 1));
> +       return rmi_rtt_create(virt_to_phys(realm->rd), phys, addr, level);
> +}
> +
> +static int realm_rtt_fold(struct realm *realm,
> +                         unsigned long addr,
> +                         int level,
> +                         phys_addr_t *rtt_granule)
> +{
> +       unsigned long out_rtt;
> +       int ret;
> +
> +       ret = rmi_rtt_fold(virt_to_phys(realm->rd), addr, level, &out_rtt);
> +
> +       if (RMI_RETURN_STATUS(ret) == RMI_SUCCESS && rtt_granule)
> +               *rtt_granule = out_rtt;
> +
> +       return ret;
> +}
> +
> +static int realm_destroy_protected(struct realm *realm,
> +                                  unsigned long ipa,
> +                                  unsigned long *next_addr)
> +{
> +       unsigned long rd = virt_to_phys(realm->rd);
> +       unsigned long addr;
> +       phys_addr_t rtt;
> +       int ret;
> +
> +loop:
> +       ret = rmi_data_destroy(rd, ipa, &addr, next_addr);
> +       if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
> +               if (*next_addr > ipa)
> +                       return 0; /* UNASSIGNED */
> +               rtt = alloc_delegated_page(realm, NULL);
> +               if (WARN_ON(rtt == PHYS_ADDR_MAX))
> +                       return -1;
> +               /* ASSIGNED - ipa is mapped as a block, so split */
> +               ret = realm_rtt_create(realm, ipa,
> +                                      RMI_RETURN_INDEX(ret) + 1, rtt);
> +               if (WARN_ON(ret)) {
> +                       free_delegated_page(realm, rtt);
> +                       return -1;
> +               }
> +               /* retry */
> +               goto loop;
> +       } else if (WARN_ON(ret)) {
> +               return -1;
> +       }
> +       ret = rmi_granule_undelegate(addr);
> +
> +       /*
> +        * If the undelegate fails then something has gone seriously
> +        * wrong: take an extra reference to just leak the page
> +        */
> +       if (WARN_ON(ret))
> +               get_page(phys_to_page(addr));
> +
> +       return 0;
> +}
> +
> +static void realm_unmap_range_shared(struct kvm *kvm,
> +                                    int level,
> +                                    unsigned long start,
> +                                    unsigned long end)
> +{
> +       struct realm *realm = &kvm->arch.realm;
> +       unsigned long rd = virt_to_phys(realm->rd);
> +       ssize_t map_size = rme_rtt_level_mapsize(level);
> +       unsigned long next_addr, addr;
> +       unsigned long shared_bit = BIT(realm->ia_bits - 1);
> +
> +       if (WARN_ON(level > RME_RTT_MAX_LEVEL))
> +               return;
> +
> +       start |= shared_bit;
> +       end |= shared_bit;
> +
> +       for (addr = start; addr < end; addr = next_addr) {
> +               unsigned long align_addr = ALIGN(addr, map_size);
> +               int ret;
> +
> +               next_addr = ALIGN(addr + 1, map_size);
> +
> +               if (align_addr != addr || next_addr > end) {
> +                       /* Need to recurse deeper */
> +                       if (addr < align_addr)
> +                               next_addr = align_addr;
> +                       realm_unmap_range_shared(kvm, level + 1, addr,
> +                                                min(next_addr, end));
> +                       continue;
> +               }
> +
> +               ret = rmi_rtt_unmap_unprotected(rd, addr, level, &next_addr);
> +               switch (RMI_RETURN_STATUS(ret)) {
> +               case RMI_SUCCESS:
> +                       break;
> +               case RMI_ERROR_RTT:
> +                       if (next_addr == addr) {
> +                               next_addr = ALIGN(addr + 1, map_size);
> +                               realm_unmap_range_shared(kvm, level + 1, addr,
> +                                                        next_addr);
> +                       }
> +                       break;
> +               default:
> +                       WARN_ON(1);
> +               }
> +       }
> +}
> +
> +static void realm_unmap_range_private(struct kvm *kvm,
> +                                     unsigned long start,
> +                                     unsigned long end)
> +{
> +       struct realm *realm = &kvm->arch.realm;
> +       ssize_t map_size = RME_PAGE_SIZE;
> +       unsigned long next_addr, addr;
> +
> +       for (addr = start; addr < end; addr = next_addr) {
> +               int ret;
> +
> +               next_addr = ALIGN(addr + 1, map_size);
> +
> +               ret = realm_destroy_protected(realm, addr, &next_addr);
> +
> +               if (WARN_ON(ret))
> +                       break;
> +       }
> +}
> +
> +static void realm_unmap_range(struct kvm *kvm,
> +                             unsigned long start,
> +                             unsigned long end,
> +                             bool unmap_private)
> +{
> +       realm_unmap_range_shared(kvm, RME_RTT_MAX_LEVEL - 1, start, end);
> +       if (unmap_private)
> +               realm_unmap_range_private(kvm, start, end);
> +}
> +
>  u32 kvm_realm_ipa_limit(void)
>  {
>         return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
> @@ -190,6 +341,30 @@ static int realm_rtt_destroy(struct realm *realm, unsigned long addr,
>         return ret;
>  }
>
> +static int realm_create_rtt_levels(struct realm *realm,
> +                                  unsigned long ipa,
> +                                  int level,
> +                                  int max_level,
> +                                  struct kvm_mmu_memory_cache *mc)
> +{
> +       if (WARN_ON(level == max_level))
> +               return 0;
> +
> +       while (level++ < max_level) {
> +               phys_addr_t rtt = alloc_delegated_page(realm, mc);
> +
> +               if (rtt == PHYS_ADDR_MAX)
> +                       return -ENOMEM;
> +
> +               if (realm_rtt_create(realm, ipa, level, rtt)) {
> +                       free_delegated_page(realm, rtt);
> +                       return -ENXIO;
> +               }
> +       }
> +
> +       return 0;
> +}
> +
>  static int realm_tear_down_rtt_level(struct realm *realm, int level,
>                                      unsigned long start, unsigned long end)
>  {
> @@ -265,6 +440,68 @@ static int realm_tear_down_rtt_range(struct realm *realm,
>                                          start, end);
>  }
>
> +/*
> + * Returns 0 on successful fold, a negative value on error, a positive value if
> + * we were not able to fold all tables at this level.
> + */
> +static int realm_fold_rtt_level(struct realm *realm, int level,
> +                               unsigned long start, unsigned long end)
> +{
> +       int not_folded = 0;
> +       ssize_t map_size;
> +       unsigned long addr, next_addr;
> +
> +       if (WARN_ON(level > RME_RTT_MAX_LEVEL))
> +               return -EINVAL;
> +
> +       map_size = rme_rtt_level_mapsize(level - 1);
> +
> +       for (addr = start; addr < end; addr = next_addr) {
> +               phys_addr_t rtt_granule;
> +               int ret;
> +               unsigned long align_addr = ALIGN(addr, map_size);
> +
> +               next_addr = ALIGN(addr + 1, map_size);
> +
> +               ret = realm_rtt_fold(realm, align_addr, level, &rtt_granule);
> +
> +               switch (RMI_RETURN_STATUS(ret)) {
> +               case RMI_SUCCESS:
> +                       if (!WARN_ON(rmi_granule_undelegate(rtt_granule)))
> +                               free_page((unsigned long)phys_to_virt(rtt_granule));
> +                       break;
> +               case RMI_ERROR_RTT:
> +                       if (level == RME_RTT_MAX_LEVEL ||
> +                           RMI_RETURN_INDEX(ret) < level) {
> +                               not_folded++;
> +                               break;
> +                       }
> +                       /* Recurse a level deeper */
> +                       ret = realm_fold_rtt_level(realm,
> +                                                  level + 1,
> +                                                  addr,
> +                                                  next_addr);
> +                       if (ret < 0)
> +                               return ret;
> +                       else if (ret == 0)
> +                               /* Try again at this level */
> +                               next_addr = addr;
> +                       break;
> +               default:
> +                       return -ENXIO;
> +               }
> +       }
> +
> +       return not_folded;
> +}
> +
> +static int realm_fold_rtt_range(struct realm *realm,
> +                               unsigned long start, unsigned long end)
> +{
> +       return realm_fold_rtt_level(realm, get_start_level(realm) + 1,
> +                                   start, end);
> +}
> +
>  static void ensure_spare_page(struct realm *realm)
>  {
>         phys_addr_t tmp_rtt;
> @@ -295,6 +532,147 @@ void kvm_realm_destroy_rtts(struct kvm *kvm, u32 ia_bits)
>         WARN_ON(realm_tear_down_rtt_range(realm, 0, (1UL << ia_bits)));
>  }
>
> +void kvm_realm_unmap_range(struct kvm *kvm, unsigned long ipa, u64 size,
> +                          bool unmap_private)
> +{
> +       unsigned long end = ipa + size;
> +       struct realm *realm = &kvm->arch.realm;
> +
> +       end = min(BIT(realm->ia_bits - 1), end);
> +
> +       ensure_spare_page(realm);
> +
> +       realm_unmap_range(kvm, ipa, end, unmap_private);
> +
> +       realm_fold_rtt_range(realm, ipa, end);
> +}
> +
> +static int find_map_level(struct realm *realm,
> +                         unsigned long start,
> +                         unsigned long end)
> +{
> +       int level = RME_RTT_MAX_LEVEL;
> +
> +       while (level > get_start_level(realm)) {
> +               unsigned long map_size = rme_rtt_level_mapsize(level - 1);
> +
> +               if (!IS_ALIGNED(start, map_size) ||
> +                   (start + map_size) > end)
> +                       break;
> +
> +               level--;
> +       }
> +
> +       return level;
> +}
> +
> +int realm_set_ipa_state(struct kvm_vcpu *vcpu,
> +                       unsigned long start,
> +                       unsigned long end,
> +                       unsigned long ripas)
> +{
> +       struct kvm *kvm = vcpu->kvm;
> +       struct realm *realm = &kvm->arch.realm;
> +       struct realm_rec *rec = &vcpu->arch.rec;
> +       phys_addr_t rd_phys = virt_to_phys(realm->rd);
> +       phys_addr_t rec_phys = virt_to_phys(rec->rec_page);
> +       struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> +       unsigned long ipa = start;
> +       int ret = 0;
> +
> +       while (ipa < end) {
> +               unsigned long next;
> +
> +               ret = rmi_rtt_set_ripas(rd_phys, rec_phys, ipa, end, &next);
> +
> +               if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
> +                       int walk_level = RMI_RETURN_INDEX(ret);
> +                       int level = find_map_level(realm, ipa, end);
> +
> +                       if (walk_level < level) {
> +                               ret = realm_create_rtt_levels(realm, ipa,
> +                                                             walk_level,
> +                                                             level,
> +                                                             memcache);
> +                               if (!ret)
> +                                       continue;
> +                       } else {
> +                               ret = -EINVAL;
> +                       }
> +
> +                       break;
> +               } else if (RMI_RETURN_STATUS(ret) != RMI_SUCCESS) {
> +                       WARN(1, "Unexpected error in %s: %#x\n", __func__,
> +                            ret);
> +                       ret = -EINVAL;
> +                       break;
> +               }
> +               ipa = next;
> +       }
> +
> +       if (ripas == RMI_EMPTY && ipa != start)
> +               kvm_realm_unmap_range(kvm, start, ipa - start, true);
> +
> +       return ret;
> +}
> +
> +static int realm_init_ipa_state(struct realm *realm,
> +                               unsigned long ipa,
> +                               unsigned long end)
> +{
> +       phys_addr_t rd_phys = virt_to_phys(realm->rd);
> +       int ret;
> +
> +       while (ipa < end) {
> +               unsigned long next;
> +
> +               ret = rmi_rtt_init_ripas(rd_phys, ipa, end, &next);
> +
> +               if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
> +                       int err_level = RMI_RETURN_INDEX(ret);
> +                       int level = find_map_level(realm, ipa, end);
> +
> +                       if (WARN_ON(err_level >= level))
> +                               return -ENXIO;
> +
> +                       ret = realm_create_rtt_levels(realm, ipa,
> +                                                     err_level,
> +                                                     level, NULL);
> +                       if (ret)
> +                               return ret;
> +                       /* Retry with the RTT levels in place */
> +                       continue;
> +               } else if (WARN_ON(ret)) {
> +                       return -ENXIO;
> +               }
> +
> +               ipa = next;
> +       }
> +
> +       return 0;
> +}
> +
> +static int kvm_init_ipa_range_realm(struct kvm *kvm,
> +                                   struct kvm_cap_arm_rme_init_ipa_args *args)
> +{
> +       int ret = 0;
> +       gpa_t addr, end;
> +       struct realm *realm = &kvm->arch.realm;
> +
> +       addr = args->init_ipa_base;
> +       end = addr + args->init_ipa_size;
> +
> +       if (end < addr)
> +               return -EINVAL;
> +
> +       if (kvm_realm_state(kvm) != REALM_STATE_NEW)
> +               return -EINVAL;
> +
> +       ret = realm_init_ipa_state(realm, addr, end);
> +
> +       return ret;
> +}
> +
>  /* Protects access to rme_vmid_bitmap */
>  static DEFINE_SPINLOCK(rme_vmid_lock);
>  static unsigned long *rme_vmid_bitmap;
> @@ -418,6 +796,18 @@ int kvm_realm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>         case KVM_CAP_ARM_RME_CREATE_RD:
>                 r = kvm_create_realm(kvm);
>                 break;
> +       case KVM_CAP_ARM_RME_INIT_IPA_REALM: {
> +               struct kvm_cap_arm_rme_init_ipa_args args;
> +               void __user *argp = u64_to_user_ptr(cap->args[1]);
> +
> +               if (copy_from_user(&args, argp, sizeof(args))) {
> +                       r = -EFAULT;
> +                       break;
> +               }
> +
> +               r = kvm_init_ipa_range_realm(kvm, &args);
> +               break;
> +       }
>         default:
>                 r = -EINVAL;
>                 break;
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 21/43] arm64: RME: Runtime faulting of memory
  2024-04-12  8:42   ` [PATCH v2 21/43] arm64: RME: Runtime faulting of memory Steven Price
@ 2024-04-25 10:43     ` Fuad Tabba
  0 siblings, 0 replies; 104+ messages in thread
From: Fuad Tabba @ 2024-04-25 10:43 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, linux-coco, Ganapatrao Kulkarni

Hi,

On Fri, Apr 12, 2024 at 9:44 AM Steven Price <steven.price@arm.com> wrote:
>
> At runtime if the realm guest accesses memory which hasn't yet been
> mapped then KVM needs to either populate the region or fault the guest.
>
> For memory in the lower (protected) region of IPA a fresh page is
> provided to the RMM which will zero the contents. For memory in the
> upper (shared) region of IPA, the memory from the memslot is mapped
> into the realm VM non secure.
>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  arch/arm64/include/asm/kvm_emulate.h |  10 ++
>  arch/arm64/include/asm/kvm_rme.h     |  10 ++
>  arch/arm64/kvm/mmu.c                 | 119 +++++++++++++++-
>  arch/arm64/kvm/rme.c                 | 199 ++++++++++++++++++++++++---
>  4 files changed, 316 insertions(+), 22 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 2209a7c6267f..d40d998d9be2 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -629,6 +629,16 @@ static inline bool kvm_realm_is_created(struct kvm *kvm)
>         return kvm_is_realm(kvm) && kvm_realm_state(kvm) != REALM_STATE_NONE;
>  }
>
> +static inline gpa_t kvm_gpa_stolen_bits(struct kvm *kvm)
> +{
> +       if (kvm_is_realm(kvm)) {
> +               struct realm *realm = &kvm->arch.realm;
> +
> +               return BIT(realm->ia_bits - 1);
> +       }
> +       return 0;
> +}
> +
>  static inline bool vcpu_is_rec(struct kvm_vcpu *vcpu)
>  {
>         if (static_branch_unlikely(&kvm_rme_is_available))
> diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/asm/kvm_rme.h
> index 749f2eb97bd4..48c7766fadeb 100644
> --- a/arch/arm64/include/asm/kvm_rme.h
> +++ b/arch/arm64/include/asm/kvm_rme.h
> @@ -103,6 +103,16 @@ void kvm_realm_unmap_range(struct kvm *kvm,
>                            unsigned long ipa,
>                            u64 size,
>                            bool unmap_private);
> +int realm_map_protected(struct realm *realm,
> +                       unsigned long base_ipa,
> +                       struct page *dst_page,
> +                       unsigned long map_size,
> +                       struct kvm_mmu_memory_cache *memcache);
> +int realm_map_non_secure(struct realm *realm,
> +                        unsigned long ipa,
> +                        struct page *page,
> +                        unsigned long map_size,
> +                        struct kvm_mmu_memory_cache *memcache);
>  int realm_set_ipa_state(struct kvm_vcpu *vcpu,
>                         unsigned long addr, unsigned long end,
>                         unsigned long ripas);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 8a7b5449697f..50a49e4e2020 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -325,8 +325,13 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
>
>         lockdep_assert_held_write(&kvm->mmu_lock);
>         WARN_ON(size & ~PAGE_MASK);
> -       WARN_ON(stage2_apply_range(mmu, start, end, kvm_pgtable_stage2_unmap,
> -                                  may_block));
> +
> +       if (kvm_is_realm(kvm))
> +               kvm_realm_unmap_range(kvm, start, size, !only_shared);
> +       else
> +               WARN_ON(stage2_apply_range(mmu, start, end,
> +                                          kvm_pgtable_stage2_unmap,
> +                                          may_block));
>  }
>
>  static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
> @@ -340,7 +345,11 @@ static void stage2_flush_memslot(struct kvm *kvm,
>         phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
>         phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
>
> -       stage2_apply_range_resched(&kvm->arch.mmu, addr, end, kvm_pgtable_stage2_flush);
> +       if (kvm_is_realm(kvm))
> +               kvm_realm_unmap_range(kvm, addr, end - addr, false);
> +       else
> +               stage2_apply_range_resched(&kvm->arch.mmu, addr, end,
> +                                          kvm_pgtable_stage2_flush);
>  }
>
>  /**
> @@ -997,6 +1006,10 @@ void stage2_unmap_vm(struct kvm *kvm)
>         struct kvm_memory_slot *memslot;
>         int idx, bkt;
>
> +       /* For realms this is handled by the RMM so nothing to do here */
> +       if (kvm_is_realm(kvm))
> +               return;
> +
>         idx = srcu_read_lock(&kvm->srcu);
>         mmap_read_lock(current->mm);
>         write_lock(&kvm->mmu_lock);
> @@ -1020,6 +1033,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>         if (kvm_is_realm(kvm) &&
>             (kvm_realm_state(kvm) != REALM_STATE_DEAD &&
>              kvm_realm_state(kvm) != REALM_STATE_NONE)) {
> +               unmap_stage2_range(mmu, 0, (~0ULL) & PAGE_MASK);
>                 write_unlock(&kvm->mmu_lock);
>                 kvm_realm_destroy_rtts(kvm, pgt->ia_bits);
>                 return;
> @@ -1383,6 +1397,69 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
>         return vma->vm_flags & VM_MTE_ALLOWED;
>  }
>
> +static int realm_map_ipa(struct kvm *kvm, phys_addr_t ipa,
> +                        kvm_pfn_t pfn, unsigned long map_size,
> +                        enum kvm_pgtable_prot prot,
> +                        struct kvm_mmu_memory_cache *memcache)
> +{
> +       struct realm *realm = &kvm->arch.realm;
> +       struct page *page = pfn_to_page(pfn);
> +
> +       if (WARN_ON(!(prot & KVM_PGTABLE_PROT_W)))
> +               return -EFAULT;
> +
> +       if (!realm_is_addr_protected(realm, ipa))
> +               return realm_map_non_secure(realm, ipa, page, map_size,
> +                                           memcache);
> +
> +       return realm_map_protected(realm, ipa, page, map_size, memcache);
> +}
> +
> +static int private_memslot_fault(struct kvm_vcpu *vcpu,
> +                                phys_addr_t fault_ipa,
> +                                struct kvm_memory_slot *memslot)
> +{
> +       struct kvm *kvm = vcpu->kvm;
> +       gpa_t gpa_stolen_mask = kvm_gpa_stolen_bits(kvm);
> +       gfn_t gfn = (fault_ipa & ~gpa_stolen_mask) >> PAGE_SHIFT;
> +       bool is_priv_gfn = !((fault_ipa & gpa_stolen_mask) == gpa_stolen_mask);
> +       bool priv_exists = kvm_mem_is_private(kvm, gfn);
> +       struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> +       int order;
> +       kvm_pfn_t pfn;
> +       int ret;
> +
> +       if (priv_exists != is_priv_gfn) {
> +               kvm_prepare_memory_fault_exit(vcpu,
> +                                             fault_ipa & ~gpa_stolen_mask,
> +                                             PAGE_SIZE,
> +                                             kvm_is_write_fault(vcpu),
> +                                             false, is_priv_gfn);
> +
> +               return 0;
> +       }
> +
> +       if (!is_priv_gfn) {
> +               /* Not a private mapping, handling normally */
> +               return -EAGAIN;
> +       }
> +
> +       if (kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &order))
> +               return 1; /* Retry */

You don't need to pass a variable to hold the order if you don't need
it. You can pass NULL.

I am also confused about the return, why do you return 1 regardless of
the reason kvm_gmem_get_pfn() fails?

> +       ret = kvm_mmu_topup_memory_cache(memcache,
> +                                        kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu));
> +       if (ret)
> +               return ret;

If this fails you should release the page you got earlier (e.g.,
kvm_release_pfn_clean()), or you could move it before
kvm_gmem_get_pfn().

> +       /* FIXME: Should be able to use bigger than PAGE_SIZE mappings */
> +       ret = realm_map_ipa(kvm, fault_ipa, pfn, PAGE_SIZE, KVM_PGTABLE_PROT_W,
> +                            memcache);
> +       if (!ret)
> +               return 1; /* Handled */

Should also release the page if it fails. Speaking of which,
where/when do you eventually release the page?

Cheers,
/fuad

> +       return ret;
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                           struct kvm_memory_slot *memslot, unsigned long hva,
>                           bool fault_is_perm)
> @@ -1402,10 +1479,19 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         long vma_pagesize, fault_granule;
>         enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>         struct kvm_pgtable *pgt;
> +       gpa_t gpa_stolen_mask = kvm_gpa_stolen_bits(vcpu->kvm);
>
>         if (fault_is_perm)
>                 fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
>         write_fault = kvm_is_write_fault(vcpu);
> +
> +       /*
> +        * Realms cannot map protected pages read-only
> +        * FIXME: It should be possible to map unprotected pages read-only
> +        */
> +       if (vcpu_is_rec(vcpu))
> +               write_fault = true;
> +
>         exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
>         VM_BUG_ON(write_fault && exec_fault);
>
> @@ -1478,7 +1564,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
>                 fault_ipa &= ~(vma_pagesize - 1);
>
> -       gfn = fault_ipa >> PAGE_SHIFT;
> +       gfn = (fault_ipa & ~gpa_stolen_mask) >> PAGE_SHIFT;
>         mte_allowed = kvm_vma_mte_allowed(vma);
>
>         vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
> @@ -1538,7 +1624,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>          * If we are not forced to use page mapping, check if we are
>          * backed by a THP and thus use block mapping if possible.
>          */
> -       if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
> +       /* FIXME: We shouldn't need to disable this for realms */
> +       if (vma_pagesize == PAGE_SIZE && !(force_pte || device || kvm_is_realm(kvm))) {
>                 if (fault_is_perm && fault_granule > PAGE_SIZE)
>                         vma_pagesize = fault_granule;
>                 else
> @@ -1584,6 +1671,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>          */
>         if (fault_is_perm && vma_pagesize == fault_granule)
>                 ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
> +       else if (kvm_is_realm(kvm))
> +               ret = realm_map_ipa(kvm, fault_ipa, pfn, vma_pagesize,
> +                                   prot, memcache);
>         else
>                 ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
>                                              __pfn_to_phys(pfn), prot,
> @@ -1638,6 +1728,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>         struct kvm_memory_slot *memslot;
>         unsigned long hva;
>         bool is_iabt, write_fault, writable;
> +       gpa_t gpa_stolen_mask = kvm_gpa_stolen_bits(vcpu->kvm);
>         gfn_t gfn;
>         int ret, idx;
>
> @@ -1693,8 +1784,15 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>
>         idx = srcu_read_lock(&vcpu->kvm->srcu);
>
> -       gfn = fault_ipa >> PAGE_SHIFT;
> +       gfn = (fault_ipa & ~gpa_stolen_mask) >> PAGE_SHIFT;
>         memslot = gfn_to_memslot(vcpu->kvm, gfn);
> +
> +       if (kvm_slot_can_be_private(memslot)) {
> +               ret = private_memslot_fault(vcpu, fault_ipa, memslot);
> +               if (ret != -EAGAIN)
> +                       goto out;
> +       }
> +
>         hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
>         write_fault = kvm_is_write_fault(vcpu);
>         if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
> @@ -1738,6 +1836,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>                  * of the page size.
>                  */
>                 fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
> +               fault_ipa &= ~gpa_stolen_mask;
>                 ret = io_mem_abort(vcpu, fault_ipa);
>                 goto out_unlock;
>         }
> @@ -1819,6 +1918,10 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>         if (!kvm->arch.mmu.pgt)
>                 return false;
>
> +       /* We don't support aging for Realms */
> +       if (kvm_is_realm(kvm))
> +               return true;
> +
>         return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
>                                                    range->start << PAGE_SHIFT,
>                                                    size, true);
> @@ -1831,6 +1934,10 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>         if (!kvm->arch.mmu.pgt)
>                 return false;
>
> +       /* We don't support aging for Realms */
> +       if (kvm_is_realm(kvm))
> +               return true;
> +
>         return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
>                                                    range->start << PAGE_SHIFT,
>                                                    size, false);
> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
> index 4aab507f896e..72f6f5f542c4 100644
> --- a/arch/arm64/kvm/rme.c
> +++ b/arch/arm64/kvm/rme.c
> @@ -606,6 +606,170 @@ static int fold_rtt(struct realm *realm, unsigned long addr, int level)
>         return 0;
>  }
>
> +int realm_map_protected(struct realm *realm,
> +                       unsigned long base_ipa,
> +                       struct page *dst_page,
> +                       unsigned long map_size,
> +                       struct kvm_mmu_memory_cache *memcache)
> +{
> +       phys_addr_t dst_phys = page_to_phys(dst_page);
> +       phys_addr_t rd = virt_to_phys(realm->rd);
> +       unsigned long phys = dst_phys;
> +       unsigned long ipa = base_ipa;
> +       unsigned long size;
> +       int map_level;
> +       int ret = 0;
> +
> +       if (WARN_ON(!IS_ALIGNED(ipa, map_size)))
> +               return -EINVAL;
> +
> +       switch (map_size) {
> +       case PAGE_SIZE:
> +               map_level = 3;
> +               break;
> +       case RME_L2_BLOCK_SIZE:
> +               map_level = 2;
> +               break;
> +       default:
> +               return -EINVAL;
> +       }
> +
> +       if (map_level < RME_RTT_MAX_LEVEL) {
> +               /*
> +                * A temporary RTT is needed during the map, precreate it,
> +                * however if there is an error (e.g. missing parent tables)
> +                * this will be handled below.
> +                */
> +               realm_create_rtt_levels(realm, ipa, map_level,
> +                                       RME_RTT_MAX_LEVEL, memcache);
> +       }
> +
> +       for (size = 0; size < map_size; size += PAGE_SIZE) {
> +               if (rmi_granule_delegate(phys)) {
> +                       struct rtt_entry rtt;
> +
> +                       /*
> +                        * It's possible we raced with another VCPU on the same
> +                        * fault. If the entry exists and matches then exit
> +                        * early and assume the other VCPU will handle the
> +                        * mapping.
> +                        */
> +                       if (rmi_rtt_read_entry(rd, ipa, RME_RTT_MAX_LEVEL, &rtt))
> +                               goto err;
> +
> +                       // FIXME: For a block mapping this could race at level
> +                       // 2 or 3...
> +                       if (WARN_ON((rtt.walk_level != RME_RTT_MAX_LEVEL ||
> +                                    rtt.state != RMI_ASSIGNED ||
> +                                    rtt.desc != phys))) {
> +                               goto err;
> +                       }
> +
> +                       return 0;
> +               }
> +
> +               ret = rmi_data_create_unknown(rd, phys, ipa);
> +
> +               if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
> +                       /* Create missing RTTs and retry */
> +                       int level = RMI_RETURN_INDEX(ret);
> +
> +                       ret = realm_create_rtt_levels(realm, ipa, level,
> +                                                     RME_RTT_MAX_LEVEL,
> +                                                     memcache);
> +                       WARN_ON(ret);
> +                       if (ret)
> +                               goto err_undelegate;
> +
> +                       ret = rmi_data_create_unknown(rd, phys, ipa);
> +               }
> +               WARN_ON(ret);
> +
> +               if (ret)
> +                       goto err_undelegate;
> +
> +               phys += PAGE_SIZE;
> +               ipa += PAGE_SIZE;
> +       }
> +
> +       if (map_size == RME_L2_BLOCK_SIZE)
> +               ret = fold_rtt(realm, base_ipa, map_level);
> +       if (WARN_ON(ret))
> +               goto err;
> +
> +       return 0;
> +
> +err_undelegate:
> +       if (WARN_ON(rmi_granule_undelegate(phys))) {
> +               /* Page can't be returned to NS world so is lost */
> +               get_page(phys_to_page(phys));
> +       }
> +err:
> +       while (size > 0) {
> +               unsigned long data, top;
> +
> +               phys -= PAGE_SIZE;
> +               size -= PAGE_SIZE;
> +               ipa -= PAGE_SIZE;
> +
> +               WARN_ON(rmi_data_destroy(rd, ipa, &data, &top));
> +
> +               if (WARN_ON(rmi_granule_undelegate(phys))) {
> +                       /* Page can't be returned to NS world so is lost */
> +                       get_page(phys_to_page(phys));
> +               }
> +       }
> +       return -ENXIO;
> +}
> +
> +int realm_map_non_secure(struct realm *realm,
> +                        unsigned long ipa,
> +                        struct page *page,
> +                        unsigned long map_size,
> +                        struct kvm_mmu_memory_cache *memcache)
> +{
> +       phys_addr_t rd = virt_to_phys(realm->rd);
> +       int map_level;
> +       int ret = 0;
> +       unsigned long desc = page_to_phys(page) |
> +                            PTE_S2_MEMATTR(MT_S2_FWB_NORMAL) |
> +                            /* FIXME: Read+Write permissions for now */
> +                            (3 << 6) |
> +                            PTE_SHARED;
> +
> +       if (WARN_ON(!IS_ALIGNED(ipa, map_size)))
> +               return -EINVAL;
> +
> +       switch (map_size) {
> +       case PAGE_SIZE:
> +               map_level = 3;
> +               break;
> +       case RME_L2_BLOCK_SIZE:
> +               map_level = 2;
> +               break;
> +       default:
> +               return -EINVAL;
> +       }
> +
> +       ret = rmi_rtt_map_unprotected(rd, ipa, map_level, desc);
> +
> +       if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
> +               /* Create missing RTTs and retry */
> +               int level = RMI_RETURN_INDEX(ret);
> +
> +               ret = realm_create_rtt_levels(realm, ipa, level, map_level,
> +                                             memcache);
> +               if (WARN_ON(ret))
> +                       return -ENXIO;
> +
> +               ret = rmi_rtt_map_unprotected(rd, ipa, map_level, desc);
> +       }
> +       if (WARN_ON(ret))
> +               return -ENXIO;
> +
> +       return 0;
> +}
> +
>  static int populate_par_region(struct kvm *kvm,
>                                phys_addr_t ipa_base,
>                                phys_addr_t ipa_end,
> @@ -617,7 +781,6 @@ static int populate_par_region(struct kvm *kvm,
>         int idx;
>         phys_addr_t ipa;
>         int ret = 0;
> -       struct page *tmp_page;
>         unsigned long data_flags = 0;
>
>         base_gfn = gpa_to_gfn(ipa_base);
> @@ -639,9 +802,8 @@ static int populate_par_region(struct kvm *kvm,
>                 goto out;
>         }
>
> -       tmp_page = alloc_page(GFP_KERNEL);
> -       if (!tmp_page) {
> -               ret = -ENOMEM;
> +       if (!kvm_slot_can_be_private(memslot)) {
> +               ret = -EINVAL;
>                 goto out;
>         }
>
> @@ -714,31 +876,36 @@ static int populate_par_region(struct kvm *kvm,
>                 for (offset = 0; offset < map_size && !ret;
>                      offset += PAGE_SIZE, page++) {
>                         phys_addr_t page_ipa = ipa + offset;
> +                       kvm_pfn_t priv_pfn;
> +                       int order;
>
> -                       ret = realm_create_protected_data_page(realm, page_ipa,
> -                                                              page, tmp_page,
> -                                                              data_flags);
> +                       ret = kvm_gmem_get_pfn(kvm, memslot,
> +                                              page_ipa >> PAGE_SHIFT,
> +                                              &priv_pfn, &order);
> +                       if (ret)
> +                               break;
> +
> +                       ret = realm_create_protected_data_page(
> +                                       realm, page_ipa,
> +                                       pfn_to_page(priv_pfn),
> +                                       page, data_flags);
>                 }
> +
> +               kvm_release_pfn_clean(pfn);
> +
>                 if (ret)
> -                       goto err_release_pfn;
> +                       break;
>
>                 if (level == 2) {
>                         ret = fold_rtt(realm, ipa, level);
>                         if (ret)
> -                               goto err_release_pfn;
> +                               break;
>                 }
>
>                 ipa += map_size;
> -               kvm_release_pfn_dirty(pfn);
> -err_release_pfn:
> -               if (ret) {
> -                       kvm_release_pfn_clean(pfn);
> -                       break;
> -               }
>         }
>
>         mmap_read_unlock(current->mm);
> -       __free_page(tmp_page);
>
>  out:
>         srcu_read_unlock(&kvm->srcu, idx);
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 09/14] arm64: Enable memory encrypt for Realms
  2024-04-15  3:13     ` kernel test robot
@ 2024-04-25 13:42       ` Suzuki K Poulose
  2024-04-25 15:52         ` Steven Price
  2024-04-25 16:29         ` Suzuki K Poulose
  0 siblings, 2 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-25 13:42 UTC (permalink / raw)
  To: kernel test robot, Steven Price, kvm, kvmarm
  Cc: llvm, oe-kbuild-all, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Emanuele.Rocca

On 15/04/2024 04:13, kernel test robot wrote:
> Hi Steven,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on arm64/for-next/core]
> [also build test ERROR on kvmarm/next efi/next tip/irq/core linus/master v6.9-rc3 next-20240412]
> [cannot apply to arnd-asm-generic/master]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Steven-Price/arm64-rsi-Add-RSI-definitions/20240412-164852
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
> patch link:    https://lore.kernel.org/r/20240412084213.1733764-10-steven.price%40arm.com
> patch subject: [PATCH v2 09/14] arm64: Enable memory encrypt for Realms
> config: arm64-allyesconfig (https://download.01.org/0day-ci/archive/20240415/202404151003.vkNApJiS-lkp@intel.com/config)
> compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 8b3b4a92adee40483c27f26c478a384cd69c6f05)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240415/202404151003.vkNApJiS-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202404151003.vkNApJiS-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>     In file included from drivers/hv/hv.c:13:
>     In file included from include/linux/mm.h:2208:
>     include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
>       508 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>       509 |                            item];
>           |                            ~~~~
>     include/linux/vmstat.h:515:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
>       515 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>       516 |                            NR_VM_NUMA_EVENT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>     include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
>       522 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
>           |                               ~~~~~~~~~~~ ^ ~~~
>     include/linux/vmstat.h:527:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
>       527 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>       528 |                            NR_VM_NUMA_EVENT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>     include/linux/vmstat.h:536:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
>       536 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>       537 |                            NR_VM_NUMA_EVENT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>>> drivers/hv/hv.c:132:10: error: call to undeclared function 'set_memory_decrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>       132 |                         ret = set_memory_decrypted((unsigned long)hv_cpu->post_msg_page, 1);
>           |                               ^
>     drivers/hv/hv.c:168:10: error: call to undeclared function 'set_memory_decrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>       168 |                         ret = set_memory_decrypted((unsigned long)
>           |                               ^
>>> drivers/hv/hv.c:218:11: error: call to undeclared function 'set_memory_encrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>       218 |                                 ret = set_memory_encrypted((unsigned long)
>           |                                       ^
>     drivers/hv/hv.c:230:11: error: call to undeclared function 'set_memory_encrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>       230 |                                 ret = set_memory_encrypted((unsigned long)
>           |                                       ^
>     drivers/hv/hv.c:239:11: error: call to undeclared function 'set_memory_encrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>       239 |                                 ret = set_memory_encrypted((unsigned long)
>           |                                       ^
>     5 warnings and 5 errors generated.
> --
>     In file included from drivers/hv/connection.c:16:
>     In file included from include/linux/mm.h:2208:
>     include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
>       508 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>       509 |                            item];
>           |                            ~~~~
>     include/linux/vmstat.h:515:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
>       515 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>       516 |                            NR_VM_NUMA_EVENT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>     include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
>       522 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
>           |                               ~~~~~~~~~~~ ^ ~~~
>     include/linux/vmstat.h:527:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
>       527 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>       528 |                            NR_VM_NUMA_EVENT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>     include/linux/vmstat.h:536:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
>       536 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>       537 |                            NR_VM_NUMA_EVENT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>>> drivers/hv/connection.c:236:8: error: call to undeclared function 'set_memory_decrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>       236 |         ret = set_memory_decrypted((unsigned long)
>           |               ^
>>> drivers/hv/connection.c:340:2: error: call to undeclared function 'set_memory_encrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>       340 |         set_memory_encrypted((unsigned long)vmbus_connection.monitor_pages[0], 1);
>           |         ^
>     5 warnings and 2 errors generated.
> --
>     In file included from drivers/hv/channel.c:14:
>     In file included from include/linux/mm.h:2208:
>     include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
>       508 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>       509 |                            item];
>           |                            ~~~~
>     include/linux/vmstat.h:515:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
>       515 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>       516 |                            NR_VM_NUMA_EVENT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>     include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
>       522 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
>           |                               ~~~~~~~~~~~ ^ ~~~
>     include/linux/vmstat.h:527:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
>       527 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>       528 |                            NR_VM_NUMA_EVENT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>     include/linux/vmstat.h:536:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
>       536 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>       537 |                            NR_VM_NUMA_EVENT_ITEMS +
>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>>> drivers/hv/channel.c:442:8: error: call to undeclared function 'set_memory_decrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>       442 |         ret = set_memory_decrypted((unsigned long)kbuffer,
>           |               ^
>>> drivers/hv/channel.c:531:3: error: call to undeclared function 'set_memory_encrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>       531 |                 set_memory_encrypted((unsigned long)kbuffer,
>           |                 ^
>     drivers/hv/channel.c:848:8: error: call to undeclared function 'set_memory_encrypted'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>       848 |         ret = set_memory_encrypted((unsigned long)gpadl->buffer,
>           |               ^
>     5 warnings and 3 errors generated.

Thats my mistake. The correct place for declaring set_memory_*crypted() 
is asm/set_memory.h not asm/mem_encrypt.h.

Steven, please could you fold this patch below :


diff --git a/arch/arm64/include/asm/mem_encrypt.h 
b/arch/arm64/include/asm/mem_encrypt.h
index 7381f9585321..e47265cd180a 100644
--- a/arch/arm64/include/asm/mem_encrypt.h
+++ b/arch/arm64/include/asm/mem_encrypt.h
@@ -14,6 +14,4 @@ static inline bool force_dma_unencrypted(struct device 
*dev)
         return is_realm_world();
  }

-int set_memory_encrypted(unsigned long addr, int numpages);
-int set_memory_decrypted(unsigned long addr, int numpages);
  #endif
diff --git a/arch/arm64/include/asm/set_memory.h 
b/arch/arm64/include/asm/set_memory.h
index 0f740b781187..9561b90fb43c 100644
--- a/arch/arm64/include/asm/set_memory.h
+++ b/arch/arm64/include/asm/set_memory.h
@@ -14,4 +14,6 @@ int set_direct_map_invalid_noflush(struct page *page);
  int set_direct_map_default_noflush(struct page *page);
  bool kernel_page_present(struct page *page);

+int set_memory_encrypted(unsigned long addr, int numpages);
+int set_memory_decrypted(unsigned long addr, int numpages);



Suzuki
> 
> 
> vim +/set_memory_decrypted +132 drivers/hv/hv.c
> 
> 3e7ee4902fe699 drivers/staging/hv/Hv.c Hank Janssen      2009-07-13   96
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19   97  int hv_synic_alloc(void)
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19   98  {
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18   99  	int cpu, ret = -ENOMEM;
> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  100  	struct hv_per_cpu_context *hv_cpu;
> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  101
> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  102  	/*
> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  103  	 * First, zero all per-cpu memory areas so hv_synic_free() can
> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  104  	 * detect what memory has been allocated and cleanup properly
> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  105  	 * after any failures.
> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  106  	 */
> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  107  	for_each_present_cpu(cpu) {
> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  108  		hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  109  		memset(hv_cpu, 0, sizeof(*hv_cpu));
> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  110  	}
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  111
> 6396bb221514d2 drivers/hv/hv.c         Kees Cook         2018-06-12  112  	hv_context.hv_numa_map = kcalloc(nr_node_ids, sizeof(struct cpumask),
> 597ff72f3de850 drivers/hv/hv.c         Jia-Ju Bai        2018-03-04  113  					 GFP_KERNEL);
> 9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  114  	if (hv_context.hv_numa_map == NULL) {
> 9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  115  		pr_err("Unable to allocate NUMA map\n");
> 9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  116  		goto err;
> 9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  117  	}
> 9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  118
> 421b8f20d3c381 drivers/hv/hv.c         Vitaly Kuznetsov  2016-12-07  119  	for_each_present_cpu(cpu) {
> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  120  		hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  121
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  122  		tasklet_init(&hv_cpu->msg_dpc,
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  123  			     vmbus_on_msg_dpc, (unsigned long) hv_cpu);
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  124
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  125  		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  126  			hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  127  			if (hv_cpu->post_msg_page == NULL) {
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  128  				pr_err("Unable to allocate post msg page\n");
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  129  				goto err;
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  130  			}
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  131
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24 @132  			ret = set_memory_decrypted((unsigned long)hv_cpu->post_msg_page, 1);
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  133  			if (ret) {
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  134  				pr_err("Failed to decrypt post msg page: %d\n", ret);
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  135  				/* Just leak the page, as it's unsafe to free the page. */
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  136  				hv_cpu->post_msg_page = NULL;
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  137  				goto err;
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  138  			}
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  139
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  140  			memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  141  		}
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  142
> faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  143  		/*
> faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  144  		 * Synic message and event pages are allocated by paravisor.
> faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  145  		 * Skip these pages allocation here.
> faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  146  		 */
> d3a9d7e49d1531 drivers/hv/hv.c         Dexuan Cui        2023-08-24  147  		if (!ms_hyperv.paravisor_present && !hv_root_partition) {
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  148  			hv_cpu->synic_message_page =
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  149  				(void *)get_zeroed_page(GFP_ATOMIC);
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  150  			if (hv_cpu->synic_message_page == NULL) {
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  151  				pr_err("Unable to allocate SYNIC message page\n");
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  152  				goto err;
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  153  			}
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  154
> faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  155  			hv_cpu->synic_event_page =
> faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  156  				(void *)get_zeroed_page(GFP_ATOMIC);
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  157  			if (hv_cpu->synic_event_page == NULL) {
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  158  				pr_err("Unable to allocate SYNIC event page\n");
> 68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  159
> 68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  160  				free_page((unsigned long)hv_cpu->synic_message_page);
> 68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  161  				hv_cpu->synic_message_page = NULL;
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  162  				goto err;
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  163  			}
> faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  164  		}
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  165
> 68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  166  		if (!ms_hyperv.paravisor_present &&
> e3131f1c81448a drivers/hv/hv.c         Dexuan Cui        2023-08-24  167  		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  168  			ret = set_memory_decrypted((unsigned long)
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  169  				hv_cpu->synic_message_page, 1);
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  170  			if (ret) {
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  171  				pr_err("Failed to decrypt SYNIC msg page: %d\n", ret);
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  172  				hv_cpu->synic_message_page = NULL;
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  173
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  174  				/*
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  175  				 * Free the event page here so that hv_synic_free()
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  176  				 * won't later try to re-encrypt it.
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  177  				 */
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  178  				free_page((unsigned long)hv_cpu->synic_event_page);
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  179  				hv_cpu->synic_event_page = NULL;
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  180  				goto err;
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  181  			}
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  182
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  183  			ret = set_memory_decrypted((unsigned long)
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  184  				hv_cpu->synic_event_page, 1);
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  185  			if (ret) {
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  186  				pr_err("Failed to decrypt SYNIC event page: %d\n", ret);
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  187  				hv_cpu->synic_event_page = NULL;
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  188  				goto err;
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  189  			}
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  190
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  191  			memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  192  			memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  193  		}
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  194  	}
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  195
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  196  	return 0;
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  197
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  198  err:
> 572086325ce9a9 drivers/hv/hv.c         Michael Kelley    2018-08-02  199  	/*
> 572086325ce9a9 drivers/hv/hv.c         Michael Kelley    2018-08-02  200  	 * Any memory allocations that succeeded will be freed when
> 572086325ce9a9 drivers/hv/hv.c         Michael Kelley    2018-08-02  201  	 * the caller cleans up by calling hv_synic_free()
> 572086325ce9a9 drivers/hv/hv.c         Michael Kelley    2018-08-02  202  	 */
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  203  	return ret;
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  204  }
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  205
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  206
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  207  void hv_synic_free(void)
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  208  {
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  209  	int cpu, ret;
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  210
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  211  	for_each_present_cpu(cpu) {
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  212  		struct hv_per_cpu_context *hv_cpu
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  213  			= per_cpu_ptr(hv_context.cpu_context, cpu);
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  214
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  215  		/* It's better to leak the page if the encryption fails. */
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  216  		if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  217  			if (hv_cpu->post_msg_page) {
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24 @218  				ret = set_memory_encrypted((unsigned long)
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  219  					hv_cpu->post_msg_page, 1);
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  220  				if (ret) {
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  221  					pr_err("Failed to encrypt post msg page: %d\n", ret);
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  222  					hv_cpu->post_msg_page = NULL;
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  223  				}
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  224  			}
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  225  		}
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  226
> 68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  227  		if (!ms_hyperv.paravisor_present &&
> e3131f1c81448a drivers/hv/hv.c         Dexuan Cui        2023-08-24  228  		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  229  			if (hv_cpu->synic_message_page) {
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  230  				ret = set_memory_encrypted((unsigned long)
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  231  					hv_cpu->synic_message_page, 1);
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  232  				if (ret) {
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  233  					pr_err("Failed to encrypt SYNIC msg page: %d\n", ret);
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  234  					hv_cpu->synic_message_page = NULL;
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  235  				}
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  236  			}
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  237
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  238  			if (hv_cpu->synic_event_page) {
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  239  				ret = set_memory_encrypted((unsigned long)
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  240  					hv_cpu->synic_event_page, 1);
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  241  				if (ret) {
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  242  					pr_err("Failed to encrypt SYNIC event page: %d\n", ret);
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  243  					hv_cpu->synic_event_page = NULL;
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  244  				}
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  245  			}
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  246  		}
> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  247
> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  248  		free_page((unsigned long)hv_cpu->post_msg_page);
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  249  		free_page((unsigned long)hv_cpu->synic_event_page);
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  250  		free_page((unsigned long)hv_cpu->synic_message_page);
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  251  	}
> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  252
> 9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  253  	kfree(hv_context.hv_numa_map);
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  254  }
> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  255
> 


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 42/43] arm64: kvm: Expose support for private memory
  2024-04-12  8:43   ` [PATCH v2 42/43] arm64: kvm: Expose support for private memory Steven Price
@ 2024-04-25 14:44     ` Fuad Tabba
  0 siblings, 0 replies; 104+ messages in thread
From: Fuad Tabba @ 2024-04-25 14:44 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, linux-coco, Ganapatrao Kulkarni

Hi,

On Fri, Apr 12, 2024 at 9:44 AM Steven Price <steven.price@arm.com> wrote:
>
> Select KVM_GENERIC_PRIVATE_MEM and provide the necessary support
> functions.
>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  4 ++++
>  arch/arm64/kvm/Kconfig            |  1 +
>  arch/arm64/kvm/arm.c              |  5 +++++
>  arch/arm64/kvm/mmu.c              | 19 +++++++++++++++++++
>  4 files changed, 29 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 902923402f6e..93de7f5009fe 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1259,6 +1259,10 @@ static inline bool kvm_vm_is_protected(struct kvm *kvm)
>         return false;
>  }
>
> +#ifdef CONFIG_KVM_PRIVATE_MEM
> +bool kvm_arch_has_private_mem(struct kvm *kvm);
> +#endif
> +

I think it might be better to define kvm_arch_has_private_mem() for
both cases, whether KVM_PRIVATE_MEM is enabled or not, similar to the
way it's defined in arch/x86/include/asm/kvm_host.h

>  int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature);
>  bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
>
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 58f09370d17e..8da57e74c86a 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -37,6 +37,7 @@ menuconfig KVM
>         select HAVE_KVM_VCPU_RUN_PID_CHANGE
>         select SCHED_INFO
>         select GUEST_PERF_EVENTS if PERF_EVENTS
> +       select KVM_GENERIC_PRIVATE_MEM

I don't think this should be enabled by default, but should depend on
whether RME is configured. That said, I can't find the config option
for RME...

>         help
>           Support hosting virtualized guest machines.
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 2dd014d3c366..a66d0a6eb4fa 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -89,6 +89,11 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
>         return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
>  }
>
> +bool kvm_arch_has_private_mem(struct kvm *kvm)
> +{
> +       return kvm_is_realm(kvm);
> +}
> +

Related to my earlier comment on kvm_arch_has_private_mem(), and
considering how often this function is called, wouldn't it be better
to define this in a way similar to arch/x86/include/asm/kvm_host.h ?


>  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                             struct kvm_enable_cap *cap)
>  {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 48c957e21c83..808bceebad4d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -2171,6 +2171,25 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>         return ret;
>  }

The following two functions should be gated by

#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES


> +bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
> +                                       struct kvm_gfn_range *range)
> +{
> +       WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm));
> +       return false;
> +}
> +
> +bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> +                                        struct kvm_gfn_range *range)
> +{
> +       WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm));

I think this should return here, not just warn.

Cheers,
/fuad

> +
> +       if (range->arg.attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE)
> +               range->only_shared = true;
> +       kvm_unmap_gfn_range(kvm, range);
> +
> +       return false;
> +}
> +
>  void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
>  {
>  }
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 09/14] arm64: Enable memory encrypt for Realms
  2024-04-25 13:42       ` Suzuki K Poulose
@ 2024-04-25 15:52         ` Steven Price
  2024-04-25 16:29         ` Suzuki K Poulose
  1 sibling, 0 replies; 104+ messages in thread
From: Steven Price @ 2024-04-25 15:52 UTC (permalink / raw)
  To: Suzuki K Poulose, kernel test robot, kvm, kvmarm
  Cc: llvm, oe-kbuild-all, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Emanuele.Rocca

On 25/04/2024 14:42, Suzuki K Poulose wrote:
> On 15/04/2024 04:13, kernel test robot wrote:
>> Hi Steven,
>>
>> kernel test robot noticed the following build errors:
>>

<snip>

>>>> drivers/hv/channel.c:442:8: error: call to undeclared function
>>>> 'set_memory_decrypted'; ISO C99 and later do not support implicit
>>>> function declarations [-Wimplicit-function-declaration]
>>       442 |         ret = set_memory_decrypted((unsigned long)kbuffer,
>>           |               ^
>>>> drivers/hv/channel.c:531:3: error: call to undeclared function
>>>> 'set_memory_encrypted'; ISO C99 and later do not support implicit
>>>> function declarations [-Wimplicit-function-declaration]
>>       531 |                 set_memory_encrypted((unsigned long)kbuffer,
>>           |                 ^
>>     drivers/hv/channel.c:848:8: error: call to undeclared function
>> 'set_memory_encrypted'; ISO C99 and later do not support implicit
>> function declarations [-Wimplicit-function-declaration]
>>       848 |         ret = set_memory_encrypted((unsigned
>> long)gpadl->buffer,
>>           |               ^
>>     5 warnings and 3 errors generated.
> 
> Thats my mistake. The correct place for declaring set_memory_*crypted()
> is asm/set_memory.h not asm/mem_encrypt.h.
> 
> Steven, please could you fold this patch below :

Sure, I've folded into my local branch. Thanks for looking into the error.

Steve

> 
> diff --git a/arch/arm64/include/asm/mem_encrypt.h
> b/arch/arm64/include/asm/mem_encrypt.h
> index 7381f9585321..e47265cd180a 100644
> --- a/arch/arm64/include/asm/mem_encrypt.h
> +++ b/arch/arm64/include/asm/mem_encrypt.h
> @@ -14,6 +14,4 @@ static inline bool force_dma_unencrypted(struct device
> *dev)
>         return is_realm_world();
>  }
> 
> -int set_memory_encrypted(unsigned long addr, int numpages);
> -int set_memory_decrypted(unsigned long addr, int numpages);
>  #endif
> diff --git a/arch/arm64/include/asm/set_memory.h
> b/arch/arm64/include/asm/set_memory.h
> index 0f740b781187..9561b90fb43c 100644
> --- a/arch/arm64/include/asm/set_memory.h
> +++ b/arch/arm64/include/asm/set_memory.h
> @@ -14,4 +14,6 @@ int set_direct_map_invalid_noflush(struct page *page);
>  int set_direct_map_default_noflush(struct page *page);
>  bool kernel_page_present(struct page *page);
> 
> +int set_memory_encrypted(unsigned long addr, int numpages);
> +int set_memory_decrypted(unsigned long addr, int numpages);
> 
> 
> 
> Suzuki


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 01/43] KVM: Prepare for handling only shared mappings in mmu_notifier events
  2024-04-25  9:48     ` Fuad Tabba
@ 2024-04-25 15:58       ` Steven Price
  2024-04-25 22:56         ` Sean Christopherson
  0 siblings, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-04-25 15:58 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, kvmarm, Sean Christopherson, Catalin Marinas, Marc Zyngier,
	Will Deacon, James Morse, Oliver Upton, Suzuki K Poulose,
	Zenghui Yu, linux-arm-kernel, linux-kernel, Joey Gouly,
	Alexandru Elisei, Christoffer Dall, linux-coco,
	Ganapatrao Kulkarni

On 25/04/2024 10:48, Fuad Tabba wrote:
> Hi,
> 
> On Fri, Apr 12, 2024 at 9:43 AM Steven Price <steven.price@arm.com> wrote:
>>
>> From: Sean Christopherson <seanjc@google.com>
>>
>> Add flags to "struct kvm_gfn_range" to let notifier events target only
>> shared and only private mappings, and write up the existing mmu_notifier
>> events to be shared-only (private memory is never associated with a
>> userspace virtual address, i.e. can't be reached via mmu_notifiers).
>>
>> Add two flags so that KVM can handle the three possibilities (shared,
>> private, and shared+private) without needing something like a tri-state
>> enum.
>>
>> Link: https://lore.kernel.org/all/ZJX0hk+KpQP0KUyB@google.com
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>  include/linux/kvm_host.h | 2 ++
>>  virt/kvm/kvm_main.c      | 7 +++++++
>>  2 files changed, 9 insertions(+)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 48f31dcd318a..c7581360fd88 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -268,6 +268,8 @@ struct kvm_gfn_range {
>>         gfn_t start;
>>         gfn_t end;
>>         union kvm_mmu_notifier_arg arg;
>> +       bool only_private;
>> +       bool only_shared;
>>         bool may_block;
>>  };
>>  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index fb49c2a60200..3486ceef6f4e 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -633,6 +633,13 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
>>                          * the second or later invocation of the handler).
>>                          */
>>                         gfn_range.arg = range->arg;
>> +
>> +                       /*
>> +                        * HVA-based notifications aren't relevant to private
>> +                        * mappings as they don't have a userspace mapping.
>> +                        */
>> +                       gfn_range.only_private = false;
>> +                       gfn_range.only_shared = true;
>>                         gfn_range.may_block = range->may_block;
> 
> I'd discussed this with Sean when he posted this earlier. Having two
> booleans to encode three valid states could be confusing. In response,
> Sean suggested using an enum instead:
> https://lore.kernel.org/all/ZUO1Giju0GkUdF0o@google.com/

That would work fine too! Unless I've missed it Sean hasn't posted an
updated patch. My assumption is that this will get merged (in whatever
form) before the rest of the series as part of that other series. It
shouldn't be too hard to adapt.

Thanks,

Steve

> Cheers,
> /fuad
> 
>>
>>                         /*
> 
> 
>> --
>> 2.34.1
>>
> 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 09/14] arm64: Enable memory encrypt for Realms
  2024-04-25 13:42       ` Suzuki K Poulose
  2024-04-25 15:52         ` Steven Price
@ 2024-04-25 16:29         ` Suzuki K Poulose
  2024-04-25 18:16           ` Emanuele Rocca
  1 sibling, 1 reply; 104+ messages in thread
From: Suzuki K Poulose @ 2024-04-25 16:29 UTC (permalink / raw)
  To: kernel test robot, Steven Price, kvm, kvmarm
  Cc: llvm, oe-kbuild-all, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Emanuele.Rocca

On 25/04/2024 14:42, Suzuki K Poulose wrote:
> On 15/04/2024 04:13, kernel test robot wrote:
>> Hi Steven,
>>
>> kernel test robot noticed the following build errors:
>>
>> [auto build test ERROR on arm64/for-next/core]
>> [also build test ERROR on kvmarm/next efi/next tip/irq/core 
>> linus/master v6.9-rc3 next-20240412]
>> [cannot apply to arnd-asm-generic/master]
>> [If your patch is applied to the wrong git tree, kindly drop us a note.
>> And when submitting patch, we suggest to use '--base' as documented in
>> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>>
>> url:    
>> https://github.com/intel-lab-lkp/linux/commits/Steven-Price/arm64-rsi-Add-RSI-definitions/20240412-164852
>> base:   
>> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 
>> for-next/core
>> patch link:    
>> https://lore.kernel.org/r/20240412084213.1733764-10-steven.price%40arm.com
>> patch subject: [PATCH v2 09/14] arm64: Enable memory encrypt for Realms
>> config: arm64-allyesconfig 
>> (https://download.01.org/0day-ci/archive/20240415/202404151003.vkNApJiS-lkp@intel.com/config)
>> compiler: clang version 19.0.0git 
>> (https://github.com/llvm/llvm-project 
>> 8b3b4a92adee40483c27f26c478a384cd69c6f05)
>> reproduce (this is a W=1 build): 
>> (https://download.01.org/0day-ci/archive/20240415/202404151003.vkNApJiS-lkp@intel.com/reproduce)
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new 
>> version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <lkp@intel.com>
>> | Closes: 
>> https://lore.kernel.org/oe-kbuild-all/202404151003.vkNApJiS-lkp@intel.com/
>>
>> All errors (new ones prefixed by >>):
>>
>>     In file included from drivers/hv/hv.c:13:
>>     In file included from include/linux/mm.h:2208:
>>     include/linux/vmstat.h:508:43: warning: arithmetic between 
>> different enumeration types ('enum zone_stat_item' and 'enum 
>> numa_stat_item') [-Wenum-enum-conversion]
>>       508 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>>       509 |                            item];
>>           |                            ~~~~
>>     include/linux/vmstat.h:515:43: warning: arithmetic between 
>> different enumeration types ('enum zone_stat_item' and 'enum 
>> numa_stat_item') [-Wenum-enum-conversion]
>>       515 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>>       516 |                            NR_VM_NUMA_EVENT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>>     include/linux/vmstat.h:522:36: warning: arithmetic between 
>> different enumeration types ('enum node_stat_item' and 'enum 
>> lru_list') [-Wenum-enum-conversion]
>>       522 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // 
>> skip "nr_"
>>           |                               ~~~~~~~~~~~ ^ ~~~
>>     include/linux/vmstat.h:527:43: warning: arithmetic between 
>> different enumeration types ('enum zone_stat_item' and 'enum 
>> numa_stat_item') [-Wenum-enum-conversion]
>>       527 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>>       528 |                            NR_VM_NUMA_EVENT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>>     include/linux/vmstat.h:536:43: warning: arithmetic between 
>> different enumeration types ('enum zone_stat_item' and 'enum 
>> numa_stat_item') [-Wenum-enum-conversion]
>>       536 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>>       537 |                            NR_VM_NUMA_EVENT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>>>> drivers/hv/hv.c:132:10: error: call to undeclared function 
>>>> 'set_memory_decrypted'; ISO C99 and later do not support implicit 
>>>> function declarations [-Wimplicit-function-declaration]
>>       132 |                         ret = 
>> set_memory_decrypted((unsigned long)hv_cpu->post_msg_page, 1);
>>           |                               ^
>>     drivers/hv/hv.c:168:10: error: call to undeclared function 
>> 'set_memory_decrypted'; ISO C99 and later do not support implicit 
>> function declarations [-Wimplicit-function-declaration]
>>       168 |                         ret = 
>> set_memory_decrypted((unsigned long)
>>           |                               ^
>>>> drivers/hv/hv.c:218:11: error: call to undeclared function 
>>>> 'set_memory_encrypted'; ISO C99 and later do not support implicit 
>>>> function declarations [-Wimplicit-function-declaration]
>>       218 |                                 ret = 
>> set_memory_encrypted((unsigned long)
>>           |                                       ^
>>     drivers/hv/hv.c:230:11: error: call to undeclared function 
>> 'set_memory_encrypted'; ISO C99 and later do not support implicit 
>> function declarations [-Wimplicit-function-declaration]
>>       230 |                                 ret = 
>> set_memory_encrypted((unsigned long)
>>           |                                       ^
>>     drivers/hv/hv.c:239:11: error: call to undeclared function 
>> 'set_memory_encrypted'; ISO C99 and later do not support implicit 
>> function declarations [-Wimplicit-function-declaration]
>>       239 |                                 ret = 
>> set_memory_encrypted((unsigned long)
>>           |                                       ^
>>     5 warnings and 5 errors generated.
>> -- 
>>     In file included from drivers/hv/connection.c:16:
>>     In file included from include/linux/mm.h:2208:
>>     include/linux/vmstat.h:508:43: warning: arithmetic between 
>> different enumeration types ('enum zone_stat_item' and 'enum 
>> numa_stat_item') [-Wenum-enum-conversion]
>>       508 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>>       509 |                            item];
>>           |                            ~~~~
>>     include/linux/vmstat.h:515:43: warning: arithmetic between 
>> different enumeration types ('enum zone_stat_item' and 'enum 
>> numa_stat_item') [-Wenum-enum-conversion]
>>       515 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>>       516 |                            NR_VM_NUMA_EVENT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>>     include/linux/vmstat.h:522:36: warning: arithmetic between 
>> different enumeration types ('enum node_stat_item' and 'enum 
>> lru_list') [-Wenum-enum-conversion]
>>       522 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // 
>> skip "nr_"
>>           |                               ~~~~~~~~~~~ ^ ~~~
>>     include/linux/vmstat.h:527:43: warning: arithmetic between 
>> different enumeration types ('enum zone_stat_item' and 'enum 
>> numa_stat_item') [-Wenum-enum-conversion]
>>       527 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>>       528 |                            NR_VM_NUMA_EVENT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>>     include/linux/vmstat.h:536:43: warning: arithmetic between 
>> different enumeration types ('enum zone_stat_item' and 'enum 
>> numa_stat_item') [-Wenum-enum-conversion]
>>       536 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>>       537 |                            NR_VM_NUMA_EVENT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>>>> drivers/hv/connection.c:236:8: error: call to undeclared function 
>>>> 'set_memory_decrypted'; ISO C99 and later do not support implicit 
>>>> function declarations [-Wimplicit-function-declaration]
>>       236 |         ret = set_memory_decrypted((unsigned long)
>>           |               ^
>>>> drivers/hv/connection.c:340:2: error: call to undeclared function 
>>>> 'set_memory_encrypted'; ISO C99 and later do not support implicit 
>>>> function declarations [-Wimplicit-function-declaration]
>>       340 |         set_memory_encrypted((unsigned 
>> long)vmbus_connection.monitor_pages[0], 1);
>>           |         ^
>>     5 warnings and 2 errors generated.
>> -- 
>>     In file included from drivers/hv/channel.c:14:
>>     In file included from include/linux/mm.h:2208:
>>     include/linux/vmstat.h:508:43: warning: arithmetic between 
>> different enumeration types ('enum zone_stat_item' and 'enum 
>> numa_stat_item') [-Wenum-enum-conversion]
>>       508 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>>       509 |                            item];
>>           |                            ~~~~
>>     include/linux/vmstat.h:515:43: warning: arithmetic between 
>> different enumeration types ('enum zone_stat_item' and 'enum 
>> numa_stat_item') [-Wenum-enum-conversion]
>>       515 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>>       516 |                            NR_VM_NUMA_EVENT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>>     include/linux/vmstat.h:522:36: warning: arithmetic between 
>> different enumeration types ('enum node_stat_item' and 'enum 
>> lru_list') [-Wenum-enum-conversion]
>>       522 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // 
>> skip "nr_"
>>           |                               ~~~~~~~~~~~ ^ ~~~
>>     include/linux/vmstat.h:527:43: warning: arithmetic between 
>> different enumeration types ('enum zone_stat_item' and 'enum 
>> numa_stat_item') [-Wenum-enum-conversion]
>>       527 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>>       528 |                            NR_VM_NUMA_EVENT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>>     include/linux/vmstat.h:536:43: warning: arithmetic between 
>> different enumeration types ('enum zone_stat_item' and 'enum 
>> numa_stat_item') [-Wenum-enum-conversion]
>>       536 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>>       537 |                            NR_VM_NUMA_EVENT_ITEMS +
>>           |                            ~~~~~~~~~~~~~~~~~~~~~~
>>>> drivers/hv/channel.c:442:8: error: call to undeclared function 
>>>> 'set_memory_decrypted'; ISO C99 and later do not support implicit 
>>>> function declarations [-Wimplicit-function-declaration]
>>       442 |         ret = set_memory_decrypted((unsigned long)kbuffer,
>>           |               ^
>>>> drivers/hv/channel.c:531:3: error: call to undeclared function 
>>>> 'set_memory_encrypted'; ISO C99 and later do not support implicit 
>>>> function declarations [-Wimplicit-function-declaration]
>>       531 |                 set_memory_encrypted((unsigned long)kbuffer,
>>           |                 ^
>>     drivers/hv/channel.c:848:8: error: call to undeclared function 
>> 'set_memory_encrypted'; ISO C99 and later do not support implicit 
>> function declarations [-Wimplicit-function-declaration]
>>       848 |         ret = set_memory_encrypted((unsigned 
>> long)gpadl->buffer,
>>           |               ^
>>     5 warnings and 3 errors generated.
> 
> Thats my mistake. The correct place for declaring set_memory_*crypted() 
> is asm/set_memory.h not asm/mem_encrypt.h.
> 
> Steven, please could you fold this patch below :
> 
> 
> diff --git a/arch/arm64/include/asm/mem_encrypt.h 
> b/arch/arm64/include/asm/mem_encrypt.h
> index 7381f9585321..e47265cd180a 100644
> --- a/arch/arm64/include/asm/mem_encrypt.h
> +++ b/arch/arm64/include/asm/mem_encrypt.h
> @@ -14,6 +14,4 @@ static inline bool force_dma_unencrypted(struct device 
> *dev)
>          return is_realm_world();
>   }
> 
> -int set_memory_encrypted(unsigned long addr, int numpages);
> -int set_memory_decrypted(unsigned long addr, int numpages);
>   #endif
> diff --git a/arch/arm64/include/asm/set_memory.h 
> b/arch/arm64/include/asm/set_memory.h
> index 0f740b781187..9561b90fb43c 100644
> --- a/arch/arm64/include/asm/set_memory.h
> +++ b/arch/arm64/include/asm/set_memory.h
> @@ -14,4 +14,6 @@ int set_direct_map_invalid_noflush(struct page *page);
>   int set_direct_map_default_noflush(struct page *page);
>   bool kernel_page_present(struct page *page);
> 
> +int set_memory_encrypted(unsigned long addr, int numpages);
> +int set_memory_decrypted(unsigned long addr, int numpages);
> 
> 

Emmanuele reports that these need to be exported as well, something
like:


diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 229b6d9990f5..de3843ce2aea 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -228,11 +228,13 @@ int set_memory_encrypted(unsigned long addr, int 
numpages)
  {
         return __set_memory_encrypted(addr, numpages, true);
  }
+EXPORT_SYMBOL_GPL(set_memory_encrypted);

  int set_memory_decrypted(unsigned long addr, int numpages)
  {
         return __set_memory_encrypted(addr, numpages, false);
  }
+EXPORT_SYMBOL_GPL(set_memory_decrypted);

  #ifdef CONFIG_DEBUG_PAGEALLOC
  void __kernel_map_pages(struct page *page, int numpages, int enable


> 
> Suzuki



>>
>>
>> vim +/set_memory_decrypted +132 drivers/hv/hv.c
>>
>> 3e7ee4902fe699 drivers/staging/hv/Hv.c Hank Janssen      2009-07-13   96
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19   
>> 97  int hv_synic_alloc(void)
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19   
>> 98  {
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18   
>> 99      int cpu, ret = -ENOMEM;
>> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  
>> 100      struct hv_per_cpu_context *hv_cpu;
>> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  101
>> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  
>> 102      /*
>> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  
>> 103       * First, zero all per-cpu memory areas so hv_synic_free() can
>> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  
>> 104       * detect what memory has been allocated and cleanup properly
>> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  
>> 105       * after any failures.
>> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  
>> 106       */
>> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  
>> 107      for_each_present_cpu(cpu) {
>> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  
>> 108          hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
>> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  
>> 109          memset(hv_cpu, 0, sizeof(*hv_cpu));
>> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  
>> 110      }
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  111
>> 6396bb221514d2 drivers/hv/hv.c         Kees Cook         2018-06-12  
>> 112      hv_context.hv_numa_map = kcalloc(nr_node_ids, sizeof(struct 
>> cpumask),
>> 597ff72f3de850 drivers/hv/hv.c         Jia-Ju Bai        2018-03-04  
>> 113                       GFP_KERNEL);
>> 9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  
>> 114      if (hv_context.hv_numa_map == NULL) {
>> 9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  
>> 115          pr_err("Unable to allocate NUMA map\n");
>> 9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  
>> 116          goto err;
>> 9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  
>> 117      }
>> 9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  118
>> 421b8f20d3c381 drivers/hv/hv.c         Vitaly Kuznetsov  2016-12-07  
>> 119      for_each_present_cpu(cpu) {
>> f25a7ece08bdb1 drivers/hv/hv.c         Michael Kelley    2018-08-10  
>> 120          hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  121
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  
>> 122          tasklet_init(&hv_cpu->msg_dpc,
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  
>> 123                   vmbus_on_msg_dpc, (unsigned long) hv_cpu);
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  124
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 125          if (ms_hyperv.paravisor_present && 
>> hv_isolation_type_tdx()) {
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 126              hv_cpu->post_msg_page = (void 
>> *)get_zeroed_page(GFP_ATOMIC);
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 127              if (hv_cpu->post_msg_page == NULL) {
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 128                  pr_err("Unable to allocate post msg page\n");
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 129                  goto err;
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 130              }
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  131
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24 
>> @132              ret = set_memory_decrypted((unsigned 
>> long)hv_cpu->post_msg_page, 1);
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 133              if (ret) {
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 134                  pr_err("Failed to decrypt post msg page: %d\n", 
>> ret);
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 135                  /* Just leak the page, as it's unsafe to free the 
>> page. */
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 136                  hv_cpu->post_msg_page = NULL;
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 137                  goto err;
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 138              }
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  139
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 140              memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 141          }
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  142
>> faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  
>> 143          /*
>> faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  
>> 144           * Synic message and event pages are allocated by paravisor.
>> faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  
>> 145           * Skip these pages allocation here.
>> faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  
>> 146           */
>> d3a9d7e49d1531 drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 147          if (!ms_hyperv.paravisor_present && !hv_root_partition) {
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  
>> 148              hv_cpu->synic_message_page =
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  
>> 149                  (void *)get_zeroed_page(GFP_ATOMIC);
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  
>> 150              if (hv_cpu->synic_message_page == NULL) {
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  
>> 151                  pr_err("Unable to allocate SYNIC message page\n");
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  
>> 152                  goto err;
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  
>> 153              }
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  154
>> faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  
>> 155              hv_cpu->synic_event_page =
>> faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  
>> 156                  (void *)get_zeroed_page(GFP_ATOMIC);
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  
>> 157              if (hv_cpu->synic_event_page == NULL) {
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  
>> 158                  pr_err("Unable to allocate SYNIC event page\n");
>> 68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  159
>> 68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 160                  free_page((unsigned 
>> long)hv_cpu->synic_message_page);
>> 68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 161                  hv_cpu->synic_message_page = NULL;
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  
>> 162                  goto err;
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  
>> 163              }
>> faff44069ff538 drivers/hv/hv.c         Tianyu Lan        2021-10-25  
>> 164          }
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  165
>> 68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 166          if (!ms_hyperv.paravisor_present &&
>> e3131f1c81448a drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 167              (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 168              ret = set_memory_decrypted((unsigned long)
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 169                  hv_cpu->synic_message_page, 1);
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 170              if (ret) {
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 171                  pr_err("Failed to decrypt SYNIC msg page: %d\n", 
>> ret);
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 172                  hv_cpu->synic_message_page = NULL;
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  173
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 174                  /*
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 175                   * Free the event page here so that hv_synic_free()
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 176                   * won't later try to re-encrypt it.
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 177                   */
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 178                  free_page((unsigned long)hv_cpu->synic_event_page);
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 179                  hv_cpu->synic_event_page = NULL;
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 180                  goto err;
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 181              }
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  182
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 183              ret = set_memory_decrypted((unsigned long)
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 184                  hv_cpu->synic_event_page, 1);
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 185              if (ret) {
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 186                  pr_err("Failed to decrypt SYNIC event page: 
>> %d\n", ret);
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 187                  hv_cpu->synic_event_page = NULL;
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 188                  goto err;
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 189              }
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  190
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 191              memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 192              memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 193          }
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  
>> 194      }
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  195
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  
>> 196      return 0;
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  197
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  
>> 198  err:
>> 572086325ce9a9 drivers/hv/hv.c         Michael Kelley    2018-08-02  
>> 199      /*
>> 572086325ce9a9 drivers/hv/hv.c         Michael Kelley    2018-08-02  
>> 200       * Any memory allocations that succeeded will be freed when
>> 572086325ce9a9 drivers/hv/hv.c         Michael Kelley    2018-08-02  
>> 201       * the caller cleans up by calling hv_synic_free()
>> 572086325ce9a9 drivers/hv/hv.c         Michael Kelley    2018-08-02  
>> 202       */
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 203      return ret;
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  
>> 204  }
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  205
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  206
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  
>> 207  void hv_synic_free(void)
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  
>> 208  {
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 209      int cpu, ret;
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  210
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  
>> 211      for_each_present_cpu(cpu) {
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  
>> 212          struct hv_per_cpu_context *hv_cpu
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  
>> 213              = per_cpu_ptr(hv_context.cpu_context, cpu);
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  214
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 215          /* It's better to leak the page if the encryption fails. */
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 216          if (ms_hyperv.paravisor_present && 
>> hv_isolation_type_tdx()) {
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 217              if (hv_cpu->post_msg_page) {
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24 
>> @218                  ret = set_memory_encrypted((unsigned long)
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 219                      hv_cpu->post_msg_page, 1);
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 220                  if (ret) {
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 221                      pr_err("Failed to encrypt post msg page: 
>> %d\n", ret);
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 222                      hv_cpu->post_msg_page = NULL;
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 223                  }
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 224              }
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 225          }
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  226
>> 68f2f2bc163d44 drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 227          if (!ms_hyperv.paravisor_present &&
>> e3131f1c81448a drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 228              (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 229              if (hv_cpu->synic_message_page) {
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 230                  ret = set_memory_encrypted((unsigned long)
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 231                      hv_cpu->synic_message_page, 1);
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 232                  if (ret) {
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 233                      pr_err("Failed to encrypt SYNIC msg page: 
>> %d\n", ret);
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 234                      hv_cpu->synic_message_page = NULL;
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 235                  }
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 236              }
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  237
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 238              if (hv_cpu->synic_event_page) {
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 239                  ret = set_memory_encrypted((unsigned long)
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 240                      hv_cpu->synic_event_page, 1);
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 241                  if (ret) {
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 242                      pr_err("Failed to encrypt SYNIC event page: 
>> %d\n", ret);
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 243                      hv_cpu->synic_event_page = NULL;
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 244                  }
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 245              }
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  
>> 246          }
>> 193061ea0a50c1 drivers/hv/hv.c         Tianyu Lan        2023-08-18  247
>> 23378295042a4b drivers/hv/hv.c         Dexuan Cui        2023-08-24  
>> 248          free_page((unsigned long)hv_cpu->post_msg_page);
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  
>> 249          free_page((unsigned long)hv_cpu->synic_event_page);
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  
>> 250          free_page((unsigned long)hv_cpu->synic_message_page);
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  
>> 251      }
>> 37cdd991fac810 drivers/hv/hv.c         Stephen Hemminger 2017-02-11  252
>> 9f01ec53458d9e drivers/hv/hv.c         K. Y. Srinivasan  2015-08-05  
>> 253      kfree(hv_context.hv_numa_map);
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  
>> 254  }
>> 2608fb65310341 drivers/hv/hv.c         Jason Wang        2013-06-19  255
>>
> 


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 09/14] arm64: Enable memory encrypt for Realms
  2024-04-25 16:29         ` Suzuki K Poulose
@ 2024-04-25 18:16           ` Emanuele Rocca
  0 siblings, 0 replies; 104+ messages in thread
From: Emanuele Rocca @ 2024-04-25 18:16 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: kernel test robot, Steven Price, kvm, kvmarm, llvm,
	oe-kbuild-all, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni

Hi,

On 2024-04-25 05:29, Suzuki K Poulose wrote:
> Emmanuele reports that these need to be exported as well, something
> like:
> 
> 
> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> index 229b6d9990f5..de3843ce2aea 100644
> --- a/arch/arm64/mm/pageattr.c
> +++ b/arch/arm64/mm/pageattr.c
> @@ -228,11 +228,13 @@ int set_memory_encrypted(unsigned long addr, int
> numpages)
>  {
>         return __set_memory_encrypted(addr, numpages, true);
>  }
> +EXPORT_SYMBOL_GPL(set_memory_encrypted);
> 
>  int set_memory_decrypted(unsigned long addr, int numpages)
>  {
>         return __set_memory_encrypted(addr, numpages, false);
>  }
> +EXPORT_SYMBOL_GPL(set_memory_decrypted);
> 
>  #ifdef CONFIG_DEBUG_PAGEALLOC
>  void __kernel_map_pages(struct page *page, int numpages, int enable

Indeed, without exporting the symbols I was getting this build failure:

 ERROR: modpost: "set_memory_encrypted" [drivers/hv/hv_vmbus.ko] undefined!
 ERROR: modpost: "set_memory_decrypted" [drivers/hv/hv_vmbus.ko] undefined!

I can now build 6.9-rc1 w/ CCA guest patches if I apply Suzuki's
changes:

1) move set_memory_encrypted/decrypted from asm/mem_encrypt.h to
   asm/set_memory.h
2) export both symbols in mm/pageattr.c

See diff below.

Thanks,
  Emanuele

diff --git a/arch/arm64/include/asm/mem_encrypt.h b/arch/arm64/include/asm/mem_encrypt.h
index 7381f9585321..e47265cd180a 100644
--- a/arch/arm64/include/asm/mem_encrypt.h
+++ b/arch/arm64/include/asm/mem_encrypt.h
@@ -14,6 +14,4 @@ static inline bool force_dma_unencrypted(struct device *dev)
        return is_realm_world();
 }

-int set_memory_encrypted(unsigned long addr, int numpages);
-int set_memory_decrypted(unsigned long addr, int numpages);
 #endif
diff --git a/arch/arm64/include/asm/set_memory.h b/arch/arm64/include/asm/set_memory.h
index 0f740b781187..9561b90fb43c 100644
--- a/arch/arm64/include/asm/set_memory.h
+++ b/arch/arm64/include/asm/set_memory.h
@@ -14,4 +14,6 @@ int set_direct_map_invalid_noflush(struct page *page);
 int set_direct_map_default_noflush(struct page *page);
 bool kernel_page_present(struct page *page);

+int set_memory_encrypted(unsigned long addr, int numpages);
+int set_memory_decrypted(unsigned long addr, int numpages);
 #endif /* _ASM_ARM64_SET_MEMORY_H */
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 229b6d9990f5..de3843ce2aea 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -228,11 +228,13 @@ int set_memory_encrypted(unsigned long addr, int numpages)
 {
        return __set_memory_encrypted(addr, numpages, true);
 }
+EXPORT_SYMBOL_GPL(set_memory_encrypted);

 int set_memory_decrypted(unsigned long addr, int numpages)
 {
        return __set_memory_encrypted(addr, numpages, false);
 }
+EXPORT_SYMBOL_GPL(set_memory_decrypted);

 #ifdef CONFIG_DEBUG_PAGEALLOC
 void __kernel_map_pages(struct page *page, int numpages, int enable)

^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 01/43] KVM: Prepare for handling only shared mappings in mmu_notifier events
  2024-04-25 15:58       ` Steven Price
@ 2024-04-25 22:56         ` Sean Christopherson
  0 siblings, 0 replies; 104+ messages in thread
From: Sean Christopherson @ 2024-04-25 22:56 UTC (permalink / raw)
  To: Steven Price
  Cc: Fuad Tabba, kvm, kvmarm, Catalin Marinas, Marc Zyngier,
	Will Deacon, James Morse, Oliver Upton, Suzuki K Poulose,
	Zenghui Yu, linux-arm-kernel, linux-kernel, Joey Gouly,
	Alexandru Elisei, Christoffer Dall, linux-coco,
	Ganapatrao Kulkarni

On Thu, Apr 25, 2024, Steven Price wrote:
> On 25/04/2024 10:48, Fuad Tabba wrote:
> > On Fri, Apr 12, 2024 at 9:43 AM Steven Price <steven.price@arm.com> wrote:
> >>  bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >> index fb49c2a60200..3486ceef6f4e 100644
> >> --- a/virt/kvm/kvm_main.c
> >> +++ b/virt/kvm/kvm_main.c
> >> @@ -633,6 +633,13 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
> >>                          * the second or later invocation of the handler).
> >>                          */
> >>                         gfn_range.arg = range->arg;
> >> +
> >> +                       /*
> >> +                        * HVA-based notifications aren't relevant to private
> >> +                        * mappings as they don't have a userspace mapping.
> >> +                        */
> >> +                       gfn_range.only_private = false;
> >> +                       gfn_range.only_shared = true;
> >>                         gfn_range.may_block = range->may_block;
> > 
> > I'd discussed this with Sean when he posted this earlier. Having two
> > booleans to encode three valid states could be confusing. In response,
> > Sean suggested using an enum instead:
> > https://lore.kernel.org/all/ZUO1Giju0GkUdF0o@google.com/
> 
> That would work fine too! Unless I've missed it Sean hasn't posted an
> updated patch. My assumption is that this will get merged (in whatever
> form) before the rest of the series as part of that other series. It
> shouldn't be too hard to adapt.

Yeah, there's no updated patch.

Fuad, if you have a strong preference, I recommend chiming in on the TDX series[*],
as that is the series that's likely going to be the first user, and I don't have
a strong preference on bools versus an enum.

[*] https://lore.kernel.org/all/e324ff5e47e07505648c0092a5370ac9ddd72f0b.1708933498.git.isaku.yamahata@intel.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 17/43] arm64: RME: Allow VMM to set RIPAS
  2024-04-12  8:42   ` [PATCH v2 17/43] arm64: RME: Allow VMM to set RIPAS Steven Price
  2024-04-19  9:34     ` Suzuki K Poulose
  2024-04-25  9:53     ` Fuad Tabba
@ 2024-05-01 14:27     ` Jean-Philippe Brucker
  2024-05-01 14:56       ` Suzuki K Poulose
  2 siblings, 1 reply; 104+ messages in thread
From: Jean-Philippe Brucker @ 2024-05-01 14:27 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni

On Fri, Apr 12, 2024 at 09:42:43AM +0100, Steven Price wrote:
> +static inline bool realm_is_addr_protected(struct realm *realm,
> +					   unsigned long addr)
> +{
> +	unsigned int ia_bits = realm->ia_bits;
> +
> +	return !(addr & ~(BIT(ia_bits - 1) - 1));

Is it enough to return !(addr & BIT(realm->ia_bits - 1))?

> +static void realm_unmap_range_shared(struct kvm *kvm,
> +				     int level,
> +				     unsigned long start,
> +				     unsigned long end)
> +{
> +	struct realm *realm = &kvm->arch.realm;
> +	unsigned long rd = virt_to_phys(realm->rd);
> +	ssize_t map_size = rme_rtt_level_mapsize(level);
> +	unsigned long next_addr, addr;
> +	unsigned long shared_bit = BIT(realm->ia_bits - 1);
> +
> +	if (WARN_ON(level > RME_RTT_MAX_LEVEL))
> +		return;
> +
> +	start |= shared_bit;
> +	end |= shared_bit;
> +
> +	for (addr = start; addr < end; addr = next_addr) {
> +		unsigned long align_addr = ALIGN(addr, map_size);
> +		int ret;
> +
> +		next_addr = ALIGN(addr + 1, map_size);
> +
> +		if (align_addr != addr || next_addr > end) {
> +			/* Need to recurse deeper */
> +			if (addr < align_addr)
> +				next_addr = align_addr;
> +			realm_unmap_range_shared(kvm, level + 1, addr,
> +						 min(next_addr, end));
> +			continue;
> +		}
> +
> +		ret = rmi_rtt_unmap_unprotected(rd, addr, level, &next_addr);
> +		switch (RMI_RETURN_STATUS(ret)) {
> +		case RMI_SUCCESS:
> +			break;
> +		case RMI_ERROR_RTT:
> +			if (next_addr == addr) {
> +				next_addr = ALIGN(addr + 1, map_size);
> +				realm_unmap_range_shared(kvm, level + 1, addr,
> +							 next_addr);
> +			}
> +			break;
> +		default:
> +			WARN_ON(1);

In this case we also need to return, because RMM returns with next_addr ==
0, causing an infinite loop. At the moment a VMM can trigger this easily
by creating guest memfd before creating a RD, see below

> +		}
> +	}
> +}
> +
> +static void realm_unmap_range_private(struct kvm *kvm,
> +				      unsigned long start,
> +				      unsigned long end)
> +{
> +	struct realm *realm = &kvm->arch.realm;
> +	ssize_t map_size = RME_PAGE_SIZE;
> +	unsigned long next_addr, addr;
> +
> +	for (addr = start; addr < end; addr = next_addr) {
> +		int ret;
> +
> +		next_addr = ALIGN(addr + 1, map_size);
> +
> +		ret = realm_destroy_protected(realm, addr, &next_addr);
> +
> +		if (WARN_ON(ret))
> +			break;
> +	}
> +}
> +
> +static void realm_unmap_range(struct kvm *kvm,
> +			      unsigned long start,
> +			      unsigned long end,
> +			      bool unmap_private)
> +{

Should this check for a valid kvm->arch.realm.rd, or a valid realm state?
I'm not sure what the best place is but none of the RMM calls will succeed
if the RD is NULL, causing some WARNs.

I can trigger this with set_memory_attributes() ioctls before creating a
RD for example.

> +	realm_unmap_range_shared(kvm, RME_RTT_MAX_LEVEL - 1, start, end);
> +	if (unmap_private)
> +		realm_unmap_range_private(kvm, start, end);
> +}
> +
>  u32 kvm_realm_ipa_limit(void)
>  {
>  	return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
> @@ -190,6 +341,30 @@ static int realm_rtt_destroy(struct realm *realm, unsigned long addr,
>  	return ret;
>  }
>  
> +static int realm_create_rtt_levels(struct realm *realm,
> +				   unsigned long ipa,
> +				   int level,
> +				   int max_level,
> +				   struct kvm_mmu_memory_cache *mc)
> +{
> +	if (WARN_ON(level == max_level))
> +		return 0;
> +
> +	while (level++ < max_level) {
> +		phys_addr_t rtt = alloc_delegated_page(realm, mc);
> +
> +		if (rtt == PHYS_ADDR_MAX)
> +			return -ENOMEM;
> +
> +		if (realm_rtt_create(realm, ipa, level, rtt)) {
> +			free_delegated_page(realm, rtt);
> +			return -ENXIO;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static int realm_tear_down_rtt_level(struct realm *realm, int level,
>  				     unsigned long start, unsigned long end)
>  {
> @@ -265,6 +440,68 @@ static int realm_tear_down_rtt_range(struct realm *realm,
>  					 start, end);
>  }
>  
> +/*
> + * Returns 0 on successful fold, a negative value on error, a positive value if
> + * we were not able to fold all tables at this level.
> + */
> +static int realm_fold_rtt_level(struct realm *realm, int level,
> +				unsigned long start, unsigned long end)
> +{
> +	int not_folded = 0;
> +	ssize_t map_size;
> +	unsigned long addr, next_addr;
> +
> +	if (WARN_ON(level > RME_RTT_MAX_LEVEL))
> +		return -EINVAL;
> +
> +	map_size = rme_rtt_level_mapsize(level - 1);
> +
> +	for (addr = start; addr < end; addr = next_addr) {
> +		phys_addr_t rtt_granule;
> +		int ret;
> +		unsigned long align_addr = ALIGN(addr, map_size);
> +
> +		next_addr = ALIGN(addr + 1, map_size);
> +
> +		ret = realm_rtt_fold(realm, align_addr, level, &rtt_granule);
> +
> +		switch (RMI_RETURN_STATUS(ret)) {
> +		case RMI_SUCCESS:
> +			if (!WARN_ON(rmi_granule_undelegate(rtt_granule)))
> +				free_page((unsigned long)phys_to_virt(rtt_granule));
> +			break;
> +		case RMI_ERROR_RTT:
> +			if (level == RME_RTT_MAX_LEVEL ||
> +			    RMI_RETURN_INDEX(ret) < level) {
> +				not_folded++;
> +				break;
> +			}
> +			/* Recurse a level deeper */
> +			ret = realm_fold_rtt_level(realm,
> +						   level + 1,
> +						   addr,
> +						   next_addr);
> +			if (ret < 0)
> +				return ret;
> +			else if (ret == 0)
> +				/* Try again at this level */
> +				next_addr = addr;
> +			break;
> +		default:

Maybe this also deserves a WARN() to be consistent with the other RMI
calls

Thanks,
Jean

> +			return -ENXIO;
> +		}
> +	}
> +
> +	return not_folded;
> +}

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 17/43] arm64: RME: Allow VMM to set RIPAS
  2024-05-01 14:27     ` Jean-Philippe Brucker
@ 2024-05-01 14:56       ` Suzuki K Poulose
  0 siblings, 0 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-05-01 14:56 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni

On 01/05/2024 15:27, Jean-Philippe Brucker wrote:
> On Fri, Apr 12, 2024 at 09:42:43AM +0100, Steven Price wrote:
>> +static inline bool realm_is_addr_protected(struct realm *realm,
>> +					   unsigned long addr)
>> +{
>> +	unsigned int ia_bits = realm->ia_bits;
>> +
>> +	return !(addr & ~(BIT(ia_bits - 1) - 1));
> 
> Is it enough to return !(addr & BIT(realm->ia_bits - 1))?

I thought about that too. But if we are dealing with an IPA
that is > (BIT(realm->ia_bits)), we don't want to be treating
that as a protected address. This could only happen if the Realm
is buggy (or the VMM has tricked it). So the existing check
looks safer.

> 
>> +static void realm_unmap_range_shared(struct kvm *kvm,
>> +				     int level,
>> +				     unsigned long start,
>> +				     unsigned long end)
>> +{
>> +	struct realm *realm = &kvm->arch.realm;
>> +	unsigned long rd = virt_to_phys(realm->rd);
>> +	ssize_t map_size = rme_rtt_level_mapsize(level);
>> +	unsigned long next_addr, addr;
>> +	unsigned long shared_bit = BIT(realm->ia_bits - 1);
>> +
>> +	if (WARN_ON(level > RME_RTT_MAX_LEVEL))
>> +		return;
>> +
>> +	start |= shared_bit;
>> +	end |= shared_bit;
>> +
>> +	for (addr = start; addr < end; addr = next_addr) {
>> +		unsigned long align_addr = ALIGN(addr, map_size);
>> +		int ret;
>> +
>> +		next_addr = ALIGN(addr + 1, map_size);
>> +
>> +		if (align_addr != addr || next_addr > end) {
>> +			/* Need to recurse deeper */
>> +			if (addr < align_addr)
>> +				next_addr = align_addr;
>> +			realm_unmap_range_shared(kvm, level + 1, addr,
>> +						 min(next_addr, end));
>> +			continue;
>> +		}
>> +
>> +		ret = rmi_rtt_unmap_unprotected(rd, addr, level, &next_addr);
>> +		switch (RMI_RETURN_STATUS(ret)) {
>> +		case RMI_SUCCESS:
>> +			break;
>> +		case RMI_ERROR_RTT:
>> +			if (next_addr == addr) {
>> +				next_addr = ALIGN(addr + 1, map_size);
>> +				realm_unmap_range_shared(kvm, level + 1, addr,
>> +							 next_addr);
>> +			}
>> +			break;
>> +		default:
>> +			WARN_ON(1);
> 
> In this case we also need to return, because RMM returns with next_addr ==
> 0, causing an infinite loop. At the moment a VMM can trigger this easily
> by creating guest memfd before creating a RD, see below

Thats a good point. I agree.

> 
>> +		}
>> +	}
>> +}
>> +
>> +static void realm_unmap_range_private(struct kvm *kvm,
>> +				      unsigned long start,
>> +				      unsigned long end)
>> +{
>> +	struct realm *realm = &kvm->arch.realm;
>> +	ssize_t map_size = RME_PAGE_SIZE;
>> +	unsigned long next_addr, addr;
>> +
>> +	for (addr = start; addr < end; addr = next_addr) {
>> +		int ret;
>> +
>> +		next_addr = ALIGN(addr + 1, map_size);
>> +
>> +		ret = realm_destroy_protected(realm, addr, &next_addr);
>> +
>> +		if (WARN_ON(ret))
>> +			break;
>> +	}
>> +}
>> +
>> +static void realm_unmap_range(struct kvm *kvm,
>> +			      unsigned long start,
>> +			      unsigned long end,
>> +			      bool unmap_private)
>> +{
> 
> Should this check for a valid kvm->arch.realm.rd, or a valid realm state?
> I'm not sure what the best place is but none of the RMM calls will succeed
> if the RD is NULL, causing some WARNs.
> 
> I can trigger this with set_memory_attributes() ioctls before creating a
> RD for example.
> 

True, this could be triggered by a buggy VMM in other ways, and we could
easily gate it on the Realm state >= NEW.

Suzuki



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 17/43] arm64: RME: Allow VMM to set RIPAS
  2024-04-19  9:34     ` Suzuki K Poulose
  2024-04-19 10:20       ` Suzuki K Poulose
@ 2024-05-01 15:47       ` Steven Price
  2024-05-02 10:16         ` Suzuki K Poulose
  1 sibling, 1 reply; 104+ messages in thread
From: Steven Price @ 2024-05-01 15:47 UTC (permalink / raw)
  To: Suzuki K Poulose, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

On 19/04/2024 10:34, Suzuki K Poulose wrote:
> On 12/04/2024 09:42, Steven Price wrote:
>> Each page within the protected region of the realm guest can be marked
>> as either RAM or EMPTY. Allow the VMM to control this before the guest
>> has started and provide the equivalent functions to change this (with
>> the guest's approval) at runtime.
>>
>> When transitioning from RIPAS RAM (1) to RIPAS EMPTY (0) the memory is
>> unmapped from the guest and undelegated allowing the memory to be reused
>> by the host. When transitioning to RIPAS RAM the actual population of
>> the leaf RTTs is done later on stage 2 fault, however it may be
>> necessary to allocate additional RTTs to represent the range requested.
> 
> minor nit: To give a bit more context:
> 
> "however it may be necessary to allocate additional RTTs in order for
> the RMM to track the RIPAS for the requested range".

That is what I meant - but your wording is probably clearer ;)
Technically there's also the case where a RIPAS change will cause a
block mapping to be split which isn't just about tracking RIPAS but also
about breaking up the block mapping for the pages that remain.

>>
>> When freeing a block mapping it is necessary to temporarily unfold the
>> RTT which requires delegating an extra page to the RMM, this page can
>> then be recovered once the contents of the block mapping have been
>> freed. A spare, delegated page (spare_page) is used for this purpose.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   arch/arm64/include/asm/kvm_rme.h |  16 ++
>>   arch/arm64/kvm/mmu.c             |   8 +-
>>   arch/arm64/kvm/rme.c             | 390 +++++++++++++++++++++++++++++++
>>   3 files changed, 411 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_rme.h
>> b/arch/arm64/include/asm/kvm_rme.h
>> index 915e76068b00..cc8f81cfc3c0 100644
>> --- a/arch/arm64/include/asm/kvm_rme.h
>> +++ b/arch/arm64/include/asm/kvm_rme.h
>> @@ -96,6 +96,14 @@ void kvm_realm_destroy_rtts(struct kvm *kvm, u32
>> ia_bits);
>>   int kvm_create_rec(struct kvm_vcpu *vcpu);
>>   void kvm_destroy_rec(struct kvm_vcpu *vcpu);
>>   +void kvm_realm_unmap_range(struct kvm *kvm,
>> +               unsigned long ipa,
>> +               u64 size,
>> +               bool unmap_private);
>> +int realm_set_ipa_state(struct kvm_vcpu *vcpu,
>> +            unsigned long addr, unsigned long end,
>> +            unsigned long ripas);
>> +
>>   #define RME_RTT_BLOCK_LEVEL    2
>>   #define RME_RTT_MAX_LEVEL    3
>>   @@ -114,4 +122,12 @@ static inline unsigned long
>> rme_rtt_level_mapsize(int level)
>>       return (1UL << RME_RTT_LEVEL_SHIFT(level));
>>   }
>>   +static inline bool realm_is_addr_protected(struct realm *realm,
>> +                       unsigned long addr)
>> +{
>> +    unsigned int ia_bits = realm->ia_bits;
>> +
>> +    return !(addr & ~(BIT(ia_bits - 1) - 1));
>> +}
>> +
>>   #endif
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 46f0c4e80ace..8a7b5449697f 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -310,6 +310,7 @@ static void invalidate_icache_guest_page(void *va,
>> size_t size)
>>    * @start: The intermediate physical base address of the range to unmap
>>    * @size:  The size of the area to unmap
>>    * @may_block: Whether or not we are permitted to block
>> + * @only_shared: If true then protected mappings should not be unmapped
>>    *
>>    * Clear a range of stage-2 mappings, lowering the various
>> ref-counts.  Must
>>    * be called while holding mmu_lock (unless for freeing the stage2
>> pgd before
>> @@ -317,7 +318,7 @@ static void invalidate_icache_guest_page(void *va,
>> size_t size)
>>    * with things behind our backs.
>>    */
>>   static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t
>> start, u64 size,
>> -                 bool may_block)
>> +                 bool may_block, bool only_shared)
>>   {
>>       struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
>>       phys_addr_t end = start + size;
>> @@ -330,7 +331,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu
>> *mmu, phys_addr_t start, u64
>>     static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t
>> start, u64 size)
>>   {
>> -    __unmap_stage2_range(mmu, start, size, true);
>> +    __unmap_stage2_range(mmu, start, size, true, false);
>>   }
>>     static void stage2_flush_memslot(struct kvm *kvm,
>> @@ -1771,7 +1772,8 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct
>> kvm_gfn_range *range)
>>         __unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
>>                    (range->end - range->start) << PAGE_SHIFT,
>> -                 range->may_block);
>> +                 range->may_block,
>> +                 range->only_shared);
>>         return false;
>>   }
>> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
>> index 629a095bea61..9e5983c51393 100644
>> --- a/arch/arm64/kvm/rme.c
>> +++ b/arch/arm64/kvm/rme.c
>> @@ -79,6 +79,12 @@ static phys_addr_t __alloc_delegated_page(struct
>> realm *realm,
>>       return phys;
>>   }
>>   +static phys_addr_t alloc_delegated_page(struct realm *realm,
>> +                    struct kvm_mmu_memory_cache *mc)
>> +{
>> +    return __alloc_delegated_page(realm, mc, GFP_KERNEL);
>> +}
>> +
>>   static void free_delegated_page(struct realm *realm, phys_addr_t phys)
>>   {
>>       if (realm->spare_page == PHYS_ADDR_MAX) {
>> @@ -94,6 +100,151 @@ static void free_delegated_page(struct realm
>> *realm, phys_addr_t phys)
>>       free_page((unsigned long)phys_to_virt(phys));
>>   }
>>   +static int realm_rtt_create(struct realm *realm,
>> +                unsigned long addr,
>> +                int level,
>> +                phys_addr_t phys)
>> +{
>> +    addr = ALIGN_DOWN(addr, rme_rtt_level_mapsize(level - 1));
>> +    return rmi_rtt_create(virt_to_phys(realm->rd), phys, addr, level);
>> +}
>> +
>> +static int realm_rtt_fold(struct realm *realm,
>> +              unsigned long addr,
>> +              int level,
>> +              phys_addr_t *rtt_granule)
>> +{
>> +    unsigned long out_rtt;
>> +    int ret;
>> +
>> +    ret = rmi_rtt_fold(virt_to_phys(realm->rd), addr, level, &out_rtt);
>> +
>> +    if (RMI_RETURN_STATUS(ret) == RMI_SUCCESS && rtt_granule)
>> +        *rtt_granule = out_rtt;
>> +
>> +    return ret;
>> +}
>> +
>> +static int realm_destroy_protected(struct realm *realm,
>> +                   unsigned long ipa,
>> +                   unsigned long *next_addr)
>> +{
>> +    unsigned long rd = virt_to_phys(realm->rd);
>> +    unsigned long addr;
>> +    phys_addr_t rtt;
>> +    int ret;
>> +
>> +loop:
>> +    ret = rmi_data_destroy(rd, ipa, &addr, next_addr);
>> +    if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
>> +        if (*next_addr > ipa)
>> +            return 0; /* UNASSIGNED */
>> +        rtt = alloc_delegated_page(realm, NULL);
>> +        if (WARN_ON(rtt == PHYS_ADDR_MAX))
>> +            return -1;
>> +        /* ASSIGNED - ipa is mapped as a block, so split */
>> +        ret = realm_rtt_create(realm, ipa,
>> +                       RMI_RETURN_INDEX(ret) + 1, rtt);
> 
> Could we not go all the way to L3 (rather than 1 level deeper) and try
> again ? That way, we are covered for block mappings at L1 (1G).

I don't think this situation can happen. The spec states that for
RMI_ERROR_RTT either top>ipa (UNASSIGNED, in which case we've bailed out
above) or top==ipa (ASSIGNED) in which case RMI_RETURN_INDEX(ret) must
equal 2.

So in this case realm_rtt_create() will always be called with level=3.

I can simplify this by explicitly passing 3 and adding a WARN_ON() for
RMI_RETURN_INDEX(ret) != 2.

>> +        if (WARN_ON(ret)) {
>> +            free_delegated_page(realm, rtt);
>> +            return -1;
>> +        }
>> +        /* retry */
>> +        goto loop;
>> +    } else if (WARN_ON(ret)) {
>> +        return -1;
>> +    }
>> +    ret = rmi_granule_undelegate(addr);
>> +
>> +    /*
>> +     * If the undelegate fails then something has gone seriously
>> +     * wrong: take an extra reference to just leak the page
>> +     */
>> +    if (WARN_ON(ret))
>> +        get_page(phys_to_page(addr));
>> +
>> +    return 0;
>> +}
>> +
>> +static void realm_unmap_range_shared(struct kvm *kvm,
>> +                     int level,
>> +                     unsigned long start,
>> +                     unsigned long end)
>> +{
>> +    struct realm *realm = &kvm->arch.realm;
>> +    unsigned long rd = virt_to_phys(realm->rd);
>> +    ssize_t map_size = rme_rtt_level_mapsize(level);
>> +    unsigned long next_addr, addr;
>> +    unsigned long shared_bit = BIT(realm->ia_bits - 1);
>> +
>> +    if (WARN_ON(level > RME_RTT_MAX_LEVEL))
>> +        return;
>> +
>> +    start |= shared_bit;
>> +    end |= shared_bit;
>> +
>> +    for (addr = start; addr < end; addr = next_addr) {
>> +        unsigned long align_addr = ALIGN(addr, map_size);
>> +        int ret;
>> +
>> +        next_addr = ALIGN(addr + 1, map_size);
>> +
>> +        if (align_addr != addr || next_addr > end) {
>> +            /* Need to recurse deeper */
>> +            if (addr < align_addr)
>> +                next_addr = align_addr;
>> +            realm_unmap_range_shared(kvm, level + 1, addr,
>> +                         min(next_addr, end));
>> +            continue;
>> +        }
>> +
>> +        ret = rmi_rtt_unmap_unprotected(rd, addr, level, &next_addr);
> 
> minor nit: We could potentially use rmi_rtt_destroy() to tear down
> shared mappings without unmapping them individually, if the range
> is big enough. All such optimisations could come later though.

Apparently so, I do feel the "liveness" concept in the RMM spec is very
confusing. But it looks like an RTT in the unprotected IPA range isn't
considered 'live' if it only contains ASSIGNED_NS entries. The wording
here isn't great though:

  An RTT is live if, for any of its entries, either of the following is
  true:
    • The RTTE state is ASSIGNED
    • The RTTE state is TABLE.

   Note that an RTT can be non-live, even if one of its entries is live.
   This would be the case for example if the RTT corresponds to an
   Unprotected IPA range and the state of one of its entries is
   ASSIGNED_NS.

Like you say this is an optimisation, so I think we should leave it
until later for now.

>> +        switch (RMI_RETURN_STATUS(ret)) {
>> +        case RMI_SUCCESS:
>> +            break;
>> +        case RMI_ERROR_RTT:
>> +            if (next_addr == addr) {
> 
> At this point we have a block aligned address, but the mapping is
> further deep. Given, start from top to down, we implicitly handle
> the case of block mappings. Not sure if that needs to be in a comment
> here.
> 
>> +                next_addr = ALIGN(addr + 1, map_size);
> 
> Reset to the "actual next" as it was overwritten by the RMI call.

I'll add a comment explaining what's happening here.

>> +                realm_unmap_range_shared(kvm, level + 1, addr,
>> +                             next_addr);
>> +            }
>> +            break;
>> +        default:
>> +            WARN_ON(1);
>> +        }
>> +    }
>> +}
>> +
>> +static void realm_unmap_range_private(struct kvm *kvm,
>> +                      unsigned long start,
>> +                      unsigned long end)
>> +{
>> +    struct realm *realm = &kvm->arch.realm;
>> +    ssize_t map_size = RME_PAGE_SIZE;
>> +    unsigned long next_addr, addr;
>> +
>> +    for (addr = start; addr < end; addr = next_addr) {
>> +        int ret;
>> +
>> +        next_addr = ALIGN(addr + 1, map_size);
>> +
>> +        ret = realm_destroy_protected(realm, addr, &next_addr);
>> +
>> +        if (WARN_ON(ret))
>> +            break;
>> +    }
>> +}
>> +
>> +static void realm_unmap_range(struct kvm *kvm,
>> +                  unsigned long start,
>> +                  unsigned long end,
>> +                  bool unmap_private)
>> +{
>> +    realm_unmap_range_shared(kvm, RME_RTT_MAX_LEVEL - 1, start, end);
> 
> minor nit: We already have a helper to find a suitable start level
> (defined below), may be we could use that ? And even do the rtt_destroy
> optimisation for unprotected range.

Good point, that should work.

>> +    if (unmap_private)
>> +        realm_unmap_range_private(kvm, start, end);
>> +}
>> +
>>   u32 kvm_realm_ipa_limit(void)
>>   {
>>       return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
>> @@ -190,6 +341,30 @@ static int realm_rtt_destroy(struct realm *realm,
>> unsigned long addr,
>>       return ret;
>>   }
>>   +static int realm_create_rtt_levels(struct realm *realm,
>> +                   unsigned long ipa,
>> +                   int level,
>> +                   int max_level,
>> +                   struct kvm_mmu_memory_cache *mc)
>> +{
>> +    if (WARN_ON(level == max_level))
>> +        return 0;
>> +
>> +    while (level++ < max_level) {
>> +        phys_addr_t rtt = alloc_delegated_page(realm, mc);
>> +
>> +        if (rtt == PHYS_ADDR_MAX)
>> +            return -ENOMEM;
>> +
>> +        if (realm_rtt_create(realm, ipa, level, rtt)) {
>> +            free_delegated_page(realm, rtt);
>> +            return -ENXIO;
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   static int realm_tear_down_rtt_level(struct realm *realm, int level,
>>                        unsigned long start, unsigned long end)
>>   {
>> @@ -265,6 +440,68 @@ static int realm_tear_down_rtt_range(struct realm
>> *realm,
>>                        start, end);
>>   }
>>   +/*
>> + * Returns 0 on successful fold, a negative value on error, a
>> positive value if
>> + * we were not able to fold all tables at this level.
>> + */
>> +static int realm_fold_rtt_level(struct realm *realm, int level,
>> +                unsigned long start, unsigned long end)
>> +{
>> +    int not_folded = 0;
>> +    ssize_t map_size;
>> +    unsigned long addr, next_addr;
>> +
>> +    if (WARN_ON(level > RME_RTT_MAX_LEVEL))
>> +        return -EINVAL;
>> +
>> +    map_size = rme_rtt_level_mapsize(level - 1);
>> +
>> +    for (addr = start; addr < end; addr = next_addr) {
>> +        phys_addr_t rtt_granule;
>> +        int ret;
>> +        unsigned long align_addr = ALIGN(addr, map_size);
>> +
>> +        next_addr = ALIGN(addr + 1, map_size);
>> +
>> +        ret = realm_rtt_fold(realm, align_addr, level, &rtt_granule);
>> +
>> +        switch (RMI_RETURN_STATUS(ret)) {
>> +        case RMI_SUCCESS:
>> +            if (!WARN_ON(rmi_granule_undelegate(rtt_granule)))
>> +                free_page((unsigned long)phys_to_virt(rtt_granule));
> 
> minor nit: Do we need a wrapper function for things like this, abd
> leaking the page if undelegate fails, something like
> rme_reclaim_delegated_page()  ?

I'll take a look at that - mostly sites don't have the same argument
repeated (unlike this one), so I'll have to have a think about the best
form.

>> +            break;
>> +        case RMI_ERROR_RTT:
>> +            if (level == RME_RTT_MAX_LEVEL ||
>> +                RMI_RETURN_INDEX(ret) < level) {
>> +                not_folded++;
>> +                break;
>> +            }
>> +            /* Recurse a level deeper */
>> +            ret = realm_fold_rtt_level(realm,
>> +                           level + 1,
>> +                           addr,
>> +                           next_addr);
>> +            if (ret < 0)
>> +                return ret;
>> +            else if (ret == 0)
>> +                /* Try again at this level */
>> +                next_addr = addr;
>> +            break;
>> +        default:
>> +            return -ENXIO;
>> +        }
>> +    }
>> +
>> +    return not_folded;
>> +}
>> +
>> +static int realm_fold_rtt_range(struct realm *realm,
>> +                unsigned long start, unsigned long end)
>> +{
>> +    return realm_fold_rtt_level(realm, get_start_level(realm) + 1,
>> +                    start, end);
>> +}
>> +
>>   static void ensure_spare_page(struct realm *realm)
>>   {
>>       phys_addr_t tmp_rtt;
>> @@ -295,6 +532,147 @@ void kvm_realm_destroy_rtts(struct kvm *kvm, u32
>> ia_bits)
>>       WARN_ON(realm_tear_down_rtt_range(realm, 0, (1UL << ia_bits)));
>>   }
>>   +void kvm_realm_unmap_range(struct kvm *kvm, unsigned long ipa, u64
>> size,
>> +               bool unmap_private)
>> +{
>> +    unsigned long end = ipa + size;
>> +    struct realm *realm = &kvm->arch.realm;
>> +
>> +    end = min(BIT(realm->ia_bits - 1), end);
>> +
>> +    ensure_spare_page(realm);
>> +
>> +    realm_unmap_range(kvm, ipa, end, unmap_private);
>> +
>> +    realm_fold_rtt_range(realm, ipa, end);
> 
> Shouldn't this be :
> 
>     if (unmap_private)
>         realm_fold_rtt_range(realm, ipa, end);

Indeed it's a little pointless folding if we haven't touched the private
mappings.

> Also it is fine to reclaim RTTs from the protected space, not the
> unprotected half, as long as we use RTT_DESTROY in unmap_shared routine.
> 
>> +}
>> +
>> +static int find_map_level(struct realm *realm,
>> +              unsigned long start,
>> +              unsigned long end)
>> +{
>> +    int level = RME_RTT_MAX_LEVEL;
>> +
>> +    while (level > get_start_level(realm)) {
>> +        unsigned long map_size = rme_rtt_level_mapsize(level - 1);
>> +
>> +        if (!IS_ALIGNED(start, map_size) ||
>> +            (start + map_size) > end)
>> +            break;
>> +
>> +        level--;
>> +    }
>> +
>> +    return level;
>> +}
>> +
>> +int realm_set_ipa_state(struct kvm_vcpu *vcpu,
>> +            unsigned long start,
>> +            unsigned long end,
>> +            unsigned long ripas)
>> +{
>> +    struct kvm *kvm = vcpu->kvm;
>> +    struct realm *realm = &kvm->arch.realm;
>> +    struct realm_rec *rec = &vcpu->arch.rec;
>> +    phys_addr_t rd_phys = virt_to_phys(realm->rd);
>> +    phys_addr_t rec_phys = virt_to_phys(rec->rec_page);
>> +    struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>> +    unsigned long ipa = start;
>> +    int ret = 0;
>> +
>> +    while (ipa < end) {
>> +        unsigned long next;
>> +
>> +        ret = rmi_rtt_set_ripas(rd_phys, rec_phys, ipa, end, &next);
>> +
>> +        if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
>> +            int walk_level = RMI_RETURN_INDEX(ret);
>> +            int level = find_map_level(realm, ipa, end);
> 
> Might be worth adding a comment here. Check if the RMM needs tables to
> create deeper level tables.

Ok

>> +
>> +            if (walk_level < level) {
>> +                ret = realm_create_rtt_levels(realm, ipa,
>> +                                  walk_level,
>> +                                  level,
>> +                                  memcache);
>                 /* Retry with RTTs created */

Ok

>> +                if (!ret)
>> +                    continue;
>> +            } else {
>> +                ret = -EINVAL;
>> +            }
>> +
>> +            break;
>> +        } else if (RMI_RETURN_STATUS(ret) != RMI_SUCCESS) {
>> +            WARN(1, "Unexpected error in %s: %#x\n", __func__,
>> +                 ret);
>> +            ret = -EINVAL;
>> +            break;
>> +        }
>> +        ipa = next;
>> +    }
>> +
>> +    if (ripas == RMI_EMPTY && ipa != start)
>> +        kvm_realm_unmap_range(kvm, start, ipa - start, true);
> 
> This triggers unmapping the "shared" aliases too, which is not necessary.

Yeah, the kvm_realm_unmap_range() function is a bit problematic - the
various "unmap_private" and "only_shared" flags need a bit of a rethink.

>> +
>> +    return ret;
>> +}
>> +
>> +static int realm_init_ipa_state(struct realm *realm,
>> +                unsigned long ipa,
>> +                unsigned long end)
>> +{
>> +    phys_addr_t rd_phys = virt_to_phys(realm->rd);
>> +    int ret;
>> +
>> +    while (ipa < end) {
>> +        unsigned long next;
>> +
>> +        ret = rmi_rtt_init_ripas(rd_phys, ipa, end, &next);
>> +
>> +        if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
>> +            int err_level = RMI_RETURN_INDEX(ret);
>> +            int level = find_map_level(realm, ipa, end);
>> +
>> +            if (WARN_ON(err_level >= level))
>> +                return -ENXIO;
>> +
>> +            ret = realm_create_rtt_levels(realm, ipa,
>> +                              err_level,
>> +                              level, NULL);
>> +            if (ret)
>> +                return ret;
>> +            /* Retry with the RTT levels in place */
>> +            continue;
>> +        } else if (WARN_ON(ret)) {
>> +            return -ENXIO;
>> +        }
>> +
>> +        ipa = next;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int kvm_init_ipa_range_realm(struct kvm *kvm,
>> +                    struct kvm_cap_arm_rme_init_ipa_args *args)
>> +{
>> +    int ret = 0;
>> +    gpa_t addr, end;
>> +    struct realm *realm = &kvm->arch.realm;
>> +
>> +    addr = args->init_ipa_base;
>> +    end = addr + args->init_ipa_size;
>> +
>> +    if (end < addr)
>> +        return -EINVAL;
>> +
>> +    if (kvm_realm_state(kvm) != REALM_STATE_NEW)
>> +        return -EINVAL;
>> +
>> +    ret = realm_init_ipa_state(realm, addr, end);
>> +
>> +    return ret;
> 
> super minor nit:
> 
>     return realm_init_ipa_state(realm, addr, end);

That was probably left over from some printk() debugging ;)

Thanks,

Steve

>> +}
>> +
>>   /* Protects access to rme_vmid_bitmap */
>>   static DEFINE_SPINLOCK(rme_vmid_lock);
>>   static unsigned long *rme_vmid_bitmap;
>> @@ -418,6 +796,18 @@ int kvm_realm_enable_cap(struct kvm *kvm, struct
>> kvm_enable_cap *cap)
>>       case KVM_CAP_ARM_RME_CREATE_RD:
>>           r = kvm_create_realm(kvm);
>>           break;
>> +    case KVM_CAP_ARM_RME_INIT_IPA_REALM: {
>> +        struct kvm_cap_arm_rme_init_ipa_args args;
>> +        void __user *argp = u64_to_user_ptr(cap->args[1]);
>> +
>> +        if (copy_from_user(&args, argp, sizeof(args))) {
>> +            r = -EFAULT;
>> +            break;
>> +        }
>> +
>> +        r = kvm_init_ipa_range_realm(kvm, &args);
>> +        break;
>> +    }
> 
> 
> Suzuki


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 17/43] arm64: RME: Allow VMM to set RIPAS
  2024-05-01 15:47       ` Steven Price
@ 2024-05-02 10:16         ` Suzuki K Poulose
  0 siblings, 0 replies; 104+ messages in thread
From: Suzuki K Poulose @ 2024-05-02 10:16 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni

On 01/05/2024 16:47, Steven Price wrote:
> On 19/04/2024 10:34, Suzuki K Poulose wrote:
>> On 12/04/2024 09:42, Steven Price wrote:
>>> Each page within the protected region of the realm guest can be marked
>>> as either RAM or EMPTY. Allow the VMM to control this before the guest
>>> has started and provide the equivalent functions to change this (with
>>> the guest's approval) at runtime.
>>>
>>> When transitioning from RIPAS RAM (1) to RIPAS EMPTY (0) the memory is
>>> unmapped from the guest and undelegated allowing the memory to be reused
>>> by the host. When transitioning to RIPAS RAM the actual population of
>>> the leaf RTTs is done later on stage 2 fault, however it may be
>>> necessary to allocate additional RTTs to represent the range requested.
>>
>> minor nit: To give a bit more context:
>>
>> "however it may be necessary to allocate additional RTTs in order for
>> the RMM to track the RIPAS for the requested range".
> 
> That is what I meant - but your wording is probably clearer ;)
> Technically there's also the case where a RIPAS change will cause a
> block mapping to be split which isn't just about tracking RIPAS but also
> about breaking up the block mapping for the pages that remain.
> 
>>>
>>> When freeing a block mapping it is necessary to temporarily unfold the
>>> RTT which requires delegating an extra page to the RMM, this page can
>>> then be recovered once the contents of the block mapping have been
>>> freed. A spare, delegated page (spare_page) is used for this purpose.
>>>
>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>> ---
>>>    arch/arm64/include/asm/kvm_rme.h |  16 ++
>>>    arch/arm64/kvm/mmu.c             |   8 +-
>>>    arch/arm64/kvm/rme.c             | 390 +++++++++++++++++++++++++++++++
>>>    3 files changed, 411 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_rme.h
>>> b/arch/arm64/include/asm/kvm_rme.h
>>> index 915e76068b00..cc8f81cfc3c0 100644
>>> --- a/arch/arm64/include/asm/kvm_rme.h
>>> +++ b/arch/arm64/include/asm/kvm_rme.h
>>> @@ -96,6 +96,14 @@ void kvm_realm_destroy_rtts(struct kvm *kvm, u32
>>> ia_bits);
>>>    int kvm_create_rec(struct kvm_vcpu *vcpu);
>>>    void kvm_destroy_rec(struct kvm_vcpu *vcpu);
>>>    +void kvm_realm_unmap_range(struct kvm *kvm,
>>> +               unsigned long ipa,
>>> +               u64 size,
>>> +               bool unmap_private);
>>> +int realm_set_ipa_state(struct kvm_vcpu *vcpu,
>>> +            unsigned long addr, unsigned long end,
>>> +            unsigned long ripas);
>>> +
>>>    #define RME_RTT_BLOCK_LEVEL    2
>>>    #define RME_RTT_MAX_LEVEL    3
>>>    @@ -114,4 +122,12 @@ static inline unsigned long
>>> rme_rtt_level_mapsize(int level)
>>>        return (1UL << RME_RTT_LEVEL_SHIFT(level));
>>>    }
>>>    +static inline bool realm_is_addr_protected(struct realm *realm,
>>> +                       unsigned long addr)
>>> +{
>>> +    unsigned int ia_bits = realm->ia_bits;
>>> +
>>> +    return !(addr & ~(BIT(ia_bits - 1) - 1));
>>> +}
>>> +
>>>    #endif
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index 46f0c4e80ace..8a7b5449697f 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -310,6 +310,7 @@ static void invalidate_icache_guest_page(void *va,
>>> size_t size)
>>>     * @start: The intermediate physical base address of the range to unmap
>>>     * @size:  The size of the area to unmap
>>>     * @may_block: Whether or not we are permitted to block
>>> + * @only_shared: If true then protected mappings should not be unmapped
>>>     *
>>>     * Clear a range of stage-2 mappings, lowering the various
>>> ref-counts.  Must
>>>     * be called while holding mmu_lock (unless for freeing the stage2
>>> pgd before
>>> @@ -317,7 +318,7 @@ static void invalidate_icache_guest_page(void *va,
>>> size_t size)
>>>     * with things behind our backs.
>>>     */
>>>    static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t
>>> start, u64 size,
>>> -                 bool may_block)
>>> +                 bool may_block, bool only_shared)
>>>    {
>>>        struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
>>>        phys_addr_t end = start + size;
>>> @@ -330,7 +331,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu
>>> *mmu, phys_addr_t start, u64
>>>      static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t
>>> start, u64 size)
>>>    {
>>> -    __unmap_stage2_range(mmu, start, size, true);
>>> +    __unmap_stage2_range(mmu, start, size, true, false);
>>>    }
>>>      static void stage2_flush_memslot(struct kvm *kvm,
>>> @@ -1771,7 +1772,8 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct
>>> kvm_gfn_range *range)
>>>          __unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
>>>                     (range->end - range->start) << PAGE_SHIFT,
>>> -                 range->may_block);
>>> +                 range->may_block,
>>> +                 range->only_shared);
>>>          return false;
>>>    }
>>> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
>>> index 629a095bea61..9e5983c51393 100644
>>> --- a/arch/arm64/kvm/rme.c
>>> +++ b/arch/arm64/kvm/rme.c
>>> @@ -79,6 +79,12 @@ static phys_addr_t __alloc_delegated_page(struct
>>> realm *realm,
>>>        return phys;
>>>    }
>>>    +static phys_addr_t alloc_delegated_page(struct realm *realm,
>>> +                    struct kvm_mmu_memory_cache *mc)
>>> +{
>>> +    return __alloc_delegated_page(realm, mc, GFP_KERNEL);
>>> +}
>>> +
>>>    static void free_delegated_page(struct realm *realm, phys_addr_t phys)
>>>    {
>>>        if (realm->spare_page == PHYS_ADDR_MAX) {
>>> @@ -94,6 +100,151 @@ static void free_delegated_page(struct realm
>>> *realm, phys_addr_t phys)
>>>        free_page((unsigned long)phys_to_virt(phys));
>>>    }
>>>    +static int realm_rtt_create(struct realm *realm,
>>> +                unsigned long addr,
>>> +                int level,
>>> +                phys_addr_t phys)
>>> +{
>>> +    addr = ALIGN_DOWN(addr, rme_rtt_level_mapsize(level - 1));
>>> +    return rmi_rtt_create(virt_to_phys(realm->rd), phys, addr, level);
>>> +}
>>> +
>>> +static int realm_rtt_fold(struct realm *realm,
>>> +              unsigned long addr,
>>> +              int level,
>>> +              phys_addr_t *rtt_granule)
>>> +{
>>> +    unsigned long out_rtt;
>>> +    int ret;
>>> +
>>> +    ret = rmi_rtt_fold(virt_to_phys(realm->rd), addr, level, &out_rtt);
>>> +
>>> +    if (RMI_RETURN_STATUS(ret) == RMI_SUCCESS && rtt_granule)
>>> +        *rtt_granule = out_rtt;
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static int realm_destroy_protected(struct realm *realm,
>>> +                   unsigned long ipa,
>>> +                   unsigned long *next_addr)
>>> +{
>>> +    unsigned long rd = virt_to_phys(realm->rd);
>>> +    unsigned long addr;
>>> +    phys_addr_t rtt;
>>> +    int ret;
>>> +
>>> +loop:
>>> +    ret = rmi_data_destroy(rd, ipa, &addr, next_addr);
>>> +    if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
>>> +        if (*next_addr > ipa)
>>> +            return 0; /* UNASSIGNED */
>>> +        rtt = alloc_delegated_page(realm, NULL);
>>> +        if (WARN_ON(rtt == PHYS_ADDR_MAX))
>>> +            return -1;
>>> +        /* ASSIGNED - ipa is mapped as a block, so split */
>>> +        ret = realm_rtt_create(realm, ipa,
>>> +                       RMI_RETURN_INDEX(ret) + 1, rtt);
>>
>> Could we not go all the way to L3 (rather than 1 level deeper) and try
>> again ? That way, we are covered for block mappings at L1 (1G).
> 
> I don't think this situation can happen. The spec states that for
> RMI_ERROR_RTT either top>ipa (UNASSIGNED, in which case we've bailed out
> above) or top==ipa (ASSIGNED) in which case RMI_RETURN_INDEX(ret) must
> equal 2.

It could be 1 as RMM spec allows Block mapping at L1. But if we don't
fold to that level in KVM, the current code may be fine.

> 
> So in this case realm_rtt_create() will always be called with level=3.
> 
> I can simplify this by explicitly passing 3 and adding a WARN_ON() for
> RMI_RETURN_INDEX(ret) != 2.

Yes please, if we don't fold all the way up to L1.

Thanks
Suzuki



^ permalink raw reply	[flat|nested] 104+ messages in thread

end of thread, other threads:[~2024-05-02 10:17 UTC | newest]

Thread overview: 104+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-12  8:40 [v2] Support for Arm CCA VMs on Linux Steven Price
2024-04-11 18:54 ` Itaru Kitayama
2024-04-15  8:14   ` Steven Price
2024-04-12  8:41 ` [PATCH v2 00/14] arm64: Support for running as a guest in Arm CCA Steven Price
2024-04-12  8:42   ` [PATCH v2 01/14] arm64: rsi: Add RSI definitions Steven Price
2024-04-12  8:42   ` [PATCH v2 02/14] arm64: Detect if in a realm and set RIPAS RAM Steven Price
2024-04-12  8:42   ` [PATCH v2 03/14] arm64: realm: Query IPA size from the RMM Steven Price
2024-04-12  8:42   ` [PATCH v2 04/14] arm64: Mark all I/O as non-secure shared Steven Price
2024-04-12  8:42   ` [PATCH v2 05/14] fixmap: Allow architecture overriding set_fixmap_io Steven Price
2024-04-12  8:42   ` [PATCH v2 06/14] arm64: Override set_fixmap_io Steven Price
2024-04-12  8:42   ` [PATCH v2 07/14] arm64: Make the PHYS_MASK_SHIFT dynamic Steven Price
2024-04-12  8:42   ` [PATCH v2 08/14] arm64: Enforce bounce buffers for realm DMA Steven Price
2024-04-12  8:42   ` [PATCH v2 09/14] arm64: Enable memory encrypt for Realms Steven Price
2024-04-15  3:13     ` kernel test robot
2024-04-25 13:42       ` Suzuki K Poulose
2024-04-25 15:52         ` Steven Price
2024-04-25 16:29         ` Suzuki K Poulose
2024-04-25 18:16           ` Emanuele Rocca
2024-04-12  8:42   ` [PATCH v2 10/14] arm64: Force device mappings to be non-secure shared Steven Price
2024-04-12  8:42   ` [PATCH v2 11/14] efi: arm64: Map Device with Prot Shared Steven Price
2024-04-12  8:42   ` [PATCH v2 12/14] arm64: realm: Support nonsecure ITS emulation shared Steven Price
2024-04-12  8:42   ` [PATCH v2 13/14] arm64: rsi: Interfaces to query attestation token Steven Price
2024-04-12  8:42   ` [PATCH v2 14/14] virt: arm-cca-guest: TSM_REPORT support for realms Steven Price
2024-04-24 13:06     ` Thomas Fossati
2024-04-24 13:27       ` Suzuki K Poulose
2024-04-24 13:19     ` Suzuki K Poulose
2024-04-12  8:42 ` [PATCH v2 00/43] arm64: Support for Arm CCA in KVM Steven Price
2024-04-12  8:42   ` [PATCH v2 01/43] KVM: Prepare for handling only shared mappings in mmu_notifier events Steven Price
2024-04-25  9:48     ` Fuad Tabba
2024-04-25 15:58       ` Steven Price
2024-04-25 22:56         ` Sean Christopherson
2024-04-12  8:42   ` [PATCH v2 02/43] kvm: arm64: pgtable: Track the number of pages in the entry level Steven Price
2024-04-12  8:42   ` [PATCH v2 03/43] kvm: arm64: Include kvm_emulate.h in kvm/arm_psci.h Steven Price
2024-04-12  8:42   ` [PATCH v2 04/43] arm64: RME: Handle Granule Protection Faults (GPFs) Steven Price
2024-04-16 11:17     ` Suzuki K Poulose
2024-04-18 13:17       ` Steven Price
2024-04-12  8:42   ` [PATCH v2 05/43] arm64: RME: Add SMC definitions for calling the RMM Steven Price
2024-04-16 12:38     ` Suzuki K Poulose
2024-04-18 13:17       ` Steven Price
2024-04-12  8:42   ` [PATCH v2 06/43] arm64: RME: Add wrappers for RMI calls Steven Price
2024-04-16 13:14     ` Suzuki K Poulose
2024-04-19 11:18       ` Steven Price
2024-04-12  8:42   ` [PATCH v2 07/43] arm64: RME: Check for RME support at KVM init Steven Price
2024-04-16 13:30     ` Suzuki K Poulose
2024-04-22 15:39       ` Steven Price
2024-04-12  8:42   ` [PATCH v2 08/43] arm64: RME: Define the user ABI Steven Price
2024-04-12  8:42   ` [PATCH v2 09/43] arm64: RME: ioctls to create and configure realms Steven Price
2024-04-17  9:51     ` Suzuki K Poulose
2024-04-22 16:33       ` Steven Price
2024-04-18 16:04     ` Suzuki K Poulose
2024-04-12  8:42   ` [PATCH v2 10/43] kvm: arm64: Expose debug HW register numbers for Realm Steven Price
2024-04-12  8:42   ` [PATCH v2 11/43] arm64: kvm: Allow passing machine type in KVM creation Steven Price
2024-04-17 10:20     ` Suzuki K Poulose
2024-04-12  8:42   ` [PATCH v2 12/43] arm64: RME: Keep a spare page delegated to the RMM Steven Price
2024-04-17 10:19     ` Suzuki K Poulose
2024-04-12  8:42   ` [PATCH v2 13/43] arm64: RME: RTT handling Steven Price
2024-04-17 13:37     ` Suzuki K Poulose
2024-04-24 10:59       ` Steven Price
2024-04-12  8:42   ` [PATCH v2 14/43] arm64: RME: Allocate/free RECs to match vCPUs Steven Price
2024-04-18  9:23     ` Suzuki K Poulose
2024-04-12  8:42   ` [PATCH v2 15/43] arm64: RME: Support for the VGIC in realms Steven Price
2024-04-12  8:42   ` [PATCH v2 16/43] KVM: arm64: Support timers in realm RECs Steven Price
2024-04-18  9:30     ` Suzuki K Poulose
2024-04-12  8:42   ` [PATCH v2 17/43] arm64: RME: Allow VMM to set RIPAS Steven Price
2024-04-19  9:34     ` Suzuki K Poulose
2024-04-19 10:20       ` Suzuki K Poulose
2024-05-01 15:47       ` Steven Price
2024-05-02 10:16         ` Suzuki K Poulose
2024-04-25  9:53     ` Fuad Tabba
2024-05-01 14:27     ` Jean-Philippe Brucker
2024-05-01 14:56       ` Suzuki K Poulose
2024-04-12  8:42   ` [PATCH v2 18/43] arm64: RME: Handle realm enter/exit Steven Price
2024-04-19 13:00     ` Suzuki K Poulose
2024-04-12  8:42   ` [PATCH v2 19/43] KVM: arm64: Handle realm MMIO emulation Steven Price
2024-04-12  8:42   ` [PATCH v2 20/43] arm64: RME: Allow populating initial contents Steven Price
2024-04-19 13:17     ` Suzuki K Poulose
2024-04-12  8:42   ` [PATCH v2 21/43] arm64: RME: Runtime faulting of memory Steven Price
2024-04-25 10:43     ` Fuad Tabba
2024-04-12  8:42   ` [PATCH v2 22/43] KVM: arm64: Handle realm VCPU load Steven Price
2024-04-12  8:42   ` [PATCH v2 23/43] KVM: arm64: Validate register access for a Realm VM Steven Price
2024-04-12  8:42   ` [PATCH v2 24/43] KVM: arm64: Handle Realm PSCI requests Steven Price
2024-04-12  8:42   ` [PATCH v2 25/43] KVM: arm64: WARN on injected undef exceptions Steven Price
2024-04-12  8:42   ` [PATCH v2 26/43] arm64: Don't expose stolen time for realm guests Steven Price
2024-04-12  8:42   ` [PATCH v2 27/43] arm64: rme: allow userspace to inject aborts Steven Price
2024-04-12  8:42   ` [PATCH v2 28/43] arm64: rme: support RSI_HOST_CALL Steven Price
2024-04-12  8:42   ` [PATCH v2 29/43] arm64: rme: Allow checking SVE on VM instance Steven Price
2024-04-12  8:42   ` [PATCH v2 30/43] arm64: RME: Always use 4k pages for realms Steven Price
2024-04-12  8:42   ` [PATCH v2 31/43] arm64: rme: Prevent Device mappings for Realms Steven Price
2024-04-12  8:42   ` [PATCH v2 32/43] arm_pmu: Provide a mechanism for disabling the physical IRQ Steven Price
2024-04-12  8:42   ` [PATCH v2 33/43] arm64: rme: Enable PMU support with a realm guest Steven Price
2024-04-13 23:44     ` kernel test robot
2024-04-18 16:06       ` Suzuki K Poulose
2024-04-12  8:43   ` [PATCH v2 34/43] kvm: rme: Hide KVM_CAP_READONLY_MEM for realm guests Steven Price
2024-04-12  8:43   ` [PATCH v2 35/43] arm64: RME: Propagate number of breakpoints and watchpoints to userspace Steven Price
2024-04-12  8:43   ` [PATCH v2 36/43] arm64: RME: Set breakpoint parameters through SET_ONE_REG Steven Price
2024-04-12  8:43   ` [PATCH v2 37/43] arm64: RME: Initialize PMCR.N with number counter supported by RMM Steven Price
2024-04-12  8:43   ` [PATCH v2 38/43] arm64: RME: Propagate max SVE vector length from RMM Steven Price
2024-04-12  8:43   ` [PATCH v2 39/43] arm64: RME: Configure max SVE vector length for a Realm Steven Price
2024-04-12  8:43   ` [PATCH v2 40/43] arm64: RME: Provide register list for unfinalized RME RECs Steven Price
2024-04-12  8:43   ` [PATCH v2 41/43] arm64: RME: Provide accurate register list Steven Price
2024-04-12  8:43   ` [PATCH v2 42/43] arm64: kvm: Expose support for private memory Steven Price
2024-04-25 14:44     ` Fuad Tabba
2024-04-12  8:43   ` [PATCH v2 43/43] KVM: arm64: Allow activating realms Steven Price
2024-04-12 16:52 ` [v2] Support for Arm CCA VMs on Linux Jean-Philippe Brucker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).