linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/10] arm64: Stolen time support
@ 2019-08-21 15:36 Steven Price
  2019-08-21 15:36 ` [PATCH v3 01/10] KVM: arm64: Document PV-time interface Steven Price
                   ` (9 more replies)
  0 siblings, 10 replies; 36+ messages in thread
From: Steven Price @ 2019-08-21 15:36 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: Steven Price, Catalin Marinas, Paolo Bonzini,
	Radim Krčmář,
	Russell King, James Morse, Julien Thierry, Suzuki K Pouloze,
	Mark Rutland, kvm, linux-doc, linux-kernel

This series add support for paravirtualized time for arm64 guests and
KVM hosts following the specification in Arm's document DEN 0057A:

https://developer.arm.com/docs/den0057/a

It implements support for stolen time, allowing the guest to
identify time when it is forcibly not executing.

It doesn't implement support for Live Physical Time (LPT) as there are
some concerns about the overheads and approach in the above
specification, and I expect an updated version of the specification to
be released soon with just the stolen time parts.

NOTE: Patches 8 and 9 will conflict with Mark Rutland's series[1] cleaning
up the SMCCC conduit. I do feel that the addition of an _invoke() call
makes a number of call sites cleaner and it should be possible to
integrate both this and Mark's other cleanups.

[1] https://lore.kernel.org/linux-arm-kernel/20190809132245.43505-1-mark.rutland@arm.com/

Also available as a git tree:
git://linux-arm.org/linux-sp.git stolen_time/v3

Changes from v2:
https://lore.kernel.org/lkml/20190819140436.12207-1-steven.price@arm.com/
 * Switched from using gfn_to_hva_cache to a new macro kvm_put_guest()
   that can provide the single-copy atomicity required (on arm64). This
   macro is added in patch 4.
 * Tidied up the locking for kvm_update_stolen_time().
   pagefault_disable() was unnecessary and the caller didn't need to
   take kvm->srcu as the function does it itself.
 * Removed struct kvm_arch_pvtime from the arm implementation, replaced
   instead with inline static functions which are empty for arm.
 * Fixed a few checkpatch --strict warnings.

Changes from v1:
https://lore.kernel.org/lkml/20190802145017.42543-1-steven.price@arm.com/
 * Host kernel no longer allocates the stolen time structure, instead it
   is allocated by user space. This means the save/restore functionality
   can be removed.
 * Refactored the code so arm has stub implementations and to avoid
   initcall
 * Rebased to pick up Documentation/{virt->virtual} change
 * Bunch of typo fixes

Christoffer Dall (1):
  KVM: arm/arm64: Factor out hypercall handling from PSCI code

Steven Price (9):
  KVM: arm64: Document PV-time interface
  KVM: arm64: Implement PV_FEATURES call
  KVM: Implement kvm_put_guest()
  KVM: arm64: Support stolen time reporting via shared structure
  KVM: Allow kvm_device_ops to be const
  KVM: arm64: Provide a PV_TIME device to user space
  arm/arm64: Provide a wrapper for SMCCC 1.1 calls
  arm/arm64: Make use of the SMCCC 1.1 wrapper
  arm64: Retrieve stolen time as paravirtualized guest

 Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++
 arch/arm/include/asm/kvm_host.h       |  30 +++++
 arch/arm/kvm/Makefile                 |   2 +-
 arch/arm/kvm/handle_exit.c            |   2 +-
 arch/arm/mm/proc-v7-bugs.c            |  13 +-
 arch/arm64/include/asm/kvm_host.h     |  28 +++-
 arch/arm64/include/asm/paravirt.h     |   9 +-
 arch/arm64/include/asm/pvclock-abi.h  |  17 +++
 arch/arm64/include/uapi/asm/kvm.h     |   8 ++
 arch/arm64/kernel/cpu_errata.c        |  80 ++++-------
 arch/arm64/kernel/paravirt.c          | 148 +++++++++++++++++++++
 arch/arm64/kernel/time.c              |   3 +
 arch/arm64/kvm/Kconfig                |   1 +
 arch/arm64/kvm/Makefile               |   2 +
 arch/arm64/kvm/handle_exit.c          |   4 +-
 include/kvm/arm_hypercalls.h          |  43 ++++++
 include/kvm/arm_psci.h                |   2 +-
 include/linux/arm-smccc.h             |  58 ++++++++
 include/linux/cpuhotplug.h            |   1 +
 include/linux/kvm_host.h              |  28 +++-
 include/linux/kvm_types.h             |   2 +
 include/uapi/linux/kvm.h              |   2 +
 virt/kvm/arm/arm.c                    |  11 ++
 virt/kvm/arm/hypercalls.c             |  68 ++++++++++
 virt/kvm/arm/psci.c                   |  84 +-----------
 virt/kvm/arm/pvtime.c                 | 182 ++++++++++++++++++++++++++
 virt/kvm/kvm_main.c                   |   6 +-
 27 files changed, 780 insertions(+), 154 deletions(-)
 create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
 create mode 100644 arch/arm64/include/asm/pvclock-abi.h
 create mode 100644 include/kvm/arm_hypercalls.h
 create mode 100644 virt/kvm/arm/hypercalls.c
 create mode 100644 virt/kvm/arm/pvtime.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 01/10] KVM: arm64: Document PV-time interface
  2019-08-21 15:36 [PATCH v3 00/10] arm64: Stolen time support Steven Price
@ 2019-08-21 15:36 ` Steven Price
  2019-08-27  8:44   ` Christoffer Dall
                     ` (2 more replies)
  2019-08-21 15:36 ` [PATCH v3 02/10] KVM: arm/arm64: Factor out hypercall handling from PSCI code Steven Price
                   ` (8 subsequent siblings)
  9 siblings, 3 replies; 36+ messages in thread
From: Steven Price @ 2019-08-21 15:36 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: Steven Price, Catalin Marinas, Paolo Bonzini,
	Radim Krčmář,
	Russell King, James Morse, Julien Thierry, Suzuki K Pouloze,
	Mark Rutland, kvm, linux-doc, linux-kernel

Introduce a paravirtualization interface for KVM/arm64 based on the
"Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.

This only adds the details about "Stolen Time" as the details of "Live
Physical Time" have not been fully agreed.

User space can specify a reserved area of memory for the guest and
inform KVM to populate the memory with information on time that the host
kernel has stolen from the guest.

A hypercall interface is provided for the guest to interrogate the
hypervisor's support for this interface and the location of the shared
memory structures.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
 1 file changed, 100 insertions(+)
 create mode 100644 Documentation/virt/kvm/arm/pvtime.txt

diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
new file mode 100644
index 000000000000..1ceb118694e7
--- /dev/null
+++ b/Documentation/virt/kvm/arm/pvtime.txt
@@ -0,0 +1,100 @@
+Paravirtualized time support for arm64
+======================================
+
+Arm specification DEN0057/A defined a standard for paravirtualised time
+support for AArch64 guests:
+
+https://developer.arm.com/docs/den0057/a
+
+KVM/arm64 implements the stolen time part of this specification by providing
+some hypervisor service calls to support a paravirtualized guest obtaining a
+view of the amount of time stolen from its execution.
+
+Two new SMCCC compatible hypercalls are defined:
+
+PV_FEATURES 0xC5000020
+PV_TIME_ST  0xC5000022
+
+These are only available in the SMC64/HVC64 calling convention as
+paravirtualized time is not available to 32 bit Arm guests. The existence of
+the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
+mechanism before calling it.
+
+PV_FEATURES
+    Function ID:  (uint32)  : 0xC5000020
+    PV_func_id:   (uint32)  : Either PV_TIME_LPT or PV_TIME_ST
+    Return value: (int32)   : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
+                              PV-time feature is supported by the hypervisor.
+
+PV_TIME_ST
+    Function ID:  (uint32)  : 0xC5000022
+    Return value: (int64)   : IPA of the stolen time data structure for this
+                              (V)CPU. On failure:
+                              NOT_SUPPORTED (-1)
+
+The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
+with inner and outer write back caching attributes, in the inner shareable
+domain. A total of 16 bytes from the IPA returned are guaranteed to be
+meaningfully filled by the hypervisor (see structure below).
+
+PV_TIME_ST returns the structure for the calling VCPU.
+
+Stolen Time
+-----------
+
+The structure pointed to by the PV_TIME_ST hypercall is as follows:
+
+  Field       | Byte Length | Byte Offset | Description
+  ----------- | ----------- | ----------- | --------------------------
+  Revision    |      4      |      0      | Must be 0 for version 0.1
+  Attributes  |      4      |      4      | Must be 0
+  Stolen time |      8      |      8      | Stolen time in unsigned
+              |             |             | nanoseconds indicating how
+              |             |             | much time this VCPU thread
+              |             |             | was involuntarily not
+              |             |             | running on a physical CPU.
+
+The structure will be updated by the hypervisor prior to scheduling a VCPU. It
+will be present within a reserved region of the normal memory given to the
+guest. The guest should not attempt to write into this memory. There is a
+structure per VCPU of the guest.
+
+User space interface
+====================
+
+User space can request that KVM provide the paravirtualized time interface to
+a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
+
+    struct kvm_create_device pvtime_device = {
+            .type = KVM_DEV_TYPE_ARM_PV_TIME,
+            .attr = 0,
+            .flags = 0,
+    };
+
+    pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
+
+Creation of the device should be done after creating the vCPUs of the virtual
+machine.
+
+The IPA of the structures must be given to KVM. This is the base address
+of an array of stolen time structures (one for each VCPU). The base address
+must be page aligned. The size must be at least 64 * number of VCPUs and be a
+multiple of PAGE_SIZE.
+
+The memory for these structures should be added to the guest in the usual
+manner (e.g. using KVM_SET_USER_MEMORY_REGION).
+
+For example:
+
+    struct kvm_dev_arm_st_region region = {
+            .gpa = <IPA of guest base address>,
+            .size = <size in bytes>
+    };
+
+    struct kvm_device_attr st_base = {
+            .group = KVM_DEV_ARM_PV_TIME_PADDR,
+            .attr = KVM_DEV_ARM_PV_TIME_ST,
+            .addr = (u64)&region
+    };
+
+    ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, &st_base);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 02/10] KVM: arm/arm64: Factor out hypercall handling from PSCI code
  2019-08-21 15:36 [PATCH v3 00/10] arm64: Stolen time support Steven Price
  2019-08-21 15:36 ` [PATCH v3 01/10] KVM: arm64: Document PV-time interface Steven Price
@ 2019-08-21 15:36 ` Steven Price
  2019-08-21 15:36 ` [PATCH v3 03/10] KVM: arm64: Implement PV_FEATURES call Steven Price
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Steven Price @ 2019-08-21 15:36 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: Steven Price, Catalin Marinas, Paolo Bonzini,
	Radim Krčmář,
	Russell King, James Morse, Julien Thierry, Suzuki K Pouloze,
	Mark Rutland, kvm, linux-doc, linux-kernel, Christoffer Dall

From: Christoffer Dall <christoffer.dall@arm.com>

We currently intertwine the KVM PSCI implementation with the general
dispatch of hypercall handling, which makes perfect sense because PSCI
is the only category of hypercalls we support.

However, as we are about to support additional hypercalls, factor out
this functionality into a separate hypercall handler file.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[steven.price@arm.com: rebased]
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm/kvm/Makefile        |  2 +-
 arch/arm/kvm/handle_exit.c   |  2 +-
 arch/arm64/kvm/Makefile      |  1 +
 arch/arm64/kvm/handle_exit.c |  4 +-
 include/kvm/arm_hypercalls.h | 43 ++++++++++++++++++
 include/kvm/arm_psci.h       |  2 +-
 virt/kvm/arm/hypercalls.c    | 59 +++++++++++++++++++++++++
 virt/kvm/arm/psci.c          | 84 +-----------------------------------
 8 files changed, 110 insertions(+), 87 deletions(-)
 create mode 100644 include/kvm/arm_hypercalls.h
 create mode 100644 virt/kvm/arm/hypercalls.c

diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 531e59f5be9c..ef4d01088efc 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -23,7 +23,7 @@ obj-y += kvm-arm.o init.o interrupts.o
 obj-y += handle_exit.o guest.o emulate.o reset.o
 obj-y += coproc.o coproc_a15.o coproc_a7.o   vgic-v3-coproc.o
 obj-y += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio.o
-obj-y += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
+obj-y += $(KVM)/arm/psci.o $(KVM)/arm/perf.o $(KVM)/arm/hypercalls.o
 obj-y += $(KVM)/arm/aarch32.o
 
 obj-y += $(KVM)/arm/vgic/vgic.o
diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c
index 2a6a1394d26e..e58a89d2f13f 100644
--- a/arch/arm/kvm/handle_exit.c
+++ b/arch/arm/kvm/handle_exit.c
@@ -9,7 +9,7 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_coproc.h>
 #include <asm/kvm_mmu.h>
-#include <kvm/arm_psci.h>
+#include <kvm/arm_hypercalls.h>
 #include <trace/events/kvm.h>
 
 #include "trace.h"
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 3ac1a64d2fb9..73dce4d47d47 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_KVM_ARM_HOST) += hyp/
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
+kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hypercalls.o
 
 kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o
 kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 706cca23f0d2..aacfc55de44c 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -11,8 +11,6 @@
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 
-#include <kvm/arm_psci.h>
-
 #include <asm/esr.h>
 #include <asm/exception.h>
 #include <asm/kvm_asm.h>
@@ -22,6 +20,8 @@
 #include <asm/debug-monitors.h>
 #include <asm/traps.h>
 
+#include <kvm/arm_hypercalls.h>
+
 #define CREATE_TRACE_POINTS
 #include "trace.h"
 
diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
new file mode 100644
index 000000000000..0e2509d27910
--- /dev/null
+++ b/include/kvm/arm_hypercalls.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2019 Arm Ltd. */
+
+#ifndef __KVM_ARM_HYPERCALLS_H
+#define __KVM_ARM_HYPERCALLS_H
+
+#include <asm/kvm_emulate.h>
+
+int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
+
+static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
+{
+	return vcpu_get_reg(vcpu, 0);
+}
+
+static inline unsigned long smccc_get_arg1(struct kvm_vcpu *vcpu)
+{
+	return vcpu_get_reg(vcpu, 1);
+}
+
+static inline unsigned long smccc_get_arg2(struct kvm_vcpu *vcpu)
+{
+	return vcpu_get_reg(vcpu, 2);
+}
+
+static inline unsigned long smccc_get_arg3(struct kvm_vcpu *vcpu)
+{
+	return vcpu_get_reg(vcpu, 3);
+}
+
+static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
+				    unsigned long a0,
+				    unsigned long a1,
+				    unsigned long a2,
+				    unsigned long a3)
+{
+	vcpu_set_reg(vcpu, 0, a0);
+	vcpu_set_reg(vcpu, 1, a1);
+	vcpu_set_reg(vcpu, 2, a2);
+	vcpu_set_reg(vcpu, 3, a3);
+}
+
+#endif
diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
index 632e78bdef4d..5b58bd2fe088 100644
--- a/include/kvm/arm_psci.h
+++ b/include/kvm/arm_psci.h
@@ -40,7 +40,7 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu, struct kvm *kvm)
 }
 
 
-int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
+int kvm_psci_call(struct kvm_vcpu *vcpu);
 
 struct kvm_one_reg;
 
diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
new file mode 100644
index 000000000000..f875241bd030
--- /dev/null
+++ b/virt/kvm/arm/hypercalls.c
@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2019 Arm Ltd.
+
+#include <linux/arm-smccc.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_emulate.h>
+
+#include <kvm/arm_hypercalls.h>
+#include <kvm/arm_psci.h>
+
+int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
+{
+	u32 func_id = smccc_get_function(vcpu);
+	u32 val = SMCCC_RET_NOT_SUPPORTED;
+	u32 feature;
+
+	switch (func_id) {
+	case ARM_SMCCC_VERSION_FUNC_ID:
+		val = ARM_SMCCC_VERSION_1_1;
+		break;
+	case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
+		feature = smccc_get_arg1(vcpu);
+		switch (feature) {
+		case ARM_SMCCC_ARCH_WORKAROUND_1:
+			switch (kvm_arm_harden_branch_predictor()) {
+			case KVM_BP_HARDEN_UNKNOWN:
+				break;
+			case KVM_BP_HARDEN_WA_NEEDED:
+				val = SMCCC_RET_SUCCESS;
+				break;
+			case KVM_BP_HARDEN_NOT_REQUIRED:
+				val = SMCCC_RET_NOT_REQUIRED;
+				break;
+			}
+			break;
+		case ARM_SMCCC_ARCH_WORKAROUND_2:
+			switch (kvm_arm_have_ssbd()) {
+			case KVM_SSBD_FORCE_DISABLE:
+			case KVM_SSBD_UNKNOWN:
+				break;
+			case KVM_SSBD_KERNEL:
+				val = SMCCC_RET_SUCCESS;
+				break;
+			case KVM_SSBD_FORCE_ENABLE:
+			case KVM_SSBD_MITIGATED:
+				val = SMCCC_RET_NOT_REQUIRED;
+				break;
+			}
+			break;
+		}
+		break;
+	default:
+		return kvm_psci_call(vcpu);
+	}
+
+	smccc_set_retval(vcpu, val, 0, 0, 0);
+	return 1;
+}
diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
index 87927f7e1ee7..17e2bdd4b76f 100644
--- a/virt/kvm/arm/psci.c
+++ b/virt/kvm/arm/psci.c
@@ -15,6 +15,7 @@
 #include <asm/kvm_host.h>
 
 #include <kvm/arm_psci.h>
+#include <kvm/arm_hypercalls.h>
 
 /*
  * This is an implementation of the Power State Coordination Interface
@@ -23,38 +24,6 @@
 
 #define AFFINITY_MASK(level)	~((0x1UL << ((level) * MPIDR_LEVEL_BITS)) - 1)
 
-static u32 smccc_get_function(struct kvm_vcpu *vcpu)
-{
-	return vcpu_get_reg(vcpu, 0);
-}
-
-static unsigned long smccc_get_arg1(struct kvm_vcpu *vcpu)
-{
-	return vcpu_get_reg(vcpu, 1);
-}
-
-static unsigned long smccc_get_arg2(struct kvm_vcpu *vcpu)
-{
-	return vcpu_get_reg(vcpu, 2);
-}
-
-static unsigned long smccc_get_arg3(struct kvm_vcpu *vcpu)
-{
-	return vcpu_get_reg(vcpu, 3);
-}
-
-static void smccc_set_retval(struct kvm_vcpu *vcpu,
-			     unsigned long a0,
-			     unsigned long a1,
-			     unsigned long a2,
-			     unsigned long a3)
-{
-	vcpu_set_reg(vcpu, 0, a0);
-	vcpu_set_reg(vcpu, 1, a1);
-	vcpu_set_reg(vcpu, 2, a2);
-	vcpu_set_reg(vcpu, 3, a3);
-}
-
 static unsigned long psci_affinity_mask(unsigned long affinity_level)
 {
 	if (affinity_level <= 3)
@@ -373,7 +342,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
  * Errors:
  * -EINVAL: Unrecognized PSCI function
  */
-static int kvm_psci_call(struct kvm_vcpu *vcpu)
+int kvm_psci_call(struct kvm_vcpu *vcpu)
 {
 	switch (kvm_psci_version(vcpu, vcpu->kvm)) {
 	case KVM_ARM_PSCI_1_0:
@@ -387,55 +356,6 @@ static int kvm_psci_call(struct kvm_vcpu *vcpu)
 	};
 }
 
-int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
-{
-	u32 func_id = smccc_get_function(vcpu);
-	u32 val = SMCCC_RET_NOT_SUPPORTED;
-	u32 feature;
-
-	switch (func_id) {
-	case ARM_SMCCC_VERSION_FUNC_ID:
-		val = ARM_SMCCC_VERSION_1_1;
-		break;
-	case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
-		feature = smccc_get_arg1(vcpu);
-		switch(feature) {
-		case ARM_SMCCC_ARCH_WORKAROUND_1:
-			switch (kvm_arm_harden_branch_predictor()) {
-			case KVM_BP_HARDEN_UNKNOWN:
-				break;
-			case KVM_BP_HARDEN_WA_NEEDED:
-				val = SMCCC_RET_SUCCESS;
-				break;
-			case KVM_BP_HARDEN_NOT_REQUIRED:
-				val = SMCCC_RET_NOT_REQUIRED;
-				break;
-			}
-			break;
-		case ARM_SMCCC_ARCH_WORKAROUND_2:
-			switch (kvm_arm_have_ssbd()) {
-			case KVM_SSBD_FORCE_DISABLE:
-			case KVM_SSBD_UNKNOWN:
-				break;
-			case KVM_SSBD_KERNEL:
-				val = SMCCC_RET_SUCCESS;
-				break;
-			case KVM_SSBD_FORCE_ENABLE:
-			case KVM_SSBD_MITIGATED:
-				val = SMCCC_RET_NOT_REQUIRED;
-				break;
-			}
-			break;
-		}
-		break;
-	default:
-		return kvm_psci_call(vcpu);
-	}
-
-	smccc_set_retval(vcpu, val, 0, 0, 0);
-	return 1;
-}
-
 int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
 {
 	return 3;		/* PSCI version and two workaround registers */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 03/10] KVM: arm64: Implement PV_FEATURES call
  2019-08-21 15:36 [PATCH v3 00/10] arm64: Stolen time support Steven Price
  2019-08-21 15:36 ` [PATCH v3 01/10] KVM: arm64: Document PV-time interface Steven Price
  2019-08-21 15:36 ` [PATCH v3 02/10] KVM: arm/arm64: Factor out hypercall handling from PSCI code Steven Price
@ 2019-08-21 15:36 ` Steven Price
  2019-08-21 15:36 ` [PATCH v3 04/10] KVM: Implement kvm_put_guest() Steven Price
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Steven Price @ 2019-08-21 15:36 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: Steven Price, Catalin Marinas, Paolo Bonzini,
	Radim Krčmář,
	Russell King, James Morse, Julien Thierry, Suzuki K Pouloze,
	Mark Rutland, kvm, linux-doc, linux-kernel

This provides a mechanism for querying which paravirtualized features
are available in this hypervisor.

Also add the header file which defines the ABI for the paravirtualized
clock features we're about to add.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm/include/asm/kvm_host.h      |  6 ++++++
 arch/arm64/include/asm/kvm_host.h    |  2 ++
 arch/arm64/include/asm/pvclock-abi.h | 17 +++++++++++++++++
 arch/arm64/kvm/Makefile              |  1 +
 include/linux/arm-smccc.h            | 14 ++++++++++++++
 virt/kvm/arm/hypercalls.c            |  6 ++++++
 virt/kvm/arm/pvtime.c                | 21 +++++++++++++++++++++
 7 files changed, 67 insertions(+)
 create mode 100644 arch/arm64/include/asm/pvclock-abi.h
 create mode 100644 virt/kvm/arm/pvtime.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 8a37c8e89777..369b5d2d54bf 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -7,6 +7,7 @@
 #ifndef __ARM_KVM_HOST_H__
 #define __ARM_KVM_HOST_H__
 
+#include <linux/arm-smccc.h>
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/kvm_types.h>
@@ -323,6 +324,11 @@ static inline int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
+static inline int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
+{
+	return SMCCC_RET_NOT_SUPPORTED;
+}
+
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f656169db8c3..583b3639062a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -478,6 +478,8 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct kvm_run *run,
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
+int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu);
+
 void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
diff --git a/arch/arm64/include/asm/pvclock-abi.h b/arch/arm64/include/asm/pvclock-abi.h
new file mode 100644
index 000000000000..c4f1c0a0789c
--- /dev/null
+++ b/arch/arm64/include/asm/pvclock-abi.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2019 Arm Ltd. */
+
+#ifndef __ASM_PVCLOCK_ABI_H
+#define __ASM_PVCLOCK_ABI_H
+
+/* The below structure is defined in ARM DEN0057A */
+
+struct pvclock_vcpu_stolen_time {
+	__le32 revision;
+	__le32 attributes;
+	__le64 stolen_time;
+	/* Structure must be 64 byte aligned, pad to that size */
+	u8 padding[48];
+} __packed;
+
+#endif
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 73dce4d47d47..5ffbdc39e780 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -14,6 +14,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/e
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hypercalls.o
+kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/pvtime.o
 
 kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o
 kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 080012a6f025..e7f129f26ebd 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -45,6 +45,7 @@
 #define ARM_SMCCC_OWNER_SIP		2
 #define ARM_SMCCC_OWNER_OEM		3
 #define ARM_SMCCC_OWNER_STANDARD	4
+#define ARM_SMCCC_OWNER_STANDARD_HYP	5
 #define ARM_SMCCC_OWNER_TRUSTED_APP	48
 #define ARM_SMCCC_OWNER_TRUSTED_APP_END	49
 #define ARM_SMCCC_OWNER_TRUSTED_OS	50
@@ -302,5 +303,18 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
 #define SMCCC_RET_NOT_SUPPORTED			-1
 #define SMCCC_RET_NOT_REQUIRED			-2
 
+/* Paravirtualised time calls (defined by ARM DEN0057A) */
+#define ARM_SMCCC_HV_PV_FEATURES				\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
+			   ARM_SMCCC_SMC_64,			\
+			   ARM_SMCCC_OWNER_STANDARD_HYP,	\
+			   0x20)
+
+#define ARM_SMCCC_HV_PV_TIME_ST					\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
+			   ARM_SMCCC_SMC_64,			\
+			   ARM_SMCCC_OWNER_STANDARD_HYP,	\
+			   0x22)
+
 #endif /*__ASSEMBLY__*/
 #endif /*__LINUX_ARM_SMCCC_H*/
diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
index f875241bd030..63ae629c466a 100644
--- a/virt/kvm/arm/hypercalls.c
+++ b/virt/kvm/arm/hypercalls.c
@@ -48,8 +48,14 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 				break;
 			}
 			break;
+		case ARM_SMCCC_HV_PV_FEATURES:
+			val = SMCCC_RET_SUCCESS;
+			break;
 		}
 		break;
+	case ARM_SMCCC_HV_PV_FEATURES:
+		val = kvm_hypercall_pv_features(vcpu);
+		break;
 	default:
 		return kvm_psci_call(vcpu);
 	}
diff --git a/virt/kvm/arm/pvtime.c b/virt/kvm/arm/pvtime.c
new file mode 100644
index 000000000000..6201d71cb1f8
--- /dev/null
+++ b/virt/kvm/arm/pvtime.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2019 Arm Ltd.
+
+#include <linux/arm-smccc.h>
+
+#include <kvm/arm_hypercalls.h>
+
+int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
+{
+	u32 feature = smccc_get_arg1(vcpu);
+	u32 val = SMCCC_RET_NOT_SUPPORTED;
+
+	switch (feature) {
+	case ARM_SMCCC_HV_PV_FEATURES:
+		val = SMCCC_RET_SUCCESS;
+		break;
+	}
+
+	return val;
+}
+
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 04/10] KVM: Implement kvm_put_guest()
  2019-08-21 15:36 [PATCH v3 00/10] arm64: Stolen time support Steven Price
                   ` (2 preceding siblings ...)
  2019-08-21 15:36 ` [PATCH v3 03/10] KVM: arm64: Implement PV_FEATURES call Steven Price
@ 2019-08-21 15:36 ` Steven Price
  2019-08-22 10:29   ` Jonathan Cameron
  2019-08-22 15:28   ` Sean Christopherson
  2019-08-21 15:36 ` [PATCH v3 05/10] KVM: arm64: Support stolen time reporting via shared structure Steven Price
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 36+ messages in thread
From: Steven Price @ 2019-08-21 15:36 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: Steven Price, Catalin Marinas, Paolo Bonzini,
	Radim Krčmář,
	Russell King, James Morse, Julien Thierry, Suzuki K Pouloze,
	Mark Rutland, kvm, linux-doc, linux-kernel

kvm_put_guest() is analogous to put_user() - it writes a single value to
the guest physical address. The implementation is built upon put_user()
and so it has the same single copy atomic properties.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 include/linux/kvm_host.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index fcb46b3374c6..e154a1897e20 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -746,6 +746,30 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 				  unsigned long len);
 int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 			      gpa_t gpa, unsigned long len);
+
+#define __kvm_put_guest(kvm, gfn, offset, value, type)			\
+({									\
+	unsigned long __addr = gfn_to_hva(kvm, gfn);			\
+	type __user *__uaddr = (type __user *)(__addr + offset);	\
+	int __ret = 0;							\
+									\
+	if (kvm_is_error_hva(__addr))					\
+		__ret = -EFAULT;					\
+	else								\
+		__ret = put_user(value, __uaddr);			\
+	if (!__ret)							\
+		mark_page_dirty(kvm, gfn);				\
+	__ret;								\
+})
+
+#define kvm_put_guest(kvm, gpa, value, type)				\
+({									\
+	gpa_t __gpa = gpa;						\
+	struct kvm *__kvm = kvm;					\
+	__kvm_put_guest(__kvm, __gpa >> PAGE_SHIFT,			\
+			offset_in_page(__gpa), (value), type);		\
+})
+
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
 int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 05/10] KVM: arm64: Support stolen time reporting via shared structure
  2019-08-21 15:36 [PATCH v3 00/10] arm64: Stolen time support Steven Price
                   ` (3 preceding siblings ...)
  2019-08-21 15:36 ` [PATCH v3 04/10] KVM: Implement kvm_put_guest() Steven Price
@ 2019-08-21 15:36 ` Steven Price
  2019-08-22 10:39   ` Jonathan Cameron
  2019-08-23 12:07   ` Zenghui Yu
  2019-08-21 15:36 ` [PATCH v3 06/10] KVM: Allow kvm_device_ops to be const Steven Price
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 36+ messages in thread
From: Steven Price @ 2019-08-21 15:36 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: Steven Price, Catalin Marinas, Paolo Bonzini,
	Radim Krčmář,
	Russell King, James Morse, Julien Thierry, Suzuki K Pouloze,
	Mark Rutland, kvm, linux-doc, linux-kernel

Implement the service call for configuring a shared structure between a
VCPU and the hypervisor in which the hypervisor can write the time
stolen from the VCPU's execution time by other tasks on the host.

The hypervisor allocates memory which is placed at an IPA chosen by user
space. The hypervisor then updates the shared structure using
kvm_put_guest() to ensure single copy atomicity of the 64-bit value
reporting the stolen time in nanoseconds.

Whenever stolen time is enabled by the guest, the stolen time counter is
reset.

The stolen time itself is retrieved from the sched_info structure
maintained by the Linux scheduler code. We enable SCHEDSTATS when
selecting KVM Kconfig to ensure this value is meaningful.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm/include/asm/kvm_host.h   | 20 +++++++++
 arch/arm64/include/asm/kvm_host.h | 25 +++++++++++-
 arch/arm64/kvm/Kconfig            |  1 +
 include/linux/kvm_types.h         |  2 +
 virt/kvm/arm/arm.c                | 10 +++++
 virt/kvm/arm/hypercalls.c         |  3 ++
 virt/kvm/arm/pvtime.c             | 67 +++++++++++++++++++++++++++++++
 7 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 369b5d2d54bf..47d2ced99421 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -39,6 +39,7 @@
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
 #define KVM_REQ_VCPU_RESET	KVM_ARCH_REQ(2)
+#define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
 
 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
 
@@ -329,6 +330,25 @@ static inline int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
 	return SMCCC_RET_NOT_SUPPORTED;
 }
 
+static inline int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu)
+{
+	return SMCCC_RET_NOT_SUPPORTED;
+}
+
+static inline int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool init)
+{
+	return -ENOTSUPP;
+}
+
+static inline void kvm_pvtime_init_vm(struct kvm_arch *kvm_arch)
+{
+}
+
+static inline bool kvm_is_pvtime_enabled(struct kvm_arch *kvm_arch)
+{
+	return false;
+}
+
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 583b3639062a..b6fa7beffd8a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -44,6 +44,7 @@
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
 #define KVM_REQ_VCPU_RESET	KVM_ARCH_REQ(2)
+#define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
 
 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
 
@@ -83,6 +84,11 @@ struct kvm_arch {
 
 	/* Mandated version of PSCI */
 	u32 psci_version;
+
+	struct kvm_arch_pvtime {
+		gpa_t st_base;
+		u64 st_size;
+	} pvtime;
 };
 
 #define KVM_NR_MEM_OBJS     40
@@ -338,8 +344,13 @@ struct kvm_vcpu_arch {
 	/* True when deferrable sysregs are loaded on the physical CPU,
 	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
 	bool sysregs_loaded_on_cpu;
-};
 
+	/* Guest PV state */
+	struct {
+		u64 steal;
+		u64 last_steal;
+	} steal;
+};
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
 #define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) + \
 				      sve_ffr_offset((vcpu)->arch.sve_max_vl)))
@@ -479,6 +490,18 @@ int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
 int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu);
+int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu);
+int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool init);
+
+static inline void kvm_pvtime_init_vm(struct kvm_arch *kvm_arch)
+{
+	kvm_arch->pvtime.st_base = GPA_INVALID;
+}
+
+static inline bool kvm_is_pvtime_enabled(struct kvm_arch *kvm_arch)
+{
+	return (kvm_arch->pvtime.st_base != GPA_INVALID);
+}
 
 void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
 
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index a67121d419a2..d8b88e40d223 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -39,6 +39,7 @@ config KVM
 	select IRQ_BYPASS_MANAGER
 	select HAVE_KVM_IRQ_BYPASS
 	select HAVE_KVM_VCPU_RUN_PID_CHANGE
+	select SCHEDSTATS
 	---help---
 	  Support hosting virtualized guest machines.
 	  We don't support KVM with 16K page tables yet, due to the multiple
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index bde5374ae021..1c88e69db3d9 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -35,6 +35,8 @@ typedef unsigned long  gva_t;
 typedef u64            gpa_t;
 typedef u64            gfn_t;
 
+#define GPA_INVALID	(~(gpa_t)0)
+
 typedef unsigned long  hva_t;
 typedef u64            hpa_t;
 typedef u64            hfn_t;
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 35a069815baf..5e8343e2dd62 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -40,6 +40,10 @@
 #include <asm/kvm_coproc.h>
 #include <asm/sections.h>
 
+#include <kvm/arm_hypercalls.h>
+#include <kvm/arm_pmu.h>
+#include <kvm/arm_psci.h>
+
 #ifdef REQUIRES_VIRT
 __asm__(".arch_extension	virt");
 #endif
@@ -135,6 +139,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm->arch.max_vcpus = vgic_present ?
 				kvm_vgic_get_max_vcpus() : KVM_MAX_VCPUS;
 
+	kvm_pvtime_init_vm(&kvm->arch);
 	return ret;
 out_free_stage2_pgd:
 	kvm_free_stage2_pgd(kvm);
@@ -379,6 +384,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	kvm_vcpu_load_sysregs(vcpu);
 	kvm_arch_vcpu_load_fp(vcpu);
 	kvm_vcpu_pmu_restore_guest(vcpu);
+	if (kvm_is_pvtime_enabled(&vcpu->kvm->arch))
+		kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
 
 	if (single_task_running())
 		vcpu_clear_wfe_traps(vcpu);
@@ -644,6 +651,9 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 		 * that a VCPU sees new virtual interrupts.
 		 */
 		kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu);
+
+		if (kvm_check_request(KVM_REQ_RECORD_STEAL, vcpu))
+			kvm_update_stolen_time(vcpu, false);
 	}
 }
 
diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
index 63ae629c466a..ac678eabf15f 100644
--- a/virt/kvm/arm/hypercalls.c
+++ b/virt/kvm/arm/hypercalls.c
@@ -56,6 +56,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 	case ARM_SMCCC_HV_PV_FEATURES:
 		val = kvm_hypercall_pv_features(vcpu);
 		break;
+	case ARM_SMCCC_HV_PV_TIME_ST:
+		val = kvm_hypercall_stolen_time(vcpu);
+		break;
 	default:
 		return kvm_psci_call(vcpu);
 	}
diff --git a/virt/kvm/arm/pvtime.c b/virt/kvm/arm/pvtime.c
index 6201d71cb1f8..28603689f6e0 100644
--- a/virt/kvm/arm/pvtime.c
+++ b/virt/kvm/arm/pvtime.c
@@ -3,8 +3,51 @@
 
 #include <linux/arm-smccc.h>
 
+#include <asm/pvclock-abi.h>
+
 #include <kvm/arm_hypercalls.h>
 
+int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool init)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_arch_pvtime *pvtime = &kvm->arch.pvtime;
+	u64 steal;
+	u64 steal_le;
+	u64 offset;
+	int idx;
+	const int stride = sizeof(struct pvclock_vcpu_stolen_time);
+
+	if (pvtime->st_base == GPA_INVALID)
+		return -ENOTSUPP;
+
+	/* Let's do the local bookkeeping */
+	steal = vcpu->arch.steal.steal;
+	steal += current->sched_info.run_delay - vcpu->arch.steal.last_steal;
+	vcpu->arch.steal.last_steal = current->sched_info.run_delay;
+	vcpu->arch.steal.steal = steal;
+
+	offset = stride * kvm_vcpu_get_idx(vcpu);
+
+	if (unlikely(offset + stride > pvtime->st_size))
+		return -EINVAL;
+
+	steal_le = cpu_to_le64(steal);
+	idx = srcu_read_lock(&kvm->srcu);
+	if (init) {
+		struct pvclock_vcpu_stolen_time init_values = {
+			.revision = 0,
+			.attributes = 0
+		};
+		kvm_write_guest(kvm, pvtime->st_base + offset, &init_values,
+				sizeof(init_values));
+	}
+	offset += offsetof(struct pvclock_vcpu_stolen_time, stolen_time);
+	kvm_put_guest(kvm, pvtime->st_base + offset, steal_le, u64);
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return 0;
+}
+
 int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
 {
 	u32 feature = smccc_get_arg1(vcpu);
@@ -12,6 +55,7 @@ int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
 
 	switch (feature) {
 	case ARM_SMCCC_HV_PV_FEATURES:
+	case ARM_SMCCC_HV_PV_TIME_ST:
 		val = SMCCC_RET_SUCCESS;
 		break;
 	}
@@ -19,3 +63,26 @@ int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
 	return val;
 }
 
+int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu)
+{
+	u64 ret;
+	int err;
+
+	/*
+	 * Start counting stolen time from the time the guest requests
+	 * the feature enabled.
+	 */
+	vcpu->arch.steal.steal = 0;
+	vcpu->arch.steal.last_steal = current->sched_info.run_delay;
+
+	err = kvm_update_stolen_time(vcpu, true);
+
+	if (err)
+		ret = SMCCC_RET_NOT_SUPPORTED;
+	else
+		ret = vcpu->kvm->arch.pvtime.st_base +
+			(sizeof(struct pvclock_vcpu_stolen_time) *
+			 kvm_vcpu_get_idx(vcpu));
+
+	return ret;
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 06/10] KVM: Allow kvm_device_ops to be const
  2019-08-21 15:36 [PATCH v3 00/10] arm64: Stolen time support Steven Price
                   ` (4 preceding siblings ...)
  2019-08-21 15:36 ` [PATCH v3 05/10] KVM: arm64: Support stolen time reporting via shared structure Steven Price
@ 2019-08-21 15:36 ` Steven Price
  2019-08-21 15:36 ` [PATCH v3 07/10] KVM: arm64: Provide a PV_TIME device to user space Steven Price
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Steven Price @ 2019-08-21 15:36 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: Steven Price, Catalin Marinas, Paolo Bonzini,
	Radim Krčmář,
	Russell King, James Morse, Julien Thierry, Suzuki K Pouloze,
	Mark Rutland, kvm, linux-doc, linux-kernel

Currently a kvm_device_ops structure cannot be const without triggering
compiler warnings. However the structure doesn't need to be written to
and, by marking it const, it can be read-only in memory. Add some more
const keywords to allow this.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 include/linux/kvm_host.h | 4 ++--
 virt/kvm/kvm_main.c      | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e154a1897e20..c5c1a923f21b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1262,7 +1262,7 @@ extern unsigned int halt_poll_ns_grow_start;
 extern unsigned int halt_poll_ns_shrink;
 
 struct kvm_device {
-	struct kvm_device_ops *ops;
+	const struct kvm_device_ops *ops;
 	struct kvm *kvm;
 	void *private;
 	struct list_head vm_node;
@@ -1315,7 +1315,7 @@ struct kvm_device_ops {
 void kvm_device_get(struct kvm_device *dev);
 void kvm_device_put(struct kvm_device *dev);
 struct kvm_device *kvm_device_from_filp(struct file *filp);
-int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type);
+int kvm_register_device_ops(const struct kvm_device_ops *ops, u32 type);
 void kvm_unregister_device_ops(u32 type);
 
 extern struct kvm_device_ops kvm_mpic_ops;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c6a91b044d8d..75488ebb87c9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3046,14 +3046,14 @@ struct kvm_device *kvm_device_from_filp(struct file *filp)
 	return filp->private_data;
 }
 
-static struct kvm_device_ops *kvm_device_ops_table[KVM_DEV_TYPE_MAX] = {
+static const struct kvm_device_ops *kvm_device_ops_table[KVM_DEV_TYPE_MAX] = {
 #ifdef CONFIG_KVM_MPIC
 	[KVM_DEV_TYPE_FSL_MPIC_20]	= &kvm_mpic_ops,
 	[KVM_DEV_TYPE_FSL_MPIC_42]	= &kvm_mpic_ops,
 #endif
 };
 
-int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type)
+int kvm_register_device_ops(const struct kvm_device_ops *ops, u32 type)
 {
 	if (type >= ARRAY_SIZE(kvm_device_ops_table))
 		return -ENOSPC;
@@ -3074,7 +3074,7 @@ void kvm_unregister_device_ops(u32 type)
 static int kvm_ioctl_create_device(struct kvm *kvm,
 				   struct kvm_create_device *cd)
 {
-	struct kvm_device_ops *ops = NULL;
+	const struct kvm_device_ops *ops = NULL;
 	struct kvm_device *dev;
 	bool test = cd->flags & KVM_CREATE_DEVICE_TEST;
 	int type;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 07/10] KVM: arm64: Provide a PV_TIME device to user space
  2019-08-21 15:36 [PATCH v3 00/10] arm64: Stolen time support Steven Price
                   ` (5 preceding siblings ...)
  2019-08-21 15:36 ` [PATCH v3 06/10] KVM: Allow kvm_device_ops to be const Steven Price
@ 2019-08-21 15:36 ` Steven Price
  2019-08-22 10:57   ` Jonathan Cameron
  2019-08-21 15:36 ` [PATCH v3 08/10] arm/arm64: Provide a wrapper for SMCCC 1.1 calls Steven Price
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Steven Price @ 2019-08-21 15:36 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: Steven Price, Catalin Marinas, Paolo Bonzini,
	Radim Krčmář,
	Russell King, James Morse, Julien Thierry, Suzuki K Pouloze,
	Mark Rutland, kvm, linux-doc, linux-kernel

Allow user space to inform the KVM host where in the physical memory
map the paravirtualized time structures should be located.

A device is created which provides the base address of an array of
Stolen Time (ST) structures, one for each VCPU. There must be (64 *
total number of VCPUs) bytes of memory available at this location.

The address is given in terms of the physical address visible to
the guest and must be page aligned. The guest will discover the address
via a hypercall.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm/include/asm/kvm_host.h   |  4 ++
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/include/uapi/asm/kvm.h |  8 +++
 include/uapi/linux/kvm.h          |  2 +
 virt/kvm/arm/arm.c                |  1 +
 virt/kvm/arm/pvtime.c             | 94 +++++++++++++++++++++++++++++++
 6 files changed, 110 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 47d2ced99421..b6c8dbc0556b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -325,6 +325,10 @@ static inline int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
+static inline void kvm_pvtime_init(void)
+{
+}
+
 static inline int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
 {
 	return SMCCC_RET_NOT_SUPPORTED;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index b6fa7beffd8a..7b2147f62c16 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -489,6 +489,7 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct kvm_run *run,
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
+void kvm_pvtime_init(void);
 int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu);
 int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu);
 int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool init);
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 9a507716ae2f..209c4de67306 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -367,6 +367,14 @@ struct kvm_vcpu_events {
 #define KVM_PSCI_RET_INVAL		PSCI_RET_INVALID_PARAMS
 #define KVM_PSCI_RET_DENIED		PSCI_RET_DENIED
 
+/* Device Control API: PV_TIME */
+#define KVM_DEV_ARM_PV_TIME_REGION	0
+#define  KVM_DEV_ARM_PV_TIME_ST		0
+struct kvm_dev_arm_st_region {
+	__u64 gpa;
+	__u64 size;
+};
+
 #endif
 
 #endif /* __ARM_KVM_H__ */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5e3f12d5359e..265156a984f2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1222,6 +1222,8 @@ enum kvm_device_type {
 #define KVM_DEV_TYPE_ARM_VGIC_ITS	KVM_DEV_TYPE_ARM_VGIC_ITS
 	KVM_DEV_TYPE_XIVE,
 #define KVM_DEV_TYPE_XIVE		KVM_DEV_TYPE_XIVE
+	KVM_DEV_TYPE_ARM_PV_TIME,
+#define KVM_DEV_TYPE_ARM_PV_TIME	KVM_DEV_TYPE_ARM_PV_TIME
 	KVM_DEV_TYPE_MAX,
 };
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 5e8343e2dd62..bfb5a842e6ab 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1494,6 +1494,7 @@ static int init_subsystems(void)
 
 	kvm_perf_init();
 	kvm_coproc_table_init();
+	kvm_pvtime_init();
 
 out:
 	on_each_cpu(_kvm_arch_hardware_disable, NULL, 1);
diff --git a/virt/kvm/arm/pvtime.c b/virt/kvm/arm/pvtime.c
index 28603689f6e0..3e55c1fb6a49 100644
--- a/virt/kvm/arm/pvtime.c
+++ b/virt/kvm/arm/pvtime.c
@@ -2,7 +2,9 @@
 // Copyright (C) 2019 Arm Ltd.
 
 #include <linux/arm-smccc.h>
+#include <linux/kvm_host.h>
 
+#include <asm/kvm_mmu.h>
 #include <asm/pvclock-abi.h>
 
 #include <kvm/arm_hypercalls.h>
@@ -86,3 +88,95 @@ int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu)
 
 	return ret;
 }
+
+static int kvm_arm_pvtime_create(struct kvm_device *dev, u32 type)
+{
+	return 0;
+}
+
+static void kvm_arm_pvtime_destroy(struct kvm_device *dev)
+{
+	struct kvm_arch_pvtime *pvtime = &dev->kvm->arch.pvtime;
+
+	pvtime->st_base = GPA_INVALID;
+	kfree(dev);
+}
+
+static int kvm_arm_pvtime_set_attr(struct kvm_device *dev,
+				   struct kvm_device_attr *attr)
+{
+	struct kvm *kvm = dev->kvm;
+	struct kvm_arch_pvtime *pvtime = &kvm->arch.pvtime;
+	u64 __user *user = (u64 __user *)attr->addr;
+	struct kvm_dev_arm_st_region region;
+
+	switch (attr->group) {
+	case KVM_DEV_ARM_PV_TIME_REGION:
+		if (copy_from_user(&region, user, sizeof(region)))
+			return -EFAULT;
+		if (region.gpa & ~PAGE_MASK)
+			return -EINVAL;
+		if (region.size & ~PAGE_MASK)
+			return -EINVAL;
+		switch (attr->attr) {
+		case KVM_DEV_ARM_PV_TIME_ST:
+			if (pvtime->st_base != GPA_INVALID)
+				return -EEXIST;
+			pvtime->st_base = region.gpa;
+			pvtime->st_size = region.size;
+			return 0;
+		}
+		break;
+	}
+	return -ENXIO;
+}
+
+static int kvm_arm_pvtime_get_attr(struct kvm_device *dev,
+				   struct kvm_device_attr *attr)
+{
+	struct kvm_arch_pvtime *pvtime = &dev->kvm->arch.pvtime;
+	u64 __user *user = (u64 __user *)attr->addr;
+	struct kvm_dev_arm_st_region region;
+
+	switch (attr->group) {
+	case KVM_DEV_ARM_PV_TIME_REGION:
+		switch (attr->attr) {
+		case KVM_DEV_ARM_PV_TIME_ST:
+			region.gpa = pvtime->st_base;
+			region.size = pvtime->st_size;
+			if (copy_to_user(user, &region, sizeof(region)))
+				return -EFAULT;
+			return 0;
+		}
+		break;
+	}
+	return -ENXIO;
+}
+
+static int kvm_arm_pvtime_has_attr(struct kvm_device *dev,
+				   struct kvm_device_attr *attr)
+{
+	switch (attr->group) {
+	case KVM_DEV_ARM_PV_TIME_REGION:
+		switch (attr->attr) {
+		case KVM_DEV_ARM_PV_TIME_ST:
+			return 0;
+		}
+		break;
+	}
+	return -ENXIO;
+}
+
+static const struct kvm_device_ops pvtime_ops = {
+	"Arm PV time",
+	.create = kvm_arm_pvtime_create,
+	.destroy = kvm_arm_pvtime_destroy,
+	.set_attr = kvm_arm_pvtime_set_attr,
+	.get_attr = kvm_arm_pvtime_get_attr,
+	.has_attr = kvm_arm_pvtime_has_attr
+};
+
+void kvm_pvtime_init(void)
+{
+	kvm_register_device_ops(&pvtime_ops, KVM_DEV_TYPE_ARM_PV_TIME);
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 08/10] arm/arm64: Provide a wrapper for SMCCC 1.1 calls
  2019-08-21 15:36 [PATCH v3 00/10] arm64: Stolen time support Steven Price
                   ` (6 preceding siblings ...)
  2019-08-21 15:36 ` [PATCH v3 07/10] KVM: arm64: Provide a PV_TIME device to user space Steven Price
@ 2019-08-21 15:36 ` Steven Price
  2019-08-21 15:36 ` [PATCH v3 09/10] arm/arm64: Make use of the SMCCC 1.1 wrapper Steven Price
  2019-08-21 15:36 ` [PATCH v3 10/10] arm64: Retrieve stolen time as paravirtualized guest Steven Price
  9 siblings, 0 replies; 36+ messages in thread
From: Steven Price @ 2019-08-21 15:36 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: Steven Price, Catalin Marinas, Paolo Bonzini,
	Radim Krčmář,
	Russell King, James Morse, Julien Thierry, Suzuki K Pouloze,
	Mark Rutland, kvm, linux-doc, linux-kernel

SMCCC 1.1 calls may use either HVC or SMC depending on the PSCI
conduit. Rather than coding this in every call site provide a macro
which uses the correct instruction. The macro also handles the case
where no PSCI conduit is configured returning a not supported error
in res, along with returning the conduit used for the call.

This allow us to remove some duplicated code and will be useful later
when adding paravirtualized time hypervisor calls.

Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Will Deacon <will@kernel.org>
---
 include/linux/arm-smccc.h | 44 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index e7f129f26ebd..eee1e832221d 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -303,6 +303,50 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
 #define SMCCC_RET_NOT_SUPPORTED			-1
 #define SMCCC_RET_NOT_REQUIRED			-2
 
+/* Like arm_smccc_1_1* but always returns SMCCC_RET_NOT_SUPPORTED.
+ * Used when the PSCI conduit is not defined. The empty asm statement
+ * avoids compiler warnings about unused variables.
+ */
+#define __fail_smccc_1_1(...)						\
+	do {								\
+		__declare_args(__count_args(__VA_ARGS__), __VA_ARGS__);	\
+		asm ("" __constraints(__count_args(__VA_ARGS__)));	\
+		if (___res)						\
+			___res->a0 = SMCCC_RET_NOT_SUPPORTED;		\
+	} while (0)
+
+/*
+ * arm_smccc_1_1_invoke() - make an SMCCC v1.1 compliant call
+ *
+ * This is a variadic macro taking one to eight source arguments, and
+ * an optional return structure.
+ *
+ * @a0-a7: arguments passed in registers 0 to 7
+ * @res: result values from registers 0 to 3
+ *
+ * This macro will make either an HVC call or an SMC call depending on the
+ * current PSCI conduit. If no valid conduit is available then -1
+ * (SMCCC_RET_NOT_SUPPORTED) is returned in @res.a0 (if supplied).
+ *
+ * The return value also provides the conduit that was used.
+ */
+#define arm_smccc_1_1_invoke(...) ({					\
+		int method = psci_ops.conduit;				\
+		switch (method) {					\
+		case PSCI_CONDUIT_HVC:					\
+			arm_smccc_1_1_hvc(__VA_ARGS__);			\
+			break;						\
+		case PSCI_CONDUIT_SMC:					\
+			arm_smccc_1_1_smc(__VA_ARGS__);			\
+			break;						\
+		default:						\
+			__fail_smccc_1_1(__VA_ARGS__);			\
+			method = PSCI_CONDUIT_NONE;			\
+			break;						\
+		}							\
+		method;							\
+	})
+
 /* Paravirtualised time calls (defined by ARM DEN0057A) */
 #define ARM_SMCCC_HV_PV_FEATURES				\
 	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 09/10] arm/arm64: Make use of the SMCCC 1.1 wrapper
  2019-08-21 15:36 [PATCH v3 00/10] arm64: Stolen time support Steven Price
                   ` (7 preceding siblings ...)
  2019-08-21 15:36 ` [PATCH v3 08/10] arm/arm64: Provide a wrapper for SMCCC 1.1 calls Steven Price
@ 2019-08-21 15:36 ` Steven Price
  2019-08-21 15:36 ` [PATCH v3 10/10] arm64: Retrieve stolen time as paravirtualized guest Steven Price
  9 siblings, 0 replies; 36+ messages in thread
From: Steven Price @ 2019-08-21 15:36 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: Steven Price, Catalin Marinas, Paolo Bonzini,
	Radim Krčmář,
	Russell King, James Morse, Julien Thierry, Suzuki K Pouloze,
	Mark Rutland, kvm, linux-doc, linux-kernel

Rather than directly choosing which function to use based on
psci_ops.conduit, use the new arm_smccc_1_1 wrapper instead.

In some cases we still need to do some operations based on the
conduit, but the code duplication is removed.

No functional change.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm/mm/proc-v7-bugs.c     | 13 +++---
 arch/arm64/kernel/cpu_errata.c | 80 ++++++++++++----------------------
 2 files changed, 33 insertions(+), 60 deletions(-)

diff --git a/arch/arm/mm/proc-v7-bugs.c b/arch/arm/mm/proc-v7-bugs.c
index 9a07916af8dd..8eb52f3385e7 100644
--- a/arch/arm/mm/proc-v7-bugs.c
+++ b/arch/arm/mm/proc-v7-bugs.c
@@ -78,12 +78,13 @@ static void cpu_v7_spectre_init(void)
 		if (psci_ops.smccc_version == SMCCC_VERSION_1_0)
 			break;
 
+		arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
+				     ARM_SMCCC_ARCH_WORKAROUND_1, &res);
+		if ((int)res.a0 != 0)
+			return;
+
 		switch (psci_ops.conduit) {
 		case PSCI_CONDUIT_HVC:
-			arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
-					  ARM_SMCCC_ARCH_WORKAROUND_1, &res);
-			if ((int)res.a0 != 0)
-				break;
 			per_cpu(harden_branch_predictor_fn, cpu) =
 				call_hvc_arch_workaround_1;
 			cpu_do_switch_mm = cpu_v7_hvc_switch_mm;
@@ -91,10 +92,6 @@ static void cpu_v7_spectre_init(void)
 			break;
 
 		case PSCI_CONDUIT_SMC:
-			arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
-					  ARM_SMCCC_ARCH_WORKAROUND_1, &res);
-			if ((int)res.a0 != 0)
-				break;
 			per_cpu(harden_branch_predictor_fn, cpu) =
 				call_smc_arch_workaround_1;
 			cpu_do_switch_mm = cpu_v7_smc_switch_mm;
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 1e43ba5c79b7..400a49aaae85 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -215,40 +215,31 @@ static int detect_harden_bp_fw(void)
 	if (psci_ops.smccc_version == SMCCC_VERSION_1_0)
 		return -1;
 
+	arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
+			     ARM_SMCCC_ARCH_WORKAROUND_1, &res);
+
+	switch ((int)res.a0) {
+	case 1:
+		/* Firmware says we're just fine */
+		return 0;
+	case 0:
+		break;
+	default:
+		return -1;
+	}
+
 	switch (psci_ops.conduit) {
 	case PSCI_CONDUIT_HVC:
-		arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
-				  ARM_SMCCC_ARCH_WORKAROUND_1, &res);
-		switch ((int)res.a0) {
-		case 1:
-			/* Firmware says we're just fine */
-			return 0;
-		case 0:
-			cb = call_hvc_arch_workaround_1;
-			/* This is a guest, no need to patch KVM vectors */
-			smccc_start = NULL;
-			smccc_end = NULL;
-			break;
-		default:
-			return -1;
-		}
+		cb = call_hvc_arch_workaround_1;
+		/* This is a guest, no need to patch KVM vectors */
+		smccc_start = NULL;
+		smccc_end = NULL;
 		break;
 
 	case PSCI_CONDUIT_SMC:
-		arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
-				  ARM_SMCCC_ARCH_WORKAROUND_1, &res);
-		switch ((int)res.a0) {
-		case 1:
-			/* Firmware says we're just fine */
-			return 0;
-		case 0:
-			cb = call_smc_arch_workaround_1;
-			smccc_start = __smccc_workaround_1_smc_start;
-			smccc_end = __smccc_workaround_1_smc_end;
-			break;
-		default:
-			return -1;
-		}
+		cb = call_smc_arch_workaround_1;
+		smccc_start = __smccc_workaround_1_smc_start;
+		smccc_end = __smccc_workaround_1_smc_end;
 		break;
 
 	default:
@@ -338,6 +329,7 @@ void __init arm64_enable_wa2_handling(struct alt_instr *alt,
 
 void arm64_set_ssbd_mitigation(bool state)
 {
+	int conduit;
 	if (!IS_ENABLED(CONFIG_ARM64_SSBD)) {
 		pr_info_once("SSBD disabled by kernel configuration\n");
 		return;
@@ -351,19 +343,10 @@ void arm64_set_ssbd_mitigation(bool state)
 		return;
 	}
 
-	switch (psci_ops.conduit) {
-	case PSCI_CONDUIT_HVC:
-		arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL);
-		break;
+	conduit = arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_WORKAROUND_2, state,
+				       NULL);
 
-	case PSCI_CONDUIT_SMC:
-		arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL);
-		break;
-
-	default:
-		WARN_ON_ONCE(1);
-		break;
-	}
+	WARN_ON_ONCE(conduit == PSCI_CONDUIT_NONE);
 }
 
 static bool has_ssbd_mitigation(const struct arm64_cpu_capabilities *entry,
@@ -373,6 +356,7 @@ static bool has_ssbd_mitigation(const struct arm64_cpu_capabilities *entry,
 	bool required = true;
 	s32 val;
 	bool this_cpu_safe = false;
+	int conduit;
 
 	WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
 
@@ -397,18 +381,10 @@ static bool has_ssbd_mitigation(const struct arm64_cpu_capabilities *entry,
 		return false;
 	}
 
-	switch (psci_ops.conduit) {
-	case PSCI_CONDUIT_HVC:
-		arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
-				  ARM_SMCCC_ARCH_WORKAROUND_2, &res);
-		break;
-
-	case PSCI_CONDUIT_SMC:
-		arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
-				  ARM_SMCCC_ARCH_WORKAROUND_2, &res);
-		break;
+	conduit = arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
+				       ARM_SMCCC_ARCH_WORKAROUND_2, &res);
 
-	default:
+	if (conduit == PSCI_CONDUIT_NONE) {
 		ssbd_state = ARM64_SSBD_UNKNOWN;
 		if (!this_cpu_safe)
 			__ssb_safe = false;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 10/10] arm64: Retrieve stolen time as paravirtualized guest
  2019-08-21 15:36 [PATCH v3 00/10] arm64: Stolen time support Steven Price
                   ` (8 preceding siblings ...)
  2019-08-21 15:36 ` [PATCH v3 09/10] arm/arm64: Make use of the SMCCC 1.1 wrapper Steven Price
@ 2019-08-21 15:36 ` Steven Price
  2019-08-23 11:45   ` Zenghui Yu
  9 siblings, 1 reply; 36+ messages in thread
From: Steven Price @ 2019-08-21 15:36 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: Steven Price, Catalin Marinas, Paolo Bonzini,
	Radim Krčmář,
	Russell King, James Morse, Julien Thierry, Suzuki K Pouloze,
	Mark Rutland, kvm, linux-doc, linux-kernel

Enable paravirtualization features when running under a hypervisor
supporting the PV_TIME_ST hypercall.

For each (v)CPU, we ask the hypervisor for the location of a shared
page which the hypervisor will use to report stolen time to us. We set
pv_time_ops to the stolen time function which simply reads the stolen
value from the shared page for a VCPU. We guarantee single-copy
atomicity using READ_ONCE which means we can also read the stolen
time for another VCPU than the currently running one while it is
potentially being updated by the hypervisor.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/paravirt.h |   9 +-
 arch/arm64/kernel/paravirt.c      | 148 ++++++++++++++++++++++++++++++
 arch/arm64/kernel/time.c          |   3 +
 include/linux/cpuhotplug.h        |   1 +
 4 files changed, 160 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h
index 799d9dd6f7cc..125c26c42902 100644
--- a/arch/arm64/include/asm/paravirt.h
+++ b/arch/arm64/include/asm/paravirt.h
@@ -21,6 +21,13 @@ static inline u64 paravirt_steal_clock(int cpu)
 {
 	return pv_ops.time.steal_clock(cpu);
 }
-#endif
+
+int __init kvm_guest_init(void);
+
+#else
+
+#define kvm_guest_init()
+
+#endif // CONFIG_PARAVIRT
 
 #endif
diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
index 4cfed91fe256..ea8dbbbd3293 100644
--- a/arch/arm64/kernel/paravirt.c
+++ b/arch/arm64/kernel/paravirt.c
@@ -6,13 +6,161 @@
  * Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
  */
 
+#define pr_fmt(fmt) "kvmarm-pv: " fmt
+
+#include <linux/arm-smccc.h>
+#include <linux/cpuhotplug.h>
 #include <linux/export.h>
+#include <linux/io.h>
 #include <linux/jump_label.h>
+#include <linux/printk.h>
+#include <linux/psci.h>
+#include <linux/reboot.h>
+#include <linux/slab.h>
 #include <linux/types.h>
+
 #include <asm/paravirt.h>
+#include <asm/pvclock-abi.h>
+#include <asm/smp_plat.h>
 
 struct static_key paravirt_steal_enabled;
 struct static_key paravirt_steal_rq_enabled;
 
 struct paravirt_patch_template pv_ops;
 EXPORT_SYMBOL_GPL(pv_ops);
+
+struct kvmarm_stolen_time_region {
+	struct pvclock_vcpu_stolen_time *kaddr;
+};
+
+static DEFINE_PER_CPU(struct kvmarm_stolen_time_region, stolen_time_region);
+
+static bool steal_acc = true;
+static int __init parse_no_stealacc(char *arg)
+{
+	steal_acc = false;
+	return 0;
+}
+
+early_param("no-steal-acc", parse_no_stealacc);
+
+/* return stolen time in ns by asking the hypervisor */
+static u64 kvm_steal_clock(int cpu)
+{
+	struct kvmarm_stolen_time_region *reg;
+
+	reg = per_cpu_ptr(&stolen_time_region, cpu);
+	if (!reg->kaddr) {
+		pr_warn_once("stolen time enabled but not configured for cpu %d\n",
+			     cpu);
+		return 0;
+	}
+
+	return le64_to_cpu(READ_ONCE(reg->kaddr->stolen_time));
+}
+
+static int disable_stolen_time_current_cpu(void)
+{
+	struct kvmarm_stolen_time_region *reg;
+
+	reg = this_cpu_ptr(&stolen_time_region);
+	if (!reg->kaddr)
+		return 0;
+
+	memunmap(reg->kaddr);
+	memset(reg, 0, sizeof(*reg));
+
+	return 0;
+}
+
+static int stolen_time_dying_cpu(unsigned int cpu)
+{
+	return disable_stolen_time_current_cpu();
+}
+
+static int init_stolen_time_cpu(unsigned int cpu)
+{
+	struct kvmarm_stolen_time_region *reg;
+	struct arm_smccc_res res;
+
+	reg = this_cpu_ptr(&stolen_time_region);
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_TIME_ST, &res);
+
+	if ((long)res.a0 < 0)
+		return -EINVAL;
+
+	reg->kaddr = memremap(res.a0,
+			      sizeof(struct pvclock_vcpu_stolen_time),
+			      MEMREMAP_WB);
+
+	if (!reg->kaddr) {
+		pr_warn("Failed to map stolen time data structure\n");
+		return -ENOMEM;
+	}
+
+	if (le32_to_cpu(reg->kaddr->revision) != 0 ||
+	    le32_to_cpu(reg->kaddr->attributes) != 0) {
+		pr_warn("Unexpected revision or attributes in stolen time data\n");
+		return -ENXIO;
+	}
+
+	return 0;
+}
+
+static int kvm_arm_init_stolen_time(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state(CPUHP_AP_ARM_KVMPV_STARTING,
+				"hypervisor/kvmarm/pv:starting",
+				init_stolen_time_cpu, stolen_time_dying_cpu);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
+static bool has_kvm_steal_clock(void)
+{
+	struct arm_smccc_res res;
+
+	/* To detect the presence of PV time support we require SMCCC 1.1+ */
+	if (psci_ops.smccc_version < SMCCC_VERSION_1_1)
+		return false;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
+			     ARM_SMCCC_HV_PV_FEATURES, &res);
+
+	if (res.a0 != SMCCC_RET_SUCCESS)
+		return false;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_FEATURES,
+			     ARM_SMCCC_HV_PV_TIME_ST, &res);
+
+	if (res.a0 != SMCCC_RET_SUCCESS)
+		return false;
+
+	return true;
+}
+
+int __init kvm_guest_init(void)
+{
+	int ret = 0;
+
+	if (!has_kvm_steal_clock())
+		return 0;
+
+	ret = kvm_arm_init_stolen_time();
+	if (ret)
+		return ret;
+
+	pv_ops.time.steal_clock = kvm_steal_clock;
+
+	static_key_slow_inc(&paravirt_steal_enabled);
+	if (steal_acc)
+		static_key_slow_inc(&paravirt_steal_rq_enabled);
+
+	pr_info("using stolen time PV\n");
+
+	return 0;
+}
diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index 0b2946414dc9..a52aea14c6ec 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -30,6 +30,7 @@
 
 #include <asm/thread_info.h>
 #include <asm/stacktrace.h>
+#include <asm/paravirt.h>
 
 unsigned long profile_pc(struct pt_regs *regs)
 {
@@ -65,4 +66,6 @@ void __init time_init(void)
 
 	/* Calibrate the delay loop directly */
 	lpj_fine = arch_timer_rate / HZ;
+
+	kvm_guest_init();
 }
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 068793a619ca..89d75edb5750 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -136,6 +136,7 @@ enum cpuhp_state {
 	/* Must be the last timer callback */
 	CPUHP_AP_DUMMY_TIMER_STARTING,
 	CPUHP_AP_ARM_XEN_STARTING,
+	CPUHP_AP_ARM_KVMPV_STARTING,
 	CPUHP_AP_ARM_CORESIGHT_STARTING,
 	CPUHP_AP_ARM64_ISNDEP_STARTING,
 	CPUHP_AP_SMPCFD_DYING,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 04/10] KVM: Implement kvm_put_guest()
  2019-08-21 15:36 ` [PATCH v3 04/10] KVM: Implement kvm_put_guest() Steven Price
@ 2019-08-22 10:29   ` Jonathan Cameron
  2019-08-22 10:37     ` Steven Price
  2019-08-22 15:28   ` Sean Christopherson
  1 sibling, 1 reply; 36+ messages in thread
From: Jonathan Cameron @ 2019-08-22 10:29 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm,
	Mark Rutland, linux-kernel, kvm, Radim Krčmář,
	Catalin Marinas, Suzuki K Pouloze, linux-doc, Russell King,
	James Morse, Paolo Bonzini, Julien Thierry

On Wed, 21 Aug 2019 16:36:50 +0100
Steven Price <steven.price@arm.com> wrote:

> kvm_put_guest() is analogous to put_user() - it writes a single value to
> the guest physical address. The implementation is built upon put_user()
> and so it has the same single copy atomic properties.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  include/linux/kvm_host.h | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index fcb46b3374c6..e154a1897e20 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -746,6 +746,30 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
>  				  unsigned long len);
>  int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
>  			      gpa_t gpa, unsigned long len);
> +
> +#define __kvm_put_guest(kvm, gfn, offset, value, type)			\
> +({									\
> +	unsigned long __addr = gfn_to_hva(kvm, gfn);			\
> +	type __user *__uaddr = (type __user *)(__addr + offset);	\
> +	int __ret = 0;							\

Why initialize __ret?

> +									\
> +	if (kvm_is_error_hva(__addr))					\
> +		__ret = -EFAULT;					\
> +	else								\
> +		__ret = put_user(value, __uaddr);			\
> +	if (!__ret)							\
> +		mark_page_dirty(kvm, gfn);				\
> +	__ret;								\
> +})
> +
> +#define kvm_put_guest(kvm, gpa, value, type)				\
> +({									\
> +	gpa_t __gpa = gpa;						\
> +	struct kvm *__kvm = kvm;					\
> +	__kvm_put_guest(__kvm, __gpa >> PAGE_SHIFT,			\
> +			offset_in_page(__gpa), (value), type);		\
> +})
> +
>  int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
>  int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
>  struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 04/10] KVM: Implement kvm_put_guest()
  2019-08-22 10:29   ` Jonathan Cameron
@ 2019-08-22 10:37     ` Steven Price
  0 siblings, 0 replies; 36+ messages in thread
From: Steven Price @ 2019-08-22 10:37 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, kvm, Radim Krčmář,
	Marc Zyngier, Suzuki K Pouloze, linux-doc, linux-kernel,
	Russell King, James Morse, Julien Thierry, Catalin Marinas,
	Paolo Bonzini, Will Deacon, kvmarm, linux-arm-kernel

On 22/08/2019 11:29, Jonathan Cameron wrote:
> On Wed, 21 Aug 2019 16:36:50 +0100
> Steven Price <steven.price@arm.com> wrote:
> 
>> kvm_put_guest() is analogous to put_user() - it writes a single value to
>> the guest physical address. The implementation is built upon put_user()
>> and so it has the same single copy atomic properties.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>  include/linux/kvm_host.h | 24 ++++++++++++++++++++++++
>>  1 file changed, 24 insertions(+)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index fcb46b3374c6..e154a1897e20 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -746,6 +746,30 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
>>  				  unsigned long len);
>>  int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
>>  			      gpa_t gpa, unsigned long len);
>> +
>> +#define __kvm_put_guest(kvm, gfn, offset, value, type)			\
>> +({									\
>> +	unsigned long __addr = gfn_to_hva(kvm, gfn);			\
>> +	type __user *__uaddr = (type __user *)(__addr + offset);	\
>> +	int __ret = 0;							\
> 
> Why initialize __ret?

Good question. Actually looking at this again if I reorder this to be
pessimistic I can make it shorter:

	int __ret = -EFAULT;

	if (!kvm_is_error_hva(__addr))
		__ret = put_user(value, __uaddr);
	if (!__ret)
		mark_page_dirty(kvm, gfn);				
	__ret;

Thanks for taking a look.

Steve

>> +									\
>> +	if (kvm_is_error_hva(__addr))					\
>> +		__ret = -EFAULT;					\
>> +	else								\
>> +		__ret = put_user(value, __uaddr);			\
>> +	if (!__ret)							\
>> +		mark_page_dirty(kvm, gfn);				\
>> +	__ret;								\
>> +})
>> +
>> +#define kvm_put_guest(kvm, gpa, value, type)				\
>> +({									\
>> +	gpa_t __gpa = gpa;						\
>> +	struct kvm *__kvm = kvm;					\
>> +	__kvm_put_guest(__kvm, __gpa >> PAGE_SHIFT,			\
>> +			offset_in_page(__gpa), (value), type);		\
>> +})
>> +
>>  int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
>>  int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
>>  struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 05/10] KVM: arm64: Support stolen time reporting via shared structure
  2019-08-21 15:36 ` [PATCH v3 05/10] KVM: arm64: Support stolen time reporting via shared structure Steven Price
@ 2019-08-22 10:39   ` Jonathan Cameron
  2019-08-22 11:00     ` Steven Price
  2019-08-23 12:07   ` Zenghui Yu
  1 sibling, 1 reply; 36+ messages in thread
From: Jonathan Cameron @ 2019-08-22 10:39 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm,
	Mark Rutland, linux-kernel, kvm, Radim Krčmář,
	Catalin Marinas, Suzuki K Pouloze, linux-doc, Russell King,
	James Morse, Paolo Bonzini, Julien Thierry

On Wed, 21 Aug 2019 16:36:51 +0100
Steven Price <steven.price@arm.com> wrote:

> Implement the service call for configuring a shared structure between a
> VCPU and the hypervisor in which the hypervisor can write the time
> stolen from the VCPU's execution time by other tasks on the host.
> 
> The hypervisor allocates memory which is placed at an IPA chosen by user
> space. The hypervisor then updates the shared structure using
> kvm_put_guest() to ensure single copy atomicity of the 64-bit value
> reporting the stolen time in nanoseconds.
> 
> Whenever stolen time is enabled by the guest, the stolen time counter is
> reset.
> 
> The stolen time itself is retrieved from the sched_info structure
> maintained by the Linux scheduler code. We enable SCHEDSTATS when
> selecting KVM Kconfig to ensure this value is meaningful.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>

One totally trivial comment inline... Feel free to ignore :)

> ---
>  arch/arm/include/asm/kvm_host.h   | 20 +++++++++
>  arch/arm64/include/asm/kvm_host.h | 25 +++++++++++-
>  arch/arm64/kvm/Kconfig            |  1 +
>  include/linux/kvm_types.h         |  2 +
>  virt/kvm/arm/arm.c                | 10 +++++
>  virt/kvm/arm/hypercalls.c         |  3 ++
>  virt/kvm/arm/pvtime.c             | 67 +++++++++++++++++++++++++++++++
>  7 files changed, 127 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 369b5d2d54bf..47d2ced99421 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -39,6 +39,7 @@
>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
>  #define KVM_REQ_VCPU_RESET	KVM_ARCH_REQ(2)
> +#define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
>  
>  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>  
> @@ -329,6 +330,25 @@ static inline int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
>  	return SMCCC_RET_NOT_SUPPORTED;
>  }
>  
> +static inline int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu)
> +{
> +	return SMCCC_RET_NOT_SUPPORTED;
> +}
> +
> +static inline int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool init)
> +{
> +	return -ENOTSUPP;
> +}
> +
> +static inline void kvm_pvtime_init_vm(struct kvm_arch *kvm_arch)
> +{
> +}
> +
> +static inline bool kvm_is_pvtime_enabled(struct kvm_arch *kvm_arch)
> +{
> +	return false;
> +}
> +
>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 583b3639062a..b6fa7beffd8a 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -44,6 +44,7 @@
>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
>  #define KVM_REQ_VCPU_RESET	KVM_ARCH_REQ(2)
> +#define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
>  
>  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>  
> @@ -83,6 +84,11 @@ struct kvm_arch {
>  
>  	/* Mandated version of PSCI */
>  	u32 psci_version;
> +
> +	struct kvm_arch_pvtime {
> +		gpa_t st_base;
> +		u64 st_size;
> +	} pvtime;
>  };
>  
>  #define KVM_NR_MEM_OBJS     40
> @@ -338,8 +344,13 @@ struct kvm_vcpu_arch {
>  	/* True when deferrable sysregs are loaded on the physical CPU,
>  	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
>  	bool sysregs_loaded_on_cpu;
> -};
>  
> +	/* Guest PV state */
> +	struct {
> +		u64 steal;
> +		u64 last_steal;
> +	} steal;
> +};
>  /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
>  #define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) + \
>  				      sve_ffr_offset((vcpu)->arch.sve_max_vl)))
> @@ -479,6 +490,18 @@ int kvm_perf_init(void);
>  int kvm_perf_teardown(void);
>  
>  int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu);
> +int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu);
> +int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool init);
> +
> +static inline void kvm_pvtime_init_vm(struct kvm_arch *kvm_arch)
> +{
> +	kvm_arch->pvtime.st_base = GPA_INVALID;
> +}
> +
> +static inline bool kvm_is_pvtime_enabled(struct kvm_arch *kvm_arch)
> +{
> +	return (kvm_arch->pvtime.st_base != GPA_INVALID);
> +}
>  
>  void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
>  
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index a67121d419a2..d8b88e40d223 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -39,6 +39,7 @@ config KVM
>  	select IRQ_BYPASS_MANAGER
>  	select HAVE_KVM_IRQ_BYPASS
>  	select HAVE_KVM_VCPU_RUN_PID_CHANGE
> +	select SCHEDSTATS
>  	---help---
>  	  Support hosting virtualized guest machines.
>  	  We don't support KVM with 16K page tables yet, due to the multiple
> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> index bde5374ae021..1c88e69db3d9 100644
> --- a/include/linux/kvm_types.h
> +++ b/include/linux/kvm_types.h
> @@ -35,6 +35,8 @@ typedef unsigned long  gva_t;
>  typedef u64            gpa_t;
>  typedef u64            gfn_t;
>  
> +#define GPA_INVALID	(~(gpa_t)0)
> +
>  typedef unsigned long  hva_t;
>  typedef u64            hpa_t;
>  typedef u64            hfn_t;
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 35a069815baf..5e8343e2dd62 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -40,6 +40,10 @@
>  #include <asm/kvm_coproc.h>
>  #include <asm/sections.h>
>  
> +#include <kvm/arm_hypercalls.h>
> +#include <kvm/arm_pmu.h>
> +#include <kvm/arm_psci.h>
> +
>  #ifdef REQUIRES_VIRT
>  __asm__(".arch_extension	virt");
>  #endif
> @@ -135,6 +139,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	kvm->arch.max_vcpus = vgic_present ?
>  				kvm_vgic_get_max_vcpus() : KVM_MAX_VCPUS;
>  
> +	kvm_pvtime_init_vm(&kvm->arch);
>  	return ret;
>  out_free_stage2_pgd:
>  	kvm_free_stage2_pgd(kvm);
> @@ -379,6 +384,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	kvm_vcpu_load_sysregs(vcpu);
>  	kvm_arch_vcpu_load_fp(vcpu);
>  	kvm_vcpu_pmu_restore_guest(vcpu);
> +	if (kvm_is_pvtime_enabled(&vcpu->kvm->arch))
> +		kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
>  
>  	if (single_task_running())
>  		vcpu_clear_wfe_traps(vcpu);
> @@ -644,6 +651,9 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
>  		 * that a VCPU sees new virtual interrupts.
>  		 */
>  		kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu);
> +
> +		if (kvm_check_request(KVM_REQ_RECORD_STEAL, vcpu))
> +			kvm_update_stolen_time(vcpu, false);
>  	}
>  }
>  
> diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
> index 63ae629c466a..ac678eabf15f 100644
> --- a/virt/kvm/arm/hypercalls.c
> +++ b/virt/kvm/arm/hypercalls.c
> @@ -56,6 +56,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>  	case ARM_SMCCC_HV_PV_FEATURES:
>  		val = kvm_hypercall_pv_features(vcpu);
>  		break;
> +	case ARM_SMCCC_HV_PV_TIME_ST:
> +		val = kvm_hypercall_stolen_time(vcpu);
> +		break;
>  	default:
>  		return kvm_psci_call(vcpu);
>  	}
> diff --git a/virt/kvm/arm/pvtime.c b/virt/kvm/arm/pvtime.c
> index 6201d71cb1f8..28603689f6e0 100644
> --- a/virt/kvm/arm/pvtime.c
> +++ b/virt/kvm/arm/pvtime.c
> @@ -3,8 +3,51 @@
>  
>  #include <linux/arm-smccc.h>
>  
> +#include <asm/pvclock-abi.h>
> +
>  #include <kvm/arm_hypercalls.h>
>  
> +int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool init)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_arch_pvtime *pvtime = &kvm->arch.pvtime;
> +	u64 steal;
> +	u64 steal_le;
> +	u64 offset;
> +	int idx;
> +	const int stride = sizeof(struct pvclock_vcpu_stolen_time);
> +
> +	if (pvtime->st_base == GPA_INVALID)
> +		return -ENOTSUPP;
> +
> +	/* Let's do the local bookkeeping */
> +	steal = vcpu->arch.steal.steal;
> +	steal += current->sched_info.run_delay - vcpu->arch.steal.last_steal;
> +	vcpu->arch.steal.last_steal = current->sched_info.run_delay;
> +	vcpu->arch.steal.steal = steal;
> +
> +	offset = stride * kvm_vcpu_get_idx(vcpu);
> +
> +	if (unlikely(offset + stride > pvtime->st_size))
> +		return -EINVAL;
> +
> +	steal_le = cpu_to_le64(steal);
> +	idx = srcu_read_lock(&kvm->srcu);
> +	if (init) {
> +		struct pvclock_vcpu_stolen_time init_values = {
> +			.revision = 0,
> +			.attributes = 0
> +		};
> +		kvm_write_guest(kvm, pvtime->st_base + offset, &init_values,
> +				sizeof(init_values));
> +	}
> +	offset += offsetof(struct pvclock_vcpu_stolen_time, stolen_time);
> +	kvm_put_guest(kvm, pvtime->st_base + offset, steal_le, u64);
> +	srcu_read_unlock(&kvm->srcu, idx);
> +
> +	return 0;
> +}
> +
>  int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
>  {
>  	u32 feature = smccc_get_arg1(vcpu);
> @@ -12,6 +55,7 @@ int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
>  
>  	switch (feature) {
>  	case ARM_SMCCC_HV_PV_FEATURES:
> +	case ARM_SMCCC_HV_PV_TIME_ST:
>  		val = SMCCC_RET_SUCCESS;
>  		break;
>  	}
> @@ -19,3 +63,26 @@ int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
>  	return val;
>  }
>  
> +int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu)
> +{
> +	u64 ret;
> +	int err;
> +
> +	/*
> +	 * Start counting stolen time from the time the guest requests
> +	 * the feature enabled.
> +	 */
> +	vcpu->arch.steal.steal = 0;
> +	vcpu->arch.steal.last_steal = current->sched_info.run_delay;
> +
> +	err = kvm_update_stolen_time(vcpu, true);
> +
> +	if (err)
> +		ret = SMCCC_RET_NOT_SUPPORTED;

Trivial by why not
		return SMCCC_RET_NOT_SUPPORTED;

	return vcpu->kvm->arch.pvtime.st_base +
...
Drops the indentation a bit and puts the error handling out of
line which is slightly nicer to read (to my eyes).

> +	else
> +		ret = vcpu->kvm->arch.pvtime.st_base +
> +			(sizeof(struct pvclock_vcpu_stolen_time) *
> +			 kvm_vcpu_get_idx(vcpu));
> +
> +	return ret;
> +}



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 07/10] KVM: arm64: Provide a PV_TIME device to user space
  2019-08-21 15:36 ` [PATCH v3 07/10] KVM: arm64: Provide a PV_TIME device to user space Steven Price
@ 2019-08-22 10:57   ` Jonathan Cameron
  2019-08-22 11:11     ` Steven Price
  0 siblings, 1 reply; 36+ messages in thread
From: Jonathan Cameron @ 2019-08-22 10:57 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm,
	Mark Rutland, linux-kernel, kvm, Radim Krčmář,
	Catalin Marinas, Suzuki K Pouloze, linux-doc, Russell King,
	James Morse, Paolo Bonzini, Julien Thierry

On Wed, 21 Aug 2019 16:36:53 +0100
Steven Price <steven.price@arm.com> wrote:

> Allow user space to inform the KVM host where in the physical memory
> map the paravirtualized time structures should be located.
> 
> A device is created which provides the base address of an array of
> Stolen Time (ST) structures, one for each VCPU. There must be (64 *
> total number of VCPUs) bytes of memory available at this location.
> 
> The address is given in terms of the physical address visible to
> the guest and must be page aligned. The guest will discover the address
> via a hypercall.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>

Hi Steven,

One general question inline.  I'm not particularly familiar with this area
of the kernel, so maybe I'm missing something obvious, but having
.destroy free the kvm_device which wasn't created in .create seems
'unusual'. 

Otherwise, FWIW looks good to me.

Jonathan

> ---
>  arch/arm/include/asm/kvm_host.h   |  4 ++
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  arch/arm64/include/uapi/asm/kvm.h |  8 +++
>  include/uapi/linux/kvm.h          |  2 +
>  virt/kvm/arm/arm.c                |  1 +
>  virt/kvm/arm/pvtime.c             | 94 +++++++++++++++++++++++++++++++
>  6 files changed, 110 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 47d2ced99421..b6c8dbc0556b 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -325,6 +325,10 @@ static inline int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  int kvm_perf_init(void);
>  int kvm_perf_teardown(void);
>  
> +static inline void kvm_pvtime_init(void)
> +{
> +}
> +
>  static inline int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
>  {
>  	return SMCCC_RET_NOT_SUPPORTED;
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index b6fa7beffd8a..7b2147f62c16 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -489,6 +489,7 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  int kvm_perf_init(void);
>  int kvm_perf_teardown(void);
>  
> +void kvm_pvtime_init(void);
>  int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu);
>  int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu);
>  int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool init);
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 9a507716ae2f..209c4de67306 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -367,6 +367,14 @@ struct kvm_vcpu_events {
>  #define KVM_PSCI_RET_INVAL		PSCI_RET_INVALID_PARAMS
>  #define KVM_PSCI_RET_DENIED		PSCI_RET_DENIED
>  
> +/* Device Control API: PV_TIME */
> +#define KVM_DEV_ARM_PV_TIME_REGION	0
> +#define  KVM_DEV_ARM_PV_TIME_ST		0
> +struct kvm_dev_arm_st_region {
> +	__u64 gpa;
> +	__u64 size;
> +};
> +
>  #endif
>  
>  #endif /* __ARM_KVM_H__ */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 5e3f12d5359e..265156a984f2 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1222,6 +1222,8 @@ enum kvm_device_type {
>  #define KVM_DEV_TYPE_ARM_VGIC_ITS	KVM_DEV_TYPE_ARM_VGIC_ITS
>  	KVM_DEV_TYPE_XIVE,
>  #define KVM_DEV_TYPE_XIVE		KVM_DEV_TYPE_XIVE
> +	KVM_DEV_TYPE_ARM_PV_TIME,
> +#define KVM_DEV_TYPE_ARM_PV_TIME	KVM_DEV_TYPE_ARM_PV_TIME
>  	KVM_DEV_TYPE_MAX,
>  };
>  
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 5e8343e2dd62..bfb5a842e6ab 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -1494,6 +1494,7 @@ static int init_subsystems(void)
>  
>  	kvm_perf_init();
>  	kvm_coproc_table_init();
> +	kvm_pvtime_init();
>  
>  out:
>  	on_each_cpu(_kvm_arch_hardware_disable, NULL, 1);
> diff --git a/virt/kvm/arm/pvtime.c b/virt/kvm/arm/pvtime.c
> index 28603689f6e0..3e55c1fb6a49 100644
> --- a/virt/kvm/arm/pvtime.c
> +++ b/virt/kvm/arm/pvtime.c
> @@ -2,7 +2,9 @@
>  // Copyright (C) 2019 Arm Ltd.
>  
>  #include <linux/arm-smccc.h>
> +#include <linux/kvm_host.h>
>  
> +#include <asm/kvm_mmu.h>
>  #include <asm/pvclock-abi.h>
>  
>  #include <kvm/arm_hypercalls.h>
> @@ -86,3 +88,95 @@ int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu)
>  
>  	return ret;
>  }
> +
> +static int kvm_arm_pvtime_create(struct kvm_device *dev, u32 type)
> +{
> +	return 0;
> +}
> +
> +static void kvm_arm_pvtime_destroy(struct kvm_device *dev)
> +{
> +	struct kvm_arch_pvtime *pvtime = &dev->kvm->arch.pvtime;
> +
> +	pvtime->st_base = GPA_INVALID;
> +	kfree(dev);

Nothing to do with your patch as such... All users do the same.

This seems miss balanced. Why do we need to free the device by hand
when we didn't create it in the create function?  I appreciate
the comments say this is needed, but as far as I can see every
single callback does kfree(dev) at the end which seems an
odd thing to do.

> +}
> +
> +static int kvm_arm_pvtime_set_attr(struct kvm_device *dev,
> +				   struct kvm_device_attr *attr)
> +{
> +	struct kvm *kvm = dev->kvm;
> +	struct kvm_arch_pvtime *pvtime = &kvm->arch.pvtime;
> +	u64 __user *user = (u64 __user *)attr->addr;
> +	struct kvm_dev_arm_st_region region;
> +
> +	switch (attr->group) {
> +	case KVM_DEV_ARM_PV_TIME_REGION:
> +		if (copy_from_user(&region, user, sizeof(region)))
> +			return -EFAULT;
> +		if (region.gpa & ~PAGE_MASK)
> +			return -EINVAL;
> +		if (region.size & ~PAGE_MASK)
> +			return -EINVAL;
> +		switch (attr->attr) {
> +		case KVM_DEV_ARM_PV_TIME_ST:
> +			if (pvtime->st_base != GPA_INVALID)
> +				return -EEXIST;
> +			pvtime->st_base = region.gpa;
> +			pvtime->st_size = region.size;
> +			return 0;
> +		}
> +		break;
> +	}
> +	return -ENXIO;
> +}
> +
> +static int kvm_arm_pvtime_get_attr(struct kvm_device *dev,
> +				   struct kvm_device_attr *attr)
> +{
> +	struct kvm_arch_pvtime *pvtime = &dev->kvm->arch.pvtime;
> +	u64 __user *user = (u64 __user *)attr->addr;
> +	struct kvm_dev_arm_st_region region;
> +
> +	switch (attr->group) {
> +	case KVM_DEV_ARM_PV_TIME_REGION:
> +		switch (attr->attr) {
> +		case KVM_DEV_ARM_PV_TIME_ST:
> +			region.gpa = pvtime->st_base;
> +			region.size = pvtime->st_size;
> +			if (copy_to_user(user, &region, sizeof(region)))
> +				return -EFAULT;
> +			return 0;
> +		}
> +		break;
> +	}
> +	return -ENXIO;
> +}
> +
> +static int kvm_arm_pvtime_has_attr(struct kvm_device *dev,
> +				   struct kvm_device_attr *attr)
> +{
> +	switch (attr->group) {
> +	case KVM_DEV_ARM_PV_TIME_REGION:
> +		switch (attr->attr) {
> +		case KVM_DEV_ARM_PV_TIME_ST:
> +			return 0;
> +		}
> +		break;
> +	}
> +	return -ENXIO;
> +}
> +
> +static const struct kvm_device_ops pvtime_ops = {
> +	"Arm PV time",
> +	.create = kvm_arm_pvtime_create,
> +	.destroy = kvm_arm_pvtime_destroy,
> +	.set_attr = kvm_arm_pvtime_set_attr,
> +	.get_attr = kvm_arm_pvtime_get_attr,
> +	.has_attr = kvm_arm_pvtime_has_attr
> +};
> +
> +void kvm_pvtime_init(void)
> +{
> +	kvm_register_device_ops(&pvtime_ops, KVM_DEV_TYPE_ARM_PV_TIME);
> +}



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 05/10] KVM: arm64: Support stolen time reporting via shared structure
  2019-08-22 10:39   ` Jonathan Cameron
@ 2019-08-22 11:00     ` Steven Price
  0 siblings, 0 replies; 36+ messages in thread
From: Steven Price @ 2019-08-22 11:00 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, kvm, Radim Krčmář,
	Marc Zyngier, Suzuki K Pouloze, linux-doc, linux-kernel,
	Russell King, James Morse, Julien Thierry, Catalin Marinas,
	Paolo Bonzini, Will Deacon, kvmarm, linux-arm-kernel

On 22/08/2019 11:39, Jonathan Cameron wrote:
> On Wed, 21 Aug 2019 16:36:51 +0100
> Steven Price <steven.price@arm.com> wrote:
> 
>> Implement the service call for configuring a shared structure between a
>> VCPU and the hypervisor in which the hypervisor can write the time
>> stolen from the VCPU's execution time by other tasks on the host.
>>
>> The hypervisor allocates memory which is placed at an IPA chosen by user
>> space. The hypervisor then updates the shared structure using
>> kvm_put_guest() to ensure single copy atomicity of the 64-bit value
>> reporting the stolen time in nanoseconds.
>>
>> Whenever stolen time is enabled by the guest, the stolen time counter is
>> reset.
>>
>> The stolen time itself is retrieved from the sched_info structure
>> maintained by the Linux scheduler code. We enable SCHEDSTATS when
>> selecting KVM Kconfig to ensure this value is meaningful.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
> 
> One totally trivial comment inline... Feel free to ignore :)
> 
[...]
>> +int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu)
>> +{
>> +	u64 ret;
>> +	int err;
>> +
>> +	/*
>> +	 * Start counting stolen time from the time the guest requests
>> +	 * the feature enabled.
>> +	 */
>> +	vcpu->arch.steal.steal = 0;
>> +	vcpu->arch.steal.last_steal = current->sched_info.run_delay;
>> +
>> +	err = kvm_update_stolen_time(vcpu, true);
>> +
>> +	if (err)
>> +		ret = SMCCC_RET_NOT_SUPPORTED;
> 
> Trivial by why not
> 		return SMCCC_RET_NOT_SUPPORTED;
> 
> 	return vcpu->kvm->arch.pvtime.st_base +
> ...
> Drops the indentation a bit and puts the error handling out of
> line which is slightly nicer to read (to my eyes).

Yes that's a nice change - drops the extra "ret" variable too.

Thanks,

Steve

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 07/10] KVM: arm64: Provide a PV_TIME device to user space
  2019-08-22 10:57   ` Jonathan Cameron
@ 2019-08-22 11:11     ` Steven Price
  2019-08-22 11:48       ` Jonathan Cameron
  0 siblings, 1 reply; 36+ messages in thread
From: Steven Price @ 2019-08-22 11:11 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, kvm, Radim Krčmář,
	Marc Zyngier, Suzuki K Pouloze, linux-doc, linux-kernel,
	Russell King, James Morse, Julien Thierry, Catalin Marinas,
	Paolo Bonzini, Will Deacon, kvmarm, linux-arm-kernel

On 22/08/2019 11:57, Jonathan Cameron wrote:
> On Wed, 21 Aug 2019 16:36:53 +0100
> Steven Price <steven.price@arm.com> wrote:
> 
>> Allow user space to inform the KVM host where in the physical memory
>> map the paravirtualized time structures should be located.
>>
>> A device is created which provides the base address of an array of
>> Stolen Time (ST) structures, one for each VCPU. There must be (64 *
>> total number of VCPUs) bytes of memory available at this location.
>>
>> The address is given in terms of the physical address visible to
>> the guest and must be page aligned. The guest will discover the address
>> via a hypercall.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
> 
> Hi Steven,
> 
> One general question inline.  I'm not particularly familiar with this area
> of the kernel, so maybe I'm missing something obvious, but having
> .destroy free the kvm_device which wasn't created in .create seems
> 'unusual'. 
> 
> Otherwise, FWIW looks good to me.
> 
> Jonathan
> 
[...]
>> +static void kvm_arm_pvtime_destroy(struct kvm_device *dev)
>> +{
>> +	struct kvm_arch_pvtime *pvtime = &dev->kvm->arch.pvtime;
>> +
>> +	pvtime->st_base = GPA_INVALID;
>> +	kfree(dev);
> 
> Nothing to do with your patch as such... All users do the same.
> 
> This seems miss balanced. Why do we need to free the device by hand
> when we didn't create it in the create function?  I appreciate
> the comments say this is needed, but as far as I can see every
> single callback does kfree(dev) at the end which seems an
> odd thing to do.

Yes I think this is odd too - indeed when I initially wrote this I
missed off the kfree() call and had to track down the memory leak.

When I looked into potentially tiding this up I found some other
oddities, e.g. "kvm-xive" (arch/powerpc/kvm/book3s_xive.c) doesn't have
a destroy callback. But I can't see anything in the common code which
deals with that case. So I decided to just "go with the flow" at the
moment, since I don't understand how some of these existing devices work
(perhaps they are already broken?).

Steve

>> +}
>> +
>> +static int kvm_arm_pvtime_set_attr(struct kvm_device *dev,
>> +				   struct kvm_device_attr *attr)
>> +{
>> +	struct kvm *kvm = dev->kvm;
>> +	struct kvm_arch_pvtime *pvtime = &kvm->arch.pvtime;
>> +	u64 __user *user = (u64 __user *)attr->addr;
>> +	struct kvm_dev_arm_st_region region;
>> +
>> +	switch (attr->group) {
>> +	case KVM_DEV_ARM_PV_TIME_REGION:
>> +		if (copy_from_user(&region, user, sizeof(region)))
>> +			return -EFAULT;
>> +		if (region.gpa & ~PAGE_MASK)
>> +			return -EINVAL;
>> +		if (region.size & ~PAGE_MASK)
>> +			return -EINVAL;
>> +		switch (attr->attr) {
>> +		case KVM_DEV_ARM_PV_TIME_ST:
>> +			if (pvtime->st_base != GPA_INVALID)
>> +				return -EEXIST;
>> +			pvtime->st_base = region.gpa;
>> +			pvtime->st_size = region.size;
>> +			return 0;
>> +		}
>> +		break;
>> +	}
>> +	return -ENXIO;
>> +}
>> +
>> +static int kvm_arm_pvtime_get_attr(struct kvm_device *dev,
>> +				   struct kvm_device_attr *attr)
>> +{
>> +	struct kvm_arch_pvtime *pvtime = &dev->kvm->arch.pvtime;
>> +	u64 __user *user = (u64 __user *)attr->addr;
>> +	struct kvm_dev_arm_st_region region;
>> +
>> +	switch (attr->group) {
>> +	case KVM_DEV_ARM_PV_TIME_REGION:
>> +		switch (attr->attr) {
>> +		case KVM_DEV_ARM_PV_TIME_ST:
>> +			region.gpa = pvtime->st_base;
>> +			region.size = pvtime->st_size;
>> +			if (copy_to_user(user, &region, sizeof(region)))
>> +				return -EFAULT;
>> +			return 0;
>> +		}
>> +		break;
>> +	}
>> +	return -ENXIO;
>> +}
>> +
>> +static int kvm_arm_pvtime_has_attr(struct kvm_device *dev,
>> +				   struct kvm_device_attr *attr)
>> +{
>> +	switch (attr->group) {
>> +	case KVM_DEV_ARM_PV_TIME_REGION:
>> +		switch (attr->attr) {
>> +		case KVM_DEV_ARM_PV_TIME_ST:
>> +			return 0;
>> +		}
>> +		break;
>> +	}
>> +	return -ENXIO;
>> +}
>> +
>> +static const struct kvm_device_ops pvtime_ops = {
>> +	"Arm PV time",
>> +	.create = kvm_arm_pvtime_create,
>> +	.destroy = kvm_arm_pvtime_destroy,
>> +	.set_attr = kvm_arm_pvtime_set_attr,
>> +	.get_attr = kvm_arm_pvtime_get_attr,
>> +	.has_attr = kvm_arm_pvtime_has_attr
>> +};
>> +
>> +void kvm_pvtime_init(void)
>> +{
>> +	kvm_register_device_ops(&pvtime_ops, KVM_DEV_TYPE_ARM_PV_TIME);
>> +}
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 07/10] KVM: arm64: Provide a PV_TIME device to user space
  2019-08-22 11:11     ` Steven Price
@ 2019-08-22 11:48       ` Jonathan Cameron
  0 siblings, 0 replies; 36+ messages in thread
From: Jonathan Cameron @ 2019-08-22 11:48 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, kvm, Radim Krčmář,
	Marc Zyngier, Suzuki K Pouloze, linux-doc, linux-kernel,
	Russell King, James Morse, Julien Thierry, Catalin Marinas,
	Paolo Bonzini, Will Deacon, kvmarm, linux-arm-kernel

On Thu, 22 Aug 2019 12:11:55 +0100
Steven Price <steven.price@arm.com> wrote:

> On 22/08/2019 11:57, Jonathan Cameron wrote:
> > On Wed, 21 Aug 2019 16:36:53 +0100
> > Steven Price <steven.price@arm.com> wrote:
> >   
> >> Allow user space to inform the KVM host where in the physical memory
> >> map the paravirtualized time structures should be located.
> >>
> >> A device is created which provides the base address of an array of
> >> Stolen Time (ST) structures, one for each VCPU. There must be (64 *
> >> total number of VCPUs) bytes of memory available at this location.
> >>
> >> The address is given in terms of the physical address visible to
> >> the guest and must be page aligned. The guest will discover the address
> >> via a hypercall.
> >>
> >> Signed-off-by: Steven Price <steven.price@arm.com>  
> > 
> > Hi Steven,
> > 
> > One general question inline.  I'm not particularly familiar with this area
> > of the kernel, so maybe I'm missing something obvious, but having
> > .destroy free the kvm_device which wasn't created in .create seems
> > 'unusual'. 
> > 
> > Otherwise, FWIW looks good to me.
> > 
> > Jonathan
> >   
> [...]
> >> +static void kvm_arm_pvtime_destroy(struct kvm_device *dev)
> >> +{
> >> +	struct kvm_arch_pvtime *pvtime = &dev->kvm->arch.pvtime;
> >> +
> >> +	pvtime->st_base = GPA_INVALID;
> >> +	kfree(dev);  
> > 
> > Nothing to do with your patch as such... All users do the same.
> > 
> > This seems miss balanced. Why do we need to free the device by hand
> > when we didn't create it in the create function?  I appreciate
> > the comments say this is needed, but as far as I can see every
> > single callback does kfree(dev) at the end which seems an
> > odd thing to do.  
> 
> Yes I think this is odd too - indeed when I initially wrote this I
> missed off the kfree() call and had to track down the memory leak.
> 
> When I looked into potentially tiding this up I found some other
> oddities, e.g. "kvm-xive" (arch/powerpc/kvm/book3s_xive.c) doesn't have
> a destroy callback. But I can't see anything in the common code which
> deals with that case. So I decided to just "go with the flow" at the
> moment, since I don't understand how some of these existing devices work
> (perhaps they are already broken?).

It has a release however and kvm_device_release also removes the
device from the list that would then be cleared by kvm_destroy_devices.

kvm_device_release is a release callback for the file operations so it
'might' be called in all paths.

Fun though, in kvm_ioctl_create_device the error handling for
the anon_inode_getfd calls ops->destroy without checking it exists.
Boom.

Possibly never happens in reality but looks like a bug to me.

Jonathan


> 
> Steve
> 
> >> +}
> >> +
> >> +static int kvm_arm_pvtime_set_attr(struct kvm_device *dev,
> >> +				   struct kvm_device_attr *attr)
> >> +{
> >> +	struct kvm *kvm = dev->kvm;
> >> +	struct kvm_arch_pvtime *pvtime = &kvm->arch.pvtime;
> >> +	u64 __user *user = (u64 __user *)attr->addr;
> >> +	struct kvm_dev_arm_st_region region;
> >> +
> >> +	switch (attr->group) {
> >> +	case KVM_DEV_ARM_PV_TIME_REGION:
> >> +		if (copy_from_user(&region, user, sizeof(region)))
> >> +			return -EFAULT;
> >> +		if (region.gpa & ~PAGE_MASK)
> >> +			return -EINVAL;
> >> +		if (region.size & ~PAGE_MASK)
> >> +			return -EINVAL;
> >> +		switch (attr->attr) {
> >> +		case KVM_DEV_ARM_PV_TIME_ST:
> >> +			if (pvtime->st_base != GPA_INVALID)
> >> +				return -EEXIST;
> >> +			pvtime->st_base = region.gpa;
> >> +			pvtime->st_size = region.size;
> >> +			return 0;
> >> +		}
> >> +		break;
> >> +	}
> >> +	return -ENXIO;
> >> +}
> >> +
> >> +static int kvm_arm_pvtime_get_attr(struct kvm_device *dev,
> >> +				   struct kvm_device_attr *attr)
> >> +{
> >> +	struct kvm_arch_pvtime *pvtime = &dev->kvm->arch.pvtime;
> >> +	u64 __user *user = (u64 __user *)attr->addr;
> >> +	struct kvm_dev_arm_st_region region;
> >> +
> >> +	switch (attr->group) {
> >> +	case KVM_DEV_ARM_PV_TIME_REGION:
> >> +		switch (attr->attr) {
> >> +		case KVM_DEV_ARM_PV_TIME_ST:
> >> +			region.gpa = pvtime->st_base;
> >> +			region.size = pvtime->st_size;
> >> +			if (copy_to_user(user, &region, sizeof(region)))
> >> +				return -EFAULT;
> >> +			return 0;
> >> +		}
> >> +		break;
> >> +	}
> >> +	return -ENXIO;
> >> +}
> >> +
> >> +static int kvm_arm_pvtime_has_attr(struct kvm_device *dev,
> >> +				   struct kvm_device_attr *attr)
> >> +{
> >> +	switch (attr->group) {
> >> +	case KVM_DEV_ARM_PV_TIME_REGION:
> >> +		switch (attr->attr) {
> >> +		case KVM_DEV_ARM_PV_TIME_ST:
> >> +			return 0;
> >> +		}
> >> +		break;
> >> +	}
> >> +	return -ENXIO;
> >> +}
> >> +
> >> +static const struct kvm_device_ops pvtime_ops = {
> >> +	"Arm PV time",
> >> +	.create = kvm_arm_pvtime_create,
> >> +	.destroy = kvm_arm_pvtime_destroy,
> >> +	.set_attr = kvm_arm_pvtime_set_attr,
> >> +	.get_attr = kvm_arm_pvtime_get_attr,
> >> +	.has_attr = kvm_arm_pvtime_has_attr
> >> +};
> >> +
> >> +void kvm_pvtime_init(void)
> >> +{
> >> +	kvm_register_device_ops(&pvtime_ops, KVM_DEV_TYPE_ARM_PV_TIME);
> >> +}  
> > 
> > 
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> >   
> 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 04/10] KVM: Implement kvm_put_guest()
  2019-08-21 15:36 ` [PATCH v3 04/10] KVM: Implement kvm_put_guest() Steven Price
  2019-08-22 10:29   ` Jonathan Cameron
@ 2019-08-22 15:28   ` Sean Christopherson
  2019-08-22 15:46     ` Steven Price
  1 sibling, 1 reply; 36+ messages in thread
From: Sean Christopherson @ 2019-08-22 15:28 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm,
	Catalin Marinas, Paolo Bonzini, Radim Krčmář,
	Russell King, James Morse, Julien Thierry, Suzuki K Pouloze,
	Mark Rutland, kvm, linux-doc, linux-kernel

On Wed, Aug 21, 2019 at 04:36:50PM +0100, Steven Price wrote:
> kvm_put_guest() is analogous to put_user() - it writes a single value to
> the guest physical address. The implementation is built upon put_user()
> and so it has the same single copy atomic properties.

What you mean by "single copy atomic"?  I.e. what guarantees does
put_user() provide that __copy_to_user() does not?

> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  include/linux/kvm_host.h | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index fcb46b3374c6..e154a1897e20 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -746,6 +746,30 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
>  				  unsigned long len);
>  int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
>  			      gpa_t gpa, unsigned long len);
> +
> +#define __kvm_put_guest(kvm, gfn, offset, value, type)			\
> +({									\
> +	unsigned long __addr = gfn_to_hva(kvm, gfn);			\
> +	type __user *__uaddr = (type __user *)(__addr + offset);	\
> +	int __ret = 0;							\
> +									\
> +	if (kvm_is_error_hva(__addr))					\
> +		__ret = -EFAULT;					\
> +	else								\
> +		__ret = put_user(value, __uaddr);			\
> +	if (!__ret)							\
> +		mark_page_dirty(kvm, gfn);				\
> +	__ret;								\
> +})
> +
> +#define kvm_put_guest(kvm, gpa, value, type)				\
> +({									\
> +	gpa_t __gpa = gpa;						\
> +	struct kvm *__kvm = kvm;					\
> +	__kvm_put_guest(__kvm, __gpa >> PAGE_SHIFT,			\
> +			offset_in_page(__gpa), (value), type);		\
> +})
> +
>  int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
>  int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
>  struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 04/10] KVM: Implement kvm_put_guest()
  2019-08-22 15:28   ` Sean Christopherson
@ 2019-08-22 15:46     ` Steven Price
  2019-08-22 16:24       ` Sean Christopherson
  0 siblings, 1 reply; 36+ messages in thread
From: Steven Price @ 2019-08-22 15:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Mark Rutland, kvm, Radim Krčmář,
	Marc Zyngier, Suzuki K Pouloze, linux-doc, Russell King,
	linux-kernel, James Morse, Julien Thierry, Catalin Marinas,
	Paolo Bonzini, Will Deacon, kvmarm, linux-arm-kernel

On 22/08/2019 16:28, Sean Christopherson wrote:
> On Wed, Aug 21, 2019 at 04:36:50PM +0100, Steven Price wrote:
>> kvm_put_guest() is analogous to put_user() - it writes a single value to
>> the guest physical address. The implementation is built upon put_user()
>> and so it has the same single copy atomic properties.
> 
> What you mean by "single copy atomic"?  I.e. what guarantees does
> put_user() provide that __copy_to_user() does not?

Single-copy atomicity is defined by the Arm architecture[1] and I'm not
going to try to go into the full details here, so this is a summary.

For the sake of this feature what we care about is that the value
written/read cannot be "torn". In other words if there is a read (in
this case from another VCPU) that is racing with the write then the read
will either get the old value or the new value. It cannot return a
mixture. (This is of course assuming that the read is using a
single-copy atomic safe method).

__copy_to_user() is implemented as a memcpy() and as such cannot provide
single-copy atomicity in the general case (the buffer could easily be
bigger than the architecture can guarantee).

put_user() on the other hand is implemented (on arm64) as an explicit
store instruction and therefore is guaranteed by the architecture to be
single-copy atomic (i.e. another CPU cannot see a half-written value).

Steve

[1] https://static.docs.arm.com/ddi0487/ea/DDI0487E_a_armv8_arm.pdf#page=110

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 04/10] KVM: Implement kvm_put_guest()
  2019-08-22 15:46     ` Steven Price
@ 2019-08-22 16:24       ` Sean Christopherson
  2019-08-23 10:33         ` Steven Price
  0 siblings, 1 reply; 36+ messages in thread
From: Sean Christopherson @ 2019-08-22 16:24 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, kvm, Radim Krčmář,
	Marc Zyngier, Suzuki K Pouloze, linux-doc, Russell King,
	linux-kernel, James Morse, Julien Thierry, Catalin Marinas,
	Paolo Bonzini, Will Deacon, kvmarm, linux-arm-kernel

On Thu, Aug 22, 2019 at 04:46:10PM +0100, Steven Price wrote:
> On 22/08/2019 16:28, Sean Christopherson wrote:
> > On Wed, Aug 21, 2019 at 04:36:50PM +0100, Steven Price wrote:
> >> kvm_put_guest() is analogous to put_user() - it writes a single value to
> >> the guest physical address. The implementation is built upon put_user()
> >> and so it has the same single copy atomic properties.
> > 
> > What you mean by "single copy atomic"?  I.e. what guarantees does
> > put_user() provide that __copy_to_user() does not?
> 
> Single-copy atomicity is defined by the Arm architecture[1] and I'm not
> going to try to go into the full details here, so this is a summary.
> 
> For the sake of this feature what we care about is that the value
> written/read cannot be "torn". In other words if there is a read (in
> this case from another VCPU) that is racing with the write then the read
> will either get the old value or the new value. It cannot return a
> mixture. (This is of course assuming that the read is using a
> single-copy atomic safe method).

Thanks for the explanation.  I assumed that's what you were referring to,
but wanted to double check.
 
> __copy_to_user() is implemented as a memcpy() and as such cannot provide
> single-copy atomicity in the general case (the buffer could easily be
> bigger than the architecture can guarantee).
> 
> put_user() on the other hand is implemented (on arm64) as an explicit
> store instruction and therefore is guaranteed by the architecture to be
> single-copy atomic (i.e. another CPU cannot see a half-written value).

I don't think kvm_put_guest() belongs in generic code, at least not with
the current changelog explanation about it providing single-copy atomic
semantics.  AFAICT, the single-copy thing is very much an arm64
implementation detail, e.g. the vast majority of 32-bit architectures,
including x86, do not provide any guarantees, and x86-64 generates more
or less the same code for put_user() and __copy_to_user() for 8-byte and
smaller accesses.

As an alternative to kvm_put_guest() entirely, is it an option to change
arm64's raw_copy_to_user() to redirect to __put_user() for sizes that are
constant at compile time and can be handled by __put_user()?  That would
allow using kvm_write_guest() to update stolen time, albeit with
arguably an even bigger dependency on the uaccess implementation details.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 04/10] KVM: Implement kvm_put_guest()
  2019-08-22 16:24       ` Sean Christopherson
@ 2019-08-23 10:33         ` Steven Price
  0 siblings, 0 replies; 36+ messages in thread
From: Steven Price @ 2019-08-23 10:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Mark Rutland, kvm, Radim Krčmář,
	Marc Zyngier, Suzuki K Pouloze, linux-doc, Russell King,
	linux-kernel, James Morse, Julien Thierry, Catalin Marinas,
	Paolo Bonzini, Will Deacon, kvmarm, linux-arm-kernel

On 22/08/2019 17:24, Sean Christopherson wrote:
> On Thu, Aug 22, 2019 at 04:46:10PM +0100, Steven Price wrote:
>> On 22/08/2019 16:28, Sean Christopherson wrote:
>>> On Wed, Aug 21, 2019 at 04:36:50PM +0100, Steven Price wrote:
>>>> kvm_put_guest() is analogous to put_user() - it writes a single value to
>>>> the guest physical address. The implementation is built upon put_user()
>>>> and so it has the same single copy atomic properties.
>>>
>>> What you mean by "single copy atomic"?  I.e. what guarantees does
>>> put_user() provide that __copy_to_user() does not?
>>
>> Single-copy atomicity is defined by the Arm architecture[1] and I'm not
>> going to try to go into the full details here, so this is a summary.
>>
>> For the sake of this feature what we care about is that the value
>> written/read cannot be "torn". In other words if there is a read (in
>> this case from another VCPU) that is racing with the write then the read
>> will either get the old value or the new value. It cannot return a
>> mixture. (This is of course assuming that the read is using a
>> single-copy atomic safe method).
> 
> Thanks for the explanation.  I assumed that's what you were referring to,
> but wanted to double check.
>  
>> __copy_to_user() is implemented as a memcpy() and as such cannot provide
>> single-copy atomicity in the general case (the buffer could easily be
>> bigger than the architecture can guarantee).
>>
>> put_user() on the other hand is implemented (on arm64) as an explicit
>> store instruction and therefore is guaranteed by the architecture to be
>> single-copy atomic (i.e. another CPU cannot see a half-written value).
> 
> I don't think kvm_put_guest() belongs in generic code, at least not with
> the current changelog explanation about it providing single-copy atomic
> semantics.  AFAICT, the single-copy thing is very much an arm64
> implementation detail, e.g. the vast majority of 32-bit architectures,
> including x86, do not provide any guarantees, and x86-64 generates more
> or less the same code for put_user() and __copy_to_user() for 8-byte and
> smaller accesses.
> 
> As an alternative to kvm_put_guest() entirely, is it an option to change
> arm64's raw_copy_to_user() to redirect to __put_user() for sizes that are
> constant at compile time and can be handled by __put_user()?  That would
> allow using kvm_write_guest() to update stolen time, albeit with
> arguably an even bigger dependency on the uaccess implementation details.

I think it's important to in some way ensure that the desire that this
is a single write is shown. copy_to_user() is effectively
"setup();memcpy();finish();" and while a good memcpy() implementation
would be identical to put_user() there's a lot more room for this being
broken in the future by changes to the memcpy() implementation. (And I
don't want to require that memcpy() has to detect this case).

One suggestion is to call it something like kvm_put_guest_atomic() to
reflect the atomicity requirement. Presumably that would be based on a
new put_user_atomic() which architectures could override as necessary if
put_user() doesn't provide the necessary guarantees.

Steve

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 10/10] arm64: Retrieve stolen time as paravirtualized guest
  2019-08-21 15:36 ` [PATCH v3 10/10] arm64: Retrieve stolen time as paravirtualized guest Steven Price
@ 2019-08-23 11:45   ` Zenghui Yu
  2019-08-23 14:22     ` Steven Price
  0 siblings, 1 reply; 36+ messages in thread
From: Zenghui Yu @ 2019-08-23 11:45 UTC (permalink / raw)
  To: Steven Price, Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: linux-kernel, kvm, Catalin Marinas, linux-doc, Russell King,
	Paolo Bonzini

Hi Steven,

On 2019/8/21 23:36, Steven Price wrote:
> Enable paravirtualization features when running under a hypervisor
> supporting the PV_TIME_ST hypercall.
> 
> For each (v)CPU, we ask the hypervisor for the location of a shared
> page which the hypervisor will use to report stolen time to us. We set
> pv_time_ops to the stolen time function which simply reads the stolen
> value from the shared page for a VCPU. We guarantee single-copy
> atomicity using READ_ONCE which means we can also read the stolen
> time for another VCPU than the currently running one while it is
> potentially being updated by the hypervisor.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   arch/arm64/include/asm/paravirt.h |   9 +-
>   arch/arm64/kernel/paravirt.c      | 148 ++++++++++++++++++++++++++++++
>   arch/arm64/kernel/time.c          |   3 +
>   include/linux/cpuhotplug.h        |   1 +
>   4 files changed, 160 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h
> index 799d9dd6f7cc..125c26c42902 100644
> --- a/arch/arm64/include/asm/paravirt.h
> +++ b/arch/arm64/include/asm/paravirt.h
> @@ -21,6 +21,13 @@ static inline u64 paravirt_steal_clock(int cpu)
>   {
>   	return pv_ops.time.steal_clock(cpu);
>   }
> -#endif
> +
> +int __init kvm_guest_init(void);
> +
> +#else
> +
> +#define kvm_guest_init()
> +
> +#endif // CONFIG_PARAVIRT
>   
>   #endif
> diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
> index 4cfed91fe256..ea8dbbbd3293 100644
> --- a/arch/arm64/kernel/paravirt.c
> +++ b/arch/arm64/kernel/paravirt.c
> @@ -6,13 +6,161 @@
>    * Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>    */
>   
> +#define pr_fmt(fmt) "kvmarm-pv: " fmt
> +
> +#include <linux/arm-smccc.h>
> +#include <linux/cpuhotplug.h>
>   #include <linux/export.h>
> +#include <linux/io.h>
>   #include <linux/jump_label.h>
> +#include <linux/printk.h>
> +#include <linux/psci.h>
> +#include <linux/reboot.h>
> +#include <linux/slab.h>
>   #include <linux/types.h>
> +
>   #include <asm/paravirt.h>
> +#include <asm/pvclock-abi.h>
> +#include <asm/smp_plat.h>
>   
>   struct static_key paravirt_steal_enabled;
>   struct static_key paravirt_steal_rq_enabled;
>   
>   struct paravirt_patch_template pv_ops;
>   EXPORT_SYMBOL_GPL(pv_ops);
> +
> +struct kvmarm_stolen_time_region {
> +	struct pvclock_vcpu_stolen_time *kaddr;
> +};
> +
> +static DEFINE_PER_CPU(struct kvmarm_stolen_time_region, stolen_time_region);
> +
> +static bool steal_acc = true;
> +static int __init parse_no_stealacc(char *arg)
> +{
> +	steal_acc = false;
> +	return 0;
> +}
> +
> +early_param("no-steal-acc", parse_no_stealacc);
> +
> +/* return stolen time in ns by asking the hypervisor */
> +static u64 kvm_steal_clock(int cpu)
> +{
> +	struct kvmarm_stolen_time_region *reg;
> +
> +	reg = per_cpu_ptr(&stolen_time_region, cpu);
> +	if (!reg->kaddr) {
> +		pr_warn_once("stolen time enabled but not configured for cpu %d\n",
> +			     cpu);
> +		return 0;
> +	}
> +
> +	return le64_to_cpu(READ_ONCE(reg->kaddr->stolen_time));
> +}
> +
> +static int disable_stolen_time_current_cpu(void)
> +{
> +	struct kvmarm_stolen_time_region *reg;
> +
> +	reg = this_cpu_ptr(&stolen_time_region);
> +	if (!reg->kaddr)
> +		return 0;
> +
> +	memunmap(reg->kaddr);
> +	memset(reg, 0, sizeof(*reg));
> +
> +	return 0;
> +}
> +
> +static int stolen_time_dying_cpu(unsigned int cpu)
> +{
> +	return disable_stolen_time_current_cpu();
> +}
> +
> +static int init_stolen_time_cpu(unsigned int cpu)
> +{
> +	struct kvmarm_stolen_time_region *reg;
> +	struct arm_smccc_res res;
> +
> +	reg = this_cpu_ptr(&stolen_time_region);
> +
> +	arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_TIME_ST, &res);
> +
> +	if ((long)res.a0 < 0)
> +		return -EINVAL;
> +
> +	reg->kaddr = memremap(res.a0,
> +			      sizeof(struct pvclock_vcpu_stolen_time),
> +			      MEMREMAP_WB);

cpuhp callbacks can be invoked in atomic context (see:
	secondary_start_kernel ->
	notify_cpu_starting ->
	invoke callbacks),
but memremap might sleep...

Try to run a DEBUG_ATOMIC_SLEEP enabled PV guest, I guess we will be
greeted by the Sleep-in-Atomic-Context BUG.  We need an alternative
here?

> +
> +	if (!reg->kaddr) {
> +		pr_warn("Failed to map stolen time data structure\n");
> +		return -ENOMEM;
> +	}
> +
> +	if (le32_to_cpu(reg->kaddr->revision) != 0 ||
> +	    le32_to_cpu(reg->kaddr->attributes) != 0) {
> +		pr_warn("Unexpected revision or attributes in stolen time data\n");
> +		return -ENXIO;
> +	}
> +
> +	return 0;
> +}
> +
> +static int kvm_arm_init_stolen_time(void)
> +{
> +	int ret;
> +
> +	ret = cpuhp_setup_state(CPUHP_AP_ARM_KVMPV_STARTING,
> +				"hypervisor/kvmarm/pv:starting",
> +				init_stolen_time_cpu, stolen_time_dying_cpu);
> +	if (ret < 0)
> +		return ret;
> +	return 0;
> +}
> +
> +static bool has_kvm_steal_clock(void)
> +{
> +	struct arm_smccc_res res;
> +
> +	/* To detect the presence of PV time support we require SMCCC 1.1+ */
> +	if (psci_ops.smccc_version < SMCCC_VERSION_1_1)
> +		return false;
> +
> +	arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
> +			     ARM_SMCCC_HV_PV_FEATURES, &res);
> +
> +	if (res.a0 != SMCCC_RET_SUCCESS)
> +		return false;
> +
> +	arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_FEATURES,
> +			     ARM_SMCCC_HV_PV_TIME_ST, &res);
> +
> +	if (res.a0 != SMCCC_RET_SUCCESS)
> +		return false;
> +
> +	return true;
> +}
> +
> +int __init kvm_guest_init(void)
> +{
> +	int ret = 0;

And this look like a redundant initialization?


Thanks,
zenghui

> +
> +	if (!has_kvm_steal_clock())
> +		return 0;
> +
> +	ret = kvm_arm_init_stolen_time();
> +	if (ret)
> +		return ret;
> +
> +	pv_ops.time.steal_clock = kvm_steal_clock;
> +
> +	static_key_slow_inc(&paravirt_steal_enabled);
> +	if (steal_acc)
> +		static_key_slow_inc(&paravirt_steal_rq_enabled);
> +
> +	pr_info("using stolen time PV\n");
> +
> +	return 0;
> +}
> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
> index 0b2946414dc9..a52aea14c6ec 100644
> --- a/arch/arm64/kernel/time.c
> +++ b/arch/arm64/kernel/time.c
> @@ -30,6 +30,7 @@
>   
>   #include <asm/thread_info.h>
>   #include <asm/stacktrace.h>
> +#include <asm/paravirt.h>
>   
>   unsigned long profile_pc(struct pt_regs *regs)
>   {
> @@ -65,4 +66,6 @@ void __init time_init(void)
>   
>   	/* Calibrate the delay loop directly */
>   	lpj_fine = arch_timer_rate / HZ;
> +
> +	kvm_guest_init();
>   }
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 068793a619ca..89d75edb5750 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -136,6 +136,7 @@ enum cpuhp_state {
>   	/* Must be the last timer callback */
>   	CPUHP_AP_DUMMY_TIMER_STARTING,
>   	CPUHP_AP_ARM_XEN_STARTING,
> +	CPUHP_AP_ARM_KVMPV_STARTING,
>   	CPUHP_AP_ARM_CORESIGHT_STARTING,
>   	CPUHP_AP_ARM64_ISNDEP_STARTING,
>   	CPUHP_AP_SMPCFD_DYING,
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 05/10] KVM: arm64: Support stolen time reporting via shared structure
  2019-08-21 15:36 ` [PATCH v3 05/10] KVM: arm64: Support stolen time reporting via shared structure Steven Price
  2019-08-22 10:39   ` Jonathan Cameron
@ 2019-08-23 12:07   ` Zenghui Yu
  2019-08-23 13:23     ` Steven Price
  1 sibling, 1 reply; 36+ messages in thread
From: Zenghui Yu @ 2019-08-23 12:07 UTC (permalink / raw)
  To: Steven Price, Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: linux-kernel, kvm, Catalin Marinas, linux-doc, Russell King,
	Paolo Bonzini

Hi Steven,

Only one comment, at the bottom.

On 2019/8/21 23:36, Steven Price wrote:
> Implement the service call for configuring a shared structure between a
> VCPU and the hypervisor in which the hypervisor can write the time
> stolen from the VCPU's execution time by other tasks on the host.
> 
> The hypervisor allocates memory which is placed at an IPA chosen by user
> space. The hypervisor then updates the shared structure using
> kvm_put_guest() to ensure single copy atomicity of the 64-bit value
> reporting the stolen time in nanoseconds.
> 
> Whenever stolen time is enabled by the guest, the stolen time counter is
> reset.
> 
> The stolen time itself is retrieved from the sched_info structure
> maintained by the Linux scheduler code. We enable SCHEDSTATS when
> selecting KVM Kconfig to ensure this value is meaningful.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   arch/arm/include/asm/kvm_host.h   | 20 +++++++++
>   arch/arm64/include/asm/kvm_host.h | 25 +++++++++++-
>   arch/arm64/kvm/Kconfig            |  1 +
>   include/linux/kvm_types.h         |  2 +
>   virt/kvm/arm/arm.c                | 10 +++++
>   virt/kvm/arm/hypercalls.c         |  3 ++
>   virt/kvm/arm/pvtime.c             | 67 +++++++++++++++++++++++++++++++
>   7 files changed, 127 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 369b5d2d54bf..47d2ced99421 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -39,6 +39,7 @@
>   	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>   #define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
>   #define KVM_REQ_VCPU_RESET	KVM_ARCH_REQ(2)
> +#define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
>   
>   DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>   
> @@ -329,6 +330,25 @@ static inline int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
>   	return SMCCC_RET_NOT_SUPPORTED;
>   }
>   
> +static inline int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu)
> +{
> +	return SMCCC_RET_NOT_SUPPORTED;
> +}
> +
> +static inline int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool init)
> +{
> +	return -ENOTSUPP;
> +}
> +
> +static inline void kvm_pvtime_init_vm(struct kvm_arch *kvm_arch)
> +{
> +}
> +
> +static inline bool kvm_is_pvtime_enabled(struct kvm_arch *kvm_arch)
> +{
> +	return false;
> +}
> +
>   void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>   
>   struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 583b3639062a..b6fa7beffd8a 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -44,6 +44,7 @@
>   	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>   #define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
>   #define KVM_REQ_VCPU_RESET	KVM_ARCH_REQ(2)
> +#define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
>   
>   DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>   
> @@ -83,6 +84,11 @@ struct kvm_arch {
>   
>   	/* Mandated version of PSCI */
>   	u32 psci_version;
> +
> +	struct kvm_arch_pvtime {
> +		gpa_t st_base;
> +		u64 st_size;
> +	} pvtime;
>   };
>   
>   #define KVM_NR_MEM_OBJS     40
> @@ -338,8 +344,13 @@ struct kvm_vcpu_arch {
>   	/* True when deferrable sysregs are loaded on the physical CPU,
>   	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
>   	bool sysregs_loaded_on_cpu;
> -};
>   
> +	/* Guest PV state */
> +	struct {
> +		u64 steal;
> +		u64 last_steal;
> +	} steal;
> +};
>   /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
>   #define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) + \
>   				      sve_ffr_offset((vcpu)->arch.sve_max_vl)))
> @@ -479,6 +490,18 @@ int kvm_perf_init(void);
>   int kvm_perf_teardown(void);
>   
>   int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu);
> +int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu);
> +int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool init);
> +
> +static inline void kvm_pvtime_init_vm(struct kvm_arch *kvm_arch)
> +{
> +	kvm_arch->pvtime.st_base = GPA_INVALID;
> +}
> +
> +static inline bool kvm_is_pvtime_enabled(struct kvm_arch *kvm_arch)
> +{
> +	return (kvm_arch->pvtime.st_base != GPA_INVALID);
> +}
>   
>   void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
>   
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index a67121d419a2..d8b88e40d223 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -39,6 +39,7 @@ config KVM
>   	select IRQ_BYPASS_MANAGER
>   	select HAVE_KVM_IRQ_BYPASS
>   	select HAVE_KVM_VCPU_RUN_PID_CHANGE
> +	select SCHEDSTATS
>   	---help---
>   	  Support hosting virtualized guest machines.
>   	  We don't support KVM with 16K page tables yet, due to the multiple
> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> index bde5374ae021..1c88e69db3d9 100644
> --- a/include/linux/kvm_types.h
> +++ b/include/linux/kvm_types.h
> @@ -35,6 +35,8 @@ typedef unsigned long  gva_t;
>   typedef u64            gpa_t;
>   typedef u64            gfn_t;
>   
> +#define GPA_INVALID	(~(gpa_t)0)
> +
>   typedef unsigned long  hva_t;
>   typedef u64            hpa_t;
>   typedef u64            hfn_t;
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 35a069815baf..5e8343e2dd62 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -40,6 +40,10 @@
>   #include <asm/kvm_coproc.h>
>   #include <asm/sections.h>
>   
> +#include <kvm/arm_hypercalls.h>
> +#include <kvm/arm_pmu.h>
> +#include <kvm/arm_psci.h>
> +
>   #ifdef REQUIRES_VIRT
>   __asm__(".arch_extension	virt");
>   #endif
> @@ -135,6 +139,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>   	kvm->arch.max_vcpus = vgic_present ?
>   				kvm_vgic_get_max_vcpus() : KVM_MAX_VCPUS;
>   
> +	kvm_pvtime_init_vm(&kvm->arch);
>   	return ret;
>   out_free_stage2_pgd:
>   	kvm_free_stage2_pgd(kvm);
> @@ -379,6 +384,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>   	kvm_vcpu_load_sysregs(vcpu);
>   	kvm_arch_vcpu_load_fp(vcpu);
>   	kvm_vcpu_pmu_restore_guest(vcpu);
> +	if (kvm_is_pvtime_enabled(&vcpu->kvm->arch))
> +		kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
>   
>   	if (single_task_running())
>   		vcpu_clear_wfe_traps(vcpu);
> @@ -644,6 +651,9 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
>   		 * that a VCPU sees new virtual interrupts.
>   		 */
>   		kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu);
> +
> +		if (kvm_check_request(KVM_REQ_RECORD_STEAL, vcpu))
> +			kvm_update_stolen_time(vcpu, false);
>   	}
>   }
>   
> diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
> index 63ae629c466a..ac678eabf15f 100644
> --- a/virt/kvm/arm/hypercalls.c
> +++ b/virt/kvm/arm/hypercalls.c
> @@ -56,6 +56,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>   	case ARM_SMCCC_HV_PV_FEATURES:
>   		val = kvm_hypercall_pv_features(vcpu);
>   		break;
> +	case ARM_SMCCC_HV_PV_TIME_ST:
> +		val = kvm_hypercall_stolen_time(vcpu);
> +		break;
>   	default:
>   		return kvm_psci_call(vcpu);
>   	}
> diff --git a/virt/kvm/arm/pvtime.c b/virt/kvm/arm/pvtime.c
> index 6201d71cb1f8..28603689f6e0 100644
> --- a/virt/kvm/arm/pvtime.c
> +++ b/virt/kvm/arm/pvtime.c
> @@ -3,8 +3,51 @@
>   
>   #include <linux/arm-smccc.h>
>   
> +#include <asm/pvclock-abi.h>
> +
>   #include <kvm/arm_hypercalls.h>
>   
> +int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool init)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_arch_pvtime *pvtime = &kvm->arch.pvtime;
> +	u64 steal;
> +	u64 steal_le;
> +	u64 offset;
> +	int idx;
> +	const int stride = sizeof(struct pvclock_vcpu_stolen_time);
> +
> +	if (pvtime->st_base == GPA_INVALID)
> +		return -ENOTSUPP;
> +
> +	/* Let's do the local bookkeeping */
> +	steal = vcpu->arch.steal.steal;
> +	steal += current->sched_info.run_delay - vcpu->arch.steal.last_steal;
> +	vcpu->arch.steal.last_steal = current->sched_info.run_delay;
> +	vcpu->arch.steal.steal = steal;
> +
> +	offset = stride * kvm_vcpu_get_idx(vcpu);
> +
> +	if (unlikely(offset + stride > pvtime->st_size))
> +		return -EINVAL;
> +
> +	steal_le = cpu_to_le64(steal);
> +	idx = srcu_read_lock(&kvm->srcu);
> +	if (init) {
> +		struct pvclock_vcpu_stolen_time init_values = {
> +			.revision = 0,
> +			.attributes = 0
> +		};
> +		kvm_write_guest(kvm, pvtime->st_base + offset, &init_values,
> +				sizeof(init_values));
> +	}
> +	offset += offsetof(struct pvclock_vcpu_stolen_time, stolen_time);
> +	kvm_put_guest(kvm, pvtime->st_base + offset, steal_le, u64);
> +	srcu_read_unlock(&kvm->srcu, idx);
> +
> +	return 0;
> +}
> +
>   int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
>   {
>   	u32 feature = smccc_get_arg1(vcpu);
> @@ -12,6 +55,7 @@ int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
>   
>   	switch (feature) {
>   	case ARM_SMCCC_HV_PV_FEATURES:
> +	case ARM_SMCCC_HV_PV_TIME_ST:
>   		val = SMCCC_RET_SUCCESS;
>   		break;
>   	}
> @@ -19,3 +63,26 @@ int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
>   	return val;
>   }
>   
> +int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu)
> +{
> +	u64 ret;
> +	int err;
> +
> +	/*
> +	 * Start counting stolen time from the time the guest requests
> +	 * the feature enabled.
> +	 */
> +	vcpu->arch.steal.steal = 0;
> +	vcpu->arch.steal.last_steal = current->sched_info.run_delay;
> +
> +	err = kvm_update_stolen_time(vcpu, true);
> +
> +	if (err)
> +		ret = SMCCC_RET_NOT_SUPPORTED;
> +	else
> +		ret = vcpu->kvm->arch.pvtime.st_base +
> +			(sizeof(struct pvclock_vcpu_stolen_time) *
> +			 kvm_vcpu_get_idx(vcpu));
> +
> +	return ret;

The *type* of the 'ret' here looks a bit messy to me:
(1)u64 -> (2)int -> (3)u32 -> (4)unsigned long

(1)->(2): just inside kvm_hypercall_stolen_time()
(2)->(3): inside kvm_hvc_call_handler(), assign 'ret' to 'val'
(3)->(4): through smccc_set_retval()

I really have seen an issue caused by (2)->(3).

When the PV guest running without PV_TIME device supporting, the result
of the ARM_SMCCC_HV_PV_TIME_ST hypercall is expected to be -1 (which
means "not supported"), but the actual result I got is 4294967295.
Guest continues to run blindly, bad things would happen then...

I think this needs a fix?


Thanks,
zenghui


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 05/10] KVM: arm64: Support stolen time reporting via shared structure
  2019-08-23 12:07   ` Zenghui Yu
@ 2019-08-23 13:23     ` Steven Price
  0 siblings, 0 replies; 36+ messages in thread
From: Steven Price @ 2019-08-23 13:23 UTC (permalink / raw)
  To: Zenghui Yu, Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: linux-kernel, kvm, Catalin Marinas, linux-doc, Russell King,
	Paolo Bonzini

On 23/08/2019 13:07, Zenghui Yu wrote:
> Hi Steven,
> 
> Only one comment, at the bottom.
> 
> On 2019/8/21 23:36, Steven Price wrote:
>> Implement the service call for configuring a shared structure between a
>> VCPU and the hypervisor in which the hypervisor can write the time
>> stolen from the VCPU's execution time by other tasks on the host.
>>
>> The hypervisor allocates memory which is placed at an IPA chosen by user
>> space. The hypervisor then updates the shared structure using
>> kvm_put_guest() to ensure single copy atomicity of the 64-bit value
>> reporting the stolen time in nanoseconds.
>>
>> Whenever stolen time is enabled by the guest, the stolen time counter is
>> reset.
>>
>> The stolen time itself is retrieved from the sched_info structure
>> maintained by the Linux scheduler code. We enable SCHEDSTATS when
>> selecting KVM Kconfig to ensure this value is meaningful.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   arch/arm/include/asm/kvm_host.h   | 20 +++++++++
>>   arch/arm64/include/asm/kvm_host.h | 25 +++++++++++-
>>   arch/arm64/kvm/Kconfig            |  1 +
>>   include/linux/kvm_types.h         |  2 +
>>   virt/kvm/arm/arm.c                | 10 +++++
>>   virt/kvm/arm/hypercalls.c         |  3 ++
>>   virt/kvm/arm/pvtime.c             | 67 +++++++++++++++++++++++++++++++
>>   7 files changed, 127 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h
>> b/arch/arm/include/asm/kvm_host.h
>> index 369b5d2d54bf..47d2ced99421 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -39,6 +39,7 @@
>>       KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>   #define KVM_REQ_IRQ_PENDING    KVM_ARCH_REQ(1)
>>   #define KVM_REQ_VCPU_RESET    KVM_ARCH_REQ(2)
>> +#define KVM_REQ_RECORD_STEAL    KVM_ARCH_REQ(3)
>>     DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>>   @@ -329,6 +330,25 @@ static inline int
>> kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
>>       return SMCCC_RET_NOT_SUPPORTED;
>>   }
>>   +static inline int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu)
>> +{
>> +    return SMCCC_RET_NOT_SUPPORTED;
>> +}
>> +
>> +static inline int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool
>> init)
>> +{
>> +    return -ENOTSUPP;
>> +}
>> +
>> +static inline void kvm_pvtime_init_vm(struct kvm_arch *kvm_arch)
>> +{
>> +}
>> +
>> +static inline bool kvm_is_pvtime_enabled(struct kvm_arch *kvm_arch)
>> +{
>> +    return false;
>> +}
>> +
>>   void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>     struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long
>> mpidr);
>> diff --git a/arch/arm64/include/asm/kvm_host.h
>> b/arch/arm64/include/asm/kvm_host.h
>> index 583b3639062a..b6fa7beffd8a 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -44,6 +44,7 @@
>>       KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>   #define KVM_REQ_IRQ_PENDING    KVM_ARCH_REQ(1)
>>   #define KVM_REQ_VCPU_RESET    KVM_ARCH_REQ(2)
>> +#define KVM_REQ_RECORD_STEAL    KVM_ARCH_REQ(3)
>>     DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>>   @@ -83,6 +84,11 @@ struct kvm_arch {
>>         /* Mandated version of PSCI */
>>       u32 psci_version;
>> +
>> +    struct kvm_arch_pvtime {
>> +        gpa_t st_base;
>> +        u64 st_size;
>> +    } pvtime;
>>   };
>>     #define KVM_NR_MEM_OBJS     40
>> @@ -338,8 +344,13 @@ struct kvm_vcpu_arch {
>>       /* True when deferrable sysregs are loaded on the physical CPU,
>>        * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
>>       bool sysregs_loaded_on_cpu;
>> -};
>>   +    /* Guest PV state */
>> +    struct {
>> +        u64 steal;
>> +        u64 last_steal;
>> +    } steal;
>> +};
>>   /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
>>   #define vcpu_sve_pffr(vcpu) ((void *)((char
>> *)((vcpu)->arch.sve_state) + \
>>                         sve_ffr_offset((vcpu)->arch.sve_max_vl)))
>> @@ -479,6 +490,18 @@ int kvm_perf_init(void);
>>   int kvm_perf_teardown(void);
>>     int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu);
>> +int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu);
>> +int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool init);
>> +
>> +static inline void kvm_pvtime_init_vm(struct kvm_arch *kvm_arch)
>> +{
>> +    kvm_arch->pvtime.st_base = GPA_INVALID;
>> +}
>> +
>> +static inline bool kvm_is_pvtime_enabled(struct kvm_arch *kvm_arch)
>> +{
>> +    return (kvm_arch->pvtime.st_base != GPA_INVALID);
>> +}
>>     void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
>>   diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>> index a67121d419a2..d8b88e40d223 100644
>> --- a/arch/arm64/kvm/Kconfig
>> +++ b/arch/arm64/kvm/Kconfig
>> @@ -39,6 +39,7 @@ config KVM
>>       select IRQ_BYPASS_MANAGER
>>       select HAVE_KVM_IRQ_BYPASS
>>       select HAVE_KVM_VCPU_RUN_PID_CHANGE
>> +    select SCHEDSTATS
>>       ---help---
>>         Support hosting virtualized guest machines.
>>         We don't support KVM with 16K page tables yet, due to the
>> multiple
>> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
>> index bde5374ae021..1c88e69db3d9 100644
>> --- a/include/linux/kvm_types.h
>> +++ b/include/linux/kvm_types.h
>> @@ -35,6 +35,8 @@ typedef unsigned long  gva_t;
>>   typedef u64            gpa_t;
>>   typedef u64            gfn_t;
>>   +#define GPA_INVALID    (~(gpa_t)0)
>> +
>>   typedef unsigned long  hva_t;
>>   typedef u64            hpa_t;
>>   typedef u64            hfn_t;
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 35a069815baf..5e8343e2dd62 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -40,6 +40,10 @@
>>   #include <asm/kvm_coproc.h>
>>   #include <asm/sections.h>
>>   +#include <kvm/arm_hypercalls.h>
>> +#include <kvm/arm_pmu.h>
>> +#include <kvm/arm_psci.h>
>> +
>>   #ifdef REQUIRES_VIRT
>>   __asm__(".arch_extension    virt");
>>   #endif
>> @@ -135,6 +139,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned
>> long type)
>>       kvm->arch.max_vcpus = vgic_present ?
>>                   kvm_vgic_get_max_vcpus() : KVM_MAX_VCPUS;
>>   +    kvm_pvtime_init_vm(&kvm->arch);
>>       return ret;
>>   out_free_stage2_pgd:
>>       kvm_free_stage2_pgd(kvm);
>> @@ -379,6 +384,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int
>> cpu)
>>       kvm_vcpu_load_sysregs(vcpu);
>>       kvm_arch_vcpu_load_fp(vcpu);
>>       kvm_vcpu_pmu_restore_guest(vcpu);
>> +    if (kvm_is_pvtime_enabled(&vcpu->kvm->arch))
>> +        kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
>>         if (single_task_running())
>>           vcpu_clear_wfe_traps(vcpu);
>> @@ -644,6 +651,9 @@ static void check_vcpu_requests(struct kvm_vcpu
>> *vcpu)
>>            * that a VCPU sees new virtual interrupts.
>>            */
>>           kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu);
>> +
>> +        if (kvm_check_request(KVM_REQ_RECORD_STEAL, vcpu))
>> +            kvm_update_stolen_time(vcpu, false);
>>       }
>>   }
>>   diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c
>> index 63ae629c466a..ac678eabf15f 100644
>> --- a/virt/kvm/arm/hypercalls.c
>> +++ b/virt/kvm/arm/hypercalls.c
>> @@ -56,6 +56,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>>       case ARM_SMCCC_HV_PV_FEATURES:
>>           val = kvm_hypercall_pv_features(vcpu);
>>           break;
>> +    case ARM_SMCCC_HV_PV_TIME_ST:
>> +        val = kvm_hypercall_stolen_time(vcpu);
>> +        break;
>>       default:
>>           return kvm_psci_call(vcpu);
>>       }
>> diff --git a/virt/kvm/arm/pvtime.c b/virt/kvm/arm/pvtime.c
>> index 6201d71cb1f8..28603689f6e0 100644
>> --- a/virt/kvm/arm/pvtime.c
>> +++ b/virt/kvm/arm/pvtime.c
>> @@ -3,8 +3,51 @@
>>     #include <linux/arm-smccc.h>
>>   +#include <asm/pvclock-abi.h>
>> +
>>   #include <kvm/arm_hypercalls.h>
>>   +int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool init)
>> +{
>> +    struct kvm *kvm = vcpu->kvm;
>> +    struct kvm_arch_pvtime *pvtime = &kvm->arch.pvtime;
>> +    u64 steal;
>> +    u64 steal_le;
>> +    u64 offset;
>> +    int idx;
>> +    const int stride = sizeof(struct pvclock_vcpu_stolen_time);
>> +
>> +    if (pvtime->st_base == GPA_INVALID)
>> +        return -ENOTSUPP;
>> +
>> +    /* Let's do the local bookkeeping */
>> +    steal = vcpu->arch.steal.steal;
>> +    steal += current->sched_info.run_delay -
>> vcpu->arch.steal.last_steal;
>> +    vcpu->arch.steal.last_steal = current->sched_info.run_delay;
>> +    vcpu->arch.steal.steal = steal;
>> +
>> +    offset = stride * kvm_vcpu_get_idx(vcpu);
>> +
>> +    if (unlikely(offset + stride > pvtime->st_size))
>> +        return -EINVAL;
>> +
>> +    steal_le = cpu_to_le64(steal);
>> +    idx = srcu_read_lock(&kvm->srcu);
>> +    if (init) {
>> +        struct pvclock_vcpu_stolen_time init_values = {
>> +            .revision = 0,
>> +            .attributes = 0
>> +        };
>> +        kvm_write_guest(kvm, pvtime->st_base + offset, &init_values,
>> +                sizeof(init_values));
>> +    }
>> +    offset += offsetof(struct pvclock_vcpu_stolen_time, stolen_time);
>> +    kvm_put_guest(kvm, pvtime->st_base + offset, steal_le, u64);
>> +    srcu_read_unlock(&kvm->srcu, idx);
>> +
>> +    return 0;
>> +}
>> +
>>   int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
>>   {
>>       u32 feature = smccc_get_arg1(vcpu);
>> @@ -12,6 +55,7 @@ int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
>>         switch (feature) {
>>       case ARM_SMCCC_HV_PV_FEATURES:
>> +    case ARM_SMCCC_HV_PV_TIME_ST:
>>           val = SMCCC_RET_SUCCESS;
>>           break;
>>       }
>> @@ -19,3 +63,26 @@ int kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
>>       return val;
>>   }
>>   +int kvm_hypercall_stolen_time(struct kvm_vcpu *vcpu)
>> +{
>> +    u64 ret;
>> +    int err;
>> +
>> +    /*
>> +     * Start counting stolen time from the time the guest requests
>> +     * the feature enabled.
>> +     */
>> +    vcpu->arch.steal.steal = 0;
>> +    vcpu->arch.steal.last_steal = current->sched_info.run_delay;
>> +
>> +    err = kvm_update_stolen_time(vcpu, true);
>> +
>> +    if (err)
>> +        ret = SMCCC_RET_NOT_SUPPORTED;
>> +    else
>> +        ret = vcpu->kvm->arch.pvtime.st_base +
>> +            (sizeof(struct pvclock_vcpu_stolen_time) *
>> +             kvm_vcpu_get_idx(vcpu));
>> +
>> +    return ret;
> 
> The *type* of the 'ret' here looks a bit messy to me:
> (1)u64 -> (2)int -> (3)u32 -> (4)unsigned long
> 
> (1)->(2): just inside kvm_hypercall_stolen_time()
> (2)->(3): inside kvm_hvc_call_handler(), assign 'ret' to 'val'
> (3)->(4): through smccc_set_retval()
> 
> I really have seen an issue caused by (2)->(3).
> 
> When the PV guest running without PV_TIME device supporting, the result
> of the ARM_SMCCC_HV_PV_TIME_ST hypercall is expected to be -1 (which
> means "not supported"), but the actual result I got is 4294967295.
> Guest continues to run blindly, bad things would happen then...
> 
> I think this needs a fix?

Yes you are entirely right. I'm afraid this happened because I
refactored the functions and apparently forgot to update the return
type. In a previous version the functions themselves did
smccc_set_retval() themselves and the return value was always "1" (the
same as kvm_hvc_call_handler()).

The function should really return a "long" and "val" in
kvm_hvc_call_handler() should be upgraded to a "long" too - the
SMC64/HVC64 calling convention requires error codes to be 64-bit signed
integers.

Thanks for spotting this!

Steve

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 10/10] arm64: Retrieve stolen time as paravirtualized guest
  2019-08-23 11:45   ` Zenghui Yu
@ 2019-08-23 14:22     ` Steven Price
  2019-08-27 12:43       ` Zenghui Yu
  0 siblings, 1 reply; 36+ messages in thread
From: Steven Price @ 2019-08-23 14:22 UTC (permalink / raw)
  To: Zenghui Yu, Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: linux-kernel, kvm, Catalin Marinas, linux-doc, Russell King,
	Paolo Bonzini

On 23/08/2019 12:45, Zenghui Yu wrote:
> Hi Steven,
> 
> On 2019/8/21 23:36, Steven Price wrote:
>> Enable paravirtualization features when running under a hypervisor
>> supporting the PV_TIME_ST hypercall.
>>
>> For each (v)CPU, we ask the hypervisor for the location of a shared
>> page which the hypervisor will use to report stolen time to us. We set
>> pv_time_ops to the stolen time function which simply reads the stolen
>> value from the shared page for a VCPU. We guarantee single-copy
>> atomicity using READ_ONCE which means we can also read the stolen
>> time for another VCPU than the currently running one while it is
>> potentially being updated by the hypervisor.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   arch/arm64/include/asm/paravirt.h |   9 +-
>>   arch/arm64/kernel/paravirt.c      | 148 ++++++++++++++++++++++++++++++
>>   arch/arm64/kernel/time.c          |   3 +
>>   include/linux/cpuhotplug.h        |   1 +
>>   4 files changed, 160 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/paravirt.h
>> b/arch/arm64/include/asm/paravirt.h
>> index 799d9dd6f7cc..125c26c42902 100644
>> --- a/arch/arm64/include/asm/paravirt.h
>> +++ b/arch/arm64/include/asm/paravirt.h
>> @@ -21,6 +21,13 @@ static inline u64 paravirt_steal_clock(int cpu)
>>   {
>>       return pv_ops.time.steal_clock(cpu);
>>   }
>> -#endif
>> +
>> +int __init kvm_guest_init(void);
>> +
>> +#else
>> +
>> +#define kvm_guest_init()
>> +
>> +#endif // CONFIG_PARAVIRT
>>     #endif
>> diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
>> index 4cfed91fe256..ea8dbbbd3293 100644
>> --- a/arch/arm64/kernel/paravirt.c
>> +++ b/arch/arm64/kernel/paravirt.c
>> @@ -6,13 +6,161 @@
>>    * Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>    */
>>   +#define pr_fmt(fmt) "kvmarm-pv: " fmt
>> +
>> +#include <linux/arm-smccc.h>
>> +#include <linux/cpuhotplug.h>
>>   #include <linux/export.h>
>> +#include <linux/io.h>
>>   #include <linux/jump_label.h>
>> +#include <linux/printk.h>
>> +#include <linux/psci.h>
>> +#include <linux/reboot.h>
>> +#include <linux/slab.h>
>>   #include <linux/types.h>
>> +
>>   #include <asm/paravirt.h>
>> +#include <asm/pvclock-abi.h>
>> +#include <asm/smp_plat.h>
>>     struct static_key paravirt_steal_enabled;
>>   struct static_key paravirt_steal_rq_enabled;
>>     struct paravirt_patch_template pv_ops;
>>   EXPORT_SYMBOL_GPL(pv_ops);
>> +
>> +struct kvmarm_stolen_time_region {
>> +    struct pvclock_vcpu_stolen_time *kaddr;
>> +};
>> +
>> +static DEFINE_PER_CPU(struct kvmarm_stolen_time_region,
>> stolen_time_region);
>> +
>> +static bool steal_acc = true;
>> +static int __init parse_no_stealacc(char *arg)
>> +{
>> +    steal_acc = false;
>> +    return 0;
>> +}
>> +
>> +early_param("no-steal-acc", parse_no_stealacc);
>> +
>> +/* return stolen time in ns by asking the hypervisor */
>> +static u64 kvm_steal_clock(int cpu)
>> +{
>> +    struct kvmarm_stolen_time_region *reg;
>> +
>> +    reg = per_cpu_ptr(&stolen_time_region, cpu);
>> +    if (!reg->kaddr) {
>> +        pr_warn_once("stolen time enabled but not configured for cpu
>> %d\n",
>> +                 cpu);
>> +        return 0;
>> +    }
>> +
>> +    return le64_to_cpu(READ_ONCE(reg->kaddr->stolen_time));
>> +}
>> +
>> +static int disable_stolen_time_current_cpu(void)
>> +{
>> +    struct kvmarm_stolen_time_region *reg;
>> +
>> +    reg = this_cpu_ptr(&stolen_time_region);
>> +    if (!reg->kaddr)
>> +        return 0;
>> +
>> +    memunmap(reg->kaddr);
>> +    memset(reg, 0, sizeof(*reg));
>> +
>> +    return 0;
>> +}
>> +
>> +static int stolen_time_dying_cpu(unsigned int cpu)
>> +{
>> +    return disable_stolen_time_current_cpu();
>> +}
>> +
>> +static int init_stolen_time_cpu(unsigned int cpu)
>> +{
>> +    struct kvmarm_stolen_time_region *reg;
>> +    struct arm_smccc_res res;
>> +
>> +    reg = this_cpu_ptr(&stolen_time_region);
>> +
>> +    arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_TIME_ST, &res);
>> +
>> +    if ((long)res.a0 < 0)
>> +        return -EINVAL;
>> +
>> +    reg->kaddr = memremap(res.a0,
>> +                  sizeof(struct pvclock_vcpu_stolen_time),
>> +                  MEMREMAP_WB);
> 
> cpuhp callbacks can be invoked in atomic context (see:
>     secondary_start_kernel ->
>     notify_cpu_starting ->
>     invoke callbacks),
> but memremap might sleep...
> 
> Try to run a DEBUG_ATOMIC_SLEEP enabled PV guest, I guess we will be
> greeted by the Sleep-in-Atomic-Context BUG.  We need an alternative
> here?

Actually I had run DEBUG_ATOMIC_SLEEP and not seen any issue. But I
think that's because of the way I've configured the region in my kvmtool
changes. I'm hitting the path where the memory region is in the linear
map of the kernel and so no actual remapping is needed and hence
memremap doesn't sleep (the shared structure is in a reserved region of
RAM).

But even changing the memory layout of the guest so the call goes into
ioremap_page_range() (which contains a might_sleep()) I'm not seeing any
problems.

Am I missing something? I have to admit I don't entirely follow the
early start up - perhaps it's a simple as DEBUG_ATOMIC_SLEEP doesn't
work this early in boot?

>> +
>> +    if (!reg->kaddr) {
>> +        pr_warn("Failed to map stolen time data structure\n");
>> +        return -ENOMEM;
>> +    }
>> +
>> +    if (le32_to_cpu(reg->kaddr->revision) != 0 ||
>> +        le32_to_cpu(reg->kaddr->attributes) != 0) {
>> +        pr_warn("Unexpected revision or attributes in stolen time
>> data\n");
>> +        return -ENXIO;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int kvm_arm_init_stolen_time(void)
>> +{
>> +    int ret;
>> +
>> +    ret = cpuhp_setup_state(CPUHP_AP_ARM_KVMPV_STARTING,
>> +                "hypervisor/kvmarm/pv:starting",
>> +                init_stolen_time_cpu, stolen_time_dying_cpu);
>> +    if (ret < 0)
>> +        return ret;
>> +    return 0;
>> +}
>> +
>> +static bool has_kvm_steal_clock(void)
>> +{
>> +    struct arm_smccc_res res;
>> +
>> +    /* To detect the presence of PV time support we require SMCCC
>> 1.1+ */
>> +    if (psci_ops.smccc_version < SMCCC_VERSION_1_1)
>> +        return false;
>> +
>> +    arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
>> +                 ARM_SMCCC_HV_PV_FEATURES, &res);
>> +
>> +    if (res.a0 != SMCCC_RET_SUCCESS)
>> +        return false;
>> +
>> +    arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_FEATURES,
>> +                 ARM_SMCCC_HV_PV_TIME_ST, &res);
>> +
>> +    if (res.a0 != SMCCC_RET_SUCCESS)
>> +        return false;
>> +
>> +    return true;
>> +}
>> +
>> +int __init kvm_guest_init(void)
>> +{
>> +    int ret = 0;
> 
> And this look like a redundant initialization?

Yes - that should go, thanks for spotting it.

Steve

> 
> 
> Thanks,
> zenghui
> 
>> +
>> +    if (!has_kvm_steal_clock())
>> +        return 0;
>> +
>> +    ret = kvm_arm_init_stolen_time();
>> +    if (ret)
>> +        return ret;
>> +
>> +    pv_ops.time.steal_clock = kvm_steal_clock;
>> +
>> +    static_key_slow_inc(&paravirt_steal_enabled);
>> +    if (steal_acc)
>> +        static_key_slow_inc(&paravirt_steal_rq_enabled);
>> +
>> +    pr_info("using stolen time PV\n");
>> +
>> +    return 0;
>> +}
>> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
>> index 0b2946414dc9..a52aea14c6ec 100644
>> --- a/arch/arm64/kernel/time.c
>> +++ b/arch/arm64/kernel/time.c
>> @@ -30,6 +30,7 @@
>>     #include <asm/thread_info.h>
>>   #include <asm/stacktrace.h>
>> +#include <asm/paravirt.h>
>>     unsigned long profile_pc(struct pt_regs *regs)
>>   {
>> @@ -65,4 +66,6 @@ void __init time_init(void)
>>         /* Calibrate the delay loop directly */
>>       lpj_fine = arch_timer_rate / HZ;
>> +
>> +    kvm_guest_init();
>>   }
>> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
>> index 068793a619ca..89d75edb5750 100644
>> --- a/include/linux/cpuhotplug.h
>> +++ b/include/linux/cpuhotplug.h
>> @@ -136,6 +136,7 @@ enum cpuhp_state {
>>       /* Must be the last timer callback */
>>       CPUHP_AP_DUMMY_TIMER_STARTING,
>>       CPUHP_AP_ARM_XEN_STARTING,
>> +    CPUHP_AP_ARM_KVMPV_STARTING,
>>       CPUHP_AP_ARM_CORESIGHT_STARTING,
>>       CPUHP_AP_ARM64_ISNDEP_STARTING,
>>       CPUHP_AP_SMPCFD_DYING,
>>
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 01/10] KVM: arm64: Document PV-time interface
  2019-08-21 15:36 ` [PATCH v3 01/10] KVM: arm64: Document PV-time interface Steven Price
@ 2019-08-27  8:44   ` Christoffer Dall
  2019-08-28 11:23     ` Steven Price
  2019-08-27  8:57   ` Christoffer Dall
  2019-08-29 17:15   ` Andrew Jones
  2 siblings, 1 reply; 36+ messages in thread
From: Christoffer Dall @ 2019-08-27  8:44 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kvm, Catalin Marinas, linux-doc, Russell King,
	Paolo Bonzini

On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
> Introduce a paravirtualization interface for KVM/arm64 based on the
> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
> 
> This only adds the details about "Stolen Time" as the details of "Live
> Physical Time" have not been fully agreed.
> 
> User space can specify a reserved area of memory for the guest and
> inform KVM to populate the memory with information on time that the host
> kernel has stolen from the guest.
> 
> A hypercall interface is provided for the guest to interrogate the
> hypervisor's support for this interface and the location of the shared
> memory structures.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
>  1 file changed, 100 insertions(+)
>  create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
> 
> diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
> new file mode 100644
> index 000000000000..1ceb118694e7
> --- /dev/null
> +++ b/Documentation/virt/kvm/arm/pvtime.txt
> @@ -0,0 +1,100 @@
> +Paravirtualized time support for arm64
> +======================================
> +
> +Arm specification DEN0057/A defined a standard for paravirtualised time
> +support for AArch64 guests:
> +
> +https://developer.arm.com/docs/den0057/a
> +
> +KVM/arm64 implements the stolen time part of this specification by providing
> +some hypervisor service calls to support a paravirtualized guest obtaining a
> +view of the amount of time stolen from its execution.
> +
> +Two new SMCCC compatible hypercalls are defined:
> +
> +PV_FEATURES 0xC5000020
> +PV_TIME_ST  0xC5000022
> +
> +These are only available in the SMC64/HVC64 calling convention as
> +paravirtualized time is not available to 32 bit Arm guests. The existence of
> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
> +mechanism before calling it.
> +
> +PV_FEATURES
> +    Function ID:  (uint32)  : 0xC5000020
> +    PV_func_id:   (uint32)  : Either PV_TIME_LPT or PV_TIME_ST
> +    Return value: (int32)   : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
> +                              PV-time feature is supported by the hypervisor.
> +
> +PV_TIME_ST
> +    Function ID:  (uint32)  : 0xC5000022
> +    Return value: (int64)   : IPA of the stolen time data structure for this
> +                              (V)CPU. On failure:
> +                              NOT_SUPPORTED (-1)
> +
> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
> +with inner and outer write back caching attributes, in the inner shareable
> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
> +meaningfully filled by the hypervisor (see structure below).
> +
> +PV_TIME_ST returns the structure for the calling VCPU.
> +
> +Stolen Time
> +-----------
> +
> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
> +
> +  Field       | Byte Length | Byte Offset | Description
> +  ----------- | ----------- | ----------- | --------------------------
> +  Revision    |      4      |      0      | Must be 0 for version 0.1
> +  Attributes  |      4      |      4      | Must be 0
> +  Stolen time |      8      |      8      | Stolen time in unsigned
> +              |             |             | nanoseconds indicating how
> +              |             |             | much time this VCPU thread
> +              |             |             | was involuntarily not
> +              |             |             | running on a physical CPU.
> +
> +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
> +will be present within a reserved region of the normal memory given to the
> +guest. The guest should not attempt to write into this memory. There is a
> +structure per VCPU of the guest.
> +
> +User space interface
> +====================
> +
> +User space can request that KVM provide the paravirtualized time interface to
> +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
> +

I feel it would be more consistent to have the details of this in
Documentation/virt/kvm/devices/arm-pv-time.txt and refer to this
document from here.

> +    struct kvm_create_device pvtime_device = {
> +            .type = KVM_DEV_TYPE_ARM_PV_TIME,
> +            .attr = 0,
> +            .flags = 0,
> +    };
> +
> +    pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
> +
> +Creation of the device should be done after creating the vCPUs of the virtual
> +machine.
> +
> +The IPA of the structures must be given to KVM. This is the base address
> +of an array of stolen time structures (one for each VCPU). The base address
> +must be page aligned. The size must be at least 64 * number of VCPUs and be a
> +multiple of PAGE_SIZE.
> +
> +The memory for these structures should be added to the guest in the usual
> +manner (e.g. using KVM_SET_USER_MEMORY_REGION).
> +
> +For example:
> +
> +    struct kvm_dev_arm_st_region region = {
> +            .gpa = <IPA of guest base address>,
> +            .size = <size in bytes>
> +    };
> +
> +    struct kvm_device_attr st_base = {
> +            .group = KVM_DEV_ARM_PV_TIME_PADDR,
> +            .attr = KVM_DEV_ARM_PV_TIME_ST,
> +            .addr = (u64)&region
> +    };
> +
> +    ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, &st_base);
> -- 
> 2.20.1
> 

Thanks,

    Christoffer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 01/10] KVM: arm64: Document PV-time interface
  2019-08-21 15:36 ` [PATCH v3 01/10] KVM: arm64: Document PV-time interface Steven Price
  2019-08-27  8:44   ` Christoffer Dall
@ 2019-08-27  8:57   ` Christoffer Dall
  2019-08-28 12:09     ` Steven Price
  2019-08-28 13:49     ` Christoffer Dall
  2019-08-29 17:15   ` Andrew Jones
  2 siblings, 2 replies; 36+ messages in thread
From: Christoffer Dall @ 2019-08-27  8:57 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kvm, Catalin Marinas, linux-doc, Russell King,
	Paolo Bonzini

On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
> Introduce a paravirtualization interface for KVM/arm64 based on the
> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
> 
> This only adds the details about "Stolen Time" as the details of "Live
> Physical Time" have not been fully agreed.
> 
> User space can specify a reserved area of memory for the guest and
> inform KVM to populate the memory with information on time that the host
> kernel has stolen from the guest.
> 
> A hypercall interface is provided for the guest to interrogate the
> hypervisor's support for this interface and the location of the shared
> memory structures.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
>  1 file changed, 100 insertions(+)
>  create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
> 
> diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
> new file mode 100644
> index 000000000000..1ceb118694e7
> --- /dev/null
> +++ b/Documentation/virt/kvm/arm/pvtime.txt
> @@ -0,0 +1,100 @@
> +Paravirtualized time support for arm64
> +======================================
> +
> +Arm specification DEN0057/A defined a standard for paravirtualised time
> +support for AArch64 guests:
> +
> +https://developer.arm.com/docs/den0057/a
> +
> +KVM/arm64 implements the stolen time part of this specification by providing
> +some hypervisor service calls to support a paravirtualized guest obtaining a
> +view of the amount of time stolen from its execution.
> +
> +Two new SMCCC compatible hypercalls are defined:
> +
> +PV_FEATURES 0xC5000020
> +PV_TIME_ST  0xC5000022
> +
> +These are only available in the SMC64/HVC64 calling convention as
> +paravirtualized time is not available to 32 bit Arm guests. The existence of
> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
> +mechanism before calling it.
> +
> +PV_FEATURES
> +    Function ID:  (uint32)  : 0xC5000020
> +    PV_func_id:   (uint32)  : Either PV_TIME_LPT or PV_TIME_ST
> +    Return value: (int32)   : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
> +                              PV-time feature is supported by the hypervisor.
> +
> +PV_TIME_ST
> +    Function ID:  (uint32)  : 0xC5000022
> +    Return value: (int64)   : IPA of the stolen time data structure for this
> +                              (V)CPU. On failure:
> +                              NOT_SUPPORTED (-1)
> +
> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
> +with inner and outer write back caching attributes, in the inner shareable
> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
> +meaningfully filled by the hypervisor (see structure below).
> +
> +PV_TIME_ST returns the structure for the calling VCPU.
> +
> +Stolen Time
> +-----------
> +
> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
> +
> +  Field       | Byte Length | Byte Offset | Description
> +  ----------- | ----------- | ----------- | --------------------------
> +  Revision    |      4      |      0      | Must be 0 for version 0.1
> +  Attributes  |      4      |      4      | Must be 0
> +  Stolen time |      8      |      8      | Stolen time in unsigned
> +              |             |             | nanoseconds indicating how
> +              |             |             | much time this VCPU thread
> +              |             |             | was involuntarily not
> +              |             |             | running on a physical CPU.
> +
> +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
> +will be present within a reserved region of the normal memory given to the
> +guest. The guest should not attempt to write into this memory. There is a
> +structure per VCPU of the guest.
> +
> +User space interface
> +====================
> +
> +User space can request that KVM provide the paravirtualized time interface to
> +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
> +
> +    struct kvm_create_device pvtime_device = {
> +            .type = KVM_DEV_TYPE_ARM_PV_TIME,
> +            .attr = 0,
> +            .flags = 0,
> +    };
> +
> +    pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
> +
> +Creation of the device should be done after creating the vCPUs of the virtual
> +machine.
> +
> +The IPA of the structures must be given to KVM. This is the base address
> +of an array of stolen time structures (one for each VCPU). The base address
> +must be page aligned. The size must be at least 64 * number of VCPUs and be a
> +multiple of PAGE_SIZE.
> +
> +The memory for these structures should be added to the guest in the usual
> +manner (e.g. using KVM_SET_USER_MEMORY_REGION).
> +
> +For example:
> +
> +    struct kvm_dev_arm_st_region region = {
> +            .gpa = <IPA of guest base address>,
> +            .size = <size in bytes>
> +    };

This feel fragile; how are you handling userspace creating VCPUs after
setting this up, the GPA overlapping guest memory, etc.  Is the
philosophy here that the VMM can mess up the VM if it wants, but that
this should never lead attacks on the host (we better hope not) and so
we don't care?

It seems to me setting the IPA per vcpu throught the VCPU device would
avoid a lot of these issues.  See
Documentation/virt/kvm/devices/vcpu.txt.


Thanks,

    Christoffer

> +
> +    struct kvm_device_attr st_base = {
> +            .group = KVM_DEV_ARM_PV_TIME_PADDR,
> +            .attr = KVM_DEV_ARM_PV_TIME_ST,
> +            .addr = (u64)&region
> +    };
> +
> +    ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, &st_base);
> -- 
> 2.20.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 10/10] arm64: Retrieve stolen time as paravirtualized guest
  2019-08-23 14:22     ` Steven Price
@ 2019-08-27 12:43       ` Zenghui Yu
  0 siblings, 0 replies; 36+ messages in thread
From: Zenghui Yu @ 2019-08-27 12:43 UTC (permalink / raw)
  To: Steven Price, Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm
  Cc: kvm, linux-doc, Catalin Marinas, Russell King, linux-kernel,
	Paolo Bonzini

On 2019/8/23 22:22, Steven Price wrote:
> On 23/08/2019 12:45, Zenghui Yu wrote:
>> Hi Steven,
>>
>> On 2019/8/21 23:36, Steven Price wrote:
>>> Enable paravirtualization features when running under a hypervisor
>>> supporting the PV_TIME_ST hypercall.
>>>
>>> For each (v)CPU, we ask the hypervisor for the location of a shared
>>> page which the hypervisor will use to report stolen time to us. We set
>>> pv_time_ops to the stolen time function which simply reads the stolen
>>> value from the shared page for a VCPU. We guarantee single-copy
>>> atomicity using READ_ONCE which means we can also read the stolen
>>> time for another VCPU than the currently running one while it is
>>> potentially being updated by the hypervisor.
>>>
>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>> ---
>>>    arch/arm64/include/asm/paravirt.h |   9 +-
>>>    arch/arm64/kernel/paravirt.c      | 148 ++++++++++++++++++++++++++++++
>>>    arch/arm64/kernel/time.c          |   3 +
>>>    include/linux/cpuhotplug.h        |   1 +
>>>    4 files changed, 160 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/include/asm/paravirt.h
>>> b/arch/arm64/include/asm/paravirt.h
>>> index 799d9dd6f7cc..125c26c42902 100644
>>> --- a/arch/arm64/include/asm/paravirt.h
>>> +++ b/arch/arm64/include/asm/paravirt.h
>>> @@ -21,6 +21,13 @@ static inline u64 paravirt_steal_clock(int cpu)
>>>    {
>>>        return pv_ops.time.steal_clock(cpu);
>>>    }
>>> -#endif
>>> +
>>> +int __init kvm_guest_init(void);
>>> +
>>> +#else
>>> +
>>> +#define kvm_guest_init()
>>> +
>>> +#endif // CONFIG_PARAVIRT
>>>      #endif
>>> diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
>>> index 4cfed91fe256..ea8dbbbd3293 100644
>>> --- a/arch/arm64/kernel/paravirt.c
>>> +++ b/arch/arm64/kernel/paravirt.c
>>> @@ -6,13 +6,161 @@
>>>     * Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>>     */
>>>    +#define pr_fmt(fmt) "kvmarm-pv: " fmt
>>> +
>>> +#include <linux/arm-smccc.h>
>>> +#include <linux/cpuhotplug.h>
>>>    #include <linux/export.h>
>>> +#include <linux/io.h>
>>>    #include <linux/jump_label.h>
>>> +#include <linux/printk.h>
>>> +#include <linux/psci.h>
>>> +#include <linux/reboot.h>
>>> +#include <linux/slab.h>
>>>    #include <linux/types.h>
>>> +
>>>    #include <asm/paravirt.h>
>>> +#include <asm/pvclock-abi.h>
>>> +#include <asm/smp_plat.h>
>>>      struct static_key paravirt_steal_enabled;
>>>    struct static_key paravirt_steal_rq_enabled;
>>>      struct paravirt_patch_template pv_ops;
>>>    EXPORT_SYMBOL_GPL(pv_ops);
>>> +
>>> +struct kvmarm_stolen_time_region {
>>> +    struct pvclock_vcpu_stolen_time *kaddr;
>>> +};
>>> +
>>> +static DEFINE_PER_CPU(struct kvmarm_stolen_time_region,
>>> stolen_time_region);
>>> +
>>> +static bool steal_acc = true;
>>> +static int __init parse_no_stealacc(char *arg)
>>> +{
>>> +    steal_acc = false;
>>> +    return 0;
>>> +}
>>> +
>>> +early_param("no-steal-acc", parse_no_stealacc);
>>> +
>>> +/* return stolen time in ns by asking the hypervisor */
>>> +static u64 kvm_steal_clock(int cpu)
>>> +{
>>> +    struct kvmarm_stolen_time_region *reg;
>>> +
>>> +    reg = per_cpu_ptr(&stolen_time_region, cpu);
>>> +    if (!reg->kaddr) {
>>> +        pr_warn_once("stolen time enabled but not configured for cpu
>>> %d\n",
>>> +                 cpu);
>>> +        return 0;
>>> +    }
>>> +
>>> +    return le64_to_cpu(READ_ONCE(reg->kaddr->stolen_time));
>>> +}
>>> +
>>> +static int disable_stolen_time_current_cpu(void)
>>> +{
>>> +    struct kvmarm_stolen_time_region *reg;
>>> +
>>> +    reg = this_cpu_ptr(&stolen_time_region);
>>> +    if (!reg->kaddr)
>>> +        return 0;
>>> +
>>> +    memunmap(reg->kaddr);
>>> +    memset(reg, 0, sizeof(*reg));
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int stolen_time_dying_cpu(unsigned int cpu)
>>> +{
>>> +    return disable_stolen_time_current_cpu();
>>> +}
>>> +
>>> +static int init_stolen_time_cpu(unsigned int cpu)
>>> +{
>>> +    struct kvmarm_stolen_time_region *reg;
>>> +    struct arm_smccc_res res;
>>> +
>>> +    reg = this_cpu_ptr(&stolen_time_region);
>>> +
>>> +    arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_TIME_ST, &res);
>>> +
>>> +    if ((long)res.a0 < 0)
>>> +        return -EINVAL;
>>> +
>>> +    reg->kaddr = memremap(res.a0,
>>> +                  sizeof(struct pvclock_vcpu_stolen_time),
>>> +                  MEMREMAP_WB);
>>
>> cpuhp callbacks can be invoked in atomic context (see:
>>      secondary_start_kernel ->
>>      notify_cpu_starting ->
>>      invoke callbacks),
>> but memremap might sleep...
>>
>> Try to run a DEBUG_ATOMIC_SLEEP enabled PV guest, I guess we will be
>> greeted by the Sleep-in-Atomic-Context BUG.  We need an alternative
>> here?
> 
> Actually I had run DEBUG_ATOMIC_SLEEP and not seen any issue. But I
> think that's because of the way I've configured the region in my kvmtool
> changes. I'm hitting the path where the memory region is in the linear
> map of the kernel and so no actual remapping is needed and hence
> memremap doesn't sleep (the shared structure is in a reserved region of
> RAM).
> 
> But even changing the memory layout of the guest so the call goes into
> ioremap_page_range() (which contains a might_sleep()) I'm not seeing any
> problems.

Emm, I hit this SAC BUG when testing your V1 with the kvmtool changes
you've provided, but forgot to report it at that time.  I go back to V1
and get the following call trace:

[    0.120113] BUG: sleeping function called from invalid context at 
mm/slab.h:501
[    0.120118] in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/1
[    0.120122] no locks held by swapper/1/0.
[    0.120126] irq event stamp: 0
[    0.120135] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.120145] hardirqs last disabled at (0): [<ffff200010113b40>] 
copy_process+0x870/0x2878
[    0.120152] softirqs last  enabled at (0): [<ffff200010113b40>] 
copy_process+0x870/0x2878
[    0.120157] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.120164] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.3.0-rc6+ #2
[    0.120168] Hardware name: linux,dummy-virt (DT)
[    0.120173] Call trace:
[    0.120179]  dump_backtrace+0x0/0x250
[    0.120184]  show_stack+0x24/0x30
[    0.120192]  dump_stack+0x120/0x174
[    0.120198]  ___might_sleep+0x1b0/0x280
[    0.120203]  __might_sleep+0x80/0xf0
[    0.120209]  kmem_cache_alloc_node_trace+0x30c/0x3c8
[    0.120216]  __get_vm_area_node+0x9c/0x208
[    0.120221]  get_vm_area_caller+0x58/0x68
[    0.120227]  __ioremap_caller+0x78/0x120
[    0.120233]  ioremap_cache+0xf0/0x1a8
[    0.120240]  memremap+0x154/0x3b8
[    0.120245]  init_stolen_time_cpu+0x94/0x150
[    0.120251]  cpuhp_invoke_callback+0x12c/0x12f8
[    0.120257]  notify_cpu_starting+0xa0/0xc0
[    0.120263]  secondary_start_kernel+0x1d4/0x328

But things may have changed because we're talking about V3 now...
I will dig it a bit deeper.

> Am I missing something? I have to admit I don't entirely follow the
> early start up - perhaps it's a simple as DEBUG_ATOMIC_SLEEP doesn't
> work this early in boot?

I think it should work.


Thanks,
zenghui


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 01/10] KVM: arm64: Document PV-time interface
  2019-08-27  8:44   ` Christoffer Dall
@ 2019-08-28 11:23     ` Steven Price
  0 siblings, 0 replies; 36+ messages in thread
From: Steven Price @ 2019-08-28 11:23 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kvm, Catalin Marinas, linux-doc, Russell King,
	Paolo Bonzini

On 27/08/2019 09:44, Christoffer Dall wrote:
> On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
>> Introduce a paravirtualization interface for KVM/arm64 based on the
>> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
>>
>> This only adds the details about "Stolen Time" as the details of "Live
>> Physical Time" have not been fully agreed.
>>
>> User space can specify a reserved area of memory for the guest and
>> inform KVM to populate the memory with information on time that the host
>> kernel has stolen from the guest.
>>
>> A hypercall interface is provided for the guest to interrogate the
>> hypervisor's support for this interface and the location of the shared
>> memory structures.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>  Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
>>  1 file changed, 100 insertions(+)
>>  create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
>>
>> diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
>> new file mode 100644
>> index 000000000000..1ceb118694e7
>> --- /dev/null
>> +++ b/Documentation/virt/kvm/arm/pvtime.txt
>> @@ -0,0 +1,100 @@
>> +Paravirtualized time support for arm64
>> +======================================
>> +
>> +Arm specification DEN0057/A defined a standard for paravirtualised time
>> +support for AArch64 guests:
>> +
>> +https://developer.arm.com/docs/den0057/a
>> +
>> +KVM/arm64 implements the stolen time part of this specification by providing
>> +some hypervisor service calls to support a paravirtualized guest obtaining a
>> +view of the amount of time stolen from its execution.
>> +
>> +Two new SMCCC compatible hypercalls are defined:
>> +
>> +PV_FEATURES 0xC5000020
>> +PV_TIME_ST  0xC5000022
>> +
>> +These are only available in the SMC64/HVC64 calling convention as
>> +paravirtualized time is not available to 32 bit Arm guests. The existence of
>> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
>> +mechanism before calling it.
>> +
>> +PV_FEATURES
>> +    Function ID:  (uint32)  : 0xC5000020
>> +    PV_func_id:   (uint32)  : Either PV_TIME_LPT or PV_TIME_ST
>> +    Return value: (int32)   : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
>> +                              PV-time feature is supported by the hypervisor.
>> +
>> +PV_TIME_ST
>> +    Function ID:  (uint32)  : 0xC5000022
>> +    Return value: (int64)   : IPA of the stolen time data structure for this
>> +                              (V)CPU. On failure:
>> +                              NOT_SUPPORTED (-1)
>> +
>> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
>> +with inner and outer write back caching attributes, in the inner shareable
>> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
>> +meaningfully filled by the hypervisor (see structure below).
>> +
>> +PV_TIME_ST returns the structure for the calling VCPU.
>> +
>> +Stolen Time
>> +-----------
>> +
>> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
>> +
>> +  Field       | Byte Length | Byte Offset | Description
>> +  ----------- | ----------- | ----------- | --------------------------
>> +  Revision    |      4      |      0      | Must be 0 for version 0.1
>> +  Attributes  |      4      |      4      | Must be 0
>> +  Stolen time |      8      |      8      | Stolen time in unsigned
>> +              |             |             | nanoseconds indicating how
>> +              |             |             | much time this VCPU thread
>> +              |             |             | was involuntarily not
>> +              |             |             | running on a physical CPU.
>> +
>> +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
>> +will be present within a reserved region of the normal memory given to the
>> +guest. The guest should not attempt to write into this memory. There is a
>> +structure per VCPU of the guest.
>> +
>> +User space interface
>> +====================
>> +
>> +User space can request that KVM provide the paravirtualized time interface to
>> +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
>> +
> 
> I feel it would be more consistent to have the details of this in
> Documentation/virt/kvm/devices/arm-pv-time.txt and refer to this
> document from here.

Fair point - I'll move this lower part of the document and add a reference.

Thanks,

Steve

>> +    struct kvm_create_device pvtime_device = {
>> +            .type = KVM_DEV_TYPE_ARM_PV_TIME,
>> +            .attr = 0,
>> +            .flags = 0,
>> +    };
>> +
>> +    pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
>> +
>> +Creation of the device should be done after creating the vCPUs of the virtual
>> +machine.
>> +
>> +The IPA of the structures must be given to KVM. This is the base address
>> +of an array of stolen time structures (one for each VCPU). The base address
>> +must be page aligned. The size must be at least 64 * number of VCPUs and be a
>> +multiple of PAGE_SIZE.
>> +
>> +The memory for these structures should be added to the guest in the usual
>> +manner (e.g. using KVM_SET_USER_MEMORY_REGION).
>> +
>> +For example:
>> +
>> +    struct kvm_dev_arm_st_region region = {
>> +            .gpa = <IPA of guest base address>,
>> +            .size = <size in bytes>
>> +    };
>> +
>> +    struct kvm_device_attr st_base = {
>> +            .group = KVM_DEV_ARM_PV_TIME_PADDR,
>> +            .attr = KVM_DEV_ARM_PV_TIME_ST,
>> +            .addr = (u64)&region
>> +    };
>> +
>> +    ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, &st_base);
>> -- 
>> 2.20.1
>>
> 
> Thanks,
> 
>     Christoffer
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 01/10] KVM: arm64: Document PV-time interface
  2019-08-27  8:57   ` Christoffer Dall
@ 2019-08-28 12:09     ` Steven Price
  2019-08-30  9:22       ` Christoffer Dall
  2019-08-28 13:49     ` Christoffer Dall
  1 sibling, 1 reply; 36+ messages in thread
From: Steven Price @ 2019-08-28 12:09 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kvm, Catalin Marinas, linux-doc, Russell King,
	Paolo Bonzini

On 27/08/2019 09:57, Christoffer Dall wrote:
> On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
>> Introduce a paravirtualization interface for KVM/arm64 based on the
>> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
>>
>> This only adds the details about "Stolen Time" as the details of "Live
>> Physical Time" have not been fully agreed.
>>
>> User space can specify a reserved area of memory for the guest and
>> inform KVM to populate the memory with information on time that the host
>> kernel has stolen from the guest.
>>
>> A hypercall interface is provided for the guest to interrogate the
>> hypervisor's support for this interface and the location of the shared
>> memory structures.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>  Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
>>  1 file changed, 100 insertions(+)
>>  create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
>>
>> diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
>> new file mode 100644
>> index 000000000000..1ceb118694e7
>> --- /dev/null
>> +++ b/Documentation/virt/kvm/arm/pvtime.txt
>> @@ -0,0 +1,100 @@
>> +Paravirtualized time support for arm64
>> +======================================
>> +
>> +Arm specification DEN0057/A defined a standard for paravirtualised time
>> +support for AArch64 guests:
>> +
>> +https://developer.arm.com/docs/den0057/a
>> +
>> +KVM/arm64 implements the stolen time part of this specification by providing
>> +some hypervisor service calls to support a paravirtualized guest obtaining a
>> +view of the amount of time stolen from its execution.
>> +
>> +Two new SMCCC compatible hypercalls are defined:
>> +
>> +PV_FEATURES 0xC5000020
>> +PV_TIME_ST  0xC5000022
>> +
>> +These are only available in the SMC64/HVC64 calling convention as
>> +paravirtualized time is not available to 32 bit Arm guests. The existence of
>> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
>> +mechanism before calling it.
>> +
>> +PV_FEATURES
>> +    Function ID:  (uint32)  : 0xC5000020
>> +    PV_func_id:   (uint32)  : Either PV_TIME_LPT or PV_TIME_ST
>> +    Return value: (int32)   : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
>> +                              PV-time feature is supported by the hypervisor.
>> +
>> +PV_TIME_ST
>> +    Function ID:  (uint32)  : 0xC5000022
>> +    Return value: (int64)   : IPA of the stolen time data structure for this
>> +                              (V)CPU. On failure:
>> +                              NOT_SUPPORTED (-1)
>> +
>> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
>> +with inner and outer write back caching attributes, in the inner shareable
>> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
>> +meaningfully filled by the hypervisor (see structure below).
>> +
>> +PV_TIME_ST returns the structure for the calling VCPU.
>> +
>> +Stolen Time
>> +-----------
>> +
>> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
>> +
>> +  Field       | Byte Length | Byte Offset | Description
>> +  ----------- | ----------- | ----------- | --------------------------
>> +  Revision    |      4      |      0      | Must be 0 for version 0.1
>> +  Attributes  |      4      |      4      | Must be 0
>> +  Stolen time |      8      |      8      | Stolen time in unsigned
>> +              |             |             | nanoseconds indicating how
>> +              |             |             | much time this VCPU thread
>> +              |             |             | was involuntarily not
>> +              |             |             | running on a physical CPU.
>> +
>> +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
>> +will be present within a reserved region of the normal memory given to the
>> +guest. The guest should not attempt to write into this memory. There is a
>> +structure per VCPU of the guest.
>> +
>> +User space interface
>> +====================
>> +
>> +User space can request that KVM provide the paravirtualized time interface to
>> +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
>> +
>> +    struct kvm_create_device pvtime_device = {
>> +            .type = KVM_DEV_TYPE_ARM_PV_TIME,
>> +            .attr = 0,
>> +            .flags = 0,
>> +    };
>> +
>> +    pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
>> +
>> +Creation of the device should be done after creating the vCPUs of the virtual
>> +machine.
>> +
>> +The IPA of the structures must be given to KVM. This is the base address
>> +of an array of stolen time structures (one for each VCPU). The base address
>> +must be page aligned. The size must be at least 64 * number of VCPUs and be a
>> +multiple of PAGE_SIZE.
>> +
>> +The memory for these structures should be added to the guest in the usual
>> +manner (e.g. using KVM_SET_USER_MEMORY_REGION).
>> +
>> +For example:
>> +
>> +    struct kvm_dev_arm_st_region region = {
>> +            .gpa = <IPA of guest base address>,
>> +            .size = <size in bytes>
>> +    };
> 
> This feel fragile; how are you handling userspace creating VCPUs after
> setting this up,

In this case as long as the structures all fit within the region created
VCPUs can be created/destroyed at will. If the VCPU index is too high
then the kernel will bail out in kvm_update_stolen_time() so the
structure will not be written. I consider this case as user space
messing up, so beyond protecting the host from the mess, user space gets
to keep the pieces.

> the GPA overlapping guest memory, etc.

Again, the (host) kernel is protected against this, but clearly this
will end badly for the guest.

> Is the
> philosophy here that the VMM can mess up the VM if it wants, but that
> this should never lead attacks on the host (we better hope not) and so
> we don't care?

Yes. For things like GPA overlapping guest memory it's not really the
host's position to work out what is "guest memory". It's quite possible
that user space could decide to place the stolen time structures right
in the middle of guest memory - it's just up to user space to inform the
guest what memory is usable. Obviously the expectation is that the
shared structures would be positioned "out of the way" in GPA space in
any normal arrangement.

> It seems to me setting the IPA per vcpu throught the VCPU device would
> avoid a lot of these issues.  See
> Documentation/virt/kvm/devices/vcpu.txt.

That is certainly a possibility, I'm not really sure what the benefit is
though? It would still lead to corner cases:

 * What if only some VCPUs had stolen time setup on them?
 * What if multiple VCPUs pointed to the same location?
 * The structures can still overlap with guest memory

It's also more work to setup in user space with the only "benefit" being
that user space could choose to organise the structures however it sees
fit (e.g. no need for them to be contiguous in memory). But I'm not sure
I see a use case for that flexibility.

Perhaps there's some benefit I'm not seeing?

Steve

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 01/10] KVM: arm64: Document PV-time interface
  2019-08-27  8:57   ` Christoffer Dall
  2019-08-28 12:09     ` Steven Price
@ 2019-08-28 13:49     ` Christoffer Dall
  2019-08-29 15:21       ` Steven Price
  1 sibling, 1 reply; 36+ messages in thread
From: Christoffer Dall @ 2019-08-28 13:49 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, linux-doc, Marc Zyngier, linux-kernel, Russell King,
	Catalin Marinas, Paolo Bonzini, Will Deacon, kvmarm,
	linux-arm-kernel

On Tue, Aug 27, 2019 at 10:57:06AM +0200, Christoffer Dall wrote:
> On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
> > Introduce a paravirtualization interface for KVM/arm64 based on the
> > "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
> > 
> > This only adds the details about "Stolen Time" as the details of "Live
> > Physical Time" have not been fully agreed.
> > 
> > User space can specify a reserved area of memory for the guest and
> > inform KVM to populate the memory with information on time that the host
> > kernel has stolen from the guest.
> > 
> > A hypercall interface is provided for the guest to interrogate the
> > hypervisor's support for this interface and the location of the shared
> > memory structures.
> > 
> > Signed-off-by: Steven Price <steven.price@arm.com>
> > ---
> >  Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
> >  1 file changed, 100 insertions(+)
> >  create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
> > 
> > diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
> > new file mode 100644
> > index 000000000000..1ceb118694e7
> > --- /dev/null
> > +++ b/Documentation/virt/kvm/arm/pvtime.txt
> > @@ -0,0 +1,100 @@
> > +Paravirtualized time support for arm64
> > +======================================
> > +
> > +Arm specification DEN0057/A defined a standard for paravirtualised time
> > +support for AArch64 guests:
> > +
> > +https://developer.arm.com/docs/den0057/a
> > +
> > +KVM/arm64 implements the stolen time part of this specification by providing
> > +some hypervisor service calls to support a paravirtualized guest obtaining a
> > +view of the amount of time stolen from its execution.
> > +
> > +Two new SMCCC compatible hypercalls are defined:
> > +
> > +PV_FEATURES 0xC5000020
> > +PV_TIME_ST  0xC5000022
> > +
> > +These are only available in the SMC64/HVC64 calling convention as
> > +paravirtualized time is not available to 32 bit Arm guests. The existence of
> > +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
> > +mechanism before calling it.
> > +
> > +PV_FEATURES
> > +    Function ID:  (uint32)  : 0xC5000020
> > +    PV_func_id:   (uint32)  : Either PV_TIME_LPT or PV_TIME_ST
> > +    Return value: (int32)   : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
> > +                              PV-time feature is supported by the hypervisor.
> > +
> > +PV_TIME_ST
> > +    Function ID:  (uint32)  : 0xC5000022
> > +    Return value: (int64)   : IPA of the stolen time data structure for this
> > +                              (V)CPU. On failure:
> > +                              NOT_SUPPORTED (-1)
> > +
> > +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
> > +with inner and outer write back caching attributes, in the inner shareable
> > +domain. A total of 16 bytes from the IPA returned are guaranteed to be
> > +meaningfully filled by the hypervisor (see structure below).
> > +
> > +PV_TIME_ST returns the structure for the calling VCPU.
> > +
> > +Stolen Time
> > +-----------
> > +
> > +The structure pointed to by the PV_TIME_ST hypercall is as follows:
> > +
> > +  Field       | Byte Length | Byte Offset | Description
> > +  ----------- | ----------- | ----------- | --------------------------
> > +  Revision    |      4      |      0      | Must be 0 for version 0.1
> > +  Attributes  |      4      |      4      | Must be 0
> > +  Stolen time |      8      |      8      | Stolen time in unsigned
> > +              |             |             | nanoseconds indicating how
> > +              |             |             | much time this VCPU thread
> > +              |             |             | was involuntarily not
> > +              |             |             | running on a physical CPU.
> > +
> > +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
> > +will be present within a reserved region of the normal memory given to the
> > +guest. The guest should not attempt to write into this memory. There is a
> > +structure per VCPU of the guest.
> > +
> > +User space interface
> > +====================
> > +
> > +User space can request that KVM provide the paravirtualized time interface to
> > +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
> > +
> > +    struct kvm_create_device pvtime_device = {
> > +            .type = KVM_DEV_TYPE_ARM_PV_TIME,
> > +            .attr = 0,
> > +            .flags = 0,
> > +    };
> > +
> > +    pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
> > +
> > +Creation of the device should be done after creating the vCPUs of the virtual
> > +machine.
> > +
> > +The IPA of the structures must be given to KVM. This is the base address
> > +of an array of stolen time structures (one for each VCPU). The base address
> > +must be page aligned. The size must be at least 64 * number of VCPUs and be a
> > +multiple of PAGE_SIZE.
> > +
> > +The memory for these structures should be added to the guest in the usual
> > +manner (e.g. using KVM_SET_USER_MEMORY_REGION).
> > +
> > +For example:
> > +
> > +    struct kvm_dev_arm_st_region region = {
> > +            .gpa = <IPA of guest base address>,
> > +            .size = <size in bytes>
> > +    };
> 
> This feel fragile; how are you handling userspace creating VCPUs after
> setting this up, the GPA overlapping guest memory, etc.  Is the
> philosophy here that the VMM can mess up the VM if it wants, but that
> this should never lead attacks on the host (we better hope not) and so
> we don't care?
> 
> It seems to me setting the IPA per vcpu throught the VCPU device would
> avoid a lot of these issues.  See
> Documentation/virt/kvm/devices/vcpu.txt.
> 
> 
I discussed this with Marc the other day, and we realized that if we
make the configuration of the IPA per-PE, then a VMM can construct a VM
where these data structures are distributed within the IPA space of a
VM, which could lead to a lower TLB pressure for some
configurations/workloads.

Thanks,

    Christoffer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 01/10] KVM: arm64: Document PV-time interface
  2019-08-28 13:49     ` Christoffer Dall
@ 2019-08-29 15:21       ` Steven Price
  0 siblings, 0 replies; 36+ messages in thread
From: Steven Price @ 2019-08-29 15:21 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-doc, Marc Zyngier, linux-kernel, Russell King,
	Catalin Marinas, Paolo Bonzini, Will Deacon, kvmarm,
	linux-arm-kernel

On 28/08/2019 14:49, Christoffer Dall wrote:
> On Tue, Aug 27, 2019 at 10:57:06AM +0200, Christoffer Dall wrote:
>> On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
>>> Introduce a paravirtualization interface for KVM/arm64 based on the
>>> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
>>>
>>> This only adds the details about "Stolen Time" as the details of "Live
>>> Physical Time" have not been fully agreed.
>>>
>>> User space can specify a reserved area of memory for the guest and
>>> inform KVM to populate the memory with information on time that the host
>>> kernel has stolen from the guest.
>>>
>>> A hypercall interface is provided for the guest to interrogate the
>>> hypervisor's support for this interface and the location of the shared
>>> memory structures.
>>>
>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>> ---
>>>  Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
>>>  1 file changed, 100 insertions(+)
>>>  create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
>>>
>>> diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
>>> new file mode 100644
>>> index 000000000000..1ceb118694e7
>>> --- /dev/null
>>> +++ b/Documentation/virt/kvm/arm/pvtime.txt
>>> @@ -0,0 +1,100 @@
>>> +Paravirtualized time support for arm64
>>> +======================================
>>> +
>>> +Arm specification DEN0057/A defined a standard for paravirtualised time
>>> +support for AArch64 guests:
>>> +
>>> +https://developer.arm.com/docs/den0057/a
>>> +
>>> +KVM/arm64 implements the stolen time part of this specification by providing
>>> +some hypervisor service calls to support a paravirtualized guest obtaining a
>>> +view of the amount of time stolen from its execution.
>>> +
>>> +Two new SMCCC compatible hypercalls are defined:
>>> +
>>> +PV_FEATURES 0xC5000020
>>> +PV_TIME_ST  0xC5000022
>>> +
>>> +These are only available in the SMC64/HVC64 calling convention as
>>> +paravirtualized time is not available to 32 bit Arm guests. The existence of
>>> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
>>> +mechanism before calling it.
>>> +
>>> +PV_FEATURES
>>> +    Function ID:  (uint32)  : 0xC5000020
>>> +    PV_func_id:   (uint32)  : Either PV_TIME_LPT or PV_TIME_ST
>>> +    Return value: (int32)   : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
>>> +                              PV-time feature is supported by the hypervisor.
>>> +
>>> +PV_TIME_ST
>>> +    Function ID:  (uint32)  : 0xC5000022
>>> +    Return value: (int64)   : IPA of the stolen time data structure for this
>>> +                              (V)CPU. On failure:
>>> +                              NOT_SUPPORTED (-1)
>>> +
>>> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
>>> +with inner and outer write back caching attributes, in the inner shareable
>>> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
>>> +meaningfully filled by the hypervisor (see structure below).
>>> +
>>> +PV_TIME_ST returns the structure for the calling VCPU.
>>> +
>>> +Stolen Time
>>> +-----------
>>> +
>>> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
>>> +
>>> +  Field       | Byte Length | Byte Offset | Description
>>> +  ----------- | ----------- | ----------- | --------------------------
>>> +  Revision    |      4      |      0      | Must be 0 for version 0.1
>>> +  Attributes  |      4      |      4      | Must be 0
>>> +  Stolen time |      8      |      8      | Stolen time in unsigned
>>> +              |             |             | nanoseconds indicating how
>>> +              |             |             | much time this VCPU thread
>>> +              |             |             | was involuntarily not
>>> +              |             |             | running on a physical CPU.
>>> +
>>> +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
>>> +will be present within a reserved region of the normal memory given to the
>>> +guest. The guest should not attempt to write into this memory. There is a
>>> +structure per VCPU of the guest.
>>> +
>>> +User space interface
>>> +====================
>>> +
>>> +User space can request that KVM provide the paravirtualized time interface to
>>> +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
>>> +
>>> +    struct kvm_create_device pvtime_device = {
>>> +            .type = KVM_DEV_TYPE_ARM_PV_TIME,
>>> +            .attr = 0,
>>> +            .flags = 0,
>>> +    };
>>> +
>>> +    pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
>>> +
>>> +Creation of the device should be done after creating the vCPUs of the virtual
>>> +machine.
>>> +
>>> +The IPA of the structures must be given to KVM. This is the base address
>>> +of an array of stolen time structures (one for each VCPU). The base address
>>> +must be page aligned. The size must be at least 64 * number of VCPUs and be a
>>> +multiple of PAGE_SIZE.
>>> +
>>> +The memory for these structures should be added to the guest in the usual
>>> +manner (e.g. using KVM_SET_USER_MEMORY_REGION).
>>> +
>>> +For example:
>>> +
>>> +    struct kvm_dev_arm_st_region region = {
>>> +            .gpa = <IPA of guest base address>,
>>> +            .size = <size in bytes>
>>> +    };
>>
>> This feel fragile; how are you handling userspace creating VCPUs after
>> setting this up, the GPA overlapping guest memory, etc.  Is the
>> philosophy here that the VMM can mess up the VM if it wants, but that
>> this should never lead attacks on the host (we better hope not) and so
>> we don't care?
>>
>> It seems to me setting the IPA per vcpu throught the VCPU device would
>> avoid a lot of these issues.  See
>> Documentation/virt/kvm/devices/vcpu.txt.
>>
>>
> I discussed this with Marc the other day, and we realized that if we
> make the configuration of the IPA per-PE, then a VMM can construct a VM
> where these data structures are distributed within the IPA space of a
> VM, which could lead to a lower TLB pressure for some
> configurations/workloads.

Ok, I'm dubious it will make much difference in terms of TLB pressure,
but I've done the refactoring and I think it actually simplifies the
code. So I'll post a new version where the base address is set via the
VCPU device.

Thanks for the review,

Steve

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 01/10] KVM: arm64: Document PV-time interface
  2019-08-21 15:36 ` [PATCH v3 01/10] KVM: arm64: Document PV-time interface Steven Price
  2019-08-27  8:44   ` Christoffer Dall
  2019-08-27  8:57   ` Christoffer Dall
@ 2019-08-29 17:15   ` Andrew Jones
  2019-08-30  8:35     ` Steven Price
  2 siblings, 1 reply; 36+ messages in thread
From: Andrew Jones @ 2019-08-29 17:15 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kvm, Catalin Marinas, linux-doc, Russell King,
	Paolo Bonzini

On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
> Introduce a paravirtualization interface for KVM/arm64 based on the
> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
> 
> This only adds the details about "Stolen Time" as the details of "Live
> Physical Time" have not been fully agreed.
> 
> User space can specify a reserved area of memory for the guest and
> inform KVM to populate the memory with information on time that the host
> kernel has stolen from the guest.
> 
> A hypercall interface is provided for the guest to interrogate the
> hypervisor's support for this interface and the location of the shared
> memory structures.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
>  1 file changed, 100 insertions(+)
>  create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
> 
> diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
> new file mode 100644
> index 000000000000..1ceb118694e7
> --- /dev/null
> +++ b/Documentation/virt/kvm/arm/pvtime.txt
> @@ -0,0 +1,100 @@
> +Paravirtualized time support for arm64
> +======================================
> +
> +Arm specification DEN0057/A defined a standard for paravirtualised time
> +support for AArch64 guests:
> +
> +https://developer.arm.com/docs/den0057/a
> +
> +KVM/arm64 implements the stolen time part of this specification by providing
> +some hypervisor service calls to support a paravirtualized guest obtaining a
> +view of the amount of time stolen from its execution.
> +
> +Two new SMCCC compatible hypercalls are defined:
> +
> +PV_FEATURES 0xC5000020
> +PV_TIME_ST  0xC5000022
> +
> +These are only available in the SMC64/HVC64 calling convention as
> +paravirtualized time is not available to 32 bit Arm guests. The existence of
> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
> +mechanism before calling it.
> +
> +PV_FEATURES
> +    Function ID:  (uint32)  : 0xC5000020
> +    PV_func_id:   (uint32)  : Either PV_TIME_LPT or PV_TIME_ST
> +    Return value: (int32)   : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
> +                              PV-time feature is supported by the hypervisor.
> +
> +PV_TIME_ST
> +    Function ID:  (uint32)  : 0xC5000022
> +    Return value: (int64)   : IPA of the stolen time data structure for this
> +                              (V)CPU. On failure:

Why the () around the V in VCPU?

> +                              NOT_SUPPORTED (-1)
> +
> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
> +with inner and outer write back caching attributes, in the inner shareable
> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
> +meaningfully filled by the hypervisor (see structure below).
> +
> +PV_TIME_ST returns the structure for the calling VCPU.

The above sentence seems redundant here.

> +
> +Stolen Time
> +-----------
> +
> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
> +
> +  Field       | Byte Length | Byte Offset | Description
> +  ----------- | ----------- | ----------- | --------------------------
> +  Revision    |      4      |      0      | Must be 0 for version 0.1
> +  Attributes  |      4      |      4      | Must be 0
> +  Stolen time |      8      |      8      | Stolen time in unsigned
> +              |             |             | nanoseconds indicating how
> +              |             |             | much time this VCPU thread
> +              |             |             | was involuntarily not
> +              |             |             | running on a physical CPU.
> +
> +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
> +will be present within a reserved region of the normal memory given to the
> +guest. The guest should not attempt to write into this memory. There is a
> +structure per VCPU of the guest.
> +
> +User space interface
> +====================
> +
> +User space can request that KVM provide the paravirtualized time interface to
> +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
> +
> +    struct kvm_create_device pvtime_device = {
> +            .type = KVM_DEV_TYPE_ARM_PV_TIME,
> +            .attr = 0,
> +            .flags = 0,
> +    };
> +
> +    pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);

The ioctl doesn't return the fd. If the ioctl returns zero the fd will be
in pvtime_device.fd.

> +
> +Creation of the device should be done after creating the vCPUs of the virtual
> +machine.

Or else what? Will an error be reported in that case?

> +
> +The IPA of the structures must be given to KVM. This is the base address
> +of an array of stolen time structures (one for each VCPU). The base address
> +must be page aligned. The size must be at least 64 * number of VCPUs and be a
> +multiple of PAGE_SIZE.
> +
> +The memory for these structures should be added to the guest in the usual
> +manner (e.g. using KVM_SET_USER_MEMORY_REGION).

Above it says the guest shouldn't attempt to write the memory. Should
KVM_MEM_READONLY be used with KVM_SET_USER_MEMORY_REGION for it?

> +
> +For example:
> +
> +    struct kvm_dev_arm_st_region region = {
> +            .gpa = <IPA of guest base address>,
> +            .size = <size in bytes>
> +    };
> +
> +    struct kvm_device_attr st_base = {
> +            .group = KVM_DEV_ARM_PV_TIME_PADDR,

This is KVM_DEV_ARM_PV_TIME_REGION in the code.

> +            .attr = KVM_DEV_ARM_PV_TIME_ST,
> +            .addr = (u64)&region
> +    };
> +
> +    ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, &st_base);
> -- 
> 2.20.1
>

Thanks,
drew 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 01/10] KVM: arm64: Document PV-time interface
  2019-08-29 17:15   ` Andrew Jones
@ 2019-08-30  8:35     ` Steven Price
  0 siblings, 0 replies; 36+ messages in thread
From: Steven Price @ 2019-08-30  8:35 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kvm, Catalin Marinas, linux-doc, Russell King,
	Paolo Bonzini

On 29/08/2019 18:15, Andrew Jones wrote:
> On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
>> Introduce a paravirtualization interface for KVM/arm64 based on the
>> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
>>
>> This only adds the details about "Stolen Time" as the details of "Live
>> Physical Time" have not been fully agreed.
>>
>> User space can specify a reserved area of memory for the guest and
>> inform KVM to populate the memory with information on time that the host
>> kernel has stolen from the guest.
>>
>> A hypercall interface is provided for the guest to interrogate the
>> hypervisor's support for this interface and the location of the shared
>> memory structures.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>  Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
>>  1 file changed, 100 insertions(+)
>>  create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
>>
>> diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
>> new file mode 100644
>> index 000000000000..1ceb118694e7
>> --- /dev/null
>> +++ b/Documentation/virt/kvm/arm/pvtime.txt
>> @@ -0,0 +1,100 @@
>> +Paravirtualized time support for arm64
>> +======================================
>> +
>> +Arm specification DEN0057/A defined a standard for paravirtualised time
>> +support for AArch64 guests:
>> +
>> +https://developer.arm.com/docs/den0057/a
>> +
>> +KVM/arm64 implements the stolen time part of this specification by providing
>> +some hypervisor service calls to support a paravirtualized guest obtaining a
>> +view of the amount of time stolen from its execution.
>> +
>> +Two new SMCCC compatible hypercalls are defined:
>> +
>> +PV_FEATURES 0xC5000020
>> +PV_TIME_ST  0xC5000022
>> +
>> +These are only available in the SMC64/HVC64 calling convention as
>> +paravirtualized time is not available to 32 bit Arm guests. The existence of
>> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
>> +mechanism before calling it.
>> +
>> +PV_FEATURES
>> +    Function ID:  (uint32)  : 0xC5000020
>> +    PV_func_id:   (uint32)  : Either PV_TIME_LPT or PV_TIME_ST
>> +    Return value: (int32)   : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
>> +                              PV-time feature is supported by the hypervisor.
>> +
>> +PV_TIME_ST
>> +    Function ID:  (uint32)  : 0xC5000022
>> +    Return value: (int64)   : IPA of the stolen time data structure for this
>> +                              (V)CPU. On failure:
> 
> Why the () around the V in VCPU?

There's nothing preventing the same mechanism being used without the
virtualisation of CPUs. For example a hypervisor like Jailhouse could
implement this interface even though there the CPU isn't being
virtualised but is being handed over to the guest. Equally it is
possible for firmware to provide the same mechanism (using the SMC64
calling convention).

Admittedly that's a little confusing here because the rest of this
document is talking about KVM's implementation and normal hypervisors.
So I'll drop the brackets.

>> +                              NOT_SUPPORTED (-1)
>> +
>> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
>> +with inner and outer write back caching attributes, in the inner shareable
>> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
>> +meaningfully filled by the hypervisor (see structure below).
>> +
>> +PV_TIME_ST returns the structure for the calling VCPU.
> 
> The above sentence seems redundant here.

It is an important detail that each VCPU must use PV_TIME_ST to fetch
the address of the structure for that VCPU. E.g. It could have been
implemented so that the hypercall took a VCPU number. So while redundant
I do feel it's worth pointing this out explicitly.

>> +
>> +Stolen Time
>> +-----------
>> +
>> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
>> +
>> +  Field       | Byte Length | Byte Offset | Description
>> +  ----------- | ----------- | ----------- | --------------------------
>> +  Revision    |      4      |      0      | Must be 0 for version 0.1
>> +  Attributes  |      4      |      4      | Must be 0
>> +  Stolen time |      8      |      8      | Stolen time in unsigned
>> +              |             |             | nanoseconds indicating how
>> +              |             |             | much time this VCPU thread
>> +              |             |             | was involuntarily not
>> +              |             |             | running on a physical CPU.
>> +
>> +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
>> +will be present within a reserved region of the normal memory given to the
>> +guest. The guest should not attempt to write into this memory. There is a
>> +structure per VCPU of the guest.
>> +
>> +User space interface
>> +====================
>> +
>> +User space can request that KVM provide the paravirtualized time interface to
>> +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
>> +
>> +    struct kvm_create_device pvtime_device = {
>> +            .type = KVM_DEV_TYPE_ARM_PV_TIME,
>> +            .attr = 0,
>> +            .flags = 0,
>> +    };
>> +
>> +    pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
> 
> The ioctl doesn't return the fd. If the ioctl returns zero the fd will be
> in pvtime_device.fd.

Good catch - I'm not sure quite why I wrote that. Anyway I've agreed to
change the interface to operate on the VCPU device instead so this text
is rewritten completely.

>> +
>> +Creation of the device should be done after creating the vCPUs of the virtual
>> +machine.
> 
> Or else what? Will an error be reported in that case?

This is now enforced by calling the ioctl on the VCPU device, so it's
impossible to do in the wrong order.

>> +
>> +The IPA of the structures must be given to KVM. This is the base address
>> +of an array of stolen time structures (one for each VCPU). The base address
>> +must be page aligned. The size must be at least 64 * number of VCPUs and be a
>> +multiple of PAGE_SIZE.
>> +
>> +The memory for these structures should be added to the guest in the usual
>> +manner (e.g. using KVM_SET_USER_MEMORY_REGION).
> 
> Above it says the guest shouldn't attempt to write the memory. Should
> KVM_MEM_READONLY be used with KVM_SET_USER_MEMORY_REGION for it?

That is optional - the specification states the guest must not attempt
to write to it - so marking it read-only for the guest should work fine
with a conforming guest. But it's not required.

>> +
>> +For example:
>> +
>> +    struct kvm_dev_arm_st_region region = {
>> +            .gpa = <IPA of guest base address>,
>> +            .size = <size in bytes>
>> +    };
>> +
>> +    struct kvm_device_attr st_base = {
>> +            .group = KVM_DEV_ARM_PV_TIME_PADDR,
> 
> This is KVM_DEV_ARM_PV_TIME_REGION in the code.

Gah - I obviously missed that when I refactored to define the region
rather than just the base address. Anyway this has all changed (again)
because each VCPU has its own base address so the size is no longer
necessary.

Thanks for the review,

Steve

>> +            .attr = KVM_DEV_ARM_PV_TIME_ST,
>> +            .addr = (u64)&region
>> +    };
>> +
>> +    ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, &st_base);
>> -- 
>> 2.20.1
>>
> 
> Thanks,
> drew 
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 01/10] KVM: arm64: Document PV-time interface
  2019-08-28 12:09     ` Steven Price
@ 2019-08-30  9:22       ` Christoffer Dall
  0 siblings, 0 replies; 36+ messages in thread
From: Christoffer Dall @ 2019-08-30  9:22 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kvm, Catalin Marinas, linux-doc, Russell King,
	Paolo Bonzini

On Wed, Aug 28, 2019 at 01:09:15PM +0100, Steven Price wrote:
> On 27/08/2019 09:57, Christoffer Dall wrote:
> > On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
> >> Introduce a paravirtualization interface for KVM/arm64 based on the
> >> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
> >>
> >> This only adds the details about "Stolen Time" as the details of "Live
> >> Physical Time" have not been fully agreed.
> >>
> >> User space can specify a reserved area of memory for the guest and
> >> inform KVM to populate the memory with information on time that the host
> >> kernel has stolen from the guest.
> >>
> >> A hypercall interface is provided for the guest to interrogate the
> >> hypervisor's support for this interface and the location of the shared
> >> memory structures.
> >>
> >> Signed-off-by: Steven Price <steven.price@arm.com>
> >> ---
> >>  Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
> >>  1 file changed, 100 insertions(+)
> >>  create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
> >>
> >> diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
> >> new file mode 100644
> >> index 000000000000..1ceb118694e7
> >> --- /dev/null
> >> +++ b/Documentation/virt/kvm/arm/pvtime.txt
> >> @@ -0,0 +1,100 @@
> >> +Paravirtualized time support for arm64
> >> +======================================
> >> +
> >> +Arm specification DEN0057/A defined a standard for paravirtualised time
> >> +support for AArch64 guests:
> >> +
> >> +https://developer.arm.com/docs/den0057/a
> >> +
> >> +KVM/arm64 implements the stolen time part of this specification by providing
> >> +some hypervisor service calls to support a paravirtualized guest obtaining a
> >> +view of the amount of time stolen from its execution.
> >> +
> >> +Two new SMCCC compatible hypercalls are defined:
> >> +
> >> +PV_FEATURES 0xC5000020
> >> +PV_TIME_ST  0xC5000022
> >> +
> >> +These are only available in the SMC64/HVC64 calling convention as
> >> +paravirtualized time is not available to 32 bit Arm guests. The existence of
> >> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
> >> +mechanism before calling it.
> >> +
> >> +PV_FEATURES
> >> +    Function ID:  (uint32)  : 0xC5000020
> >> +    PV_func_id:   (uint32)  : Either PV_TIME_LPT or PV_TIME_ST
> >> +    Return value: (int32)   : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
> >> +                              PV-time feature is supported by the hypervisor.
> >> +
> >> +PV_TIME_ST
> >> +    Function ID:  (uint32)  : 0xC5000022
> >> +    Return value: (int64)   : IPA of the stolen time data structure for this
> >> +                              (V)CPU. On failure:
> >> +                              NOT_SUPPORTED (-1)
> >> +
> >> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
> >> +with inner and outer write back caching attributes, in the inner shareable
> >> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
> >> +meaningfully filled by the hypervisor (see structure below).
> >> +
> >> +PV_TIME_ST returns the structure for the calling VCPU.
> >> +
> >> +Stolen Time
> >> +-----------
> >> +
> >> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
> >> +
> >> +  Field       | Byte Length | Byte Offset | Description
> >> +  ----------- | ----------- | ----------- | --------------------------
> >> +  Revision    |      4      |      0      | Must be 0 for version 0.1
> >> +  Attributes  |      4      |      4      | Must be 0
> >> +  Stolen time |      8      |      8      | Stolen time in unsigned
> >> +              |             |             | nanoseconds indicating how
> >> +              |             |             | much time this VCPU thread
> >> +              |             |             | was involuntarily not
> >> +              |             |             | running on a physical CPU.
> >> +
> >> +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
> >> +will be present within a reserved region of the normal memory given to the
> >> +guest. The guest should not attempt to write into this memory. There is a
> >> +structure per VCPU of the guest.
> >> +
> >> +User space interface
> >> +====================
> >> +
> >> +User space can request that KVM provide the paravirtualized time interface to
> >> +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
> >> +
> >> +    struct kvm_create_device pvtime_device = {
> >> +            .type = KVM_DEV_TYPE_ARM_PV_TIME,
> >> +            .attr = 0,
> >> +            .flags = 0,
> >> +    };
> >> +
> >> +    pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
> >> +
> >> +Creation of the device should be done after creating the vCPUs of the virtual
> >> +machine.
> >> +
> >> +The IPA of the structures must be given to KVM. This is the base address
> >> +of an array of stolen time structures (one for each VCPU). The base address
> >> +must be page aligned. The size must be at least 64 * number of VCPUs and be a
> >> +multiple of PAGE_SIZE.
> >> +
> >> +The memory for these structures should be added to the guest in the usual
> >> +manner (e.g. using KVM_SET_USER_MEMORY_REGION).
> >> +
> >> +For example:
> >> +
> >> +    struct kvm_dev_arm_st_region region = {
> >> +            .gpa = <IPA of guest base address>,
> >> +            .size = <size in bytes>
> >> +    };
> > 
> > This feel fragile; how are you handling userspace creating VCPUs after
> > setting this up,
> 
> In this case as long as the structures all fit within the region created
> VCPUs can be created/destroyed at will. If the VCPU index is too high
> then the kernel will bail out in kvm_update_stolen_time() so the
> structure will not be written. I consider this case as user space
> messing up, so beyond protecting the host from the mess, user space gets
> to keep the pieces.
> 
> > the GPA overlapping guest memory, etc.
> 
> Again, the (host) kernel is protected against this, but clearly this
> will end badly for the guest.
> 
> > Is the
> > philosophy here that the VMM can mess up the VM if it wants, but that
> > this should never lead attacks on the host (we better hope not) and so
> > we don't care?
> 
> Yes. For things like GPA overlapping guest memory it's not really the
> host's position to work out what is "guest memory". It's quite possible
> that user space could decide to place the stolen time structures right
> in the middle of guest memory - it's just up to user space to inform the
> guest what memory is usable. Obviously the expectation is that the
> shared structures would be positioned "out of the way" in GPA space in
> any normal arrangement.
> 
> > It seems to me setting the IPA per vcpu throught the VCPU device would
> > avoid a lot of these issues.  See
> > Documentation/virt/kvm/devices/vcpu.txt.
> 
> That is certainly a possibility, I'm not really sure what the benefit is
> though? It would still lead to corner cases:
> 
>  * What if only some VCPUs had stolen time setup on them?
>  * What if multiple VCPUs pointed to the same location?
>  * The structures can still overlap with guest memory
> 
> It's also more work to setup in user space with the only "benefit" being
> that user space could choose to organise the structures however it sees
> fit (e.g. no need for them to be contiguous in memory). But I'm not sure
> I see a use case for that flexibility.
> 
> Perhaps there's some benefit I'm not seeing?
> 

So this is now mostly moot as you said you were going to change this,
but overall I think the assumption that it's trivial to maintain an
interface such the one proposed here to never be an attack vector for
the host is not as easy as it may sound.  In the past, we've had
numerous bugs related to things like calculating an offset into some
region based on some index and a size, because suddenly someone changed
how an index is calculated, and it had unintended side-effects etc.

So if an API is used to specify something which is effectively per-CPU,
it's better to use the per-CPU handles (VCPU fds) to directly describe
this relationship.


Thanks,

    Christoffer

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2019-08-30  9:22 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-21 15:36 [PATCH v3 00/10] arm64: Stolen time support Steven Price
2019-08-21 15:36 ` [PATCH v3 01/10] KVM: arm64: Document PV-time interface Steven Price
2019-08-27  8:44   ` Christoffer Dall
2019-08-28 11:23     ` Steven Price
2019-08-27  8:57   ` Christoffer Dall
2019-08-28 12:09     ` Steven Price
2019-08-30  9:22       ` Christoffer Dall
2019-08-28 13:49     ` Christoffer Dall
2019-08-29 15:21       ` Steven Price
2019-08-29 17:15   ` Andrew Jones
2019-08-30  8:35     ` Steven Price
2019-08-21 15:36 ` [PATCH v3 02/10] KVM: arm/arm64: Factor out hypercall handling from PSCI code Steven Price
2019-08-21 15:36 ` [PATCH v3 03/10] KVM: arm64: Implement PV_FEATURES call Steven Price
2019-08-21 15:36 ` [PATCH v3 04/10] KVM: Implement kvm_put_guest() Steven Price
2019-08-22 10:29   ` Jonathan Cameron
2019-08-22 10:37     ` Steven Price
2019-08-22 15:28   ` Sean Christopherson
2019-08-22 15:46     ` Steven Price
2019-08-22 16:24       ` Sean Christopherson
2019-08-23 10:33         ` Steven Price
2019-08-21 15:36 ` [PATCH v3 05/10] KVM: arm64: Support stolen time reporting via shared structure Steven Price
2019-08-22 10:39   ` Jonathan Cameron
2019-08-22 11:00     ` Steven Price
2019-08-23 12:07   ` Zenghui Yu
2019-08-23 13:23     ` Steven Price
2019-08-21 15:36 ` [PATCH v3 06/10] KVM: Allow kvm_device_ops to be const Steven Price
2019-08-21 15:36 ` [PATCH v3 07/10] KVM: arm64: Provide a PV_TIME device to user space Steven Price
2019-08-22 10:57   ` Jonathan Cameron
2019-08-22 11:11     ` Steven Price
2019-08-22 11:48       ` Jonathan Cameron
2019-08-21 15:36 ` [PATCH v3 08/10] arm/arm64: Provide a wrapper for SMCCC 1.1 calls Steven Price
2019-08-21 15:36 ` [PATCH v3 09/10] arm/arm64: Make use of the SMCCC 1.1 wrapper Steven Price
2019-08-21 15:36 ` [PATCH v3 10/10] arm64: Retrieve stolen time as paravirtualized guest Steven Price
2019-08-23 11:45   ` Zenghui Yu
2019-08-23 14:22     ` Steven Price
2019-08-27 12:43       ` Zenghui Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).