linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted
@ 2021-07-09  4:37 Sergey Senozhatsky
  2021-07-09  4:37 ` [PATCHv2 1/4] arm64: smccc: Add SMCCC pv-vcpu-state function call IDs Sergey Senozhatsky
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2021-07-09  4:37 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: Suleiman Souhlal, Joel Fernandes, linux-arm-kernel, kvmarm,
	linux-kernel, virtualization, Sergey Senozhatsky

Hello,

This patch set adds a simple vcpu_is_preempted() implementation so
that scheduler can make better decisions when it search for the idle
(v)CPU.

For sched benchmarks please refer to [0].

[0] https://github.com/sergey-senozhatsky/arm64-vcpu_is_preempted

v2:
- 5.13 rebase (static calls, etc)
- simplified and reworked some bits

Sergey Senozhatsky (4):
  arm64: smccc: Add SMCCC pv-vcpu-state function call IDs
  arm64: add guest pvstate support
  arm64: do not use dummy vcpu_is_preempted()
  arm64: add host pv-vcpu-state support

 arch/arm64/include/asm/kvm_host.h | 23 ++++++++
 arch/arm64/include/asm/paravirt.h | 19 +++++++
 arch/arm64/include/asm/spinlock.h | 18 +++---
 arch/arm64/kernel/paravirt.c      | 94 +++++++++++++++++++++++++++++++
 arch/arm64/kernel/smp.c           |  4 ++
 arch/arm64/kvm/Makefile           |  3 +-
 arch/arm64/kvm/arm.c              |  3 +
 arch/arm64/kvm/hypercalls.c       | 11 ++++
 arch/arm64/kvm/pv-vcpu-state.c    | 64 +++++++++++++++++++++
 include/linux/arm-smccc.h         | 18 ++++++
 10 files changed, 248 insertions(+), 9 deletions(-)
 create mode 100644 arch/arm64/kvm/pv-vcpu-state.c

-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCHv2 1/4] arm64: smccc: Add SMCCC pv-vcpu-state function call IDs
  2021-07-09  4:37 [PATCHv2 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
@ 2021-07-09  4:37 ` Sergey Senozhatsky
  2021-07-12 14:22   ` Marc Zyngier
  2021-07-09  4:37 ` [PATCHv2 2/4] arm64: add guest pvstate support Sergey Senozhatsky
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2021-07-09  4:37 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: Suleiman Souhlal, Joel Fernandes, linux-arm-kernel, kvmarm,
	linux-kernel, virtualization, Sergey Senozhatsky

Add the definitions of the SMCCC functions that will be
used to paravirt VCPU state configuration.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 include/linux/arm-smccc.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 7d1cabe15262..dbf0d658e54a 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -177,6 +177,24 @@
 			   ARM_SMCCC_OWNER_STANDARD,		\
 			   0x53)
 
+#define ARM_SMCCC_HV_PV_VCPU_STATE_FEATURES			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
+			   ARM_SMCCC_SMC_64,			\
+			   ARM_SMCCC_OWNER_STANDARD_HYP,	\
+			   0x60)
+
+#define ARM_SMCCC_HV_PV_VCPU_STATE_INIT			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
+			   ARM_SMCCC_SMC_64,			\
+			   ARM_SMCCC_OWNER_STANDARD_HYP,	\
+			   0x61)
+
+#define ARM_SMCCC_HV_PV_VCPU_STATE_RELEASE			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
+			   ARM_SMCCC_SMC_64,			\
+			   ARM_SMCCC_OWNER_STANDARD_HYP,	\
+			   0x62)
+
 /*
  * Return codes defined in ARM DEN 0070A
  * ARM DEN 0070A is now merged/consolidated into ARM DEN 0028 C
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCHv2 2/4] arm64: add guest pvstate support
  2021-07-09  4:37 [PATCHv2 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
  2021-07-09  4:37 ` [PATCHv2 1/4] arm64: smccc: Add SMCCC pv-vcpu-state function call IDs Sergey Senozhatsky
@ 2021-07-09  4:37 ` Sergey Senozhatsky
  2021-07-09  7:39   ` David Edmondson
                     ` (2 more replies)
  2021-07-09  4:37 ` [PATCHv2 3/4] arm64: do not use dummy vcpu_is_preempted() Sergey Senozhatsky
  2021-07-09  4:37 ` [PATCHv2 4/4] arm64: add host pv-vcpu-state support Sergey Senozhatsky
  3 siblings, 3 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2021-07-09  4:37 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: Suleiman Souhlal, Joel Fernandes, linux-arm-kernel, kvmarm,
	linux-kernel, virtualization, Sergey Senozhatsky

PV-vcpu-state is a per-CPU struct, which, for the time being,
holds boolean `preempted' vCPU state. During the startup,
given that host supports PV-state, each guest vCPU sends
a pointer to its per-CPU variable to the host as a payload
with the SMCCC HV call, so that host can update vCPU state
when it puts or loads vCPU.

This has impact on the guest's scheduler:

[..]
  wake_up_process()
   try_to_wake_up()
    select_task_rq_fair()
     available_idle_cpu()
      vcpu_is_preempted()

Some sched benchmarks data is available on the github page [0].

[0] https://github.com/sergey-senozhatsky/arm64-vcpu_is_preempted

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 arch/arm64/include/asm/paravirt.h | 19 +++++++
 arch/arm64/kernel/paravirt.c      | 94 +++++++++++++++++++++++++++++++
 arch/arm64/kernel/smp.c           |  4 ++
 3 files changed, 117 insertions(+)

diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h
index 9aa193e0e8f2..a3f7665dff38 100644
--- a/arch/arm64/include/asm/paravirt.h
+++ b/arch/arm64/include/asm/paravirt.h
@@ -2,6 +2,11 @@
 #ifndef _ASM_ARM64_PARAVIRT_H
 #define _ASM_ARM64_PARAVIRT_H
 
+struct vcpu_state {
+	bool	preempted;
+	u8	reserved[63];
+};
+
 #ifdef CONFIG_PARAVIRT
 #include <linux/static_call_types.h>
 
@@ -20,8 +25,22 @@ static inline u64 paravirt_steal_clock(int cpu)
 
 int __init pv_time_init(void);
 
+bool dummy_vcpu_is_preempted(unsigned int cpu);
+
+extern struct static_key pv_vcpu_is_preempted_enabled;
+DECLARE_STATIC_CALL(pv_vcpu_is_preempted, dummy_vcpu_is_preempted);
+
+static inline bool paravirt_vcpu_is_preempted(unsigned int cpu)
+{
+	return static_call(pv_vcpu_is_preempted)(cpu);
+}
+
+int __init pv_vcpu_state_init(void);
+
 #else
 
+#define pv_vcpu_state_init() do {} while (0)
+
 #define pv_time_init() do {} while (0)
 
 #endif // CONFIG_PARAVIRT
diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
index 75fed4460407..d8fc46795d94 100644
--- a/arch/arm64/kernel/paravirt.c
+++ b/arch/arm64/kernel/paravirt.c
@@ -40,6 +40,11 @@ struct pv_time_stolen_time_region {
 
 static DEFINE_PER_CPU(struct pv_time_stolen_time_region, stolen_time_region);
 
+static DEFINE_PER_CPU(struct vcpu_state, vcpus_states);
+struct static_key pv_vcpu_is_preempted_enabled;
+
+DEFINE_STATIC_CALL(pv_vcpu_is_preempted, dummy_vcpu_is_preempted);
+
 static bool steal_acc = true;
 static int __init parse_no_stealacc(char *arg)
 {
@@ -165,3 +170,92 @@ int __init pv_time_init(void)
 
 	return 0;
 }
+
+bool dummy_vcpu_is_preempted(unsigned int cpu)
+{
+	return false;
+}
+
+static bool __vcpu_is_preempted(unsigned int cpu)
+{
+	struct vcpu_state *st;
+
+	st = &per_cpu(vcpus_states, cpu);
+	return READ_ONCE(st->preempted);
+}
+
+static bool has_pv_vcpu_state(void)
+{
+	struct arm_smccc_res res;
+
+	/* To detect the presence of PV time support we require SMCCC 1.1+ */
+	if (arm_smccc_1_1_get_conduit() == SMCCC_CONDUIT_NONE)
+		return false;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
+			     ARM_SMCCC_HV_PV_VCPU_STATE_FEATURES,
+			     &res);
+
+	if (res.a0 != SMCCC_RET_SUCCESS)
+		return false;
+	return true;
+}
+
+static int __pv_vcpu_state_hook(unsigned int cpu, int event)
+{
+	struct arm_smccc_res res;
+	struct vcpu_state *st;
+
+	st = &per_cpu(vcpus_states, cpu);
+	arm_smccc_1_1_invoke(event, virt_to_phys(st), &res);
+	if (res.a0 != SMCCC_RET_SUCCESS)
+		return -EINVAL;
+	return 0;
+}
+
+static int vcpu_state_init(unsigned int cpu)
+{
+	int ret = __pv_vcpu_state_hook(cpu, ARM_SMCCC_HV_PV_VCPU_STATE_INIT);
+
+	if (ret)
+		pr_warn("Unable to ARM_SMCCC_HV_PV_STATE_INIT\n");
+	return ret;
+}
+
+static int vcpu_state_release(unsigned int cpu)
+{
+	int ret = __pv_vcpu_state_hook(cpu, ARM_SMCCC_HV_PV_VCPU_STATE_RELEASE);
+
+	if (ret)
+		pr_warn("Unable to ARM_SMCCC_HV_PV_STATE_RELEASE\n");
+	return ret;
+}
+
+static int pv_vcpu_state_register_hooks(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
+				"hypervisor/arm/pvstate:starting",
+				vcpu_state_init,
+				vcpu_state_release);
+	if (ret < 0)
+		pr_warn("Failed to register CPU hooks\n");
+	return 0;
+}
+
+int __init pv_vcpu_state_init(void)
+{
+	int ret;
+
+	if (!has_pv_vcpu_state())
+		return 0;
+
+	ret = pv_vcpu_state_register_hooks();
+	if (ret)
+		return ret;
+
+	static_call_update(pv_vcpu_is_preempted, __vcpu_is_preempted);
+	static_key_slow_inc(&pv_vcpu_is_preempted_enabled);
+	return 0;
+}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 6f6ff072acbd..20d42e0f2a99 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -50,6 +50,7 @@
 #include <asm/tlbflush.h>
 #include <asm/ptrace.h>
 #include <asm/virt.h>
+#include <asm/paravirt.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/ipi.h>
@@ -756,6 +757,9 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 	numa_store_cpu_info(this_cpu);
 	numa_add_cpu(this_cpu);
 
+	/* Init paravirt CPU state */
+	pv_vcpu_state_init();
+
 	/*
 	 * If UP is mandated by "nosmp" (which implies "maxcpus=0"), don't set
 	 * secondary CPUs present.
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCHv2 3/4] arm64: do not use dummy vcpu_is_preempted()
  2021-07-09  4:37 [PATCHv2 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
  2021-07-09  4:37 ` [PATCHv2 1/4] arm64: smccc: Add SMCCC pv-vcpu-state function call IDs Sergey Senozhatsky
  2021-07-09  4:37 ` [PATCHv2 2/4] arm64: add guest pvstate support Sergey Senozhatsky
@ 2021-07-09  4:37 ` Sergey Senozhatsky
  2021-07-12 15:47   ` Marc Zyngier
  2021-07-09  4:37 ` [PATCHv2 4/4] arm64: add host pv-vcpu-state support Sergey Senozhatsky
  3 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2021-07-09  4:37 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: Suleiman Souhlal, Joel Fernandes, linux-arm-kernel, kvmarm,
	linux-kernel, virtualization, Sergey Senozhatsky

vcpu_is_preempted() now can represent the actual state of
the VCPU, so the scheduler can make better decisions when
it picks the idle CPU to enqueue a task on.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 arch/arm64/include/asm/spinlock.h | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index 0525c0b089ed..1d579497e1b8 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -7,21 +7,23 @@
 
 #include <asm/qspinlock.h>
 #include <asm/qrwlock.h>
+#include <asm/paravirt.h>
 
 /* See include/linux/spinlock.h */
 #define smp_mb__after_spinlock()	smp_mb()
 
-/*
- * Changing this will break osq_lock() thanks to the call inside
- * smp_cond_load_relaxed().
- *
- * See:
- * https://lore.kernel.org/lkml/20200110100612.GC2827@hirez.programming.kicks-ass.net
- */
 #define vcpu_is_preempted vcpu_is_preempted
-static inline bool vcpu_is_preempted(int cpu)
+
+#ifdef CONFIG_PARAVIRT
+static inline bool vcpu_is_preempted(unsigned int cpu)
+{
+	return paravirt_vcpu_is_preempted(cpu);
+}
+#else
+static inline bool vcpu_is_preempted(unsigned int cpu)
 {
 	return false;
 }
+#endif /* CONFIG_PARAVIRT */
 
 #endif /* __ASM_SPINLOCK_H */
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCHv2 4/4] arm64: add host pv-vcpu-state support
  2021-07-09  4:37 [PATCHv2 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2021-07-09  4:37 ` [PATCHv2 3/4] arm64: do not use dummy vcpu_is_preempted() Sergey Senozhatsky
@ 2021-07-09  4:37 ` Sergey Senozhatsky
  2021-07-12 16:24   ` Marc Zyngier
  3 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2021-07-09  4:37 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: Suleiman Souhlal, Joel Fernandes, linux-arm-kernel, kvmarm,
	linux-kernel, virtualization, Sergey Senozhatsky

Add PV-vcpu-state support bits to the host. Host uses the guest
PV-state per-CPU pointers to update the VCPU state each time
it kvm_arch_vcpu_load() or kvm_arch_vcpu_put() the VCPU, so
that guest scheduler can become aware of the fact that not
all VCPUs are always available. Currently guest scheduler
on amr64 always assumes that all CPUs are available because
vcpu_is_preempted() is not implemented on arm64.

- schbench -t 3 -m 3 -p 4096

  Latency percentiles (usec)
        BASE
================================================
        50.0th: 1 (3556427 samples)
        75.0th: 13 (879210 samples)
        90.0th: 15 (893311 samples)
        95.0th: 18 (159594 samples)
        *99.0th: 118 (224187 samples)
        99.5th: 691 (28555 samples)
        99.9th: 7384 (23076 samples)
        min=1, max=104218
avg worker transfer: 25192.00 ops/sec 98.41MB/s

        PATCHED
================================================
        50.0th: 1 (3507010 samples)
        75.0th: 13 (1635775 samples)
        90.0th: 16 (901271 samples)
        95.0th: 24 (281051 samples)
        *99.0th: 114 (255581 samples)
        99.5th: 382 (33051 samples)
        99.9th: 6392 (26592 samples)
        min=1, max=83877
avg worker transfer: 28613.39 ops/sec 111.77MB/s

- perf bench sched all

  ops/sec
        BASE                 PATCHED
================================================
        33452		     36485
        33541		     39405
        33365		     36858
        33455		     38047
        33449		     37866
        33616		     34922
        33479		     34388
        33594		     37203
        33458		     35363
        33704		     35180

Student's T-test

         N           Min           Max        Median           Avg        Stddev
base     10         33365         33704         33479       33511.3     100.92467
patched  10         34388         39405         36858       36571.7      1607.454
Difference at 95.0% confidence
	3060.4 +/- 1070.09
	9.13244% +/- 3.19321%
	(Student's t, pooled s = 1138.88)

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 arch/arm64/include/asm/kvm_host.h | 23 +++++++++++
 arch/arm64/kvm/Makefile           |  3 +-
 arch/arm64/kvm/arm.c              |  3 ++
 arch/arm64/kvm/hypercalls.c       | 11 ++++++
 arch/arm64/kvm/pv-vcpu-state.c    | 64 +++++++++++++++++++++++++++++++
 5 files changed, 103 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/pv-vcpu-state.c

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 41911585ae0c..e782f4d0c916 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -381,6 +381,12 @@ struct kvm_vcpu_arch {
 		u64 last_steal;
 		gpa_t base;
 	} steal;
+
+	/* PV state of the VCPU */
+	struct {
+		gpa_t base;
+		struct gfn_to_hva_cache ghc;
+	} vcpu_state;
 };
 
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
@@ -695,6 +701,23 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch)
 	return (vcpu_arch->steal.base != GPA_INVALID);
 }
 
+int kvm_init_vcpu_state(struct kvm_vcpu *vcpu, gfn_t addr);
+int kvm_release_vcpu_state(struct kvm_vcpu *vcpu);
+
+static inline void kvm_arm_vcpu_state_init(struct kvm_vcpu_arch *vcpu_arch)
+{
+	vcpu_arch->vcpu_state.base = GPA_INVALID;
+	memset(&vcpu_arch->vcpu_state.ghc, 0, sizeof(struct gfn_to_hva_cache));
+}
+
+static inline bool
+kvm_arm_is_vcpu_state_enabled(struct kvm_vcpu_arch *vcpu_arch)
+{
+	return (vcpu_arch->vcpu_state.base != GPA_INVALID);
+}
+
+void kvm_update_vcpu_preempted(struct kvm_vcpu *vcpu, bool preempted);
+
 void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 989bb5dad2c8..2a3ee82c6d90 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -12,7 +12,8 @@ obj-$(CONFIG_KVM) += hyp/
 
 kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
 	 $(KVM)/vfio.o $(KVM)/irqchip.o $(KVM)/binary_stats.o \
-	 arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
+	 arm.o mmu.o mmio.o psci.o perf.o hypercalls.o \
+	 pvtime.o pv-vcpu-state.o \
 	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o \
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e9a2b8f27792..43e995c9fddb 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -332,6 +332,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	kvm_arm_reset_debug_ptr(vcpu);
 
 	kvm_arm_pvtime_vcpu_init(&vcpu->arch);
+	kvm_arm_vcpu_state_init(&vcpu->arch);
 
 	vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
 
@@ -429,10 +430,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (vcpu_has_ptrauth(vcpu))
 		vcpu_ptrauth_disable(vcpu);
 	kvm_arch_vcpu_load_debug_state_flags(vcpu);
+	kvm_update_vcpu_preempted(vcpu, false);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	kvm_update_vcpu_preempted(vcpu, true);
 	kvm_arch_vcpu_put_debug_state_flags(vcpu);
 	kvm_arch_vcpu_put_fp(vcpu);
 	if (has_vhe())
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 30da78f72b3b..95bcf86e0b6f 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -110,6 +110,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 		case ARM_SMCCC_HV_PV_TIME_FEATURES:
 			val[0] = SMCCC_RET_SUCCESS;
 			break;
+		case ARM_SMCCC_HV_PV_VCPU_STATE_FEATURES:
+			val[0] = SMCCC_RET_SUCCESS;
+			break;
 		}
 		break;
 	case ARM_SMCCC_HV_PV_TIME_FEATURES:
@@ -139,6 +142,14 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 	case ARM_SMCCC_TRNG_RND32:
 	case ARM_SMCCC_TRNG_RND64:
 		return kvm_trng_call(vcpu);
+	case ARM_SMCCC_HV_PV_VCPU_STATE_INIT:
+		if (kvm_init_vcpu_state(vcpu, smccc_get_arg1(vcpu)) == 0)
+			val[0] = SMCCC_RET_SUCCESS;
+		break;
+	case ARM_SMCCC_HV_PV_VCPU_STATE_RELEASE:
+		if (kvm_release_vcpu_state(vcpu) == 0)
+			val[0] = SMCCC_RET_SUCCESS;
+		break;
 	default:
 		return kvm_psci_call(vcpu);
 	}
diff --git a/arch/arm64/kvm/pv-vcpu-state.c b/arch/arm64/kvm/pv-vcpu-state.c
new file mode 100644
index 000000000000..8496bb2a5966
--- /dev/null
+++ b/arch/arm64/kvm/pv-vcpu-state.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/arm-smccc.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_mmu.h>
+#include <asm/paravirt.h>
+
+#include <kvm/arm_psci.h>
+
+int kvm_init_vcpu_state(struct kvm_vcpu *vcpu, gpa_t addr)
+{
+	struct kvm *kvm = vcpu->kvm;
+	int ret;
+	u64 idx;
+
+	if (kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
+		return 0;
+
+	idx = srcu_read_lock(&kvm->srcu);
+	ret = kvm_gfn_to_hva_cache_init(vcpu->kvm,
+					&vcpu->arch.vcpu_state.ghc,
+					addr,
+					sizeof(struct vcpu_state));
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	if (!ret)
+		vcpu->arch.vcpu_state.base = addr;
+	return ret;
+}
+
+int kvm_release_vcpu_state(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
+		return 0;
+
+	kvm_arm_vcpu_state_init(&vcpu->arch);
+	return 0;
+}
+
+void kvm_update_vcpu_preempted(struct kvm_vcpu *vcpu, bool preempted)
+{
+	struct kvm *kvm = vcpu->kvm;
+	u64 idx;
+
+	if (!kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
+		return;
+
+	/*
+	 * This function is called from atomic context, so we need to
+	 * disable page faults. kvm_write_guest_cached() will call
+	 * might_fault().
+	 */
+	pagefault_disable();
+	/*
+	 * Need to take the SRCU lock because kvm_write_guest_offset_cached()
+	 * calls kvm_memslots();
+	 */
+	idx = srcu_read_lock(&kvm->srcu);
+	kvm_write_guest_cached(kvm, &vcpu->arch.vcpu_state.ghc,
+			       &preempted, sizeof(bool));
+	srcu_read_unlock(&kvm->srcu, idx);
+	pagefault_enable();
+}
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/4] arm64: add guest pvstate support
  2021-07-09  4:37 ` [PATCHv2 2/4] arm64: add guest pvstate support Sergey Senozhatsky
@ 2021-07-09  7:39   ` David Edmondson
  2021-07-09  7:52     ` Sergey Senozhatsky
  2021-07-09 18:58   ` Joel Fernandes
  2021-07-12 15:42   ` Marc Zyngier
  2 siblings, 1 reply; 25+ messages in thread
From: David Edmondson @ 2021-07-09  7:39 UTC (permalink / raw)
  To: Sergey Senozhatsky, Marc Zyngier, Will Deacon
  Cc: Suleiman Souhlal, Joel Fernandes, linux-arm-kernel, kvmarm,
	linux-kernel, virtualization, Sergey Senozhatsky

On Friday, 2021-07-09 at 13:37:11 +09, Sergey Senozhatsky wrote:

> PV-vcpu-state is a per-CPU struct, which, for the time being,
> holds boolean `preempted' vCPU state. During the startup,
> given that host supports PV-state, each guest vCPU sends
> a pointer to its per-CPU variable to the host as a payload
> with the SMCCC HV call, so that host can update vCPU state
> when it puts or loads vCPU.
>
> This has impact on the guest's scheduler:
>
> [..]
>   wake_up_process()
>    try_to_wake_up()
>     select_task_rq_fair()
>      available_idle_cpu()
>       vcpu_is_preempted()
>
> Some sched benchmarks data is available on the github page [0].
>
> [0] https://github.com/sergey-senozhatsky/arm64-vcpu_is_preempted
>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  arch/arm64/include/asm/paravirt.h | 19 +++++++
>  arch/arm64/kernel/paravirt.c      | 94 +++++++++++++++++++++++++++++++
>  arch/arm64/kernel/smp.c           |  4 ++
>  3 files changed, 117 insertions(+)
>
> diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h
> index 9aa193e0e8f2..a3f7665dff38 100644
> --- a/arch/arm64/include/asm/paravirt.h
> +++ b/arch/arm64/include/asm/paravirt.h
> @@ -2,6 +2,11 @@
>  #ifndef _ASM_ARM64_PARAVIRT_H
>  #define _ASM_ARM64_PARAVIRT_H
>  
> +struct vcpu_state {
> +	bool	preempted;
> +	u8	reserved[63];
> +};
> +
>  #ifdef CONFIG_PARAVIRT
>  #include <linux/static_call_types.h>
>  
> @@ -20,8 +25,22 @@ static inline u64 paravirt_steal_clock(int cpu)
>  
>  int __init pv_time_init(void);
>  
> +bool dummy_vcpu_is_preempted(unsigned int cpu);
> +
> +extern struct static_key pv_vcpu_is_preempted_enabled;
> +DECLARE_STATIC_CALL(pv_vcpu_is_preempted, dummy_vcpu_is_preempted);
> +
> +static inline bool paravirt_vcpu_is_preempted(unsigned int cpu)
> +{
> +	return static_call(pv_vcpu_is_preempted)(cpu);
> +}
> +
> +int __init pv_vcpu_state_init(void);
> +
>  #else
>  
> +#define pv_vcpu_state_init() do {} while (0)
> +
>  #define pv_time_init() do {} while (0)
>  
>  #endif // CONFIG_PARAVIRT
> diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
> index 75fed4460407..d8fc46795d94 100644
> --- a/arch/arm64/kernel/paravirt.c
> +++ b/arch/arm64/kernel/paravirt.c
> @@ -40,6 +40,11 @@ struct pv_time_stolen_time_region {
>  
>  static DEFINE_PER_CPU(struct pv_time_stolen_time_region, stolen_time_region);
>  
> +static DEFINE_PER_CPU(struct vcpu_state, vcpus_states);
> +struct static_key pv_vcpu_is_preempted_enabled;
> +
> +DEFINE_STATIC_CALL(pv_vcpu_is_preempted, dummy_vcpu_is_preempted);
> +
>  static bool steal_acc = true;
>  static int __init parse_no_stealacc(char *arg)
>  {
> @@ -165,3 +170,92 @@ int __init pv_time_init(void)
>  
>  	return 0;
>  }
> +
> +bool dummy_vcpu_is_preempted(unsigned int cpu)
> +{
> +	return false;
> +}
> +
> +static bool __vcpu_is_preempted(unsigned int cpu)
> +{
> +	struct vcpu_state *st;
> +
> +	st = &per_cpu(vcpus_states, cpu);
> +	return READ_ONCE(st->preempted);
> +}
> +
> +static bool has_pv_vcpu_state(void)
> +{
> +	struct arm_smccc_res res;
> +
> +	/* To detect the presence of PV time support we require SMCCC 1.1+ */

"PV VCPU state support" rather than "PV time support".

> +	if (arm_smccc_1_1_get_conduit() == SMCCC_CONDUIT_NONE)
> +		return false;
> +
> +	arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
> +			     ARM_SMCCC_HV_PV_VCPU_STATE_FEATURES,
> +			     &res);
> +
> +	if (res.a0 != SMCCC_RET_SUCCESS)
> +		return false;
> +	return true;
> +}
> +
> +static int __pv_vcpu_state_hook(unsigned int cpu, int event)
> +{
> +	struct arm_smccc_res res;
> +	struct vcpu_state *st;
> +
> +	st = &per_cpu(vcpus_states, cpu);
> +	arm_smccc_1_1_invoke(event, virt_to_phys(st), &res);
> +	if (res.a0 != SMCCC_RET_SUCCESS)
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +static int vcpu_state_init(unsigned int cpu)
> +{
> +	int ret = __pv_vcpu_state_hook(cpu, ARM_SMCCC_HV_PV_VCPU_STATE_INIT);
> +
> +	if (ret)
> +		pr_warn("Unable to ARM_SMCCC_HV_PV_STATE_INIT\n");
> +	return ret;
> +}
> +
> +static int vcpu_state_release(unsigned int cpu)
> +{
> +	int ret = __pv_vcpu_state_hook(cpu, ARM_SMCCC_HV_PV_VCPU_STATE_RELEASE);
> +
> +	if (ret)
> +		pr_warn("Unable to ARM_SMCCC_HV_PV_STATE_RELEASE\n");
> +	return ret;
> +}
> +
> +static int pv_vcpu_state_register_hooks(void)
> +{
> +	int ret;
> +
> +	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> +				"hypervisor/arm/pvstate:starting",
> +				vcpu_state_init,
> +				vcpu_state_release);
> +	if (ret < 0)
> +		pr_warn("Failed to register CPU hooks\n");

Include that it's PV VCPU state hooks?

> +	return 0;
> +}
> +
> +int __init pv_vcpu_state_init(void)
> +{
> +	int ret;
> +
> +	if (!has_pv_vcpu_state())
> +		return 0;
> +
> +	ret = pv_vcpu_state_register_hooks();
> +	if (ret)
> +		return ret;
> +
> +	static_call_update(pv_vcpu_is_preempted, __vcpu_is_preempted);
> +	static_key_slow_inc(&pv_vcpu_is_preempted_enabled);
> +	return 0;
> +}
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 6f6ff072acbd..20d42e0f2a99 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -50,6 +50,7 @@
>  #include <asm/tlbflush.h>
>  #include <asm/ptrace.h>
>  #include <asm/virt.h>
> +#include <asm/paravirt.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/ipi.h>
> @@ -756,6 +757,9 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  	numa_store_cpu_info(this_cpu);
>  	numa_add_cpu(this_cpu);
>  
> +	/* Init paravirt CPU state */
> +	pv_vcpu_state_init();
> +
>  	/*
>  	 * If UP is mandated by "nosmp" (which implies "maxcpus=0"), don't set
>  	 * secondary CPUs present.
> -- 
> 2.32.0.93.g670b81a890-goog

dme.
-- 
If I could buy my reasoning, I'd pay to lose.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/4] arm64: add guest pvstate support
  2021-07-09  7:39   ` David Edmondson
@ 2021-07-09  7:52     ` Sergey Senozhatsky
  0 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2021-07-09  7:52 UTC (permalink / raw)
  To: David Edmondson
  Cc: Sergey Senozhatsky, Marc Zyngier, Will Deacon, Suleiman Souhlal,
	Joel Fernandes, linux-arm-kernel, kvmarm, linux-kernel,
	virtualization

On (21/07/09 08:39), David Edmondson wrote:
[..]
> > +
> > +static bool has_pv_vcpu_state(void)
> > +{
> > +	struct arm_smccc_res res;
> > +
> > +	/* To detect the presence of PV time support we require SMCCC 1.1+ */
> 
> "PV VCPU state support" rather than "PV time support".

Indeed. Thanks.

[..]
> > +static int pv_vcpu_state_register_hooks(void)
> > +{
> > +	int ret;
> > +
> > +	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> > +				"hypervisor/arm/pvstate:starting",
> > +				vcpu_state_init,
> > +				vcpu_state_release);
> > +	if (ret < 0)
> > +		pr_warn("Failed to register CPU hooks\n");
> 
> Include that it's PV VCPU state hooks?

Ack.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/4] arm64: add guest pvstate support
  2021-07-09  4:37 ` [PATCHv2 2/4] arm64: add guest pvstate support Sergey Senozhatsky
  2021-07-09  7:39   ` David Edmondson
@ 2021-07-09 18:58   ` Joel Fernandes
  2021-07-09 21:53     ` Sergey Senozhatsky
  2021-07-12 15:42   ` Marc Zyngier
  2 siblings, 1 reply; 25+ messages in thread
From: Joel Fernandes @ 2021-07-09 18:58 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Marc Zyngier, Will Deacon, Suleiman Souhlal, linux-arm-kernel,
	kvmarm, linux-kernel, virtualization

Hi, Just few nits, patch itself LGTM:

On Fri, Jul 9, 2021 at 12:37 AM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> PV-vcpu-state is a per-CPU struct, which, for the time being,
> holds boolean `preempted' vCPU state. During the startup,
> given that host supports PV-state, each guest vCPU sends
> a pointer to its per-CPU variable to the host as a payload
> with the SMCCC HV call, so that host can update vCPU state
> when it puts or loads vCPU.
>
> This has impact on the guest's scheduler:
>
> [..]
>   wake_up_process()
>    try_to_wake_up()
>     select_task_rq_fair()
>      available_idle_cpu()
>       vcpu_is_preempted()
>
> Some sched benchmarks data is available on the github page [0].
>
> [0] https://github.com/sergey-senozhatsky/arm64-vcpu_is_preempted
>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  arch/arm64/include/asm/paravirt.h | 19 +++++++
>  arch/arm64/kernel/paravirt.c      | 94 +++++++++++++++++++++++++++++++
>  arch/arm64/kernel/smp.c           |  4 ++
>  3 files changed, 117 insertions(+)
>
> diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h
> index 9aa193e0e8f2..a3f7665dff38 100644
> --- a/arch/arm64/include/asm/paravirt.h
> +++ b/arch/arm64/include/asm/paravirt.h
> @@ -2,6 +2,11 @@
>  #ifndef _ASM_ARM64_PARAVIRT_H
>  #define _ASM_ARM64_PARAVIRT_H
>
> +struct vcpu_state {
> +       bool    preempted;
> +       u8      reserved[63];
> +};
> +
>  #ifdef CONFIG_PARAVIRT
>  #include <linux/static_call_types.h>
>
> @@ -20,8 +25,22 @@ static inline u64 paravirt_steal_clock(int cpu)
>
>  int __init pv_time_init(void);
>
> +bool dummy_vcpu_is_preempted(unsigned int cpu);
> +
> +extern struct static_key pv_vcpu_is_preempted_enabled;.

pv_vcpu_is_preempted_enabled static_key is not used in any patch.
Maybe it is stale?

> +DECLARE_STATIC_CALL(pv_vcpu_is_preempted, dummy_vcpu_is_preempted);
> +
> +static inline bool paravirt_vcpu_is_preempted(unsigned int cpu)
> +{
> +       return static_call(pv_vcpu_is_preempted)(cpu);
> +}
> +
> +int __init pv_vcpu_state_init(void);
> +
>  #else
>
> +#define pv_vcpu_state_init() do {} while (0)
> +
>  #define pv_time_init() do {} while (0)
>
>  #endif // CONFIG_PARAVIRT
> diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
> index 75fed4460407..d8fc46795d94 100644
> --- a/arch/arm64/kernel/paravirt.c
> +++ b/arch/arm64/kernel/paravirt.c
> @@ -40,6 +40,11 @@ struct pv_time_stolen_time_region {
>
>  static DEFINE_PER_CPU(struct pv_time_stolen_time_region, stolen_time_region);
>
> +static DEFINE_PER_CPU(struct vcpu_state, vcpus_states);
> +struct static_key pv_vcpu_is_preempted_enabled;
> +
> +DEFINE_STATIC_CALL(pv_vcpu_is_preempted, dummy_vcpu_is_preempted);

Could we use DEFINE_STATIC_CALL_NULL and get rid of the dummy
function? I believe that makes the function trampoline as return
instruction, till it is updated.

> +
>  static bool steal_acc = true;
>  static int __init parse_no_stealacc(char *arg)
>  {
> @@ -165,3 +170,92 @@ int __init pv_time_init(void)
>
>         return 0;
>  }
> +
> +bool dummy_vcpu_is_preempted(unsigned int cpu)
> +{
> +       return false;
> +}
> +
> +static bool __vcpu_is_preempted(unsigned int cpu)
> +{
> +       struct vcpu_state *st;
> +
> +       st = &per_cpu(vcpus_states, cpu);
> +       return READ_ONCE(st->preempted);

I guess you could just do:
{
  return READ_ONCE(per_cpu(vcpus_states, cpu).preempted);
}

> +}
> +
> +static bool has_pv_vcpu_state(void)
> +{
> +       struct arm_smccc_res res;
> +
> +       /* To detect the presence of PV time support we require SMCCC 1.1+ */
> +       if (arm_smccc_1_1_get_conduit() == SMCCC_CONDUIT_NONE)
> +               return false;
> +
> +       arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
> +                            ARM_SMCCC_HV_PV_VCPU_STATE_FEATURES,
> +                            &res);
> +
> +       if (res.a0 != SMCCC_RET_SUCCESS)
> +               return false;
> +       return true;
> +}
> +
> +static int __pv_vcpu_state_hook(unsigned int cpu, int event)
> +{
> +       struct arm_smccc_res res;
> +       struct vcpu_state *st;
> +
> +       st = &per_cpu(vcpus_states, cpu);
> +       arm_smccc_1_1_invoke(event, virt_to_phys(st), &res);
> +       if (res.a0 != SMCCC_RET_SUCCESS)
> +               return -EINVAL;
> +       return 0;
> +}
> +
> +static int vcpu_state_init(unsigned int cpu)
> +{
> +       int ret = __pv_vcpu_state_hook(cpu, ARM_SMCCC_HV_PV_VCPU_STATE_INIT);
> +
> +       if (ret)
> +               pr_warn("Unable to ARM_SMCCC_HV_PV_STATE_INIT\n");
> +       return ret;
> +}
> +
> +static int vcpu_state_release(unsigned int cpu)
> +{
> +       int ret = __pv_vcpu_state_hook(cpu, ARM_SMCCC_HV_PV_VCPU_STATE_RELEASE);
> +
> +       if (ret)
> +               pr_warn("Unable to ARM_SMCCC_HV_PV_STATE_RELEASE\n");
> +       return ret;
> +}
> +
> +static int pv_vcpu_state_register_hooks(void)
> +{
> +       int ret;
> +
> +       ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> +                               "hypervisor/arm/pvstate:starting",
> +                               vcpu_state_init,
> +                               vcpu_state_release);
> +       if (ret < 0)
> +               pr_warn("Failed to register CPU hooks\n");
> +       return 0;
> +}
> +
> +int __init pv_vcpu_state_init(void)
> +{
> +       int ret;
> +
> +       if (!has_pv_vcpu_state())
> +               return 0;
> +
> +       ret = pv_vcpu_state_register_hooks();
> +       if (ret)
> +               return ret;
> +
> +       static_call_update(pv_vcpu_is_preempted, __vcpu_is_preempted);
> +       static_key_slow_inc(&pv_vcpu_is_preempted_enabled);

I think this static key inc is also stale.

thanks,

-Joel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/4] arm64: add guest pvstate support
  2021-07-09 18:58   ` Joel Fernandes
@ 2021-07-09 21:53     ` Sergey Senozhatsky
  2021-07-11 16:58       ` Joel Fernandes
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2021-07-09 21:53 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Sergey Senozhatsky, Marc Zyngier, Will Deacon, Suleiman Souhlal,
	linux-arm-kernel, kvmarm, linux-kernel, virtualization

Hi Joel,

On (21/07/09 14:58), Joel Fernandes wrote:
[..]
> > +struct vcpu_state {
> > +       bool    preempted;
> > +       u8      reserved[63];
> > +};
> > +
> >  #ifdef CONFIG_PARAVIRT
> >  #include <linux/static_call_types.h>
> >
> > @@ -20,8 +25,22 @@ static inline u64 paravirt_steal_clock(int cpu)
> >
> >  int __init pv_time_init(void);
> >
> > +bool dummy_vcpu_is_preempted(unsigned int cpu);
> > +
> > +extern struct static_key pv_vcpu_is_preempted_enabled;.
> 
> pv_vcpu_is_preempted_enabled static_key is not used in any patch.
> Maybe it is stale?

Oh, it is, thanks.

> > +DECLARE_STATIC_CALL(pv_vcpu_is_preempted, dummy_vcpu_is_preempted);
> > +
> > +static inline bool paravirt_vcpu_is_preempted(unsigned int cpu)
> > +{
> > +       return static_call(pv_vcpu_is_preempted)(cpu);
> > +}
> > +
> > +int __init pv_vcpu_state_init(void);
> > +
> >  #else
> >
> > +#define pv_vcpu_state_init() do {} while (0)
> > +
> >  #define pv_time_init() do {} while (0)
> >
> >  #endif // CONFIG_PARAVIRT
> > diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
> > index 75fed4460407..d8fc46795d94 100644
> > --- a/arch/arm64/kernel/paravirt.c
> > +++ b/arch/arm64/kernel/paravirt.c
> > @@ -40,6 +40,11 @@ struct pv_time_stolen_time_region {
> >
> >  static DEFINE_PER_CPU(struct pv_time_stolen_time_region, stolen_time_region);
> >
> > +static DEFINE_PER_CPU(struct vcpu_state, vcpus_states);
> > +struct static_key pv_vcpu_is_preempted_enabled;
> > +
> > +DEFINE_STATIC_CALL(pv_vcpu_is_preempted, dummy_vcpu_is_preempted);
> 
> Could we use DEFINE_STATIC_CALL_NULL and get rid of the dummy
> function? I believe that makes the function trampoline as return
> instruction, till it is updated.

Is DEFINE_STATIC_CALL_NULL for cases when function returns void?

We need something that returns `false` to vcpu_is_preempted() or
per_cpu(vcpus_states) once pv vcpu-state is initialised.

[..]
> > +static bool __vcpu_is_preempted(unsigned int cpu)
> > +{
> > +       struct vcpu_state *st;
> > +
> > +       st = &per_cpu(vcpus_states, cpu);
> > +       return READ_ONCE(st->preempted);
> 
> I guess you could just do:
> {
>   return READ_ONCE(per_cpu(vcpus_states, cpu).preempted);
> }

Ack.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/4] arm64: add guest pvstate support
  2021-07-09 21:53     ` Sergey Senozhatsky
@ 2021-07-11 16:58       ` Joel Fernandes
  0 siblings, 0 replies; 25+ messages in thread
From: Joel Fernandes @ 2021-07-11 16:58 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Marc Zyngier, Will Deacon, Suleiman Souhlal, linux-arm-kernel,
	kvmarm, linux-kernel, virtualization

On Fri, Jul 9, 2021 at 5:53 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:

> > > +DECLARE_STATIC_CALL(pv_vcpu_is_preempted, dummy_vcpu_is_preempted);
> > > +
> > > +static inline bool paravirt_vcpu_is_preempted(unsigned int cpu)
> > > +{
> > > +       return static_call(pv_vcpu_is_preempted)(cpu);
> > > +}
> > > +
> > > +int __init pv_vcpu_state_init(void);
> > > +
> > >  #else
> > >
> > > +#define pv_vcpu_state_init() do {} while (0)
> > > +
> > >  #define pv_time_init() do {} while (0)
> > >
> > >  #endif // CONFIG_PARAVIRT
> > > diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
> > > index 75fed4460407..d8fc46795d94 100644
> > > --- a/arch/arm64/kernel/paravirt.c
> > > +++ b/arch/arm64/kernel/paravirt.c
> > > @@ -40,6 +40,11 @@ struct pv_time_stolen_time_region {
> > >
> > >  static DEFINE_PER_CPU(struct pv_time_stolen_time_region, stolen_time_region);
> > >
> > > +static DEFINE_PER_CPU(struct vcpu_state, vcpus_states);
> > > +struct static_key pv_vcpu_is_preempted_enabled;
> > > +
> > > +DEFINE_STATIC_CALL(pv_vcpu_is_preempted, dummy_vcpu_is_preempted);
> >
> > Could we use DEFINE_STATIC_CALL_NULL and get rid of the dummy
> > function? I believe that makes the function trampoline as return
> > instruction, till it is updated.
>
> Is DEFINE_STATIC_CALL_NULL for cases when function returns void?
>
> We need something that returns `false` to vcpu_is_preempted() or
> per_cpu(vcpus_states) once pv vcpu-state is initialised.

Ah, that might be problematic. In which case what you did is fine. Thanks,

- Joel



>
> [..]
> > > +static bool __vcpu_is_preempted(unsigned int cpu)
> > > +{
> > > +       struct vcpu_state *st;
> > > +
> > > +       st = &per_cpu(vcpus_states, cpu);
> > > +       return READ_ONCE(st->preempted);
> >
> > I guess you could just do:
> > {
> >   return READ_ONCE(per_cpu(vcpus_states, cpu).preempted);
> > }
>
> Ack.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 1/4] arm64: smccc: Add SMCCC pv-vcpu-state function call IDs
  2021-07-09  4:37 ` [PATCHv2 1/4] arm64: smccc: Add SMCCC pv-vcpu-state function call IDs Sergey Senozhatsky
@ 2021-07-12 14:22   ` Marc Zyngier
  0 siblings, 0 replies; 25+ messages in thread
From: Marc Zyngier @ 2021-07-12 14:22 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Will Deacon, Suleiman Souhlal, Joel Fernandes, linux-arm-kernel,
	kvmarm, linux-kernel, virtualization

On Fri, 09 Jul 2021 05:37:10 +0100,
Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> 
> Add the definitions of the SMCCC functions that will be
> used to paravirt VCPU state configuration.
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  include/linux/arm-smccc.h | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
> index 7d1cabe15262..dbf0d658e54a 100644
> --- a/include/linux/arm-smccc.h
> +++ b/include/linux/arm-smccc.h
> @@ -177,6 +177,24 @@
>  			   ARM_SMCCC_OWNER_STANDARD,		\
>  			   0x53)
>  
> +#define ARM_SMCCC_HV_PV_VCPU_STATE_FEATURES			\
> +	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
> +			   ARM_SMCCC_SMC_64,			\
> +			   ARM_SMCCC_OWNER_STANDARD_HYP,	\
> +			   0x60)
> +
> +#define ARM_SMCCC_HV_PV_VCPU_STATE_INIT			\
> +	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
> +			   ARM_SMCCC_SMC_64,			\
> +			   ARM_SMCCC_OWNER_STANDARD_HYP,	\
> +			   0x61)
> +
> +#define ARM_SMCCC_HV_PV_VCPU_STATE_RELEASE			\
> +	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
> +			   ARM_SMCCC_SMC_64,			\
> +			   ARM_SMCCC_OWNER_STANDARD_HYP,	\
> +			   0x62)
> +

This is definitely the *wrong* namespace for this. The "STANDARD_HYP"
space is owned by ARM for hypercalls that are implementable by any
hypervisors, and for which there is an ARM-owned specification.

Unless you plan to get this adopted as an ARM spec (pretty doubtful),
this should go instead in the KVM-specific space, which currently has
a single user (the PTP stuff), although I plan to add a couple more
shortly.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/4] arm64: add guest pvstate support
  2021-07-09  4:37 ` [PATCHv2 2/4] arm64: add guest pvstate support Sergey Senozhatsky
  2021-07-09  7:39   ` David Edmondson
  2021-07-09 18:58   ` Joel Fernandes
@ 2021-07-12 15:42   ` Marc Zyngier
  2021-07-21  2:05     ` Sergey Senozhatsky
  2 siblings, 1 reply; 25+ messages in thread
From: Marc Zyngier @ 2021-07-12 15:42 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Will Deacon, Suleiman Souhlal, Joel Fernandes, linux-arm-kernel,
	kvmarm, linux-kernel, virtualization

On Fri, 09 Jul 2021 05:37:11 +0100,
Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> 
> PV-vcpu-state is a per-CPU struct, which, for the time being,
> holds boolean `preempted' vCPU state. During the startup,
> given that host supports PV-state, each guest vCPU sends
> a pointer to its per-CPU variable to the host as a payload

What is the expected memory type for this memory region? What is its
life cycle? Where is it allocated from?

> with the SMCCC HV call, so that host can update vCPU state
> when it puts or loads vCPU.
> 
> This has impact on the guest's scheduler:
> 
> [..]
>   wake_up_process()
>    try_to_wake_up()
>     select_task_rq_fair()
>      available_idle_cpu()
>       vcpu_is_preempted()
> 
> Some sched benchmarks data is available on the github page [0].
> 
> [0] https://github.com/sergey-senozhatsky/arm64-vcpu_is_preempted

Please include these results in the cover letter. I tend to reply to
email while offline, and I can't comment on GH.

> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  arch/arm64/include/asm/paravirt.h | 19 +++++++
>  arch/arm64/kernel/paravirt.c      | 94 +++++++++++++++++++++++++++++++
>  arch/arm64/kernel/smp.c           |  4 ++
>  3 files changed, 117 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h
> index 9aa193e0e8f2..a3f7665dff38 100644
> --- a/arch/arm64/include/asm/paravirt.h
> +++ b/arch/arm64/include/asm/paravirt.h
> @@ -2,6 +2,11 @@
>  #ifndef _ASM_ARM64_PARAVIRT_H
>  #define _ASM_ARM64_PARAVIRT_H
>  
> +struct vcpu_state {

If this is KVM specific (which it most likely is), please name-space
it correctly, and move it to a KVM-specific location.

> +	bool	preempted;
> +	u8	reserved[63];

Why 63? Do you attach any particular meaning to a 64byte structure
(and before you say "cache line size", please look at some of the
cache line sizes we have to deal with...).

This should also be versioned from day-1, one way or another.

> +};
> +
>  #ifdef CONFIG_PARAVIRT
>  #include <linux/static_call_types.h>
>  
> @@ -20,8 +25,22 @@ static inline u64 paravirt_steal_clock(int cpu)
>  
>  int __init pv_time_init(void);
>  
> +bool dummy_vcpu_is_preempted(unsigned int cpu);
> +
> +extern struct static_key pv_vcpu_is_preempted_enabled;
> +DECLARE_STATIC_CALL(pv_vcpu_is_preempted, dummy_vcpu_is_preempted);
> +
> +static inline bool paravirt_vcpu_is_preempted(unsigned int cpu)
> +{
> +	return static_call(pv_vcpu_is_preempted)(cpu);
> +}
> +
> +int __init pv_vcpu_state_init(void);
> +
>  #else
>  
> +#define pv_vcpu_state_init() do {} while (0)
> +
>  #define pv_time_init() do {} while (0)
>  
>  #endif // CONFIG_PARAVIRT
> diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
> index 75fed4460407..d8fc46795d94 100644
> --- a/arch/arm64/kernel/paravirt.c
> +++ b/arch/arm64/kernel/paravirt.c
> @@ -40,6 +40,11 @@ struct pv_time_stolen_time_region {
>  
>  static DEFINE_PER_CPU(struct pv_time_stolen_time_region, stolen_time_region);
>  
> +static DEFINE_PER_CPU(struct vcpu_state, vcpus_states);

nit: there is only one 'state' structure per CPU, so I'd prefer the
singular form.

> +struct static_key pv_vcpu_is_preempted_enabled;
> +
> +DEFINE_STATIC_CALL(pv_vcpu_is_preempted, dummy_vcpu_is_preempted);
> +
>  static bool steal_acc = true;
>  static int __init parse_no_stealacc(char *arg)
>  {
> @@ -165,3 +170,92 @@ int __init pv_time_init(void)
>  
>  	return 0;
>  }
> +
> +bool dummy_vcpu_is_preempted(unsigned int cpu)

Why does this have to be global?

> +{
> +	return false;
> +}
> +
> +static bool __vcpu_is_preempted(unsigned int cpu)
> +{
> +	struct vcpu_state *st;
> +
> +	st = &per_cpu(vcpus_states, cpu);
> +	return READ_ONCE(st->preempted);
> +}
> +
> +static bool has_pv_vcpu_state(void)
> +{
> +	struct arm_smccc_res res;
> +
> +	/* To detect the presence of PV time support we require SMCCC 1.1+ */
> +	if (arm_smccc_1_1_get_conduit() == SMCCC_CONDUIT_NONE)
> +		return false;
> +
> +	arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
> +			     ARM_SMCCC_HV_PV_VCPU_STATE_FEATURES,
> +			     &res);
> +
> +	if (res.a0 != SMCCC_RET_SUCCESS)
> +		return false;
> +	return true;

Please move all this over the the KVM-specific discovery mechanism.

> +}
> +
> +static int __pv_vcpu_state_hook(unsigned int cpu, int event)
> +{
> +	struct arm_smccc_res res;
> +	struct vcpu_state *st;
> +
> +	st = &per_cpu(vcpus_states, cpu);
> +	arm_smccc_1_1_invoke(event, virt_to_phys(st), &res);
> +	if (res.a0 != SMCCC_RET_SUCCESS)
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +static int vcpu_state_init(unsigned int cpu)
> +{
> +	int ret = __pv_vcpu_state_hook(cpu, ARM_SMCCC_HV_PV_VCPU_STATE_INIT);
> +
> +	if (ret)
> +		pr_warn("Unable to ARM_SMCCC_HV_PV_STATE_INIT\n");

pr_warn_once(), please.

> +	return ret;
> +}
> +
> +static int vcpu_state_release(unsigned int cpu)
> +{
> +	int ret = __pv_vcpu_state_hook(cpu, ARM_SMCCC_HV_PV_VCPU_STATE_RELEASE);
> +
> +	if (ret)
> +		pr_warn("Unable to ARM_SMCCC_HV_PV_STATE_RELEASE\n");
> +	return ret;
> +}
> +
> +static int pv_vcpu_state_register_hooks(void)
> +{
> +	int ret;
> +
> +	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> +				"hypervisor/arm/pvstate:starting",
> +				vcpu_state_init,
> +				vcpu_state_release);
> +	if (ret < 0)
> +		pr_warn("Failed to register CPU hooks\n");
> +	return 0;
> +}
> +
> +int __init pv_vcpu_state_init(void)
> +{
> +	int ret;
> +
> +	if (!has_pv_vcpu_state())
> +		return 0;
> +
> +	ret = pv_vcpu_state_register_hooks();
> +	if (ret)
> +		return ret;
> +
> +	static_call_update(pv_vcpu_is_preempted, __vcpu_is_preempted);
> +	static_key_slow_inc(&pv_vcpu_is_preempted_enabled);
> +	return 0;
> +}
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 6f6ff072acbd..20d42e0f2a99 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -50,6 +50,7 @@
>  #include <asm/tlbflush.h>
>  #include <asm/ptrace.h>
>  #include <asm/virt.h>
> +#include <asm/paravirt.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/ipi.h>
> @@ -756,6 +757,9 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  	numa_store_cpu_info(this_cpu);
>  	numa_add_cpu(this_cpu);
>  
> +	/* Init paravirt CPU state */
> +	pv_vcpu_state_init();
> +
>  	/*
>  	 * If UP is mandated by "nosmp" (which implies "maxcpus=0"), don't set
>  	 * secondary CPUs present.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 3/4] arm64: do not use dummy vcpu_is_preempted()
  2021-07-09  4:37 ` [PATCHv2 3/4] arm64: do not use dummy vcpu_is_preempted() Sergey Senozhatsky
@ 2021-07-12 15:47   ` Marc Zyngier
  2021-07-21  2:06     ` Sergey Senozhatsky
  0 siblings, 1 reply; 25+ messages in thread
From: Marc Zyngier @ 2021-07-12 15:47 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Will Deacon, Suleiman Souhlal, Joel Fernandes, linux-arm-kernel,
	kvmarm, linux-kernel, virtualization

On Fri, 09 Jul 2021 05:37:12 +0100,
Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> 
> vcpu_is_preempted() now can represent the actual state of
> the VCPU, so the scheduler can make better decisions when
> it picks the idle CPU to enqueue a task on.
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  arch/arm64/include/asm/spinlock.h | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
> index 0525c0b089ed..1d579497e1b8 100644
> --- a/arch/arm64/include/asm/spinlock.h
> +++ b/arch/arm64/include/asm/spinlock.h
> @@ -7,21 +7,23 @@
>  
>  #include <asm/qspinlock.h>
>  #include <asm/qrwlock.h>
> +#include <asm/paravirt.h>
>  
>  /* See include/linux/spinlock.h */
>  #define smp_mb__after_spinlock()	smp_mb()
>  
> -/*
> - * Changing this will break osq_lock() thanks to the call inside
> - * smp_cond_load_relaxed().
> - *
> - * See:
> - * https://lore.kernel.org/lkml/20200110100612.GC2827@hirez.programming.kicks-ass.net
> - */

Why are you deleting this? Please explain your reasoning in the commit
message. It seems to me that it still makes complete sense when
CONFIG_PARAVIRT is not defined.

>  #define vcpu_is_preempted vcpu_is_preempted
> -static inline bool vcpu_is_preempted(int cpu)
> +
> +#ifdef CONFIG_PARAVIRT
> +static inline bool vcpu_is_preempted(unsigned int cpu)
> +{
> +	return paravirt_vcpu_is_preempted(cpu);
> +}
> +#else
> +static inline bool vcpu_is_preempted(unsigned int cpu)
>  {
>  	return false;
>  }
> +#endif /* CONFIG_PARAVIRT */
>  
>  #endif /* __ASM_SPINLOCK_H */

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 4/4] arm64: add host pv-vcpu-state support
  2021-07-09  4:37 ` [PATCHv2 4/4] arm64: add host pv-vcpu-state support Sergey Senozhatsky
@ 2021-07-12 16:24   ` Marc Zyngier
  2021-07-20 18:44     ` Joel Fernandes
  2021-07-21  1:15     ` Sergey Senozhatsky
  0 siblings, 2 replies; 25+ messages in thread
From: Marc Zyngier @ 2021-07-12 16:24 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Will Deacon, Suleiman Souhlal, Joel Fernandes, linux-arm-kernel,
	kvmarm, linux-kernel, virtualization

On Fri, 09 Jul 2021 05:37:13 +0100,
Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> 
> Add PV-vcpu-state support bits to the host. Host uses the guest
> PV-state per-CPU pointers to update the VCPU state each time
> it kvm_arch_vcpu_load() or kvm_arch_vcpu_put() the VCPU, so
> that guest scheduler can become aware of the fact that not
> all VCPUs are always available. Currently guest scheduler
> on amr64 always assumes that all CPUs are available because

arm64

> vcpu_is_preempted() is not implemented on arm64.
> 
> - schbench -t 3 -m 3 -p 4096
> 
>   Latency percentiles (usec)
>         BASE
> ================================================
>         50.0th: 1 (3556427 samples)
>         75.0th: 13 (879210 samples)
>         90.0th: 15 (893311 samples)
>         95.0th: 18 (159594 samples)
>         *99.0th: 118 (224187 samples)
>         99.5th: 691 (28555 samples)
>         99.9th: 7384 (23076 samples)
>         min=1, max=104218
> avg worker transfer: 25192.00 ops/sec 98.41MB/s
> 
>         PATCHED
> ================================================
>         50.0th: 1 (3507010 samples)
>         75.0th: 13 (1635775 samples)
>         90.0th: 16 (901271 samples)
>         95.0th: 24 (281051 samples)
>         *99.0th: 114 (255581 samples)
>         99.5th: 382 (33051 samples)
>         99.9th: 6392 (26592 samples)
>         min=1, max=83877
> avg worker transfer: 28613.39 ops/sec 111.77MB/s
> 
> - perf bench sched all
> 
>   ops/sec
>         BASE                 PATCHED
> ================================================
>         33452		     36485
>         33541		     39405
>         33365		     36858
>         33455		     38047
>         33449		     37866
>         33616		     34922
>         33479		     34388
>         33594		     37203
>         33458		     35363
>         33704		     35180
> 
> Student's T-test
> 
>          N           Min           Max        Median           Avg        Stddev
> base     10         33365         33704         33479       33511.3     100.92467
> patched  10         34388         39405         36858       36571.7      1607.454
> Difference at 95.0% confidence
> 	3060.4 +/- 1070.09
> 	9.13244% +/- 3.19321%
> 	(Student's t, pooled s = 1138.88)
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  arch/arm64/include/asm/kvm_host.h | 23 +++++++++++
>  arch/arm64/kvm/Makefile           |  3 +-
>  arch/arm64/kvm/arm.c              |  3 ++
>  arch/arm64/kvm/hypercalls.c       | 11 ++++++
>  arch/arm64/kvm/pv-vcpu-state.c    | 64 +++++++++++++++++++++++++++++++
>  5 files changed, 103 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/kvm/pv-vcpu-state.c
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 41911585ae0c..e782f4d0c916 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -381,6 +381,12 @@ struct kvm_vcpu_arch {
>  		u64 last_steal;
>  		gpa_t base;
>  	} steal;
> +
> +	/* PV state of the VCPU */
> +	struct {
> +		gpa_t base;
> +		struct gfn_to_hva_cache ghc;

Please stay consistent with what we use of steal time accounting
(i.e. don't use gfn_to_hva_cache, but directly use a gpa with
kvm_put_guest()). If you can demonstrate that there is an unacceptable
overhead in doing so, then both steal time and preemption will be
updated at the same time.


> +	} vcpu_state;
>  };
>  
>  /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> @@ -695,6 +701,23 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch)
>  	return (vcpu_arch->steal.base != GPA_INVALID);
>  }
>  
> +int kvm_init_vcpu_state(struct kvm_vcpu *vcpu, gfn_t addr);
> +int kvm_release_vcpu_state(struct kvm_vcpu *vcpu);
> +
> +static inline void kvm_arm_vcpu_state_init(struct kvm_vcpu_arch *vcpu_arch)
> +{
> +	vcpu_arch->vcpu_state.base = GPA_INVALID;
> +	memset(&vcpu_arch->vcpu_state.ghc, 0, sizeof(struct gfn_to_hva_cache));

Does it really need to be inlined?

> +}
> +
> +static inline bool
> +kvm_arm_is_vcpu_state_enabled(struct kvm_vcpu_arch *vcpu_arch)
> +{
> +	return (vcpu_arch->vcpu_state.base != GPA_INVALID);
> +}
> +
> +void kvm_update_vcpu_preempted(struct kvm_vcpu *vcpu, bool preempted);
> +
>  void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 989bb5dad2c8..2a3ee82c6d90 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -12,7 +12,8 @@ obj-$(CONFIG_KVM) += hyp/
>  
>  kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
>  	 $(KVM)/vfio.o $(KVM)/irqchip.o $(KVM)/binary_stats.o \
> -	 arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
> +	 arm.o mmu.o mmio.o psci.o perf.o hypercalls.o \
> +	 pvtime.o pv-vcpu-state.o \
>  	 inject_fault.o va_layout.o handle_exit.o \
>  	 guest.o debug.o reset.o sys_regs.o \
>  	 vgic-sys-reg-v3.o fpsimd.o pmu.o \
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e9a2b8f27792..43e995c9fddb 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -332,6 +332,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  	kvm_arm_reset_debug_ptr(vcpu);
>  
>  	kvm_arm_pvtime_vcpu_init(&vcpu->arch);
> +	kvm_arm_vcpu_state_init(&vcpu->arch);
>  
>  	vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
>  
> @@ -429,10 +430,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (vcpu_has_ptrauth(vcpu))
>  		vcpu_ptrauth_disable(vcpu);
>  	kvm_arch_vcpu_load_debug_state_flags(vcpu);
> +	kvm_update_vcpu_preempted(vcpu, false);
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
> +	kvm_update_vcpu_preempted(vcpu, true);

This doesn't look right. With this, you are now telling the guest that
a vcpu that is blocked on WFI is preempted. This really isn't the
case, as it has voluntarily entered a low-power mode while waiting for
an interrupt. Indeed, the vcpu isn't running. A physical CPU wouldn't
be running either.

>  	kvm_arch_vcpu_put_debug_state_flags(vcpu);
>  	kvm_arch_vcpu_put_fp(vcpu);
>  	if (has_vhe())
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index 30da78f72b3b..95bcf86e0b6f 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -110,6 +110,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>  		case ARM_SMCCC_HV_PV_TIME_FEATURES:
>  			val[0] = SMCCC_RET_SUCCESS;
>  			break;
> +		case ARM_SMCCC_HV_PV_VCPU_STATE_FEATURES:
> +			val[0] = SMCCC_RET_SUCCESS;
> +			break;
>  		}
>  		break;
>  	case ARM_SMCCC_HV_PV_TIME_FEATURES:
> @@ -139,6 +142,14 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>  	case ARM_SMCCC_TRNG_RND32:
>  	case ARM_SMCCC_TRNG_RND64:
>  		return kvm_trng_call(vcpu);
> +	case ARM_SMCCC_HV_PV_VCPU_STATE_INIT:
> +		if (kvm_init_vcpu_state(vcpu, smccc_get_arg1(vcpu)) == 0)
> +			val[0] = SMCCC_RET_SUCCESS;
> +		break;
> +	case ARM_SMCCC_HV_PV_VCPU_STATE_RELEASE:
> +		if (kvm_release_vcpu_state(vcpu) == 0)
> +			val[0] = SMCCC_RET_SUCCESS;
> +		break;
>  	default:
>  		return kvm_psci_call(vcpu);
>  	}
> diff --git a/arch/arm64/kvm/pv-vcpu-state.c b/arch/arm64/kvm/pv-vcpu-state.c
> new file mode 100644
> index 000000000000..8496bb2a5966
> --- /dev/null
> +++ b/arch/arm64/kvm/pv-vcpu-state.c
> @@ -0,0 +1,64 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include <linux/arm-smccc.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_mmu.h>
> +#include <asm/paravirt.h>
> +
> +#include <kvm/arm_psci.h>

Why this include?

> +
> +int kvm_init_vcpu_state(struct kvm_vcpu *vcpu, gpa_t addr)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	int ret;
> +	u64 idx;
> +
> +	if (kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
> +		return 0;
> +
> +	idx = srcu_read_lock(&kvm->srcu);
> +	ret = kvm_gfn_to_hva_cache_init(vcpu->kvm,
> +					&vcpu->arch.vcpu_state.ghc,
> +					addr,
> +					sizeof(struct vcpu_state));
> +	srcu_read_unlock(&kvm->srcu, idx);
> +
> +	if (!ret)
> +		vcpu->arch.vcpu_state.base = addr;
> +	return ret;
> +}
> +
> +int kvm_release_vcpu_state(struct kvm_vcpu *vcpu)
> +{
> +	if (!kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
> +		return 0;
> +
> +	kvm_arm_vcpu_state_init(&vcpu->arch);
> +	return 0;
> +}
> +
> +void kvm_update_vcpu_preempted(struct kvm_vcpu *vcpu, bool preempted)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	u64 idx;
> +
> +	if (!kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
> +		return;
> +
> +	/*
> +	 * This function is called from atomic context, so we need to
> +	 * disable page faults. kvm_write_guest_cached() will call
> +	 * might_fault().
> +	 */
> +	pagefault_disable();
> +	/*
> +	 * Need to take the SRCU lock because kvm_write_guest_offset_cached()
> +	 * calls kvm_memslots();
> +	 */
> +	idx = srcu_read_lock(&kvm->srcu);
> +	kvm_write_guest_cached(kvm, &vcpu->arch.vcpu_state.ghc,
> +			       &preempted, sizeof(bool));
> +	srcu_read_unlock(&kvm->srcu, idx);
> +	pagefault_enable();
> +}

Before rushing into an implementation, this really could do with some
specification:

- where is the memory allocated?

- what is the lifetime of the memory?

- what is its expected memory type?

- what is the coherency model (what if the guest runs with caching
  disabled, for example)?

- is the functionality preserved across a VM reset?

- what is the strategy w.r.t live migration?

- can userspace detect that the feature is implemented?

- can userspace prevent the feature from being used?

- where is the documentation?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 4/4] arm64: add host pv-vcpu-state support
  2021-07-12 16:24   ` Marc Zyngier
@ 2021-07-20 18:44     ` Joel Fernandes
  2021-07-21  8:40       ` Marc Zyngier
  2021-07-21  1:15     ` Sergey Senozhatsky
  1 sibling, 1 reply; 25+ messages in thread
From: Joel Fernandes @ 2021-07-20 18:44 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sergey Senozhatsky, Will Deacon, Suleiman Souhlal,
	linux-arm-kernel, kvmarm, linux-kernel, virtualization

On Mon, Jul 12, 2021 at 12:24 PM Marc Zyngier <maz@kernel.org> wrote:
>
[...]
> > +}
> > +
> > +static inline bool
> > +kvm_arm_is_vcpu_state_enabled(struct kvm_vcpu_arch *vcpu_arch)
> > +{
> > +     return (vcpu_arch->vcpu_state.base != GPA_INVALID);
> > +}
> > +
> > +void kvm_update_vcpu_preempted(struct kvm_vcpu *vcpu, bool preempted);
> > +
> >  void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
> >
> >  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> > index 989bb5dad2c8..2a3ee82c6d90 100644
> > --- a/arch/arm64/kvm/Makefile
> > +++ b/arch/arm64/kvm/Makefile
> > @@ -12,7 +12,8 @@ obj-$(CONFIG_KVM) += hyp/
> >
> >  kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
> >        $(KVM)/vfio.o $(KVM)/irqchip.o $(KVM)/binary_stats.o \
> > -      arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
> > +      arm.o mmu.o mmio.o psci.o perf.o hypercalls.o \
> > +      pvtime.o pv-vcpu-state.o \
> >        inject_fault.o va_layout.o handle_exit.o \
> >        guest.o debug.o reset.o sys_regs.o \
> >        vgic-sys-reg-v3.o fpsimd.o pmu.o \
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index e9a2b8f27792..43e995c9fddb 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -332,6 +332,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> >       kvm_arm_reset_debug_ptr(vcpu);
> >
> >       kvm_arm_pvtime_vcpu_init(&vcpu->arch);
> > +     kvm_arm_vcpu_state_init(&vcpu->arch);
> >
> >       vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
> >
> > @@ -429,10 +430,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >       if (vcpu_has_ptrauth(vcpu))
> >               vcpu_ptrauth_disable(vcpu);
> >       kvm_arch_vcpu_load_debug_state_flags(vcpu);
> > +     kvm_update_vcpu_preempted(vcpu, false);
> >  }
> >
> >  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >  {
> > +     kvm_update_vcpu_preempted(vcpu, true);
>
> This doesn't look right. With this, you are now telling the guest that
> a vcpu that is blocked on WFI is preempted. This really isn't the
> case, as it has voluntarily entered a low-power mode while waiting for
> an interrupt. Indeed, the vcpu isn't running. A physical CPU wouldn't
> be running either.

Can that be cured by just checking vcpu->preempted before calling
kvm_update_vcpu_preempted() ?

- Joel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 4/4] arm64: add host pv-vcpu-state support
  2021-07-12 16:24   ` Marc Zyngier
  2021-07-20 18:44     ` Joel Fernandes
@ 2021-07-21  1:15     ` Sergey Senozhatsky
  2021-07-21  9:10       ` Marc Zyngier
  1 sibling, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2021-07-21  1:15 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sergey Senozhatsky, Will Deacon, Suleiman Souhlal,
	Joel Fernandes, linux-arm-kernel, kvmarm, linux-kernel,
	virtualization

On (21/07/12 17:24), Marc Zyngier wrote:
> >  
> >  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >  {
> > +	kvm_update_vcpu_preempted(vcpu, true);
> 
> This doesn't look right. With this, you are now telling the guest that
> a vcpu that is blocked on WFI is preempted. This really isn't the
> case, as it has voluntarily entered a low-power mode while waiting for
> an interrupt. Indeed, the vcpu isn't running. A physical CPU wouldn't
> be running either.

I suppose you are talking about kvm_vcpu_block(). Well, it checks
kvm_vcpu_check_block() but then it simply schedule() out the vcpu
process, which does look like "the vcpu is preempted". Once we
sched_in() that vcpu process again we mark it as non-preempted,
even though it remains in kvm wfx handler. Why isn't it right?

Another call path is iret:

<iret>
__schedule()
 context_switch()
  prepare_task_switch()
   fire_sched_in_preempt_notifiers()
    kvm_sched_out()
     kvm_arch_vcpu_put()

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/4] arm64: add guest pvstate support
  2021-07-12 15:42   ` Marc Zyngier
@ 2021-07-21  2:05     ` Sergey Senozhatsky
  2021-07-21  8:22       ` Marc Zyngier
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2021-07-21  2:05 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sergey Senozhatsky, Will Deacon, Suleiman Souhlal,
	Joel Fernandes, linux-arm-kernel, kvmarm, linux-kernel,
	virtualization

On (21/07/12 16:42), Marc Zyngier wrote:
> > 
> > PV-vcpu-state is a per-CPU struct, which, for the time being,
> > holds boolean `preempted' vCPU state. During the startup,
> > given that host supports PV-state, each guest vCPU sends
> > a pointer to its per-CPU variable to the host as a payload
> 
> What is the expected memory type for this memory region? What is its
> life cycle? Where is it allocated from?

Guest per-CPU area, which physical addresses is shared with the host.

> > with the SMCCC HV call, so that host can update vCPU state
> > when it puts or loads vCPU.
> > 
> > This has impact on the guest's scheduler:
> > 
> > [..]
> >   wake_up_process()
> >    try_to_wake_up()
> >     select_task_rq_fair()
> >      available_idle_cpu()
> >       vcpu_is_preempted()
> > 
> > Some sched benchmarks data is available on the github page [0].
> > 
> > [0] https://github.com/sergey-senozhatsky/arm64-vcpu_is_preempted
> 
> Please include these results in the cover letter. I tend to reply to
> email while offline, and I can't comment on GH.

ACK.

> > +struct vcpu_state {
> 
> If this is KVM specific (which it most likely is), please name-space
> it correctly, and move it to a KVM-specific location.

ACK.

> > +	bool	preempted;
> > +	u8	reserved[63];
> 
> Why 63? Do you attach any particular meaning to a 64byte structure
> (and before you say "cache line size", please look at some of the
> cache line sizes we have to deal with...).

We do have some future plans to share some bits of the guest's context
with the host.

> This should also be versioned from day-1, one way or another.

Makes sense.

> > +};
> > +
> >  #ifdef CONFIG_PARAVIRT
> >  #include <linux/static_call_types.h>
> >  
> > @@ -20,8 +25,22 @@ static inline u64 paravirt_steal_clock(int cpu)
> >  
> >  int __init pv_time_init(void);
> >  
> > +bool dummy_vcpu_is_preempted(unsigned int cpu);
> > +
> > +extern struct static_key pv_vcpu_is_preempted_enabled;
> > +DECLARE_STATIC_CALL(pv_vcpu_is_preempted, dummy_vcpu_is_preempted);
> > +
> > +static inline bool paravirt_vcpu_is_preempted(unsigned int cpu)
> > +{
> > +	return static_call(pv_vcpu_is_preempted)(cpu);
> > +}
> > +
> > +int __init pv_vcpu_state_init(void);
> > +
> >  #else
> >  
> > +#define pv_vcpu_state_init() do {} while (0)
> > +
> >  #define pv_time_init() do {} while (0)
> >  
> >  #endif // CONFIG_PARAVIRT
> > diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
> > index 75fed4460407..d8fc46795d94 100644
> > --- a/arch/arm64/kernel/paravirt.c
> > +++ b/arch/arm64/kernel/paravirt.c
> > @@ -40,6 +40,11 @@ struct pv_time_stolen_time_region {
> >  
> >  static DEFINE_PER_CPU(struct pv_time_stolen_time_region, stolen_time_region);
> >  
> > +static DEFINE_PER_CPU(struct vcpu_state, vcpus_states);
> 
> nit: there is only one 'state' structure per CPU, so I'd prefer the
> singular form.

ACK.

> > +struct static_key pv_vcpu_is_preempted_enabled;
> > +
> > +DEFINE_STATIC_CALL(pv_vcpu_is_preempted, dummy_vcpu_is_preempted);
> > +
> >  static bool steal_acc = true;
> >  static int __init parse_no_stealacc(char *arg)
> >  {
> > @@ -165,3 +170,92 @@ int __init pv_time_init(void)
> >  
> >  	return 0;
> >  }
> > +
> > +bool dummy_vcpu_is_preempted(unsigned int cpu)
> 
> Why does this have to be global?

I think this can be moved away from the header, so then we don't need
to DECLARE_STATIC_CALL() with a dummy function.

> > +static bool has_pv_vcpu_state(void)
> > +{
> > +	struct arm_smccc_res res;
> > +
> > +	/* To detect the presence of PV time support we require SMCCC 1.1+ */
> > +	if (arm_smccc_1_1_get_conduit() == SMCCC_CONDUIT_NONE)
> > +		return false;
> > +
> > +	arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
> > +			     ARM_SMCCC_HV_PV_VCPU_STATE_FEATURES,
> > +			     &res);
> > +
> > +	if (res.a0 != SMCCC_RET_SUCCESS)
> > +		return false;
> > +	return true;
> 
> Please move all this over the the KVM-specific discovery mechanism.

Will take a look.

> > +static int __pv_vcpu_state_hook(unsigned int cpu, int event)
> > +{
> > +	struct arm_smccc_res res;
> > +	struct vcpu_state *st;
> > +
> > +	st = &per_cpu(vcpus_states, cpu);
> > +	arm_smccc_1_1_invoke(event, virt_to_phys(st), &res);
> > +	if (res.a0 != SMCCC_RET_SUCCESS)
> > +		return -EINVAL;
> > +	return 0;
> > +}
> > +
> > +static int vcpu_state_init(unsigned int cpu)
> > +{
> > +	int ret = __pv_vcpu_state_hook(cpu, ARM_SMCCC_HV_PV_VCPU_STATE_INIT);
> > +
> > +	if (ret)
> > +		pr_warn("Unable to ARM_SMCCC_HV_PV_STATE_INIT\n");
> 
> pr_warn_once(), please.

ACK.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 3/4] arm64: do not use dummy vcpu_is_preempted()
  2021-07-12 15:47   ` Marc Zyngier
@ 2021-07-21  2:06     ` Sergey Senozhatsky
  0 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2021-07-21  2:06 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sergey Senozhatsky, Will Deacon, Suleiman Souhlal,
	Joel Fernandes, linux-arm-kernel, kvmarm, linux-kernel,
	virtualization

On (21/07/12 16:47), Marc Zyngier wrote:
> >  #include <asm/qspinlock.h>
> >  #include <asm/qrwlock.h>
> > +#include <asm/paravirt.h>
> >  
> >  /* See include/linux/spinlock.h */
> >  #define smp_mb__after_spinlock()	smp_mb()
> >  
> > -/*
> > - * Changing this will break osq_lock() thanks to the call inside
> > - * smp_cond_load_relaxed().
> > - *
> > - * See:
> > - * https://lore.kernel.org/lkml/20200110100612.GC2827@hirez.programming.kicks-ass.net
> > - */
> 
> Why are you deleting this? Please explain your reasoning in the commit
> message. It seems to me that it still makes complete sense when
> CONFIG_PARAVIRT is not defined.

You are right. I'll move it to !PARAVIRT #else-branch.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/4] arm64: add guest pvstate support
  2021-07-21  2:05     ` Sergey Senozhatsky
@ 2021-07-21  8:22       ` Marc Zyngier
  2021-07-21  8:47         ` Sergey Senozhatsky
  0 siblings, 1 reply; 25+ messages in thread
From: Marc Zyngier @ 2021-07-21  8:22 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Will Deacon, Suleiman Souhlal, Joel Fernandes, linux-arm-kernel,
	kvmarm, linux-kernel, virtualization

On Wed, 21 Jul 2021 03:05:25 +0100,
Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> 
> On (21/07/12 16:42), Marc Zyngier wrote:
> > > 
> > > PV-vcpu-state is a per-CPU struct, which, for the time being,
> > > holds boolean `preempted' vCPU state. During the startup,
> > > given that host supports PV-state, each guest vCPU sends
> > > a pointer to its per-CPU variable to the host as a payload
> > 
> > What is the expected memory type for this memory region? What is its
> > life cycle? Where is it allocated from?
> 
> Guest per-CPU area, which physical addresses is shared with the
> host.

Again: what are the memory types you expect this to be used with? When
will the hypervisor ever stop accessing this? How does it work across
reset?

I'm sorry to be that pressing, but these are the gory details that
actually matter if you want this thing to be maintainable.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 4/4] arm64: add host pv-vcpu-state support
  2021-07-20 18:44     ` Joel Fernandes
@ 2021-07-21  8:40       ` Marc Zyngier
  2021-07-21 10:38         ` Sergey Senozhatsky
  0 siblings, 1 reply; 25+ messages in thread
From: Marc Zyngier @ 2021-07-21  8:40 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Sergey Senozhatsky, Will Deacon, Suleiman Souhlal,
	linux-arm-kernel, kvmarm, linux-kernel, virtualization

On Tue, 20 Jul 2021 19:44:53 +0100,
Joel Fernandes <joelaf@google.com> wrote:
> 
> On Mon, Jul 12, 2021 at 12:24 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> [...]
> > >  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> > >  {
> > > +     kvm_update_vcpu_preempted(vcpu, true);
> >
> > This doesn't look right. With this, you are now telling the guest that
> > a vcpu that is blocked on WFI is preempted. This really isn't the
> > case, as it has voluntarily entered a low-power mode while waiting for
> > an interrupt. Indeed, the vcpu isn't running. A physical CPU wouldn't
> > be running either.
> 
> Can that be cured by just checking vcpu->preempted before calling
> kvm_update_vcpu_preempted() ?

It isn't obvious to me that this is the right thing to do.
vcpu->preempted is always updated on sched-out from the preempt
notifier if the vcpu was on the run-queue, so my guess is that it will
always be set when switching to another task.

What you probably want is to check whether the vcpu is blocked by
introspecting the wait-queue with:

	scuwait_active(kvm_arch_vcpu_get_wait(vcpu)

which will tell you whether you are blocking or not. We are already
using a similar construct for arming a background timer in this case.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/4] arm64: add guest pvstate support
  2021-07-21  8:22       ` Marc Zyngier
@ 2021-07-21  8:47         ` Sergey Senozhatsky
  2021-07-21 10:16           ` Marc Zyngier
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2021-07-21  8:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sergey Senozhatsky, Will Deacon, Suleiman Souhlal,
	Joel Fernandes, linux-arm-kernel, kvmarm, linux-kernel,
	virtualization

On (21/07/21 09:22), Marc Zyngier wrote:
> On Wed, 21 Jul 2021 03:05:25 +0100,
> Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> > 
> > On (21/07/12 16:42), Marc Zyngier wrote:
> > > > 
> > > > PV-vcpu-state is a per-CPU struct, which, for the time being,
> > > > holds boolean `preempted' vCPU state. During the startup,
> > > > given that host supports PV-state, each guest vCPU sends
> > > > a pointer to its per-CPU variable to the host as a payload
> > > 
> > > What is the expected memory type for this memory region? What is its
> > > life cycle? Where is it allocated from?
> > 
> > Guest per-CPU area, which physical addresses is shared with the
> > host.
> 
> Again: what are the memory types you expect this to be used with?

I heard your questions, I'm trying to figure out the answers now.

As of memory type - I presume you are talking about coherent vs
non-coherent memory. Can guest per-CPU memory be non-coherent? Guest
never writes anything to the region of memory it shares with the host,
it only reads what the host writes to it. All reads and writes are
done from CPU (no devices DMA access, etc).

Do we need any cache flushes/syncs in this case?

> When will the hypervisor ever stop accessing this?

KVM always access it for the vcpus that are getting scheduled out or
scheduled in on the host side.

> How does it work across reset?

I need to figure out what happens during reset/migration in the first
place.

> I'm sorry to be that pressing, but these are the gory details that
> actually matter if you want this thing to be maintainable.

Sure, no problem.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 4/4] arm64: add host pv-vcpu-state support
  2021-07-21  1:15     ` Sergey Senozhatsky
@ 2021-07-21  9:10       ` Marc Zyngier
  0 siblings, 0 replies; 25+ messages in thread
From: Marc Zyngier @ 2021-07-21  9:10 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Will Deacon, Suleiman Souhlal, Joel Fernandes, linux-arm-kernel,
	kvmarm, linux-kernel, virtualization

On Wed, 21 Jul 2021 02:15:47 +0100,
Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> 
> On (21/07/12 17:24), Marc Zyngier wrote:
> > >  
> > >  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> > >  {
> > > +	kvm_update_vcpu_preempted(vcpu, true);
> > 
> > This doesn't look right. With this, you are now telling the guest that
> > a vcpu that is blocked on WFI is preempted. This really isn't the
> > case, as it has voluntarily entered a low-power mode while waiting for
> > an interrupt. Indeed, the vcpu isn't running. A physical CPU wouldn't
> > be running either.
> 
> I suppose you are talking about kvm_vcpu_block().

kvm_vcpu_block() is how things are implemented. WFI is the instruction
I'm concerned about.

> Well, it checks kvm_vcpu_check_block() but then it simply schedule()
> out the vcpu process, which does look like "the vcpu is
> preempted". Once we sched_in() that vcpu process again we mark it as
> non-preempted, even though it remains in kvm wfx handler. Why isn't
> it right?

Because the vcpu hasn't been "preempted". It has *voluntarily* gone
into a low-power mode, and how KVM implements this "low-power mode" is
none of the guest's business. This is exactly the same behaviour that
you will have on bare metal. From a Linux guest perspective, the vcpu
is *idle*, not doing anything, and only waiting for an interrupt to
start executing again.

This is a fundamentally different concept from preempting a vcpu
because its time-slice is up. In this second case, you can indeed
mitigate things by exposing steal time and preemption status as you
break the illusion of a machine that is completely controlled by the
guest.

If the "reched on interrupt delivery while blocked on WFI" is too slow
for you, then *that* is the thing that needs addressing. Feeding extra
state to the guest doesn't help.

> Another call path is iret:
> 
> <iret>
> __schedule()
>  context_switch()
>   prepare_task_switch()
>    fire_sched_in_preempt_notifiers()
>     kvm_sched_out()
>      kvm_arch_vcpu_put()

I'm not sure how a x86 concept is relevant here.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 2/4] arm64: add guest pvstate support
  2021-07-21  8:47         ` Sergey Senozhatsky
@ 2021-07-21 10:16           ` Marc Zyngier
  0 siblings, 0 replies; 25+ messages in thread
From: Marc Zyngier @ 2021-07-21 10:16 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Will Deacon, Suleiman Souhlal, Joel Fernandes, linux-arm-kernel,
	kvmarm, linux-kernel, virtualization

On Wed, 21 Jul 2021 09:47:52 +0100,
Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> 
> On (21/07/21 09:22), Marc Zyngier wrote:
> > On Wed, 21 Jul 2021 03:05:25 +0100,
> > Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> > > 
> > > On (21/07/12 16:42), Marc Zyngier wrote:
> > > > > 
> > > > > PV-vcpu-state is a per-CPU struct, which, for the time being,
> > > > > holds boolean `preempted' vCPU state. During the startup,
> > > > > given that host supports PV-state, each guest vCPU sends
> > > > > a pointer to its per-CPU variable to the host as a payload
> > > > 
> > > > What is the expected memory type for this memory region? What is its
> > > > life cycle? Where is it allocated from?
> > > 
> > > Guest per-CPU area, which physical addresses is shared with the
> > > host.
> > 
> > Again: what are the memory types you expect this to be used with?
> 
> I heard your questions, I'm trying to figure out the answers now.
> 
> As of memory type - I presume you are talking about coherent vs
> non-coherent memory.

No. I'm talking about cacheable vs non-cacheable. The ARM architecture
is always coherent for memory that is inner-shareable, which applies
to any system running Linux. On the other hand, there is no
architected cache snooping when using non-cacheable accesses.

> Can guest per-CPU memory be non-coherent? Guest never writes
> anything to the region of memory it shares with the host, it only
> reads what the host writes to it. All reads and writes are done from
> CPU (no devices DMA access, etc).
> 
> Do we need any cache flushes/syncs in this case?

If you expect the guest to have non-cacheable mappings (or to run with
its MMU off at any point, which amounts to the same thing) *and* still
be able to access the shared page, then *someone* will have to perform
CMOs to make these writes visible to the PoC (unless you have FWB).

Needless to say, this would kill any sort of performance gain this
feature could hypothetically bring. Defining the scope for the access
would help mitigating this, even if that's just a sentence saying "the
shared page *must* be accessed from a cacheable mapping".

> 
> > When will the hypervisor ever stop accessing this?
> 
> KVM always access it for the vcpus that are getting scheduled out or
> scheduled in on the host side.

I was more hinting at whether there was a way to disable this at
runtime. Think of a guest using kexec, for example, where you really
don't want the hypervisor to start messing with memory that has been
since reallocated by the guest.

> > How does it work across reset?
> 
> I need to figure out what happens during reset/migration in the first
> place.

Yup.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 4/4] arm64: add host pv-vcpu-state support
  2021-07-21  8:40       ` Marc Zyngier
@ 2021-07-21 10:38         ` Sergey Senozhatsky
  2021-07-21 11:08           ` Marc Zyngier
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2021-07-21 10:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Joel Fernandes, Sergey Senozhatsky, Will Deacon,
	Suleiman Souhlal, linux-arm-kernel, kvmarm, linux-kernel,
	virtualization

On (21/07/21 09:40), Marc Zyngier wrote:
> > 
> > Can that be cured by just checking vcpu->preempted before calling
> > kvm_update_vcpu_preempted() ?
> 
> It isn't obvious to me that this is the right thing to do.
> vcpu->preempted is always updated on sched-out from the preempt
> notifier if the vcpu was on the run-queue, so my guess is that it will
> always be set when switching to another task.
> 
> What you probably want is to check whether the vcpu is blocked by
> introspecting the wait-queue with:
> 
> 	scuwait_active(kvm_arch_vcpu_get_wait(vcpu)
> 
> which will tell you whether you are blocking or not. We are already
> using a similar construct for arming a background timer in this case.

Can we examine if vcpu->run->exit_reason == WFE/WFI and avoid setting
preempted state if so?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv2 4/4] arm64: add host pv-vcpu-state support
  2021-07-21 10:38         ` Sergey Senozhatsky
@ 2021-07-21 11:08           ` Marc Zyngier
  0 siblings, 0 replies; 25+ messages in thread
From: Marc Zyngier @ 2021-07-21 11:08 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Joel Fernandes, Will Deacon, Suleiman Souhlal, linux-arm-kernel,
	kvmarm, linux-kernel, virtualization

On Wed, 21 Jul 2021 11:38:40 +0100,
Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> 
> On (21/07/21 09:40), Marc Zyngier wrote:
> > > 
> > > Can that be cured by just checking vcpu->preempted before calling
> > > kvm_update_vcpu_preempted() ?
> > 
> > It isn't obvious to me that this is the right thing to do.
> > vcpu->preempted is always updated on sched-out from the preempt
> > notifier if the vcpu was on the run-queue, so my guess is that it will
> > always be set when switching to another task.
> > 
> > What you probably want is to check whether the vcpu is blocked by
> > introspecting the wait-queue with:
> > 
> > 	scuwait_active(kvm_arch_vcpu_get_wait(vcpu)
> > 
> > which will tell you whether you are blocking or not. We are already
> > using a similar construct for arming a background timer in this case.
> 
> Can we examine if vcpu->run->exit_reason == WFE/WFI and avoid setting
> preempted state if so?

We never go back to userspace for WFI/WFE, so no reason to populate
the run structure.

Checking for the blocked state is the right thing to do, and we
already have the primitive for this. Just use it.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-07-21 11:17 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-09  4:37 [PATCHv2 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
2021-07-09  4:37 ` [PATCHv2 1/4] arm64: smccc: Add SMCCC pv-vcpu-state function call IDs Sergey Senozhatsky
2021-07-12 14:22   ` Marc Zyngier
2021-07-09  4:37 ` [PATCHv2 2/4] arm64: add guest pvstate support Sergey Senozhatsky
2021-07-09  7:39   ` David Edmondson
2021-07-09  7:52     ` Sergey Senozhatsky
2021-07-09 18:58   ` Joel Fernandes
2021-07-09 21:53     ` Sergey Senozhatsky
2021-07-11 16:58       ` Joel Fernandes
2021-07-12 15:42   ` Marc Zyngier
2021-07-21  2:05     ` Sergey Senozhatsky
2021-07-21  8:22       ` Marc Zyngier
2021-07-21  8:47         ` Sergey Senozhatsky
2021-07-21 10:16           ` Marc Zyngier
2021-07-09  4:37 ` [PATCHv2 3/4] arm64: do not use dummy vcpu_is_preempted() Sergey Senozhatsky
2021-07-12 15:47   ` Marc Zyngier
2021-07-21  2:06     ` Sergey Senozhatsky
2021-07-09  4:37 ` [PATCHv2 4/4] arm64: add host pv-vcpu-state support Sergey Senozhatsky
2021-07-12 16:24   ` Marc Zyngier
2021-07-20 18:44     ` Joel Fernandes
2021-07-21  8:40       ` Marc Zyngier
2021-07-21 10:38         ` Sergey Senozhatsky
2021-07-21 11:08           ` Marc Zyngier
2021-07-21  1:15     ` Sergey Senozhatsky
2021-07-21  9:10       ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).