kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted
@ 2020-07-21  4:17 Sergey Senozhatsky
  2020-07-21  4:17 ` [RFC][PATCH 1/4] arm64:kvm: define pv_state SMCCC HV calls Sergey Senozhatsky
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2020-07-21  4:17 UTC (permalink / raw)
  To: Will Deacon, Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose
  Cc: joelaf, linux-kernel, Sergey Senozhatsky, suleiman, kvmarm,
	linux-arm-kernel

Hello,

	RFC

	We noticed that in a number of cases when we wake_up_process()
on arm64 guest we end up enqueuing that task on a preempted VCPU. The culprit
appears to be the fact that arm64 guests are not aware of VCPU preemption
as such, so when sched picks up an idle VCPU it always assumes that VCPU
is available:

      wake_up_process()
       try_to_wake_up()
        select_task_rq_fair()
         available_idle_cpu()
          vcpu_is_preempted()    // return false;

Which is, obviously, not the case.

This RFC patch set adds a simple vcpu_is_preempted() implementation so
that scheduler can make better decisions when it search for the idle
(v)CPU.

I ran a number of sched benchmarks please refer to [0] for more
details.

[0] https://github.com/sergey-senozhatsky/arm64-vcpu_is_preempted

Sergey Senozhatsky (4):
  arm64:kvm: define pv_state SMCCC HV calls
  arm64: add guest pvstate support
  arm64: add host pvstate support
  arm64: do not use dummy vcpu_is_preempted() anymore

 arch/arm64/include/asm/kvm_host.h  |  23 ++++++
 arch/arm64/include/asm/paravirt.h  |  15 ++++
 arch/arm64/include/asm/spinlock.h  |  17 +++--
 arch/arm64/kernel/Makefile         |   2 +-
 arch/arm64/kernel/paravirt-state.c | 117 +++++++++++++++++++++++++++++
 arch/arm64/kernel/paravirt.c       |   4 +-
 arch/arm64/kernel/time.c           |   1 +
 arch/arm64/kvm/Makefile            |   2 +-
 arch/arm64/kvm/arm.c               |   4 +
 arch/arm64/kvm/hypercalls.c        |  11 +++
 arch/arm64/kvm/pvstate.c           |  58 ++++++++++++++
 include/linux/arm-smccc.h          |  18 +++++
 12 files changed, 262 insertions(+), 10 deletions(-)
 create mode 100644 arch/arm64/kernel/paravirt-state.c
 create mode 100644 arch/arm64/kvm/pvstate.c

-- 
2.27.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC][PATCH 1/4] arm64:kvm: define pv_state SMCCC HV calls
  2020-07-21  4:17 [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
@ 2020-07-21  4:17 ` Sergey Senozhatsky
  2020-07-21  4:17 ` [RFC][PATCH 2/4] arm64: add guest pvstate support Sergey Senozhatsky
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2020-07-21  4:17 UTC (permalink / raw)
  To: Will Deacon, Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose
  Cc: joelaf, linux-kernel, Sergey Senozhatsky, suleiman, kvmarm,
	linux-arm-kernel

These will be used later on to configure and enable vCPU
PV-state support, which is needed for vcpu_is_preempted().

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 include/linux/arm-smccc.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 15c706fb0a37..cba054662692 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -99,6 +99,24 @@
 			   ARM_SMCCC_OWNER_STANDARD_HYP,	\
 			   0x21)
 
+#define ARM_SMCCC_HV_PV_STATE_FEATURES				\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
+			   ARM_SMCCC_SMC_64,			\
+			   ARM_SMCCC_OWNER_STANDARD_HYP,	\
+			   0x22)
+
+#define ARM_SMCCC_HV_PV_STATE_INIT				\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
+			   ARM_SMCCC_SMC_64,			\
+			   ARM_SMCCC_OWNER_STANDARD_HYP,	\
+			   0x23)
+
+#define ARM_SMCCC_HV_PV_STATE_RELEASE				\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
+			   ARM_SMCCC_SMC_64,			\
+			   ARM_SMCCC_OWNER_STANDARD_HYP,	\
+			   0x24)
+
 /*
  * Return codes defined in ARM DEN 0070A
  * ARM DEN 0070A is now merged/consolidated into ARM DEN 0028 C
-- 
2.27.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC][PATCH 2/4] arm64: add guest pvstate support
  2020-07-21  4:17 [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
  2020-07-21  4:17 ` [RFC][PATCH 1/4] arm64:kvm: define pv_state SMCCC HV calls Sergey Senozhatsky
@ 2020-07-21  4:17 ` Sergey Senozhatsky
  2020-07-21  4:17 ` [RFC][PATCH 3/4] arm64: add host " Sergey Senozhatsky
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2020-07-21  4:17 UTC (permalink / raw)
  To: Will Deacon, Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose
  Cc: joelaf, linux-kernel, Sergey Senozhatsky, suleiman, kvmarm,
	linux-arm-kernel

PV-state is a per-CPU struct, which, for the time being,
holds boolean `preempted' vCPU state. During the startup,
given that host supports PV-state, each guest vCPU sends
a pointer to its per-CPU variable to the host as a payload
with the SMCC HV call, so that host can update vCPU state
when it puts or loads vCPU.

This has impact on the guest's scheduler - it does check
the state of the vCPU it wants to run a task on:

[..]
  wake_up_process()
   try_to_wake_up()
    select_task_rq_fair()
     available_idle_cpu()
      vcpu_is_preempted()

Some sched benchmarks data is available on the github page [0].

[0] https://github.com/sergey-senozhatsky/arm64-vcpu_is_preempted
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 arch/arm64/include/asm/paravirt.h  |  15 ++++
 arch/arm64/kernel/Makefile         |   2 +-
 arch/arm64/kernel/paravirt-state.c | 117 +++++++++++++++++++++++++++++
 arch/arm64/kernel/paravirt.c       |   4 +-
 arch/arm64/kernel/time.c           |   1 +
 5 files changed, 137 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/kernel/paravirt-state.c

diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h
index cf3a0fd7c1a7..1bf164b2041b 100644
--- a/arch/arm64/include/asm/paravirt.h
+++ b/arch/arm64/include/asm/paravirt.h
@@ -7,12 +7,22 @@ struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
+struct pvstate_vcpu_info {
+	bool	preempted;
+	u8	reserved[63];
+};
+
+struct pv_state_ops {
+	bool (*vcpu_is_preempted)(int cpu);
+};
+
 struct pv_time_ops {
 	unsigned long long (*steal_clock)(int cpu);
 };
 
 struct paravirt_patch_template {
 	struct pv_time_ops time;
+	struct pv_state_ops state;
 };
 
 extern struct paravirt_patch_template pv_ops;
@@ -22,10 +32,15 @@ static inline u64 paravirt_steal_clock(int cpu)
 	return pv_ops.time.steal_clock(cpu);
 }
 
+bool native_vcpu_is_preempted(int cpu);
+bool paravirt_vcpu_is_preempted(int cpu);
+
+int __init pv_state_init(void);
 int __init pv_time_init(void);
 
 #else
 
+#define pv_state_init() do {} while (0)
 #define pv_time_init() do {} while (0)
 
 #endif // CONFIG_PARAVIRT
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 5fb9b728459b..18974d5e798d 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -48,7 +48,7 @@ obj-$(CONFIG_ARMV8_DEPRECATED)		+= armv8_deprecated.o
 obj-$(CONFIG_ACPI)			+= acpi.o
 obj-$(CONFIG_ACPI_NUMA)			+= acpi_numa.o
 obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
-obj-$(CONFIG_PARAVIRT)			+= paravirt.o
+obj-$(CONFIG_PARAVIRT)			+= paravirt.o paravirt-state.o
 obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr.o
 obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
diff --git a/arch/arm64/kernel/paravirt-state.c b/arch/arm64/kernel/paravirt-state.c
new file mode 100644
index 000000000000..4ae92a84c73d
--- /dev/null
+++ b/arch/arm64/kernel/paravirt-state.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "arm-pvstate: " fmt
+
+#include <linux/arm-smccc.h>
+#include <linux/cpuhotplug.h>
+#include <linux/export.h>
+#include <linux/io.h>
+#include <linux/jump_label.h>
+#include <linux/printk.h>
+#include <linux/psci.h>
+#include <linux/reboot.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <asm/paravirt.h>
+#include <asm/smp_plat.h>
+
+static DEFINE_PER_CPU(struct pvstate_vcpu_info, vcpus_states);
+
+bool native_vcpu_is_preempted(int cpu)
+{
+	return false;
+}
+
+static bool pv_vcpu_is_preempted(int cpu)
+{
+	struct pvstate_vcpu_info *st;
+
+	st = &per_cpu(vcpus_states, cpu);
+	return READ_ONCE(st->preempted);
+}
+
+bool paravirt_vcpu_is_preempted(int cpu)
+{
+	return pv_ops.state.vcpu_is_preempted(cpu);
+}
+
+static bool has_pvstate(void)
+{
+	struct arm_smccc_res res;
+
+	/* To detect the presence of PV time support we require SMCCC 1.1+ */
+	if (arm_smccc_1_1_get_conduit() == SMCCC_CONDUIT_NONE)
+		return false;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
+			     ARM_SMCCC_HV_PV_STATE_FEATURES,
+			     &res);
+
+	if (res.a0 != SMCCC_RET_SUCCESS)
+		return false;
+	return true;
+}
+
+static int __pvstate_cpu_hook(unsigned int cpu, int event)
+{
+	struct arm_smccc_res res;
+	struct pvstate_vcpu_info *st;
+
+	st = &per_cpu(vcpus_states, cpu);
+	arm_smccc_1_1_invoke(event, virt_to_phys(st), &res);
+	if (res.a0 != SMCCC_RET_SUCCESS)
+		return -EINVAL;
+	return 0;
+}
+
+static int pvstate_cpu_init(unsigned int cpu)
+{
+	int ret = __pvstate_cpu_hook(cpu, ARM_SMCCC_HV_PV_STATE_INIT);
+
+	if (ret)
+		pr_warn("Unable to ARM_SMCCC_HV_PV_STATE_INIT\n");
+	return ret;
+}
+
+static int pvstate_cpu_release(unsigned int cpu)
+{
+	int ret = __pvstate_cpu_hook(cpu, ARM_SMCCC_HV_PV_STATE_RELEASE);
+
+	if (ret)
+		pr_warn("Unable to ARM_SMCCC_HV_PV_STATE_RELEASE\n");
+	return ret;
+}
+
+static int pvstate_register_hooks(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state(CPUHP_AP_ARM_KVMPV_STARTING,
+			"hypervisor/arm/pvstate:starting",
+			pvstate_cpu_init,
+			pvstate_cpu_release);
+	if (ret < 0)
+		pr_warn("Failed to register CPU hooks\n");
+	return ret;
+}
+
+static int __pvstate_init(void)
+{
+	return pvstate_register_hooks();
+}
+
+int __init pv_state_init(void)
+{
+	int ret;
+
+	if (!has_pvstate())
+		return 0;
+
+	ret = __pvstate_init();
+	if (ret)
+		return ret;
+
+	pv_ops.state.vcpu_is_preempted = pv_vcpu_is_preempted;
+	return 0;
+}
diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
index 295d66490584..3fec7563ac27 100644
--- a/arch/arm64/kernel/paravirt.c
+++ b/arch/arm64/kernel/paravirt.c
@@ -26,7 +26,9 @@
 struct static_key paravirt_steal_enabled;
 struct static_key paravirt_steal_rq_enabled;
 
-struct paravirt_patch_template pv_ops;
+struct paravirt_patch_template pv_ops = {
+	.state.vcpu_is_preempted = native_vcpu_is_preempted,
+};
 EXPORT_SYMBOL_GPL(pv_ops);
 
 struct pv_time_stolen_time_region {
diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index eebbc8d7123e..50c55792f72b 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -68,4 +68,5 @@ void __init time_init(void)
 	lpj_fine = arch_timer_rate / HZ;
 
 	pv_time_init();
+	pv_state_init();
 }
-- 
2.27.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC][PATCH 3/4] arm64: add host pvstate support
  2020-07-21  4:17 [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
  2020-07-21  4:17 ` [RFC][PATCH 1/4] arm64:kvm: define pv_state SMCCC HV calls Sergey Senozhatsky
  2020-07-21  4:17 ` [RFC][PATCH 2/4] arm64: add guest pvstate support Sergey Senozhatsky
@ 2020-07-21  4:17 ` Sergey Senozhatsky
  2020-07-21  4:17 ` [RFC][PATCH 4/4] arm64: do not use dummy vcpu_is_preempted() anymore Sergey Senozhatsky
  2020-08-17  2:03 ` [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
  4 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2020-07-21  4:17 UTC (permalink / raw)
  To: Will Deacon, Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose
  Cc: joelaf, linux-kernel, Sergey Senozhatsky, suleiman, kvmarm,
	linux-arm-kernel

Add PV-state support bits to the host. Host uses the guest
PV-state per-CPU pointers to update the VCPU state each time
it kvm_arch_vcpu_load() or kvm_arch_vcpu_put() the VCPU, so
that guest scheduler can become aware of the fact that not
all VCPUs are always available. Currently guest scheduler
on amr64 always assumes that all CPUs are available because
vcpu_is_preempted() is not implemented on arm64.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 arch/arm64/include/asm/kvm_host.h | 23 ++++++++++++
 arch/arm64/kvm/Makefile           |  2 +-
 arch/arm64/kvm/arm.c              |  4 +++
 arch/arm64/kvm/hypercalls.c       | 11 ++++++
 arch/arm64/kvm/pvstate.c          | 58 +++++++++++++++++++++++++++++++
 5 files changed, 97 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/pvstate.c

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f6485c1b8c15..e6f325a4ba70 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -372,6 +372,12 @@ struct kvm_vcpu_arch {
 		u64 last_steal;
 		gpa_t base;
 	} steal;
+
+	/* PV state of the VCPU (e.g. vcpu_is_preempted()) */
+	struct {
+		gpa_t base;
+		struct gfn_to_hva_cache ghc;
+	} pv_state;
 };
 
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
@@ -556,6 +562,23 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch)
 	return (vcpu_arch->steal.base != GPA_INVALID);
 }
 
+int kvm_init_vcpu_state(struct kvm_vcpu *vcpu, gfn_t addr);
+int kvm_release_vcpu_state(struct kvm_vcpu *vcpu);
+
+static inline void kvm_arm_vcpu_state_init(struct kvm_vcpu_arch *vcpu_arch)
+{
+	vcpu_arch->pv_state.base = GPA_INVALID;
+	memset(&vcpu_arch->pv_state.ghc, 0x00, sizeof(struct gfn_to_hva_cache));
+}
+
+static inline bool
+kvm_arm_is_vcpu_state_enabled(struct kvm_vcpu_arch *vcpu_arch)
+{
+	return (vcpu_arch->pv_state.base != GPA_INVALID);
+}
+
+void kvm_update_vcpu_preempted(struct kvm_vcpu *vcpu, bool preempted);
+
 void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 99977c1972cc..efc05ac0d781 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -12,7 +12,7 @@ obj-$(CONFIG_KVM) += hyp/
 
 kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
 	 $(KVM)/vfio.o $(KVM)/irqchip.o \
-	 arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
+	 arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o pvstate.o \
 	 inject_fault.o regmap.o va_layout.o hyp.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o \
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 691d21e4c717..a1229e3b7117 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -265,6 +265,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	kvm_arm_pvtime_vcpu_init(&vcpu->arch);
 
+	kvm_arm_vcpu_state_init(&vcpu->arch);
+
 	vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
 
 	err = kvm_vgic_vcpu_init(vcpu);
@@ -355,10 +357,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	if (vcpu_has_ptrauth(vcpu))
 		vcpu_ptrauth_disable(vcpu);
+	kvm_update_vcpu_preempted(vcpu, false);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	kvm_update_vcpu_preempted(vcpu, true);
 	kvm_arch_vcpu_put_fp(vcpu);
 	if (has_vhe())
 		kvm_vcpu_put_sysregs_vhe(vcpu);
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 550dfa3e53cd..3660552a8e02 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -52,6 +52,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 		case ARM_SMCCC_HV_PV_TIME_FEATURES:
 			val = SMCCC_RET_SUCCESS;
 			break;
+		case ARM_SMCCC_HV_PV_STATE_FEATURES:
+			val = SMCCC_RET_SUCCESS;
+			break;
 		}
 		break;
 	case ARM_SMCCC_HV_PV_TIME_FEATURES:
@@ -62,6 +65,14 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 		if (gpa != GPA_INVALID)
 			val = gpa;
 		break;
+	case ARM_SMCCC_HV_PV_STATE_INIT:
+		if (kvm_init_vcpu_state(vcpu, smccc_get_arg1(vcpu)) == 0)
+			val = SMCCC_RET_SUCCESS;
+		break;
+	case ARM_SMCCC_HV_PV_STATE_RELEASE:
+		if (kvm_release_vcpu_state(vcpu) == 0)
+			val = SMCCC_RET_SUCCESS;
+		break;
 	default:
 		return kvm_psci_call(vcpu);
 	}
diff --git a/arch/arm64/kvm/pvstate.c b/arch/arm64/kvm/pvstate.c
new file mode 100644
index 000000000000..3152555f3ed2
--- /dev/null
+++ b/arch/arm64/kvm/pvstate.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/arm-smccc.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_mmu.h>
+#include <asm/paravirt.h>
+
+#include <kvm/arm_hypercalls.h>
+
+int kvm_init_vcpu_state(struct kvm_vcpu *vcpu, gpa_t addr)
+{
+	if (kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
+		return 0;
+
+	if (kvm_gfn_to_hva_cache_init(vcpu->kvm,
+				      &vcpu->arch.pv_state.ghc,
+				      addr,
+				      sizeof(struct pvstate_vcpu_info)))
+		return -EINVAL;
+
+	vcpu->arch.pv_state.base = addr;
+	return 0;
+}
+
+int kvm_release_vcpu_state(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
+		return 0;
+
+	vcpu->arch.pv_state.base = GPA_INVALID;
+	return 0;
+}
+
+void kvm_update_vcpu_preempted(struct kvm_vcpu *vcpu, bool preempted)
+{
+	struct kvm *kvm = vcpu->kvm;
+	u64 idx;
+
+	if (!kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
+		return;
+
+	/*
+	 * This function is called from atomic context, so we need to
+	 * disable page faults. kvm_write_guest_cached() will call
+	 * might_fault().
+	 */
+	pagefault_disable();
+	/*
+	 * Need to take the SRCU lock because kvm_write_guest_offset_cached()
+	 * calls kvm_memslots();
+	 */
+	idx = srcu_read_lock(&kvm->srcu);
+	kvm_write_guest_cached(kvm, &vcpu->arch.pv_state.ghc,
+			       &preempted, sizeof(bool));
+	srcu_read_unlock(&kvm->srcu, idx);
+	pagefault_enable();
+}
-- 
2.27.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC][PATCH 4/4] arm64: do not use dummy vcpu_is_preempted() anymore
  2020-07-21  4:17 [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2020-07-21  4:17 ` [RFC][PATCH 3/4] arm64: add host " Sergey Senozhatsky
@ 2020-07-21  4:17 ` Sergey Senozhatsky
  2020-08-17  2:03 ` [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
  4 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2020-07-21  4:17 UTC (permalink / raw)
  To: Will Deacon, Marc Zyngier, James Morse, Julien Thierry, Suzuki K Poulose
  Cc: joelaf, linux-kernel, Sergey Senozhatsky, suleiman, kvmarm,
	linux-arm-kernel

vcpu_is_preempted() now can represent the actual state of
the VCPU, so the scheduler can make better decisions when
it picks the idle CPU to enqueue a task on. I executed a
whole bunch of scheduler tests [0]. One particular test
that shows the importance of vcpu_is_preempted() is AIO
stress-ng test:

x Disabled vcpu_is_preempted()
stress-ng: info:  [100] stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
stress-ng: info:  [100]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [100] aio              222927     10.01      0.89     27.61     22262.04      7822.00
stress-ng: info:  [139] aio              217043     10.01      1.00     26.80     21685.46      7807.30
stress-ng: info:  [178] aio              217261     10.01      1.08     26.79     21709.36      7795.51

+ Enabled vcpu_is_preempted()
stress-ng: info:  [100] aio              432750     10.00      1.14     19.03     43264.33     21455.13
stress-ng: info:  [139] aio              426771     10.01      1.09     18.67     42629.13     21597.72
stress-ng: info:  [179] aio              533039     10.00      1.42     20.39     53281.70     24440.12

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 arch/arm64/include/asm/spinlock.h | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index 9083d6992603..6a390eeabe82 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -11,17 +11,20 @@
 /* See include/linux/spinlock.h */
 #define smp_mb__after_spinlock()	smp_mb()
 
-/*
- * Changing this will break osq_lock() thanks to the call inside
- * smp_cond_load_relaxed().
- *
- * See:
- * https://lore.kernel.org/lkml/20200110100612.GC2827@hirez.programming.kicks-ass.net
- */
 #define vcpu_is_preempted vcpu_is_preempted
+
+#ifdef CONFIG_PARAVIRT
+extern bool paravirt_vcpu_is_preempted(int cpu);
+
+static inline bool vcpu_is_preempted(int cpu)
+{
+	return paravirt_vcpu_is_preempted(cpu);
+}
+#else
 static inline bool vcpu_is_preempted(int cpu)
 {
 	return false;
 }
+#endif /* CONFIG_PARAVIRT */
 
 #endif /* __ASM_SPINLOCK_H */
-- 
2.27.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted
  2020-07-21  4:17 [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
                   ` (3 preceding siblings ...)
  2020-07-21  4:17 ` [RFC][PATCH 4/4] arm64: do not use dummy vcpu_is_preempted() anymore Sergey Senozhatsky
@ 2020-08-17  2:03 ` Sergey Senozhatsky
  2020-08-17 12:03   ` yezengruan
  4 siblings, 1 reply; 17+ messages in thread
From: Sergey Senozhatsky @ 2020-08-17  2:03 UTC (permalink / raw)
  To: Will Deacon, Marc Zyngier
  Cc: joelaf, linux-kernel, Sergey Senozhatsky, suleiman, kvmarm,
	linux-arm-kernel

On (20/07/21 13:17), Sergey Senozhatsky wrote:
> Hello,
> 
> 	RFC
> 
> 	We noticed that in a number of cases when we wake_up_process()
> on arm64 guest we end up enqueuing that task on a preempted VCPU. The culprit
> appears to be the fact that arm64 guests are not aware of VCPU preemption
> as such, so when sched picks up an idle VCPU it always assumes that VCPU
> is available:
> 
>       wake_up_process()
>        try_to_wake_up()
>         select_task_rq_fair()
>          available_idle_cpu()
>           vcpu_is_preempted()    // return false;
> 
> Which is, obviously, not the case.
> 
> This RFC patch set adds a simple vcpu_is_preempted() implementation so
> that scheduler can make better decisions when it search for the idle
> (v)CPU.

Hi,

A gentle ping.

	-ss
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted
  2020-08-17  2:03 ` [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
@ 2020-08-17 12:03   ` yezengruan
  2020-08-17 12:25     ` Marc Zyngier
  2020-09-11  8:46     ` Sergey Senozhatsky
  0 siblings, 2 replies; 17+ messages in thread
From: yezengruan @ 2020-08-17 12:03 UTC (permalink / raw)
  To: Sergey Senozhatsky, will, maz
  Cc: joelaf, linux-kernel, suleiman, kvmarm, linux-arm-kernel

On 2020/8/17 10:03, Sergey Senozhatsky wrote:
> On (20/07/21 13:17), Sergey Senozhatsky wrote:
>> Hello,
>>
>> 	RFC
>>
>> 	We noticed that in a number of cases when we wake_up_process()
>> on arm64 guest we end up enqueuing that task on a preempted VCPU. The culprit
>> appears to be the fact that arm64 guests are not aware of VCPU preemption
>> as such, so when sched picks up an idle VCPU it always assumes that VCPU
>> is available:
>>
>>       wake_up_process()
>>        try_to_wake_up()
>>         select_task_rq_fair()
>>          available_idle_cpu()
>>           vcpu_is_preempted()    // return false;
>>
>> Which is, obviously, not the case.
>>
>> This RFC patch set adds a simple vcpu_is_preempted() implementation so
>> that scheduler can make better decisions when it search for the idle
>> (v)CPU.
> Hi,
>
> A gentle ping.
>
> 	-ss
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> .

Hi Sergey,

I have a set of patches similar to yours.

https://lore.kernel.org/lkml/20191226135833.1052-1-yezengruan@huawei.com/

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted
  2020-08-17 12:03   ` yezengruan
@ 2020-08-17 12:25     ` Marc Zyngier
  2020-08-17 14:15       ` yezengruan
  2020-09-11  8:58       ` Sergey Senozhatsky
  2020-09-11  8:46     ` Sergey Senozhatsky
  1 sibling, 2 replies; 17+ messages in thread
From: Marc Zyngier @ 2020-08-17 12:25 UTC (permalink / raw)
  To: yezengruan
  Cc: joelaf, linux-kernel, Sergey Senozhatsky, suleiman, will, kvmarm,
	linux-arm-kernel

On 2020-08-17 13:03, yezengruan wrote:
> On 2020/8/17 10:03, Sergey Senozhatsky wrote:
>> On (20/07/21 13:17), Sergey Senozhatsky wrote:
>>> Hello,
>>> 
>>> 	RFC
>>> 
>>> 	We noticed that in a number of cases when we wake_up_process()
>>> on arm64 guest we end up enqueuing that task on a preempted VCPU. The 
>>> culprit
>>> appears to be the fact that arm64 guests are not aware of VCPU 
>>> preemption
>>> as such, so when sched picks up an idle VCPU it always assumes that 
>>> VCPU
>>> is available:
>>> 
>>>       wake_up_process()
>>>        try_to_wake_up()
>>>         select_task_rq_fair()
>>>          available_idle_cpu()
>>>           vcpu_is_preempted()    // return false;
>>> 
>>> Which is, obviously, not the case.
>>> 
>>> This RFC patch set adds a simple vcpu_is_preempted() implementation 
>>> so
>>> that scheduler can make better decisions when it search for the idle
>>> (v)CPU.
>> Hi,
>> 
>> A gentle ping.
>> 
>> 	-ss
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>> .
> 
> Hi Sergey,
> 
> I have a set of patches similar to yours.
> 
> https://lore.kernel.org/lkml/20191226135833.1052-1-yezengruan@huawei.com/

It really isn't the same thing at all. You are exposing PV spinlocks,
while Sergey exposes preemption to vcpus. The former is a massive,
and probably unnecessary superset of the later, which only impacts
the scheduler (it doesn't change the way locks are implemented).

You really shouldn't conflate the two (which you have done in your
series).

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted
  2020-08-17 12:25     ` Marc Zyngier
@ 2020-08-17 14:15       ` yezengruan
  2020-09-11  8:58       ` Sergey Senozhatsky
  1 sibling, 0 replies; 17+ messages in thread
From: yezengruan @ 2020-08-17 14:15 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: joelaf, linux-kernel, Sergey Senozhatsky, suleiman, will, kvmarm,
	linux-arm-kernel

On 2020/8/17 20:25, Marc Zyngier wrote:
> On 2020-08-17 13:03, yezengruan wrote:
>> On 2020/8/17 10:03, Sergey Senozhatsky wrote:
>>> On (20/07/21 13:17), Sergey Senozhatsky wrote:
>>>> Hello,
>>>>
>>>>     RFC
>>>>
>>>>     We noticed that in a number of cases when we wake_up_process()
>>>> on arm64 guest we end up enqueuing that task on a preempted VCPU. The culprit
>>>> appears to be the fact that arm64 guests are not aware of VCPU preemption
>>>> as such, so when sched picks up an idle VCPU it always assumes that VCPU
>>>> is available:
>>>>
>>>>       wake_up_process()
>>>>        try_to_wake_up()
>>>>         select_task_rq_fair()
>>>>          available_idle_cpu()
>>>>           vcpu_is_preempted()    // return false;
>>>>
>>>> Which is, obviously, not the case.
>>>>
>>>> This RFC patch set adds a simple vcpu_is_preempted() implementation so
>>>> that scheduler can make better decisions when it search for the idle
>>>> (v)CPU.
>>> Hi,
>>>
>>> A gentle ping.
>>>
>>>     -ss
>>> _______________________________________________
>>> kvmarm mailing list
>>> kvmarm@lists.cs.columbia.edu
>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>>> .
>>
>> Hi Sergey,
>>
>> I have a set of patches similar to yours.
>>
>> https://lore.kernel.org/lkml/20191226135833.1052-1-yezengruan@huawei.com/
>
> It really isn't the same thing at all. You are exposing PV spinlocks,
> while Sergey exposes preemption to vcpus. The former is a massive,
> and probably unnecessary superset of the later, which only impacts
> the scheduler (it doesn't change the way locks are implemented).
>
> You really shouldn't conflate the two (which you have done in your
> series).
>
>         M.


Hi Marc,

Actually, both series support paravirtualization vcpu_is_preempted. My
series regard this as PV lock, but only the vcpu_is_preempted interface
of pv_lock_opt is implemented.

Except wake_up_process(), the vcpu_is_preempted interface of the current
kernel is used in the following scenarios:

kernel/sched/core.c:                            <---- wake_up_process()
--------------------
available_idle_cpu
    vcpu_is_preempted

kernel/locking/rwsem.c:
-----------------------
rwsem_optimistic_spin
    rwsem_spin_on_owner
        owner_on_cpu
            vcpu_is_preempted

kernel/locking/mutex.c:
-----------------------
mutex_optimistic_spin
    mutex_spin_on_owner
        vcpu_is_preempted

kernel/locking/osq_lock.c:
--------------------------
osq_lock
    vcpu_is_preempted


Thanks,

Zengruan

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted
  2020-08-17 12:03   ` yezengruan
  2020-08-17 12:25     ` Marc Zyngier
@ 2020-09-11  8:46     ` Sergey Senozhatsky
  1 sibling, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2020-09-11  8:46 UTC (permalink / raw)
  To: yezengruan
  Cc: joelaf, maz, linux-kernel, Sergey Senozhatsky, suleiman, will,
	kvmarm, linux-arm-kernel

Hi,

On (20/08/17 20:03), yezengruan wrote:
> Hi Sergey,
> 
> I have a set of patches similar to yours.
> 
> https://lore.kernel.org/lkml/20191226135833.1052-1-yezengruan@huawei.com/

I'm sorry for the belated reply.

Right, quite similar, but not exactly, I believe. I deliberately wanted
to untangle vcpu preemption (which is a characteristics feature) from
pv-lock, which may be somewhat implementation dependent.

Perhaps vcpu_is_preempted() should not even be implemented on per-arch
basis, but instead it can be more of a "core" functionality.

	-ss
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted
  2020-08-17 12:25     ` Marc Zyngier
  2020-08-17 14:15       ` yezengruan
@ 2020-09-11  8:58       ` Sergey Senozhatsky
  2020-12-08 20:02         ` Joel Fernandes
  1 sibling, 1 reply; 17+ messages in thread
From: Sergey Senozhatsky @ 2020-09-11  8:58 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: joelaf, Sergey Senozhatsky, linux-kernel, suleiman, will, kvmarm,
	linux-arm-kernel

My apologies for the slow reply.

On (20/08/17 13:25), Marc Zyngier wrote:
>
> It really isn't the same thing at all. You are exposing PV spinlocks,
> while Sergey exposes preemption to vcpus.
>

Correct, we see vcpu preemption as a "fundamental" feature, with
consequences that affect scheduling, which is a core feature :)

Marc, is there anything in particular that you dislike about this RFC
patch set? Joel has some ideas, which we may discuss offline if that
works for you.

	-ss
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted
  2020-09-11  8:58       ` Sergey Senozhatsky
@ 2020-12-08 20:02         ` Joel Fernandes
  2020-12-09  9:43           ` Marc Zyngier
  0 siblings, 1 reply; 17+ messages in thread
From: Joel Fernandes @ 2020-12-08 20:02 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Marc Zyngier, LKML, Suleiman Souhlal, Will Deacon, kvmarm,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)

On Fri, Sep 11, 2020 at 4:58 AM Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
>
> My apologies for the slow reply.
>
> On (20/08/17 13:25), Marc Zyngier wrote:
> >
> > It really isn't the same thing at all. You are exposing PV spinlocks,
> > while Sergey exposes preemption to vcpus.
> >
>
> Correct, we see vcpu preemption as a "fundamental" feature, with
> consequences that affect scheduling, which is a core feature :)
>
> Marc, is there anything in particular that you dislike about this RFC
> patch set? Joel has some ideas, which we may discuss offline if that
> works for you.

Hi Marc, Sergey, Just checking what is the latest on this series?

About the idea me and Sergey discussed, at a high level we discussed
being able to share information similar to "Is the vCPU preempted?"
using a more arch-independent infrastructure. I do not believe this
needs to be arch-specific. Maybe the speciifc mechanism about how to
share a page of information needs to be arch-specific, but the actual
information shared need not be. This could open the door to sharing
more such information in an arch-independent way (for example, if the
scheduler needs to know other information such as the capacity of the
CPU that the vCPU is on).

Other thoughts?

thanks,

 - Joel
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted
  2020-12-08 20:02         ` Joel Fernandes
@ 2020-12-09  9:43           ` Marc Zyngier
  2020-12-10  1:39             ` Joel Fernandes
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2020-12-09  9:43 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, Sergey Senozhatsky, Suleiman Souhlal, Will Deacon, kvmarm,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)

Hi all,

On 2020-12-08 20:02, Joel Fernandes wrote:
> On Fri, Sep 11, 2020 at 4:58 AM Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com> wrote:
>> 
>> My apologies for the slow reply.
>> 
>> On (20/08/17 13:25), Marc Zyngier wrote:
>> >
>> > It really isn't the same thing at all. You are exposing PV spinlocks,
>> > while Sergey exposes preemption to vcpus.
>> >
>> 
>> Correct, we see vcpu preemption as a "fundamental" feature, with
>> consequences that affect scheduling, which is a core feature :)
>> 
>> Marc, is there anything in particular that you dislike about this RFC
>> patch set? Joel has some ideas, which we may discuss offline if that
>> works for you.
> 
> Hi Marc, Sergey, Just checking what is the latest on this series?

I was planning to give it a go, but obviously got sidetracked. :-(

> 
> About the idea me and Sergey discussed, at a high level we discussed
> being able to share information similar to "Is the vCPU preempted?"
> using a more arch-independent infrastructure. I do not believe this
> needs to be arch-specific. Maybe the speciifc mechanism about how to
> share a page of information needs to be arch-specific, but the actual
> information shared need not be.

We already have some information sharing in the form of steal time
accounting, and I believe this "vcpu preempted" falls in the same
bucket. It looks like we could implement the feature as an extension
of the steal-time accounting, as the two concepts are linked
(one describes the accumulation of non-running time, the other is
instantaneous).

> This could open the door to sharing
> more such information in an arch-independent way (for example, if the
> scheduler needs to know other information such as the capacity of the
> CPU that the vCPU is on).

Quentin and I have discussed potential ways of improving guest 
scheduling
on terminally broken systems (otherwise known as big-little), in the
form of a capacity request from the guest to the host. I'm not really
keen on the host exposing its own capacity, as that doesn't tell the
host what the guest actually needs.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted
  2020-12-09  9:43           ` Marc Zyngier
@ 2020-12-10  1:39             ` Joel Fernandes
  2020-12-10  8:45               ` Marc Zyngier
  0 siblings, 1 reply; 17+ messages in thread
From: Joel Fernandes @ 2020-12-10  1:39 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: LKML, Sergey Senozhatsky, Suleiman Souhlal, Will Deacon, kvmarm,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)

Hi Marc, nice to hear from you.

On Wed, Dec 9, 2020 at 4:43 AM Marc Zyngier <maz@kernel.org> wrote:
>
> Hi all,
>
> On 2020-12-08 20:02, Joel Fernandes wrote:
> > On Fri, Sep 11, 2020 at 4:58 AM Sergey Senozhatsky
> > <sergey.senozhatsky@gmail.com> wrote:
> >>
> >> My apologies for the slow reply.
> >>
> >> On (20/08/17 13:25), Marc Zyngier wrote:
> >> >
> >> > It really isn't the same thing at all. You are exposing PV spinlocks,
> >> > while Sergey exposes preemption to vcpus.
> >> >
> >>
> >> Correct, we see vcpu preemption as a "fundamental" feature, with
> >> consequences that affect scheduling, which is a core feature :)
> >>
> >> Marc, is there anything in particular that you dislike about this RFC
> >> patch set? Joel has some ideas, which we may discuss offline if that
> >> works for you.
> >
> > Hi Marc, Sergey, Just checking what is the latest on this series?
>
> I was planning to give it a go, but obviously got sidetracked. :-(

Ah, that happens.

> > About the idea me and Sergey discussed, at a high level we discussed
> > being able to share information similar to "Is the vCPU preempted?"
> > using a more arch-independent infrastructure. I do not believe this
> > needs to be arch-specific. Maybe the speciifc mechanism about how to
> > share a page of information needs to be arch-specific, but the actual
> > information shared need not be.
>
> We already have some information sharing in the form of steal time
> accounting, and I believe this "vcpu preempted" falls in the same
> bucket. It looks like we could implement the feature as an extension
> of the steal-time accounting, as the two concepts are linked
> (one describes the accumulation of non-running time, the other is
> instantaneous).

Yeah I noticed the steal stuff. Will go look more into that.

> > This could open the door to sharing
> > more such information in an arch-independent way (for example, if the
> > scheduler needs to know other information such as the capacity of the
> > CPU that the vCPU is on).
>
> Quentin and I have discussed potential ways of improving guest
> scheduling
> on terminally broken systems (otherwise known as big-little), in the
> form of a capacity request from the guest to the host. I'm not really
> keen on the host exposing its own capacity, as that doesn't tell the
> host what the guest actually needs.

I am not sure how a capacity request could work well. It seems the
cost of a repeated hypercall could be prohibitive. In this case, a
lighter approach might be for KVM to restrict vCPU threads to run on
certain types of cores, and pass the capacity information to the guest
at guest's boot time. This would be a one-time cost to pay. And then,
then the guest scheduler can handle the scheduling appropriately
without any more hypercalls. Thoughts?

- Joel
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted
  2020-12-10  1:39             ` Joel Fernandes
@ 2020-12-10  8:45               ` Marc Zyngier
  2020-12-11  9:34                 ` Quentin Perret
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2020-12-10  8:45 UTC (permalink / raw)
  To: Joel Fernandes, Quentin Perret
  Cc: LKML, Sergey Senozhatsky, Suleiman Souhlal, Will Deacon, kvmarm,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)

On 2020-12-10 01:39, Joel Fernandes wrote:

[...]

>> Quentin and I have discussed potential ways of improving guest
>> scheduling
>> on terminally broken systems (otherwise known as big-little), in the
>> form of a capacity request from the guest to the host. I'm not really
>> keen on the host exposing its own capacity, as that doesn't tell the
>> host what the guest actually needs.
> 
> I am not sure how a capacity request could work well. It seems the
> cost of a repeated hypercall could be prohibitive. In this case, a
> lighter approach might be for KVM to restrict vCPU threads to run on
> certain types of cores, and pass the capacity information to the guest
> at guest's boot time.

That seems like a very narrow use case. If you actually pin vcpus to
physical CPU classes, DT is the right place to put things, because
it is completely static. This is effectively creating a virtual
big-little, which is in my opinion a userspace job.

> This would be a one-time cost to pay. And then,
> then the guest scheduler can handle the scheduling appropriately
> without any more hypercalls. Thoughts?

Anything that is a one-off belongs to firmware configuration, IMO.

The case I'm concerned with is when vcpus are allowed to roam across
the system, and hit random physical CPUs because the host has no idea
of the workload the guest deals with (specially as the AMU counters
are either absent or unusable on any available core).

The cost of a hypercall really depends on where you terminate it.
If it is a shallow exit, that's only a few hundred cycles on any half
baked CPU. Go all the way to userspace, and the host scheduler is the
limit. But the frequency of that hypercall obviously matters too.

How often do you expect the capacity request to fire? Probably not
on each and every time slice, right?

Quentin, can you shed some light on this?

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted
  2020-12-10  8:45               ` Marc Zyngier
@ 2020-12-11  9:34                 ` Quentin Perret
  2020-12-16  1:45                   ` Joel Fernandes
  0 siblings, 1 reply; 17+ messages in thread
From: Quentin Perret @ 2020-12-11  9:34 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Joel Fernandes, LKML, Sergey Senozhatsky, Suleiman Souhlal,
	Will Deacon, kvmarm,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)

On Thursday 10 Dec 2020 at 08:45:22 (+0000), Marc Zyngier wrote:
> On 2020-12-10 01:39, Joel Fernandes wrote:
> 
> [...]
> 
> > > Quentin and I have discussed potential ways of improving guest
> > > scheduling
> > > on terminally broken systems (otherwise known as big-little), in the
> > > form of a capacity request from the guest to the host. I'm not really
> > > keen on the host exposing its own capacity, as that doesn't tell the
> > > host what the guest actually needs.
> > 
> > I am not sure how a capacity request could work well. It seems the
> > cost of a repeated hypercall could be prohibitive. In this case, a
> > lighter approach might be for KVM to restrict vCPU threads to run on
> > certain types of cores, and pass the capacity information to the guest
> > at guest's boot time.
> 
> That seems like a very narrow use case. If you actually pin vcpus to
> physical CPU classes, DT is the right place to put things, because
> it is completely static. This is effectively creating a virtual
> big-little, which is in my opinion a userspace job.

+1, all you should need for this is to have the VMM pin the vCPUS and
set capacity-dmips-mhz in the guest DT accordingly. And if you're
worried about sharing the runqueue with host tasks, could you vacate the
host CPUs using cpusets or such?

The last difficult bit is how to drive DVFS. I suppose Marc's suggestion
to relay capacity requests from the guest would help with that.

> > This would be a one-time cost to pay. And then,
> > then the guest scheduler can handle the scheduling appropriately
> > without any more hypercalls. Thoughts?
> 
> Anything that is a one-off belongs to firmware configuration, IMO.
> 
> The case I'm concerned with is when vcpus are allowed to roam across
> the system, and hit random physical CPUs because the host has no idea
> of the workload the guest deals with (specially as the AMU counters
> are either absent or unusable on any available core).
> 
> The cost of a hypercall really depends on where you terminate it.
> If it is a shallow exit, that's only a few hundred cycles on any half
> baked CPU. Go all the way to userspace, and the host scheduler is the
> limit. But the frequency of that hypercall obviously matters too.
> 
> How often do you expect the capacity request to fire? Probably not
> on each and every time slice, right?
> 
> Quentin, can you shed some light on this?

Assuming that we change the 'capacity request' (aka uclamp.min of the
vCPU) every time the guest makes a frequency request, then the answer
very much is 'it depends on the workload'. Yes there is an overhead, but
I think it is hard to say how bad that would be before we give it a go.
It's unfortunately not uncommon to have painfully slow frequency changes
on real hardware, so this may be just fine. And there may be ways we
can mitigate this too (with rate limiting and such), so all in all it is
worth a try.

Also as per the above, this still would help even if the VMM pins vCPUs
and such, so these two things can live and complement each other I
think.

Now, for the patch originally under discussion here, no objection from
me in principle, it looks like a nice improvement to the stolen time
stuff and I can see how that could help some use-cases, so +1 from me.

Thanks,
Quentin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted
  2020-12-11  9:34                 ` Quentin Perret
@ 2020-12-16  1:45                   ` Joel Fernandes
  0 siblings, 0 replies; 17+ messages in thread
From: Joel Fernandes @ 2020-12-16  1:45 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Sergey Senozhatsky, Marc Zyngier, LKML, Suleiman Souhlal,
	Will Deacon, kvmarm,
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)

Hi Marc, Quentin,

On Fri, Dec 11, 2020 at 4:34 AM Quentin Perret <qperret@google.com> wrote:
>
> On Thursday 10 Dec 2020 at 08:45:22 (+0000), Marc Zyngier wrote:
> > On 2020-12-10 01:39, Joel Fernandes wrote:
> >
> > [...]
> >
> > > > Quentin and I have discussed potential ways of improving guest
> > > > scheduling
> > > > on terminally broken systems (otherwise known as big-little), in the
> > > > form of a capacity request from the guest to the host. I'm not really
> > > > keen on the host exposing its own capacity, as that doesn't tell the
> > > > host what the guest actually needs.
> > >
> > > I am not sure how a capacity request could work well. It seems the
> > > cost of a repeated hypercall could be prohibitive. In this case, a
> > > lighter approach might be for KVM to restrict vCPU threads to run on
> > > certain types of cores, and pass the capacity information to the guest
> > > at guest's boot time.
> >
> > That seems like a very narrow use case. If you actually pin vcpus to
> > physical CPU classes, DT is the right place to put things, because
> > it is completely static. This is effectively creating a virtual
> > big-little, which is in my opinion a userspace job.
>
> +1, all you should need for this is to have the VMM pin the vCPUS and
> set capacity-dmips-mhz in the guest DT accordingly. And if you're
> worried about sharing the runqueue with host tasks, could you vacate the
> host CPUs using cpusets or such?

I agree, the VMM is the right place for it with appropriate DT
settings. I think this is similar to how CPUID is emulated on Intel as
well (for example to specify SMT topology for a vCPU) -- it is done by
the VMM.

On sharing vCPU with host tasks, that is indeed an issue because the
host does not know the priority of an app (For example, a "top app"
running in Android in a VM). The sharing with host tasks should be Ok
as long as the scheduler priorities of the vCPU threads on the host
are setup correctly?

> The last difficult bit is how to drive DVFS. I suppose Marc's suggestion
> to relay capacity requests from the guest would help with that.

Yeah I misunderstood Marc.  I think for DVFS, a hypercall for capacity
request should work and be infrequent enough. IIRC, there is some rate
limiting support in cpufreq governors as well that should reduce the
rate of hypercalls if needed.

> > > This would be a one-time cost to pay. And then,
> > > then the guest scheduler can handle the scheduling appropriately
> > > without any more hypercalls. Thoughts?
> >
> > Anything that is a one-off belongs to firmware configuration, IMO.
> >
> > The case I'm concerned with is when vcpus are allowed to roam across
> > the system, and hit random physical CPUs because the host has no idea
> > of the workload the guest deals with (specially as the AMU counters
> > are either absent or unusable on any available core).

It sounds like this might be a usecase for pinning the vCPU threads
appropriately (So designate a set of vCPU threads to only run on bigs
and another set to only run on LITTLEs).  The host can setup the DT to
describe this and the VM kernel's scheduler can do appropriate task
placement.  Did I miss anything?

> > The cost of a hypercall really depends on where you terminate it.
> > If it is a shallow exit, that's only a few hundred cycles on any half
> > baked CPU. Go all the way to userspace, and the host scheduler is the
> > limit. But the frequency of that hypercall obviously matters too.
> >
> > How often do you expect the capacity request to fire? Probably not
> > on each and every time slice, right?
> >
> > Quentin, can you shed some light on this?
>
> Assuming that we change the 'capacity request' (aka uclamp.min of the
> vCPU) every time the guest makes a frequency request, then the answer
> very much is 'it depends on the workload'. Yes there is an overhead, but
> I think it is hard to say how bad that would be before we give it a go.
> It's unfortunately not uncommon to have painfully slow frequency changes
> on real hardware, so this may be just fine. And there may be ways we
> can mitigate this too (with rate limiting and such), so all in all it is
> worth a try.

Agreed.

> Also as per the above, this still would help even if the VMM pins vCPUs
> and such, so these two things can live and complement each other I
> think.

Makes sense.

> Now, for the patch originally under discussion here, no objection from
> me in principle, it looks like a nice improvement to the stolen time
> stuff and I can see how that could help some use-cases, so +1 from me.

Sounds good!

thanks,

 - Joel
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-12-16  9:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-21  4:17 [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
2020-07-21  4:17 ` [RFC][PATCH 1/4] arm64:kvm: define pv_state SMCCC HV calls Sergey Senozhatsky
2020-07-21  4:17 ` [RFC][PATCH 2/4] arm64: add guest pvstate support Sergey Senozhatsky
2020-07-21  4:17 ` [RFC][PATCH 3/4] arm64: add host " Sergey Senozhatsky
2020-07-21  4:17 ` [RFC][PATCH 4/4] arm64: do not use dummy vcpu_is_preempted() anymore Sergey Senozhatsky
2020-08-17  2:03 ` [RFC][PATCH 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
2020-08-17 12:03   ` yezengruan
2020-08-17 12:25     ` Marc Zyngier
2020-08-17 14:15       ` yezengruan
2020-09-11  8:58       ` Sergey Senozhatsky
2020-12-08 20:02         ` Joel Fernandes
2020-12-09  9:43           ` Marc Zyngier
2020-12-10  1:39             ` Joel Fernandes
2020-12-10  8:45               ` Marc Zyngier
2020-12-11  9:34                 ` Quentin Perret
2020-12-16  1:45                   ` Joel Fernandes
2020-09-11  8:46     ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).