linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] kvm/arm: New VMID allocator based on asid
@ 2021-07-29 10:40 Shameer Kolothum
  2021-07-29 10:40 ` [PATCH v3 1/4] KVM: arm64: Introduce a new VMID allocator for KVM Shameer Kolothum
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Shameer Kolothum @ 2021-07-29 10:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, linux-kernel
  Cc: maz, will, catalin.marinas, james.morse, julien.thierry.kdev,
	suzuki.poulose, jean-philippe, Alexandru.Elisei, qperret,
	linuxarm

Hi,

Major changes since v2 (Based on Will's feedback)
  -Dropped adding a new static key and cpufeature for retrieving
   supported VMID bits. Instead, we now make use of the 
   kvm_arm_vmid_bits variable (patch #2).

  -Since we expect less frequent rollover in the case of VMIDs,
   the TLB invalidation is now broadcasted on rollover instead
   of keeping per CPU flush_pending info and issuing a local
   context flush.  

 -Clear active_vmids on vCPU schedule out to avoid unnecessarily
  reserving the VMID space(patch #3). 

 -I have kept the struct kvm_vmid as it is for now(instead of a
  typedef as suggested), as we may soon add another variable to
  it when we introduce Pinned KVM VMID support.

Sanity tested on HiSilicon D06 board.

Thanks,
Shameer


RFCv1 --> v2
   - Dropped "pinned VMID" support for now.
   - Dropped RFC tag.

History(from RFC v1):
-------------------

Please find the RFC series here,
https://lore.kernel.org/kvmarm/20210506165232.1969-1-shameerali.kolothum.thodi@huawei.com/

This is based on a suggestion from Will [0] to try out the asid
based kvm vmid solution as a separate VMID allocator instead of
the shared lib approach attempted in v4[1].

The idea is to compare both the approaches and see whether the
shared lib solution with callbacks make sense or not. 

Though we are not yet using the pinned vmids yet, patch #2 has
code for pinned vmid support. This is just to help the comparison.

Test Setup/Results
----------------
The measurement was made with maxcpus set to 8 and with the
number of VMID limited to 4-bit. The test involves running
concurrently 40 guests with 2 vCPUs. Each guest will then
execute hackbench 5 times before exiting.

The performance difference between the current algo and the
new one are(avg. of 10 runs):
    - 1.9% less entry/exit from the guest
    - 0.5% faster

This is more or less comparable to v4 numbers.

For the complete series, please see,
https://github.com/hisilicon/kernel-dev/tree/private-v5.12-rc7-vmid-2nd-rfc

and for the shared asid lib v4 solution,
https://github.com/hisilicon/kernel-dev/tree/private-v5.12-rc7-asid-v4

As you can see there are ofcourse code duplication with this
approach but may be it is more easy to maintain considering
the complexity involved.

Please take a look and let me know your feedback.

Thanks,
Shameer

Julien Grall (1):
  KVM: arm64: Align the VMID allocation with the arm64 ASID one

Shameer Kolothum (3):
  KVM: arm64: Introduce a new VMID allocator for KVM
  KVM: arm64: Make VMID bits accessible outside of allocator
  KVM: arm64: Clear active_vmids on vCPU schedule out

 arch/arm64/include/asm/kvm_host.h     |  10 +-
 arch/arm64/include/asm/kvm_mmu.h      |   4 +-
 arch/arm64/kernel/image-vars.h        |   3 +
 arch/arm64/kvm/Makefile               |   2 +-
 arch/arm64/kvm/arm.c                  | 122 +++++------------
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |   3 +-
 arch/arm64/kvm/mmu.c                  |   1 -
 arch/arm64/kvm/vmid.c                 | 182 ++++++++++++++++++++++++++
 8 files changed, 228 insertions(+), 99 deletions(-)
 create mode 100644 arch/arm64/kvm/vmid.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/4] KVM: arm64: Introduce a new VMID allocator for KVM
  2021-07-29 10:40 [PATCH v3 0/4] kvm/arm: New VMID allocator based on asid Shameer Kolothum
@ 2021-07-29 10:40 ` Shameer Kolothum
  2021-08-03 11:38   ` Will Deacon
  2021-07-29 10:40 ` [PATCH v3 2/4] KVM: arm64: Make VMID bits accessible outside of allocator Shameer Kolothum
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Shameer Kolothum @ 2021-07-29 10:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, linux-kernel
  Cc: maz, will, catalin.marinas, james.morse, julien.thierry.kdev,
	suzuki.poulose, jean-philippe, Alexandru.Elisei, qperret,
	linuxarm

A new VMID allocator for arm64 KVM use. This is based on
arm64 ASID allocator algorithm.

One major deviation from the ASID allocator is the way we
flush the context. Unlike ASID allocator, we expect less
frequent rollover in the case of VMIDs. Hence, instead of
marking the CPU as flush_pending and issuing a local context
invalidation on the next context switch, we broadcast TLB
flush + I-cache invalidation over the inner shareable domain
on rollover.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 arch/arm64/include/asm/kvm_host.h |   4 +
 arch/arm64/kvm/vmid.c             | 176 ++++++++++++++++++++++++++++++
 2 files changed, 180 insertions(+)
 create mode 100644 arch/arm64/kvm/vmid.c

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 41911585ae0c..9df15811e7df 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -685,6 +685,10 @@ int kvm_arm_pvtime_get_attr(struct kvm_vcpu *vcpu,
 int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
 			    struct kvm_device_attr *attr);
 
+int kvm_arm_vmid_alloc_init(void);
+void kvm_arm_vmid_alloc_free(void);
+void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
+
 static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch)
 {
 	vcpu_arch->steal.base = GPA_INVALID;
diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
new file mode 100644
index 000000000000..b2dc12549405
--- /dev/null
+++ b/arch/arm64/kvm/vmid.c
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * VMID allocator.
+ *
+ * Based on Arm64 ASID allocator algorithm.
+ * Please refer arch/arm64/mm/context.c for detailed
+ * comments on algorithm.
+ *
+ * Copyright (C) 2002-2003 Deep Blue Solutions Ltd, all rights reserved.
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+
+#include <asm/kvm_asm.h>
+#include <asm/kvm_mmu.h>
+
+static unsigned int kvm_arm_vmid_bits;
+static DEFINE_RAW_SPINLOCK(cpu_vmid_lock);
+
+static atomic64_t vmid_generation;
+static unsigned long *vmid_map;
+
+static DEFINE_PER_CPU(atomic64_t, active_vmids);
+static DEFINE_PER_CPU(u64, reserved_vmids);
+
+#define VMID_MASK		(~GENMASK(kvm_arm_vmid_bits - 1, 0))
+#define VMID_FIRST_VERSION	(1UL << kvm_arm_vmid_bits)
+
+#define NUM_USER_VMIDS		VMID_FIRST_VERSION
+#define vmid2idx(vmid)		((vmid) & ~VMID_MASK)
+#define idx2vmid(idx)		vmid2idx(idx)
+
+#define vmid_gen_match(vmid) \
+	(!(((vmid) ^ atomic64_read(&vmid_generation)) >> kvm_arm_vmid_bits))
+
+static void flush_context(void)
+{
+	int cpu;
+	u64 vmid;
+
+	bitmap_clear(vmid_map, 0, NUM_USER_VMIDS);
+
+	for_each_possible_cpu(cpu) {
+		vmid = atomic64_xchg_relaxed(&per_cpu(active_vmids, cpu), 0);
+
+		/* Preserve reserved VMID */
+		if (vmid == 0)
+			vmid = per_cpu(reserved_vmids, cpu);
+		__set_bit(vmid2idx(vmid), vmid_map);
+		per_cpu(reserved_vmids, cpu) = vmid;
+	}
+
+	/*
+	 * Unlike ASID allocator, we expect less frequent rollover in
+	 * case of VMIDs. Hence, instead of marking the CPU as
+	 * flush_pending and issuing a local context invalidation on
+	 * the next context-switch, we broadcast TLB flush + I-cache
+	 * invalidation over the inner shareable domain on rollover.
+	 */
+	 kvm_call_hyp(__kvm_flush_vm_context);
+}
+
+static bool check_update_reserved_vmid(u64 vmid, u64 newvmid)
+{
+	int cpu;
+	bool hit = false;
+
+	/*
+	 * Iterate over the set of reserved VMIDs looking for a match
+	 * and update to use newvmid (i.e. the same VMID in the current
+	 * generation).
+	 */
+	for_each_possible_cpu(cpu) {
+		if (per_cpu(reserved_vmids, cpu) == vmid) {
+			hit = true;
+			per_cpu(reserved_vmids, cpu) = newvmid;
+		}
+	}
+
+	return hit;
+}
+
+static u64 new_vmid(struct kvm_vmid *kvm_vmid)
+{
+	static u32 cur_idx = 1;
+	u64 vmid = atomic64_read(&kvm_vmid->id);
+	u64 generation = atomic64_read(&vmid_generation);
+
+	if (vmid != 0) {
+		u64 newvmid = generation | (vmid & ~VMID_MASK);
+
+		if (check_update_reserved_vmid(vmid, newvmid))
+			return newvmid;
+
+		if (!__test_and_set_bit(vmid2idx(vmid), vmid_map))
+			return newvmid;
+	}
+
+	vmid = find_next_zero_bit(vmid_map, NUM_USER_VMIDS, cur_idx);
+	if (vmid != NUM_USER_VMIDS)
+		goto set_vmid;
+
+	/* We're out of VMIDs, so increment the global generation count */
+	generation = atomic64_add_return_relaxed(VMID_FIRST_VERSION,
+						 &vmid_generation);
+	flush_context();
+
+	/* We have more VMIDs than CPUs, so this will always succeed */
+	vmid = find_next_zero_bit(vmid_map, NUM_USER_VMIDS, 1);
+
+set_vmid:
+	__set_bit(vmid, vmid_map);
+	cur_idx = vmid;
+	return idx2vmid(vmid) | generation;
+}
+
+void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
+{
+	unsigned long flags;
+	unsigned int cpu;
+	u64 vmid, old_active_vmid;
+
+	vmid = atomic64_read(&kvm_vmid->id);
+
+	/*
+	 * Please refer comments in check_and_switch_context() in
+	 * arch/arm64/mm/context.c.
+	 */
+	old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids));
+	if (old_active_vmid && vmid_gen_match(vmid) &&
+	    atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
+				     old_active_vmid, vmid))
+		return;
+
+	raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
+
+	/* Check that our VMID belongs to the current generation. */
+	vmid = atomic64_read(&kvm_vmid->id);
+	if (!vmid_gen_match(vmid)) {
+		vmid = new_vmid(kvm_vmid);
+		atomic64_set(&kvm_vmid->id, vmid);
+	}
+
+	cpu = smp_processor_id();
+
+	atomic64_set(this_cpu_ptr(&active_vmids), vmid);
+	raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
+}
+
+/*
+ * Initialize the VMID allocator
+ */
+int kvm_arm_vmid_alloc_init(void)
+{
+	kvm_arm_vmid_bits = kvm_get_vmid_bits();
+
+	/*
+	 * Expect allocation after rollover to fail if we don't have
+	 * at least one more VMID than CPUs. VMID #0 is always reserved.
+	 */
+	WARN_ON(NUM_USER_VMIDS - 1 <= num_possible_cpus());
+	atomic64_set(&vmid_generation, VMID_FIRST_VERSION);
+	vmid_map = kcalloc(BITS_TO_LONGS(NUM_USER_VMIDS),
+			   sizeof(*vmid_map), GFP_KERNEL);
+	if (!vmid_map)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void kvm_arm_vmid_alloc_free(void)
+{
+	kfree(vmid_map);
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/4] KVM: arm64: Make VMID bits accessible outside of allocator
  2021-07-29 10:40 [PATCH v3 0/4] kvm/arm: New VMID allocator based on asid Shameer Kolothum
  2021-07-29 10:40 ` [PATCH v3 1/4] KVM: arm64: Introduce a new VMID allocator for KVM Shameer Kolothum
@ 2021-07-29 10:40 ` Shameer Kolothum
  2021-07-29 10:40 ` [PATCH v3 3/4] KVM: arm64: Align the VMID allocation with the arm64 ASID one Shameer Kolothum
  2021-07-29 10:40 ` [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out Shameer Kolothum
  3 siblings, 0 replies; 17+ messages in thread
From: Shameer Kolothum @ 2021-07-29 10:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, linux-kernel
  Cc: maz, will, catalin.marinas, james.morse, julien.thierry.kdev,
	suzuki.poulose, jean-philippe, Alexandru.Elisei, qperret,
	linuxarm

Since we already set the kvm_arm_vmid_bits in the VMID allocator
init function, make it accessible outside as well so that it can
be used in the subsequent patch.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 arch/arm64/include/asm/kvm_host.h | 1 +
 arch/arm64/kernel/image-vars.h    | 3 +++
 arch/arm64/kvm/vmid.c             | 2 +-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 9df15811e7df..ee0de63396ec 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -685,6 +685,7 @@ int kvm_arm_pvtime_get_attr(struct kvm_vcpu *vcpu,
 int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
 			    struct kvm_device_attr *attr);
 
+extern unsigned int kvm_arm_vmid_bits;
 int kvm_arm_vmid_alloc_init(void);
 void kvm_arm_vmid_alloc_free(void);
 void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index c96a9a0043bf..c12963c3a055 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -79,6 +79,9 @@ KVM_NVHE_ALIAS(__hyp_stub_vectors);
 /* Kernel symbol used by icache_is_vpipt(). */
 KVM_NVHE_ALIAS(__icache_flags);
 
+/* VMID bits set by the KVM VMID allocator */
+KVM_NVHE_ALIAS(kvm_arm_vmid_bits);
+
 /* Kernel symbols needed for cpus_have_final/const_caps checks. */
 KVM_NVHE_ALIAS(arm64_const_caps_ready);
 KVM_NVHE_ALIAS(cpu_hwcap_keys);
diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
index b2dc12549405..5584e84aed95 100644
--- a/arch/arm64/kvm/vmid.c
+++ b/arch/arm64/kvm/vmid.c
@@ -16,7 +16,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
 
-static unsigned int kvm_arm_vmid_bits;
+unsigned int kvm_arm_vmid_bits;
 static DEFINE_RAW_SPINLOCK(cpu_vmid_lock);
 
 static atomic64_t vmid_generation;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/4] KVM: arm64: Align the VMID allocation with the arm64 ASID one
  2021-07-29 10:40 [PATCH v3 0/4] kvm/arm: New VMID allocator based on asid Shameer Kolothum
  2021-07-29 10:40 ` [PATCH v3 1/4] KVM: arm64: Introduce a new VMID allocator for KVM Shameer Kolothum
  2021-07-29 10:40 ` [PATCH v3 2/4] KVM: arm64: Make VMID bits accessible outside of allocator Shameer Kolothum
@ 2021-07-29 10:40 ` Shameer Kolothum
  2021-07-29 14:59   ` kernel test robot
  2021-07-29 10:40 ` [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out Shameer Kolothum
  3 siblings, 1 reply; 17+ messages in thread
From: Shameer Kolothum @ 2021-07-29 10:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, linux-kernel
  Cc: maz, will, catalin.marinas, james.morse, julien.thierry.kdev,
	suzuki.poulose, jean-philippe, Alexandru.Elisei, qperret,
	linuxarm

From: Julien Grall <julien.grall@arm.com>

At the moment, the VMID algorithm will send an SGI to all the
CPUs to force an exit and then broadcast a full TLB flush and
I-Cache invalidation.

This patch uses the new VMID allocator. The benefits are:
   - Aligns with arm64 ASID algorithm.
   - CPUs are not forced to exit at roll-over. Instead,
     the VMID will be marked reserved and context invalidation
     is broadcasted. This will reduce the IPIs traffic.
   - More flexible to add support for pinned KVM VMIDs in
     the future.
   
With the new algo, the code is now adapted:
    - The call to update_vmid() will be done with preemption
      disabled as the new algo requires to store information
      per-CPU.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 arch/arm64/include/asm/kvm_host.h     |   4 +-
 arch/arm64/include/asm/kvm_mmu.h      |   4 +-
 arch/arm64/kvm/Makefile               |   2 +-
 arch/arm64/kvm/arm.c                  | 121 +++++++-------------------
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |   3 +-
 arch/arm64/kvm/mmu.c                  |   1 -
 6 files changed, 36 insertions(+), 99 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ee0de63396ec..bb993bce1363 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -71,9 +71,7 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
 
 struct kvm_vmid {
-	/* The VMID generation used for the virt. memory system */
-	u64    vmid_gen;
-	u32    vmid;
+	atomic64_t id;
 };
 
 struct kvm_s2_mmu {
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index b52c5c4b9a3d..61edd05a000c 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -115,6 +115,7 @@ alternative_cb_end
 #include <asm/cache.h>
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
+#include <asm/kvm_host.h>
 
 void kvm_update_va_mask(struct alt_instr *alt,
 			__le32 *origptr, __le32 *updptr, int nr_inst);
@@ -259,7 +260,8 @@ static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
 	u64 cnp = system_supports_cnp() ? VTTBR_CNP_BIT : 0;
 
 	baddr = mmu->pgd_phys;
-	vmid_field = (u64)vmid->vmid << VTTBR_VMID_SHIFT;
+	vmid_field = atomic64_read(&vmid->id) << VTTBR_VMID_SHIFT;
+	vmid_field &= VTTBR_VMID_MASK(kvm_arm_vmid_bits);
 	return kvm_phys_to_vttbr(baddr) | vmid_field | cnp;
 }
 
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 989bb5dad2c8..4a607d52356c 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -16,7 +16,7 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
 	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o \
-	 arch_timer.o trng.o\
+	 arch_timer.o trng.o vmid.o \
 	 vgic/vgic.o vgic/vgic-init.o \
 	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e9a2b8f27792..077e55a511a9 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -55,11 +55,6 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
-/* The VMID used in the VTTBR */
-static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
-static u32 kvm_next_vmid;
-static DEFINE_SPINLOCK(kvm_vmid_lock);
-
 static bool vgic_present;
 
 static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
@@ -500,87 +495,6 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
 	return vcpu_mode_priv(vcpu);
 }
 
-/* Just ensure a guest exit from a particular CPU */
-static void exit_vm_noop(void *info)
-{
-}
-
-void force_vm_exit(const cpumask_t *mask)
-{
-	preempt_disable();
-	smp_call_function_many(mask, exit_vm_noop, NULL, true);
-	preempt_enable();
-}
-
-/**
- * need_new_vmid_gen - check that the VMID is still valid
- * @vmid: The VMID to check
- *
- * return true if there is a new generation of VMIDs being used
- *
- * The hardware supports a limited set of values with the value zero reserved
- * for the host, so we check if an assigned value belongs to a previous
- * generation, which requires us to assign a new value. If we're the first to
- * use a VMID for the new generation, we must flush necessary caches and TLBs
- * on all CPUs.
- */
-static bool need_new_vmid_gen(struct kvm_vmid *vmid)
-{
-	u64 current_vmid_gen = atomic64_read(&kvm_vmid_gen);
-	smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
-	return unlikely(READ_ONCE(vmid->vmid_gen) != current_vmid_gen);
-}
-
-/**
- * update_vmid - Update the vmid with a valid VMID for the current generation
- * @vmid: The stage-2 VMID information struct
- */
-static void update_vmid(struct kvm_vmid *vmid)
-{
-	if (!need_new_vmid_gen(vmid))
-		return;
-
-	spin_lock(&kvm_vmid_lock);
-
-	/*
-	 * We need to re-check the vmid_gen here to ensure that if another vcpu
-	 * already allocated a valid vmid for this vm, then this vcpu should
-	 * use the same vmid.
-	 */
-	if (!need_new_vmid_gen(vmid)) {
-		spin_unlock(&kvm_vmid_lock);
-		return;
-	}
-
-	/* First user of a new VMID generation? */
-	if (unlikely(kvm_next_vmid == 0)) {
-		atomic64_inc(&kvm_vmid_gen);
-		kvm_next_vmid = 1;
-
-		/*
-		 * On SMP we know no other CPUs can use this CPU's or each
-		 * other's VMID after force_vm_exit returns since the
-		 * kvm_vmid_lock blocks them from reentry to the guest.
-		 */
-		force_vm_exit(cpu_all_mask);
-		/*
-		 * Now broadcast TLB + ICACHE invalidation over the inner
-		 * shareable domain to make sure all data structures are
-		 * clean.
-		 */
-		kvm_call_hyp(__kvm_flush_vm_context);
-	}
-
-	vmid->vmid = kvm_next_vmid;
-	kvm_next_vmid++;
-	kvm_next_vmid &= (1 << kvm_get_vmid_bits()) - 1;
-
-	smp_wmb();
-	WRITE_ONCE(vmid->vmid_gen, atomic64_read(&kvm_vmid_gen));
-
-	spin_unlock(&kvm_vmid_lock);
-}
-
 static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 {
 	struct kvm *kvm = vcpu->kvm;
@@ -759,8 +673,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 */
 		cond_resched();
 
-		update_vmid(&vcpu->arch.hw_mmu->vmid);
-
 		check_vcpu_requests(vcpu);
 
 		/*
@@ -770,6 +682,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 */
 		preempt_disable();
 
+		/*
+		 * The VMID allocator only tracks active VMIDs per
+		 * physical CPU, and therefore the VMID allocated may not be
+		 * preserved on VMID roll-over if the task was preempted,
+		 * making a thread's VMID inactive. So we need to call
+		 * kvm_arm_vmid_update() in non-premptible context.
+		 */
+		kvm_arm_vmid_update(&vcpu->arch.hw_mmu->vmid);
+
 		kvm_pmu_flush_hwstate(vcpu);
 
 		local_irq_disable();
@@ -808,8 +729,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 */
 		smp_store_mb(vcpu->mode, IN_GUEST_MODE);
 
-		if (ret <= 0 || need_new_vmid_gen(&vcpu->arch.hw_mmu->vmid) ||
-		    kvm_request_pending(vcpu)) {
+		if (ret <= 0 || kvm_request_pending(vcpu)) {
 			vcpu->mode = OUTSIDE_GUEST_MODE;
 			isb(); /* Ensure work in x_flush_hwstate is committed */
 			kvm_pmu_sync_hwstate(vcpu);
@@ -1519,6 +1439,7 @@ static void cpu_hyp_reset(void)
 {
 	if (!is_kernel_in_hyp_mode())
 		__hyp_reset_vectors();
+
 }
 
 /*
@@ -1698,9 +1619,26 @@ static bool init_psci_relay(void)
 
 static int init_common_resources(void)
 {
+	int err;
+
+	/*
+	 * Initialize the VMID allocator telling it to allocate a single
+	 * VMID per VM.
+	 */
+	err = kvm_arm_vmid_alloc_init();
+	if (err) {
+		kvm_err("Failed to initialize VMID allocator.\n");
+		return err;
+	}
+
 	return kvm_set_ipa_limit();
 }
 
+static void free_common_resources(void)
+{
+	kvm_arm_vmid_alloc_free();
+}
+
 static int init_subsystems(void)
 {
 	int err = 0;
@@ -2108,7 +2046,7 @@ int kvm_arch_init(void *opaque)
 
 	err = kvm_arm_init_sve();
 	if (err)
-		return err;
+		goto out_err;
 
 	if (!in_hyp_mode) {
 		err = init_hyp_mode();
@@ -2149,6 +2087,7 @@ int kvm_arch_init(void *opaque)
 	if (!in_hyp_mode)
 		teardown_hyp_mode();
 out_err:
+	free_common_resources();
 	return err;
 }
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index d938ce95d3bd..858de201247e 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -109,8 +109,7 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
 	mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
 	mmu->arch = &host_kvm.arch;
 	mmu->pgt = &host_kvm.pgt;
-	mmu->vmid.vmid_gen = 0;
-	mmu->vmid.vmid = 0;
+	atomic64_set(&mmu->vmid.id, 0);
 
 	return 0;
 }
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 3155c9e778f0..18346f56e036 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -485,7 +485,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	mmu->arch = &kvm->arch;
 	mmu->pgt = pgt;
 	mmu->pgd_phys = __pa(pgt->pgd);
-	mmu->vmid.vmid_gen = 0;
 	return 0;
 
 out_destroy_pgtable:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out
  2021-07-29 10:40 [PATCH v3 0/4] kvm/arm: New VMID allocator based on asid Shameer Kolothum
                   ` (2 preceding siblings ...)
  2021-07-29 10:40 ` [PATCH v3 3/4] KVM: arm64: Align the VMID allocation with the arm64 ASID one Shameer Kolothum
@ 2021-07-29 10:40 ` Shameer Kolothum
  2021-08-03 11:40   ` Will Deacon
  3 siblings, 1 reply; 17+ messages in thread
From: Shameer Kolothum @ 2021-07-29 10:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, linux-kernel
  Cc: maz, will, catalin.marinas, james.morse, julien.thierry.kdev,
	suzuki.poulose, jean-philippe, Alexandru.Elisei, qperret,
	linuxarm

Like ASID allocator, we copy the active_vmids into the
reserved_vmids on a rollover. But it's unlikely that
every CPU will have a vCPU as current task and we may
end up unnecessarily reserving the VMID space.

Hence, clear active_vmids when scheduling out a vCPU.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 arch/arm64/include/asm/kvm_host.h | 1 +
 arch/arm64/kvm/arm.c              | 1 +
 arch/arm64/kvm/vmid.c             | 6 ++++++
 3 files changed, 8 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index bb993bce1363..d93141cb8d16 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -687,6 +687,7 @@ extern unsigned int kvm_arm_vmid_bits;
 int kvm_arm_vmid_alloc_init(void);
 void kvm_arm_vmid_alloc_free(void);
 void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
+void kvm_arm_vmid_clear_active(void);
 
 static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch)
 {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 077e55a511a9..b134a1b89c84 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -435,6 +435,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	kvm_timer_vcpu_put(vcpu);
 	kvm_vgic_put(vcpu);
 	kvm_vcpu_pmu_restore_host(vcpu);
+	kvm_arm_vmid_clear_active();
 
 	vcpu->cpu = -1;
 }
diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
index 5584e84aed95..5fd51f5445c1 100644
--- a/arch/arm64/kvm/vmid.c
+++ b/arch/arm64/kvm/vmid.c
@@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid *kvm_vmid)
 	return idx2vmid(vmid) | generation;
 }
 
+/* Call with preemption disabled */
+void kvm_arm_vmid_clear_active(void)
+{
+	atomic64_set(this_cpu_ptr(&active_vmids), 0);
+}
+
 void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
 {
 	unsigned long flags;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/4] KVM: arm64: Align the VMID allocation with the arm64 ASID one
  2021-07-29 10:40 ` [PATCH v3 3/4] KVM: arm64: Align the VMID allocation with the arm64 ASID one Shameer Kolothum
@ 2021-07-29 14:59   ` kernel test robot
  0 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-07-29 14:59 UTC (permalink / raw)
  To: Shameer Kolothum, linux-arm-kernel, kvmarm, linux-kernel
  Cc: kbuild-all, maz, will, catalin.marinas, james.morse,
	julien.thierry.kdev, suzuki.poulose, jean-philippe

[-- Attachment #1: Type: text/plain, Size: 4504 bytes --]

Hi Shameer,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kvmarm/next]
[also build test WARNING on arm64/for-next/core arm/for-next soc/for-next rockchip/for-next shawnguo/for-next clk/clk-next v5.14-rc3 next-20210728]
[cannot apply to xlnx/master keystone/next linux-rpi/for-rpi-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Shameer-Kolothum/kvm-arm-New-VMID-allocator-based-on-asid/20210729-184607
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 10.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/3ffbfe2e26ad324870b87a46b930a9a4c0a8faa1
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Shameer-Kolothum/kvm-arm-New-VMID-allocator-based-on-asid/20210729-184607
        git checkout 3ffbfe2e26ad324870b87a46b930a9a4c0a8faa1
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   arch/arm64/kvm/vmid.c: In function 'kvm_arm_vmid_update':
>> arch/arm64/kvm/vmid.c:122:15: warning: variable 'cpu' set but not used [-Wunused-but-set-variable]
     122 |  unsigned int cpu;
         |               ^~~


vim +/cpu +122 arch/arm64/kvm/vmid.c

0537d0e3bea2fa Shameer Kolothum 2021-07-29  118  
0537d0e3bea2fa Shameer Kolothum 2021-07-29  119  void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
0537d0e3bea2fa Shameer Kolothum 2021-07-29  120  {
0537d0e3bea2fa Shameer Kolothum 2021-07-29  121  	unsigned long flags;
0537d0e3bea2fa Shameer Kolothum 2021-07-29 @122  	unsigned int cpu;
0537d0e3bea2fa Shameer Kolothum 2021-07-29  123  	u64 vmid, old_active_vmid;
0537d0e3bea2fa Shameer Kolothum 2021-07-29  124  
0537d0e3bea2fa Shameer Kolothum 2021-07-29  125  	vmid = atomic64_read(&kvm_vmid->id);
0537d0e3bea2fa Shameer Kolothum 2021-07-29  126  
0537d0e3bea2fa Shameer Kolothum 2021-07-29  127  	/*
0537d0e3bea2fa Shameer Kolothum 2021-07-29  128  	 * Please refer comments in check_and_switch_context() in
0537d0e3bea2fa Shameer Kolothum 2021-07-29  129  	 * arch/arm64/mm/context.c.
0537d0e3bea2fa Shameer Kolothum 2021-07-29  130  	 */
0537d0e3bea2fa Shameer Kolothum 2021-07-29  131  	old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids));
0537d0e3bea2fa Shameer Kolothum 2021-07-29  132  	if (old_active_vmid && vmid_gen_match(vmid) &&
0537d0e3bea2fa Shameer Kolothum 2021-07-29  133  	    atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
0537d0e3bea2fa Shameer Kolothum 2021-07-29  134  				     old_active_vmid, vmid))
0537d0e3bea2fa Shameer Kolothum 2021-07-29  135  		return;
0537d0e3bea2fa Shameer Kolothum 2021-07-29  136  
0537d0e3bea2fa Shameer Kolothum 2021-07-29  137  	raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
0537d0e3bea2fa Shameer Kolothum 2021-07-29  138  
0537d0e3bea2fa Shameer Kolothum 2021-07-29  139  	/* Check that our VMID belongs to the current generation. */
0537d0e3bea2fa Shameer Kolothum 2021-07-29  140  	vmid = atomic64_read(&kvm_vmid->id);
0537d0e3bea2fa Shameer Kolothum 2021-07-29  141  	if (!vmid_gen_match(vmid)) {
0537d0e3bea2fa Shameer Kolothum 2021-07-29  142  		vmid = new_vmid(kvm_vmid);
0537d0e3bea2fa Shameer Kolothum 2021-07-29  143  		atomic64_set(&kvm_vmid->id, vmid);
0537d0e3bea2fa Shameer Kolothum 2021-07-29  144  	}
0537d0e3bea2fa Shameer Kolothum 2021-07-29  145  
0537d0e3bea2fa Shameer Kolothum 2021-07-29  146  	cpu = smp_processor_id();
0537d0e3bea2fa Shameer Kolothum 2021-07-29  147  
0537d0e3bea2fa Shameer Kolothum 2021-07-29  148  	atomic64_set(this_cpu_ptr(&active_vmids), vmid);
0537d0e3bea2fa Shameer Kolothum 2021-07-29  149  	raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
0537d0e3bea2fa Shameer Kolothum 2021-07-29  150  }
0537d0e3bea2fa Shameer Kolothum 2021-07-29  151  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 77437 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/4] KVM: arm64: Introduce a new VMID allocator for KVM
  2021-07-29 10:40 ` [PATCH v3 1/4] KVM: arm64: Introduce a new VMID allocator for KVM Shameer Kolothum
@ 2021-08-03 11:38   ` Will Deacon
  2021-08-03 12:12     ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2021-08-03 11:38 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: linux-arm-kernel, kvmarm, linux-kernel, maz, catalin.marinas,
	james.morse, julien.thierry.kdev, suzuki.poulose, jean-philippe,
	Alexandru.Elisei, qperret, linuxarm

On Thu, Jul 29, 2021 at 11:40:06AM +0100, Shameer Kolothum wrote:
> A new VMID allocator for arm64 KVM use. This is based on
> arm64 ASID allocator algorithm.
> 
> One major deviation from the ASID allocator is the way we
> flush the context. Unlike ASID allocator, we expect less
> frequent rollover in the case of VMIDs. Hence, instead of
> marking the CPU as flush_pending and issuing a local context
> invalidation on the next context switch, we broadcast TLB
> flush + I-cache invalidation over the inner shareable domain
> on rollover.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---

[...]

> +void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
> +{
> +	unsigned long flags;
> +	unsigned int cpu;
> +	u64 vmid, old_active_vmid;
> +
> +	vmid = atomic64_read(&kvm_vmid->id);
> +
> +	/*
> +	 * Please refer comments in check_and_switch_context() in
> +	 * arch/arm64/mm/context.c.
> +	 */
> +	old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids));
> +	if (old_active_vmid && vmid_gen_match(vmid) &&
> +	    atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
> +				     old_active_vmid, vmid))
> +		return;
> +
> +	raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
> +
> +	/* Check that our VMID belongs to the current generation. */
> +	vmid = atomic64_read(&kvm_vmid->id);
> +	if (!vmid_gen_match(vmid)) {
> +		vmid = new_vmid(kvm_vmid);
> +		atomic64_set(&kvm_vmid->id, vmid);

new_vmid() can just set kvm_vmid->id directly

> +	}
> +
> +	cpu = smp_processor_id();

Why?

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out
  2021-07-29 10:40 ` [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out Shameer Kolothum
@ 2021-08-03 11:40   ` Will Deacon
  2021-08-03 12:55     ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2021-08-03 11:40 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: linux-arm-kernel, kvmarm, linux-kernel, maz, catalin.marinas,
	james.morse, julien.thierry.kdev, suzuki.poulose, jean-philippe,
	Alexandru.Elisei, qperret, linuxarm

On Thu, Jul 29, 2021 at 11:40:09AM +0100, Shameer Kolothum wrote:
> Like ASID allocator, we copy the active_vmids into the
> reserved_vmids on a rollover. But it's unlikely that
> every CPU will have a vCPU as current task and we may
> end up unnecessarily reserving the VMID space.
> 
> Hence, clear active_vmids when scheduling out a vCPU.
> 
> Suggested-by: Will Deacon <will@kernel.org>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  arch/arm64/include/asm/kvm_host.h | 1 +
>  arch/arm64/kvm/arm.c              | 1 +
>  arch/arm64/kvm/vmid.c             | 6 ++++++
>  3 files changed, 8 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index bb993bce1363..d93141cb8d16 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -687,6 +687,7 @@ extern unsigned int kvm_arm_vmid_bits;
>  int kvm_arm_vmid_alloc_init(void);
>  void kvm_arm_vmid_alloc_free(void);
>  void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
> +void kvm_arm_vmid_clear_active(void);
>  
>  static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch)
>  {
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 077e55a511a9..b134a1b89c84 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -435,6 +435,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	kvm_timer_vcpu_put(vcpu);
>  	kvm_vgic_put(vcpu);
>  	kvm_vcpu_pmu_restore_host(vcpu);
> +	kvm_arm_vmid_clear_active();
>  
>  	vcpu->cpu = -1;
>  }
> diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
> index 5584e84aed95..5fd51f5445c1 100644
> --- a/arch/arm64/kvm/vmid.c
> +++ b/arch/arm64/kvm/vmid.c
> @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid *kvm_vmid)
>  	return idx2vmid(vmid) | generation;
>  }
>  
> +/* Call with preemption disabled */
> +void kvm_arm_vmid_clear_active(void)
> +{
> +	atomic64_set(this_cpu_ptr(&active_vmids), 0);
> +}

I think this is very broken, as it will force everybody to take the
slow-path when they see an active_vmid of 0.

It also doesn't solve the issue I mentioned before, as an active_vmid of 0
means that the reserved vmid is preserved.

Needs more thought...

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v3 1/4] KVM: arm64: Introduce a new VMID allocator for KVM
  2021-08-03 11:38   ` Will Deacon
@ 2021-08-03 12:12     ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 17+ messages in thread
From: Shameerali Kolothum Thodi @ 2021-08-03 12:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, maz, catalin.marinas,
	james.morse, julien.thierry.kdev, suzuki.poulose, jean-philippe,
	Alexandru.Elisei, qperret, Linuxarm



> -----Original Message-----
> From: Will Deacon [mailto:will@kernel.org]
> Sent: 03 August 2021 12:39
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com;
> james.morse@arm.com; julien.thierry.kdev@gmail.com;
> suzuki.poulose@arm.com; jean-philippe@linaro.org;
> Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm
> <linuxarm@huawei.com>
> Subject: Re: [PATCH v3 1/4] KVM: arm64: Introduce a new VMID allocator for
> KVM
> 
> On Thu, Jul 29, 2021 at 11:40:06AM +0100, Shameer Kolothum wrote:
> > A new VMID allocator for arm64 KVM use. This is based on
> > arm64 ASID allocator algorithm.
> >
> > One major deviation from the ASID allocator is the way we
> > flush the context. Unlike ASID allocator, we expect less
> > frequent rollover in the case of VMIDs. Hence, instead of
> > marking the CPU as flush_pending and issuing a local context
> > invalidation on the next context switch, we broadcast TLB
> > flush + I-cache invalidation over the inner shareable domain
> > on rollover.
> >
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> > ---
> 
> [...]
> 
> > +void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
> > +{
> > +	unsigned long flags;
> > +	unsigned int cpu;
> > +	u64 vmid, old_active_vmid;
> > +
> > +	vmid = atomic64_read(&kvm_vmid->id);
> > +
> > +	/*
> > +	 * Please refer comments in check_and_switch_context() in
> > +	 * arch/arm64/mm/context.c.
> > +	 */
> > +	old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids));
> > +	if (old_active_vmid && vmid_gen_match(vmid) &&
> > +	    atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
> > +				     old_active_vmid, vmid))
> > +		return;
> > +
> > +	raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
> > +
> > +	/* Check that our VMID belongs to the current generation. */
> > +	vmid = atomic64_read(&kvm_vmid->id);
> > +	if (!vmid_gen_match(vmid)) {
> > +		vmid = new_vmid(kvm_vmid);
> > +		atomic64_set(&kvm_vmid->id, vmid);
> 
> new_vmid() can just set kvm_vmid->id directly

Ok.
> 
> > +	}
> > +
> > +	cpu = smp_processor_id();
> 
> Why?

Left over from previous one. Forgot to remove
as we don't have the tlb_flush_pending check anymore.

Thanks,
Shameer

> 
> Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out
  2021-08-03 11:40   ` Will Deacon
@ 2021-08-03 12:55     ` Shameerali Kolothum Thodi
  2021-08-03 15:30       ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Shameerali Kolothum Thodi @ 2021-08-03 12:55 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, maz, catalin.marinas,
	james.morse, julien.thierry.kdev, suzuki.poulose, jean-philippe,
	Alexandru.Elisei, qperret, Linuxarm



> -----Original Message-----
> From: Will Deacon [mailto:will@kernel.org]
> Sent: 03 August 2021 12:41
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com;
> james.morse@arm.com; julien.thierry.kdev@gmail.com;
> suzuki.poulose@arm.com; jean-philippe@linaro.org;
> Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm
> <linuxarm@huawei.com>
> Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU
> schedule out
> 
> On Thu, Jul 29, 2021 at 11:40:09AM +0100, Shameer Kolothum wrote:
> > Like ASID allocator, we copy the active_vmids into the
> > reserved_vmids on a rollover. But it's unlikely that
> > every CPU will have a vCPU as current task and we may
> > end up unnecessarily reserving the VMID space.
> >
> > Hence, clear active_vmids when scheduling out a vCPU.
> >
> > Suggested-by: Will Deacon <will@kernel.org>
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h | 1 +
> >  arch/arm64/kvm/arm.c              | 1 +
> >  arch/arm64/kvm/vmid.c             | 6 ++++++
> >  3 files changed, 8 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h
> b/arch/arm64/include/asm/kvm_host.h
> > index bb993bce1363..d93141cb8d16 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -687,6 +687,7 @@ extern unsigned int kvm_arm_vmid_bits;
> >  int kvm_arm_vmid_alloc_init(void);
> >  void kvm_arm_vmid_alloc_free(void);
> >  void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
> > +void kvm_arm_vmid_clear_active(void);
> >
> >  static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch
> *vcpu_arch)
> >  {
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 077e55a511a9..b134a1b89c84 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -435,6 +435,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >  	kvm_timer_vcpu_put(vcpu);
> >  	kvm_vgic_put(vcpu);
> >  	kvm_vcpu_pmu_restore_host(vcpu);
> > +	kvm_arm_vmid_clear_active();
> >
> >  	vcpu->cpu = -1;
> >  }
> > diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
> > index 5584e84aed95..5fd51f5445c1 100644
> > --- a/arch/arm64/kvm/vmid.c
> > +++ b/arch/arm64/kvm/vmid.c
> > @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid
> *kvm_vmid)
> >  	return idx2vmid(vmid) | generation;
> >  }
> >
> > +/* Call with preemption disabled */
> > +void kvm_arm_vmid_clear_active(void)
> > +{
> > +	atomic64_set(this_cpu_ptr(&active_vmids), 0);
> > +}
> 
> I think this is very broken, as it will force everybody to take the
> slow-path when they see an active_vmid of 0.

Yes. I have seen that happening in my test setup.

> It also doesn't solve the issue I mentioned before, as an active_vmid of 0
> means that the reserved vmid is preserved.
> 
> Needs more thought...

How about we clear all the active_vmids in kvm_arch_free_vm() if it
matches the kvm_vmid->id ? But we may have to hold the lock 
there.

Thanks,
Shameer
 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out
  2021-08-03 12:55     ` Shameerali Kolothum Thodi
@ 2021-08-03 15:30       ` Will Deacon
  2021-08-03 15:56         ` Shameerali Kolothum Thodi
                           ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Will Deacon @ 2021-08-03 15:30 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: linux-arm-kernel, kvmarm, linux-kernel, maz, catalin.marinas,
	james.morse, julien.thierry.kdev, suzuki.poulose, jean-philippe,
	Alexandru.Elisei, qperret, Linuxarm

On Tue, Aug 03, 2021 at 12:55:25PM +0000, Shameerali Kolothum Thodi wrote:
> > > diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
> > > index 5584e84aed95..5fd51f5445c1 100644
> > > --- a/arch/arm64/kvm/vmid.c
> > > +++ b/arch/arm64/kvm/vmid.c
> > > @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid
> > *kvm_vmid)
> > >  	return idx2vmid(vmid) | generation;
> > >  }
> > >
> > > +/* Call with preemption disabled */
> > > +void kvm_arm_vmid_clear_active(void)
> > > +{
> > > +	atomic64_set(this_cpu_ptr(&active_vmids), 0);
> > > +}
> > 
> > I think this is very broken, as it will force everybody to take the
> > slow-path when they see an active_vmid of 0.
> 
> Yes. I have seen that happening in my test setup.

Why didn't you say so?!

> > It also doesn't solve the issue I mentioned before, as an active_vmid of 0
> > means that the reserved vmid is preserved.
> > 
> > Needs more thought...
> 
> How about we clear all the active_vmids in kvm_arch_free_vm() if it
> matches the kvm_vmid->id ? But we may have to hold the lock 
> there

I think we have to be really careful not to run into the "suspended
animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID
allocator handle suspended animation") if we go down this road.

Maybe something along the lines of:

ROLLOVER

  * Take lock
  * Inc generation
    => This will force everybody down the slow path
  * Record active VMIDs
  * Broadcast TLBI
    => Only active VMIDs can be dirty
    => Reserve active VMIDs and mark as allocated

VCPU SCHED IN

  * Set active VMID
  * Check generation
  * If mismatch then:
        * Take lock
        * Try to match a reserved VMID
        * If no reserved VMID, allocate new

VCPU SCHED OUT

  * Clear active VMID

but I'm not daft enough to think I got it right first time. I think it
needs both implementing *and* modelling in TLA+ before we merge it!

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out
  2021-08-03 15:30       ` Will Deacon
@ 2021-08-03 15:56         ` Shameerali Kolothum Thodi
  2021-08-06 12:24         ` Shameerali Kolothum Thodi
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Shameerali Kolothum Thodi @ 2021-08-03 15:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, maz, catalin.marinas,
	james.morse, julien.thierry.kdev, suzuki.poulose, jean-philippe,
	Alexandru.Elisei, qperret, Linuxarm



> -----Original Message-----
> From: Will Deacon [mailto:will@kernel.org]
> Sent: 03 August 2021 16:31
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com;
> james.morse@arm.com; julien.thierry.kdev@gmail.com;
> suzuki.poulose@arm.com; jean-philippe@linaro.org;
> Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm
> <linuxarm@huawei.com>
> Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU
> schedule out
> 
> On Tue, Aug 03, 2021 at 12:55:25PM +0000, Shameerali Kolothum Thodi
> wrote:
> > > > diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
> > > > index 5584e84aed95..5fd51f5445c1 100644
> > > > --- a/arch/arm64/kvm/vmid.c
> > > > +++ b/arch/arm64/kvm/vmid.c
> > > > @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid
> > > *kvm_vmid)
> > > >  	return idx2vmid(vmid) | generation;
> > > >  }
> > > >
> > > > +/* Call with preemption disabled */
> > > > +void kvm_arm_vmid_clear_active(void)
> > > > +{
> > > > +	atomic64_set(this_cpu_ptr(&active_vmids), 0);
> > > > +}
> > >
> > > I think this is very broken, as it will force everybody to take the
> > > slow-path when they see an active_vmid of 0.
> >
> > Yes. I have seen that happening in my test setup.
> 
> Why didn't you say so?!

Sorry. I thought of getting some performance numbers with and
without this patch and measure the impact. But didn't quite get time
to finish it yet.
 
> 
> > > It also doesn't solve the issue I mentioned before, as an active_vmid of 0
> > > means that the reserved vmid is preserved.
> > >
> > > Needs more thought...
> >
> > How about we clear all the active_vmids in kvm_arch_free_vm() if it
> > matches the kvm_vmid->id ? But we may have to hold the lock
> > there
> 
> I think we have to be really careful not to run into the "suspended
> animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID
> allocator handle suspended animation") if we go down this road.


Ok. I will go through that.
 
> Maybe something along the lines of:
> 
> ROLLOVER
> 
>   * Take lock
>   * Inc generation
>     => This will force everybody down the slow path
>   * Record active VMIDs
>   * Broadcast TLBI
>     => Only active VMIDs can be dirty
>     => Reserve active VMIDs and mark as allocated
> 
> VCPU SCHED IN
> 
>   * Set active VMID
>   * Check generation
>   * If mismatch then:
>         * Take lock
>         * Try to match a reserved VMID
>         * If no reserved VMID, allocate new
> 
> VCPU SCHED OUT
> 
>   * Clear active VMID
> 
> but I'm not daft enough to think I got it right first time. I think it
> needs both implementing *and* modelling in TLA+ before we merge it!
> 

Ok. I need some time to digest the above first :).

On another note, how serious do you think is the problem of extra
reservation of the VMID space? Just wondering if we can skip this
patch for now or not..

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out
  2021-08-03 15:30       ` Will Deacon
  2021-08-03 15:56         ` Shameerali Kolothum Thodi
@ 2021-08-06 12:24         ` Shameerali Kolothum Thodi
  2021-08-09 13:09           ` Will Deacon
  2021-08-11  8:47         ` Shameerali Kolothum Thodi
  2021-10-11  6:06         ` Shameerali Kolothum Thodi
  3 siblings, 1 reply; 17+ messages in thread
From: Shameerali Kolothum Thodi @ 2021-08-06 12:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, maz, catalin.marinas,
	james.morse, julien.thierry.kdev, suzuki.poulose, jean-philippe,
	Alexandru.Elisei, qperret, Linuxarm



> -----Original Message-----
> From: Shameerali Kolothum Thodi
> Sent: 03 August 2021 16:57
> To: 'Will Deacon' <will@kernel.org>
> Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com;
> james.morse@arm.com; julien.thierry.kdev@gmail.com;
> suzuki.poulose@arm.com; jean-philippe@linaro.org;
> Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm
> <linuxarm@huawei.com>
> Subject: RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU
> schedule out
> 
> 
> 
> > -----Original Message-----
> > From: Will Deacon [mailto:will@kernel.org]
> > Sent: 03 August 2021 16:31
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> > linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com;
> > james.morse@arm.com; julien.thierry.kdev@gmail.com;
> > suzuki.poulose@arm.com; jean-philippe@linaro.org;
> > Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm
> > <linuxarm@huawei.com>
> > Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU
> > schedule out
> >
> > On Tue, Aug 03, 2021 at 12:55:25PM +0000, Shameerali Kolothum Thodi
> > wrote:
> > > > > diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
> > > > > index 5584e84aed95..5fd51f5445c1 100644
> > > > > --- a/arch/arm64/kvm/vmid.c
> > > > > +++ b/arch/arm64/kvm/vmid.c
> > > > > @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid
> > > > *kvm_vmid)
> > > > >  	return idx2vmid(vmid) | generation;
> > > > >  }
> > > > >
> > > > > +/* Call with preemption disabled */
> > > > > +void kvm_arm_vmid_clear_active(void)
> > > > > +{
> > > > > +	atomic64_set(this_cpu_ptr(&active_vmids), 0);
> > > > > +}
> > > >
> > > > I think this is very broken, as it will force everybody to take the
> > > > slow-path when they see an active_vmid of 0.
> > >
> > > Yes. I have seen that happening in my test setup.
> >
> > Why didn't you say so?!
> 
> Sorry. I thought of getting some performance numbers with and
> without this patch and measure the impact. But didn't quite get time
> to finish it yet.

These are some test numbers with and without this patch, run on two
different test setups.


a)Test Setup -1
-----------------------

Platform: HiSilicon D06 with 128 CPUs, VMID bits = 16
Run 128 VMs concurrently each with 2 vCPUs. Each Guest will execute hackbench
5 times before exiting.

Measurements taken avg. of 10 Runs.

Image : 5.14-rc3
---------------------------
  Time(s)       44.43813888
  No. of exits    145,348,264

Image: 5.14-rc3 + vmid-v3
----------------------------------------
  Time(s)        46.59789034
  No. of exits     133,587,307

%diff against 5.14-rc3
  Time: 4.8% more
  Exits: 8% less 

Image: 5.14-rc3 + vmid-v3 + Without active_asid clear
---------------------------------------------------------------------------
  Time(s)         44.5031782
  No. of exits      144,443,188

%diff against 5.14-rc3
  Time: 0.15% more
  Exits: 2.42% less

b)Test Setup -2
-----------------------

Platform: HiSilicon D06 + Kernel with maxcpus set to 8 and VMID bits set to 4.
Run 40 VMs concurrently each with 2 vCPUs. Each Guest will execute hackbench
5 times before exiting.

Measurements taken avg. of 10 Runs.

Image : 5.14-rc3-vmid4bit
------------------------------------
  Time(s)        46.19963266
  No. of exits     23,699,546

Image: 5.14-rc3-vmid4bit + vmid-v3
---------------------------------------------------
  Time(s)          45.83307736
  No. of exits      23,260,203

%diff against 5.14-rc3-vmid4bit
  Time: 0.8% less
  Exits: 1.85% less 

Image: 5.14-rc3-vmid4bit + vmid-v3 + Without active_asid clear
-----------------------------------------------------------------------------------------
  Time(s)           44.5031782
  No. of exits        144,443,188

%diff against 5.14-rc3-vmid4bit
  Time: 1.05% less
  Exits: 2.06% less

As expected, the active_asid clear on schedule out is not helping.
But without this patch, the numbers seems to be better than the
vanilla kernel when we force the setup(cpus=8, vmd=4bits)
to perform rollover.

Please let me know your thoughts.

Thanks,
Shameer

> 
> >
> > > > It also doesn't solve the issue I mentioned before, as an active_vmid of 0
> > > > means that the reserved vmid is preserved.
> > > >
> > > > Needs more thought...
> > >
> > > How about we clear all the active_vmids in kvm_arch_free_vm() if it
> > > matches the kvm_vmid->id ? But we may have to hold the lock
> > > there
> >
> > I think we have to be really careful not to run into the "suspended
> > animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID
> > allocator handle suspended animation") if we go down this road.
> 
> 
> Ok. I will go through that.
> 
> > Maybe something along the lines of:
> >
> > ROLLOVER
> >
> >   * Take lock
> >   * Inc generation
> >     => This will force everybody down the slow path
> >   * Record active VMIDs
> >   * Broadcast TLBI
> >     => Only active VMIDs can be dirty
> >     => Reserve active VMIDs and mark as allocated
> >
> > VCPU SCHED IN
> >
> >   * Set active VMID
> >   * Check generation
> >   * If mismatch then:
> >         * Take lock
> >         * Try to match a reserved VMID
> >         * If no reserved VMID, allocate new
> >
> > VCPU SCHED OUT
> >
> >   * Clear active VMID
> >
> > but I'm not daft enough to think I got it right first time. I think it
> > needs both implementing *and* modelling in TLA+ before we merge it!
> >
> 
> Ok. I need some time to digest the above first :).
> 
> On another note, how serious do you think is the problem of extra
> reservation of the VMID space? Just wondering if we can skip this
> patch for now or not..
> 
> Thanks,
> Shameer

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out
  2021-08-06 12:24         ` Shameerali Kolothum Thodi
@ 2021-08-09 13:09           ` Will Deacon
  2021-08-09 13:48             ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2021-08-09 13:09 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: linux-arm-kernel, kvmarm, linux-kernel, maz, catalin.marinas,
	james.morse, julien.thierry.kdev, suzuki.poulose, jean-philippe,
	Alexandru.Elisei, qperret, Linuxarm

On Fri, Aug 06, 2021 at 12:24:36PM +0000, Shameerali Kolothum Thodi wrote:
> These are some test numbers with and without this patch, run on two
> different test setups.
> 
> 
> a)Test Setup -1
> -----------------------
> 
> Platform: HiSilicon D06 with 128 CPUs, VMID bits = 16
> Run 128 VMs concurrently each with 2 vCPUs. Each Guest will execute hackbench
> 5 times before exiting.
> 
> Measurements taken avg. of 10 Runs.
> 
> Image : 5.14-rc3
> ---------------------------
>   Time(s)       44.43813888
>   No. of exits    145,348,264
> 
> Image: 5.14-rc3 + vmid-v3
> ----------------------------------------
>   Time(s)        46.59789034
>   No. of exits     133,587,307
> 
> %diff against 5.14-rc3
>   Time: 4.8% more
>   Exits: 8% less 
> 
> Image: 5.14-rc3 + vmid-v3 + Without active_asid clear
> ---------------------------------------------------------------------------
>   Time(s)         44.5031782
>   No. of exits      144,443,188
> 
> %diff against 5.14-rc3
>   Time: 0.15% more
>   Exits: 2.42% less
> 
> b)Test Setup -2
> -----------------------
> 
> Platform: HiSilicon D06 + Kernel with maxcpus set to 8 and VMID bits set to 4.
> Run 40 VMs concurrently each with 2 vCPUs. Each Guest will execute hackbench
> 5 times before exiting.
> 
> Measurements taken avg. of 10 Runs.
> 
> Image : 5.14-rc3-vmid4bit
> ------------------------------------
>   Time(s)        46.19963266
>   No. of exits     23,699,546
> 
> Image: 5.14-rc3-vmid4bit + vmid-v3
> ---------------------------------------------------
>   Time(s)          45.83307736
>   No. of exits      23,260,203
> 
> %diff against 5.14-rc3-vmid4bit
>   Time: 0.8% less
>   Exits: 1.85% less 
> 
> Image: 5.14-rc3-vmid4bit + vmid-v3 + Without active_asid clear
> -----------------------------------------------------------------------------------------
>   Time(s)           44.5031782
>   No. of exits        144,443,188

Really? The *exact* same numbers as the "Image: 5.14-rc3 + vmid-v3 + Without
active_asid clear" configuration? Guessing a copy-paste error here.

> %diff against 5.14-rc3-vmid4bit
>   Time: 1.05% less
>   Exits: 2.06% less
> 
> As expected, the active_asid clear on schedule out is not helping.
> But without this patch, the numbers seems to be better than the
> vanilla kernel when we force the setup(cpus=8, vmd=4bits)
> to perform rollover.

I'm struggling a bit to understand these numbers. Are you saying that
clearing the active_asid helps in the 16-bit VMID case but not in the
4-bit case?

Why would the active_asid clear have any impact on the number of exits?

The problem I see with not having the active_asid clear is that we will
roll over more frequently as the number of reserved VMIDs increases.

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out
  2021-08-09 13:09           ` Will Deacon
@ 2021-08-09 13:48             ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 17+ messages in thread
From: Shameerali Kolothum Thodi @ 2021-08-09 13:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, maz, catalin.marinas,
	james.morse, julien.thierry.kdev, suzuki.poulose, jean-philippe,
	Alexandru.Elisei, qperret, Linuxarm



> -----Original Message-----
> From: Will Deacon [mailto:will@kernel.org]
> Sent: 09 August 2021 14:09
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com;
> james.morse@arm.com; julien.thierry.kdev@gmail.com;
> suzuki.poulose@arm.com; jean-philippe@linaro.org;
> Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm
> <linuxarm@huawei.com>
> Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU
> schedule out
> 
> On Fri, Aug 06, 2021 at 12:24:36PM +0000, Shameerali Kolothum Thodi
> wrote:
> > These are some test numbers with and without this patch, run on two
> > different test setups.
> >
> >
> > a)Test Setup -1
> > -----------------------
> >
> > Platform: HiSilicon D06 with 128 CPUs, VMID bits = 16
> > Run 128 VMs concurrently each with 2 vCPUs. Each Guest will execute
> hackbench
> > 5 times before exiting.
> >
> > Measurements taken avg. of 10 Runs.
> >
> > Image : 5.14-rc3
> > ---------------------------
> >   Time(s)       44.43813888
> >   No. of exits    145,348,264
> >
> > Image: 5.14-rc3 + vmid-v3
> > ----------------------------------------
> >   Time(s)        46.59789034
> >   No. of exits     133,587,307
> >
> > %diff against 5.14-rc3
> >   Time: 4.8% more
> >   Exits: 8% less
> >
> > Image: 5.14-rc3 + vmid-v3 + Without active_asid clear
> > ---------------------------------------------------------------------------
> >   Time(s)         44.5031782
> >   No. of exits      144,443,188
> >
> > %diff against 5.14-rc3
> >   Time: 0.15% more
> >   Exits: 2.42% less
> >
> > b)Test Setup -2
> > -----------------------
> >
> > Platform: HiSilicon D06 + Kernel with maxcpus set to 8 and VMID bits set to
> 4.
> > Run 40 VMs concurrently each with 2 vCPUs. Each Guest will execute
> hackbench
> > 5 times before exiting.
> >
> > Measurements taken avg. of 10 Runs.
> >
> > Image : 5.14-rc3-vmid4bit
> > ------------------------------------
> >   Time(s)        46.19963266
> >   No. of exits     23,699,546
> >
> > Image: 5.14-rc3-vmid4bit + vmid-v3
> > ---------------------------------------------------
> >   Time(s)          45.83307736
> >   No. of exits      23,260,203
> >
> > %diff against 5.14-rc3-vmid4bit
> >   Time: 0.8% less
> >   Exits: 1.85% less
> >
> > Image: 5.14-rc3-vmid4bit + vmid-v3 + Without active_asid clear
> > -----------------------------------------------------------------------------------------
> >   Time(s)           44.5031782
> >   No. of exits        144,443,188
> 
> Really? The *exact* same numbers as the "Image: 5.14-rc3 + vmid-v3 +
> Without
> active_asid clear" configuration? Guessing a copy-paste error here.
> 
> > %diff against 5.14-rc3-vmid4bit
> >   Time: 1.05% less
> >   Exits: 2.06% less
> >
> > As expected, the active_asid clear on schedule out is not helping.
> > But without this patch, the numbers seems to be better than the
> > vanilla kernel when we force the setup(cpus=8, vmd=4bits)
> > to perform rollover.
> 
> I'm struggling a bit to understand these numbers. Are you saying that
> clearing the active_asid helps in the 16-bit VMID case but not in the
> 4-bit case?

Nope, the other way around.. The point I was trying to make is that
clearing the active_vmids definitely have an impact in 16-bit vmid
case, where rollover is not happening, as it ends up taking the slow
path more frequently.

Test setup-1, case 2(with active_vmids clear): Around 4.8% more time
to finish the test compared to vanilla kernel.

Test setup-1, case 3(Without clear): 0.15% more time compared to
vanilla kernel.

For the 4-bit vmid case, the impact of clearing vmids is not that obvious
probably because we have more rollovers.

Test setup-2, case 2(with active_vmids clear):0.8% less time compared to vanilla.
Test setup-2, case 3(Without clear): 1.05% less time compared to vanilla kernel.
 
So between the two(with and without clearing the active_vmids), the "without" 
one has better numbers for both Test setups.

> Why would the active_asid clear have any impact on the number of exits?

In 16 bit vmid case, it looks like the no. of exits is considerably lower if we clear
active_vmids. . Not sure it is because of the frequent slow path or not. But anyway,
the time to finish the test is higher.
 
> The problem I see with not having the active_asid clear is that we will
> roll over more frequently as the number of reserved VMIDs increases.

Ok. The idea of running the 4-bit test setup was to capture that. It doesn't
look like it has a major impact when compared to the original kernel. May be
I should take an average of more test runs. Please let me know if there is a 
better way to measure that impact.

Hope, I am clear.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out
  2021-08-03 15:30       ` Will Deacon
  2021-08-03 15:56         ` Shameerali Kolothum Thodi
  2021-08-06 12:24         ` Shameerali Kolothum Thodi
@ 2021-08-11  8:47         ` Shameerali Kolothum Thodi
  2021-10-11  6:06         ` Shameerali Kolothum Thodi
  3 siblings, 0 replies; 17+ messages in thread
From: Shameerali Kolothum Thodi @ 2021-08-11  8:47 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, maz, catalin.marinas,
	james.morse, julien.thierry.kdev, suzuki.poulose, jean-philippe,
	Alexandru.Elisei, qperret, Linuxarm

Hi Will,

> -----Original Message-----
> From: Will Deacon [mailto:will@kernel.org]
> Sent: 03 August 2021 16:31
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com;
> james.morse@arm.com; julien.thierry.kdev@gmail.com;
> suzuki.poulose@arm.com; jean-philippe@linaro.org;
> Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm
> <linuxarm@huawei.com>
> Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU
> schedule out

[...]
 
> I think we have to be really careful not to run into the "suspended
> animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID
> allocator handle suspended animation") if we go down this road.
> 
> Maybe something along the lines of:
> 
> ROLLOVER
> 
>   * Take lock
>   * Inc generation
>     => This will force everybody down the slow path
>   * Record active VMIDs
>   * Broadcast TLBI
>     => Only active VMIDs can be dirty
>     => Reserve active VMIDs and mark as allocated
> 
> VCPU SCHED IN
> 
>   * Set active VMID
>   * Check generation
>   * If mismatch then:
>         * Take lock
>         * Try to match a reserved VMID
>         * If no reserved VMID, allocate new
> 
> VCPU SCHED OUT
> 
>   * Clear active VMID
> 
> but I'm not daft enough to think I got it right first time. I think it
> needs both implementing *and* modelling in TLA+ before we merge it!

I attempted to implement the above algo as below. It seems to be
working in both 16-bit vmid and 4-bit vmid test setup. Though I am
not quite sure this Is exactly what you had in mind above and covers
all corner cases.

Please take a look and let me know.
(The diff below is against this v3 series)

Thanks,
Shameer

--->8<----

--- a/arch/arm64/kvm/vmid.c
+++ b/arch/arm64/kvm/vmid.c
@@ -43,7 +43,7 @@ static void flush_context(void)
        bitmap_clear(vmid_map, 0, NUM_USER_VMIDS);

        for_each_possible_cpu(cpu) {
-               vmid = atomic64_xchg_relaxed(&per_cpu(active_vmids, cpu), 0);
+               vmid = atomic64_read(&per_cpu(active_vmids, cpu));

                /* Preserve reserved VMID */
                if (vmid == 0)
@@ -125,32 +125,17 @@ void kvm_arm_vmid_clear_active(void)
 void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
 {
        unsigned long flags;
-       u64 vmid, old_active_vmid;
+       u64 vmid;

        vmid = atomic64_read(&kvm_vmid->id);
-
-       /*
-        * Please refer comments in check_and_switch_context() in
-        * arch/arm64/mm/context.c.
-        */
-       old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids));
-       if (old_active_vmid && vmid_gen_match(vmid) &&
-           atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
-                                    old_active_vmid, vmid))
+       if (vmid_gen_match(vmid)) {
+               atomic64_set(this_cpu_ptr(&active_vmids), vmid);
                return;
-
-       raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
-
-       /* Check that our VMID belongs to the current generation. */
-       vmid = atomic64_read(&kvm_vmid->id);
-       if (!vmid_gen_match(vmid)) {
-               vmid = new_vmid(kvm_vmid);
-               atomic64_set(&kvm_vmid->id, vmid);
        }

-
+       raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
+       vmid = new_vmid(kvm_vmid);
+       atomic64_set(&kvm_vmid->id, vmid);
        atomic64_set(this_cpu_ptr(&active_vmids), vmid);
        raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
 }
--->8<----





^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out
  2021-08-03 15:30       ` Will Deacon
                           ` (2 preceding siblings ...)
  2021-08-11  8:47         ` Shameerali Kolothum Thodi
@ 2021-10-11  6:06         ` Shameerali Kolothum Thodi
  3 siblings, 0 replies; 17+ messages in thread
From: Shameerali Kolothum Thodi @ 2021-10-11  6:06 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, maz, catalin.marinas,
	james.morse, julien.thierry.kdev, suzuki.poulose, jean-philippe,
	Alexandru.Elisei, qperret, Linuxarm, Jonathan Cameron



> -----Original Message-----
> From: Shameerali Kolothum Thodi
> Sent: 11 August 2021 09:48
> To: 'Will Deacon' <will@kernel.org>
> Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com;
> james.morse@arm.com; julien.thierry.kdev@gmail.com;
> suzuki.poulose@arm.com; jean-philippe@linaro.org;
> Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm
> <linuxarm@huawei.com>
> Subject: RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU
> schedule out
> 
> Hi Will,
> 
> > -----Original Message-----
> > From: Will Deacon [mailto:will@kernel.org]
> > Sent: 03 August 2021 16:31
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> > linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com;
> > james.morse@arm.com; julien.thierry.kdev@gmail.com;
> > suzuki.poulose@arm.com; jean-philippe@linaro.org;
> > Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm
> > <linuxarm@huawei.com>
> > Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU
> > schedule out
> 
> [...]
> 
> > I think we have to be really careful not to run into the "suspended
> > animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID
> > allocator handle suspended animation") if we go down this road.
> >
> > Maybe something along the lines of:
> >
> > ROLLOVER
> >
> >   * Take lock
> >   * Inc generation
> >     => This will force everybody down the slow path
> >   * Record active VMIDs
> >   * Broadcast TLBI
> >     => Only active VMIDs can be dirty
> >     => Reserve active VMIDs and mark as allocated
> >
> > VCPU SCHED IN
> >
> >   * Set active VMID
> >   * Check generation
> >   * If mismatch then:
> >         * Take lock
> >         * Try to match a reserved VMID
> >         * If no reserved VMID, allocate new
> >
> > VCPU SCHED OUT
> >
> >   * Clear active VMID
> >
> > but I'm not daft enough to think I got it right first time. I think it
> > needs both implementing *and* modelling in TLA+ before we merge it!
> 
> I attempted to implement the above algo as below. It seems to be
> working in both 16-bit vmid and 4-bit vmid test setup. 

It is not :(. I did an extended, overnight test run and it fails.
It looks to me in my below implementation there is no synchronization
on setting the active VMID and a concurrent rollover. I will have another go.

Thanks,
Shameer

Though I am
> not quite sure this Is exactly what you had in mind above and covers
> all corner cases.
> 
> Please take a look and let me know.
> (The diff below is against this v3 series)
> 
> Thanks,
> Shameer
> 
> --->8<----
> 
> --- a/arch/arm64/kvm/vmid.c
> +++ b/arch/arm64/kvm/vmid.c
> @@ -43,7 +43,7 @@ static void flush_context(void)
>         bitmap_clear(vmid_map, 0, NUM_USER_VMIDS);
> 
>         for_each_possible_cpu(cpu) {
> -               vmid = atomic64_xchg_relaxed(&per_cpu(active_vmids,
> cpu), 0);
> +               vmid = atomic64_read(&per_cpu(active_vmids, cpu));
> 
>                 /* Preserve reserved VMID */
>                 if (vmid == 0)
> @@ -125,32 +125,17 @@ void kvm_arm_vmid_clear_active(void)
>  void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
>  {
>         unsigned long flags;
> -       u64 vmid, old_active_vmid;
> +       u64 vmid;
> 
>         vmid = atomic64_read(&kvm_vmid->id);
> -
> -       /*
> -        * Please refer comments in check_and_switch_context() in
> -        * arch/arm64/mm/context.c.
> -        */
> -       old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids));
> -       if (old_active_vmid && vmid_gen_match(vmid) &&
> -           atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids),
> -                                    old_active_vmid, vmid))
> +       if (vmid_gen_match(vmid)) {
> +               atomic64_set(this_cpu_ptr(&active_vmids), vmid);
>                 return;
> -
> -       raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
> -
> -       /* Check that our VMID belongs to the current generation. */
> -       vmid = atomic64_read(&kvm_vmid->id);
> -       if (!vmid_gen_match(vmid)) {
> -               vmid = new_vmid(kvm_vmid);
> -               atomic64_set(&kvm_vmid->id, vmid);
>         }
> 
> -
> +       raw_spin_lock_irqsave(&cpu_vmid_lock, flags);
> +       vmid = new_vmid(kvm_vmid);
> +       atomic64_set(&kvm_vmid->id, vmid);
>         atomic64_set(this_cpu_ptr(&active_vmids), vmid);
>         raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags);
>  }
> --->8<----
> 
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-10-11  6:06 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-29 10:40 [PATCH v3 0/4] kvm/arm: New VMID allocator based on asid Shameer Kolothum
2021-07-29 10:40 ` [PATCH v3 1/4] KVM: arm64: Introduce a new VMID allocator for KVM Shameer Kolothum
2021-08-03 11:38   ` Will Deacon
2021-08-03 12:12     ` Shameerali Kolothum Thodi
2021-07-29 10:40 ` [PATCH v3 2/4] KVM: arm64: Make VMID bits accessible outside of allocator Shameer Kolothum
2021-07-29 10:40 ` [PATCH v3 3/4] KVM: arm64: Align the VMID allocation with the arm64 ASID one Shameer Kolothum
2021-07-29 14:59   ` kernel test robot
2021-07-29 10:40 ` [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out Shameer Kolothum
2021-08-03 11:40   ` Will Deacon
2021-08-03 12:55     ` Shameerali Kolothum Thodi
2021-08-03 15:30       ` Will Deacon
2021-08-03 15:56         ` Shameerali Kolothum Thodi
2021-08-06 12:24         ` Shameerali Kolothum Thodi
2021-08-09 13:09           ` Will Deacon
2021-08-09 13:48             ` Shameerali Kolothum Thodi
2021-08-11  8:47         ` Shameerali Kolothum Thodi
2021-10-11  6:06         ` Shameerali Kolothum Thodi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).