All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: james.morse@arm.com, suzuki.poulose@arm.com,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, will@kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v4 04/39] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
Date: Sun, 17 Oct 2021 12:43:29 +0100	[thread overview]
Message-ID: <871r4jq3ku.wl-maz@kernel.org> (raw)
In-Reply-To: <20210825161815.266051-5-alexandru.elisei@arm.com>

On Wed, 25 Aug 2021 17:17:40 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> KVM relies on doing dcache maintenance on stage 2 faults to present to a
> gueste running with the MMU off the same view of memory as userspace. For
> locked memslots, KVM so far has done the dcache maintenance when a memslot
> is locked, but that leaves KVM in a rather awkward position: what userspace
> writes to guest memory after the memslot is locked, but before a VCPU is
> run, might not be visible to the guest.
> 
> Fix this by deferring the dcache maintenance until the first VCPU is run.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  7 ++++
>  arch/arm64/include/asm/kvm_mmu.h  |  5 +++
>  arch/arm64/kvm/arm.c              |  3 ++
>  arch/arm64/kvm/mmu.c              | 56 ++++++++++++++++++++++++++++---
>  4 files changed, 67 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 97ff3ed5d4b7..ed67f914d169 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -112,6 +112,10 @@ struct kvm_arch_memory_slot {
>  	u32 flags;
>  };
>  
> +/* kvm->arch.mmu_pending_ops flags */
> +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
> +#define KVM_MAX_MMU_PENDING_OPS		1
> +
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> @@ -135,6 +139,9 @@ struct kvm_arch {
>  	 */
>  	bool return_nisv_io_abort_to_user;
>  
> +	/* Defer MMU operations until a VCPU is run. */
> +	unsigned long mmu_pending_ops;

This has a funny taste of VM-wide requests...

> +
>  	/*
>  	 * VM-wide PMU filter, implemented as a bitmap and big enough for
>  	 * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index ef079b5eb475..525c223e769f 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -219,6 +219,11 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>  int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
>  int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
>  
> +#define kvm_mmu_has_pending_ops(kvm)	\
> +	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
> +
> +void kvm_mmu_perform_pending_ops(struct kvm *kvm);
> +
>  static inline unsigned int kvm_get_vmid_bits(void)
>  {
>  	int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index efb3501c6016..144c982912d8 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -829,6 +829,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  	if (unlikely(!kvm_vcpu_initialized(vcpu)))
>  		return -ENOEXEC;
>  
> +	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
> +		kvm_mmu_perform_pending_ops(vcpu->kvm);
> +
>  	ret = kvm_vcpu_first_run_init(vcpu);

Is there any reason why this isn't done as part of the 'first run'
handling? I am refactoring that part to remove as many things as
possible from the fast path, and would love not to go back to piling
more stuff here.

Or do you expect this to happen more than once per VM (despite what
the comment says)?

>  	if (ret)
>  		return ret;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 59c2bfef2fd1..94fa08f3d9d3 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1253,6 +1253,41 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  	return ret;
>  }
>  
> +/*
> + * It's safe to do the CMOs when the first VCPU is run because:
> + * - VCPUs cannot run until mmu_cmo_needed is cleared.
> + * - Memslots cannot be modified because we hold the kvm->slots_lock.

It would be good to document the expected locking order for this kind
of stuff.

> + *
> + * It's safe to periodically release the mmu_lock because:
> + * - VCPUs cannot run.
> + * - Any changes to the stage 2 tables triggered by the MMU notifiers also take
> + *   the mmu_lock, which means accesses will be serialized.
> + * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
> + *   is live, which means that the VM will be live.
> + */
> +void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> +{
> +	struct kvm_memory_slot *memslot;
> +
> +	mutex_lock(&kvm->slots_lock);
> +	if (!kvm_mmu_has_pending_ops(kvm))
> +		goto out_unlock;
> +
> +	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
> +		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> +			if (!memslot_is_locked(memslot))
> +				continue;
> +			stage2_flush_memslot(kvm, memslot);
> +		}
> +	}
> +
> +	bitmap_zero(&kvm->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS);

clear_bit() instead? I understand that you want to support multiple
ops, but this looks odd.

> +
> +out_unlock:
> +	mutex_unlock(&kvm->slots_lock);
> +	return;
> +}
> +
>  static int try_rlimit_memlock(unsigned long npages)
>  {
>  	unsigned long lock_limit;
> @@ -1293,7 +1328,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  	struct kvm_memory_slot_page *page_entry;
>  	bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
>  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> -	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> +	struct kvm_pgtable pgt;
> +	struct kvm_pgtable_mm_ops mm_ops;
>  	struct vm_area_struct *vma;
>  	unsigned long npages = memslot->npages;
>  	unsigned int pin_flags = FOLL_LONGTERM;
> @@ -1311,6 +1347,16 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  		pin_flags |= FOLL_WRITE;
>  	}
>  
> +	/*
> +	 * Make a copy of the stage 2 translation table struct to remove the
> +	 * dcache callback so we can postpone the cache maintenance operations
> +	 * until the first VCPU is run.
> +	 */
> +	mm_ops = *kvm->arch.mmu.pgt->mm_ops;
> +	mm_ops.dcache_clean_inval_poc = NULL;
> +	pgt = *kvm->arch.mmu.pgt;
> +	pgt.mm_ops = &mm_ops;

Huhuh... Can't really say I'm in love with this. Are you trying to
avoid a double dcache clean to PoC? Is this a performance or a
correctness issue?

> +
>  	hva = memslot->userspace_addr;
>  	ipa = memslot->base_gfn << PAGE_SHIFT;
>  
> @@ -1362,13 +1408,13 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  			goto out_err;
>  		}
>  
> -		ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
> +		ret = kvm_pgtable_stage2_map(&pgt, ipa, PAGE_SIZE,
>  					     page_to_phys(page_entry->page),
>  					     prot, &cache);
>  		spin_unlock(&kvm->mmu_lock);
>  
>  		if (ret) {
> -			kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +			kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
>  						 i << PAGE_SHIFT);
>  			unpin_memslot_pages(memslot, writable);
>  			goto out_err;
> @@ -1387,7 +1433,7 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  	 */
>  	ret = account_locked_vm(current->mm, npages, true);
>  	if (ret) {
> -		kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +		kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
>  					 npages << PAGE_SHIFT);
>  		unpin_memslot_pages(memslot, writable);
>  		goto out_err;
> @@ -1397,6 +1443,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  	if (writable)
>  		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
>  
> +	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> +
>  	kvm_mmu_free_memory_cache(&cache);
>  
>  	return 0;

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: linux-kernel@vger.kernel.org, will@kernel.org,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC PATCH v4 04/39] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
Date: Sun, 17 Oct 2021 12:43:29 +0100	[thread overview]
Message-ID: <871r4jq3ku.wl-maz@kernel.org> (raw)
In-Reply-To: <20210825161815.266051-5-alexandru.elisei@arm.com>

On Wed, 25 Aug 2021 17:17:40 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> KVM relies on doing dcache maintenance on stage 2 faults to present to a
> gueste running with the MMU off the same view of memory as userspace. For
> locked memslots, KVM so far has done the dcache maintenance when a memslot
> is locked, but that leaves KVM in a rather awkward position: what userspace
> writes to guest memory after the memslot is locked, but before a VCPU is
> run, might not be visible to the guest.
> 
> Fix this by deferring the dcache maintenance until the first VCPU is run.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  7 ++++
>  arch/arm64/include/asm/kvm_mmu.h  |  5 +++
>  arch/arm64/kvm/arm.c              |  3 ++
>  arch/arm64/kvm/mmu.c              | 56 ++++++++++++++++++++++++++++---
>  4 files changed, 67 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 97ff3ed5d4b7..ed67f914d169 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -112,6 +112,10 @@ struct kvm_arch_memory_slot {
>  	u32 flags;
>  };
>  
> +/* kvm->arch.mmu_pending_ops flags */
> +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
> +#define KVM_MAX_MMU_PENDING_OPS		1
> +
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> @@ -135,6 +139,9 @@ struct kvm_arch {
>  	 */
>  	bool return_nisv_io_abort_to_user;
>  
> +	/* Defer MMU operations until a VCPU is run. */
> +	unsigned long mmu_pending_ops;

This has a funny taste of VM-wide requests...

> +
>  	/*
>  	 * VM-wide PMU filter, implemented as a bitmap and big enough for
>  	 * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index ef079b5eb475..525c223e769f 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -219,6 +219,11 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>  int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
>  int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
>  
> +#define kvm_mmu_has_pending_ops(kvm)	\
> +	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
> +
> +void kvm_mmu_perform_pending_ops(struct kvm *kvm);
> +
>  static inline unsigned int kvm_get_vmid_bits(void)
>  {
>  	int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index efb3501c6016..144c982912d8 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -829,6 +829,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  	if (unlikely(!kvm_vcpu_initialized(vcpu)))
>  		return -ENOEXEC;
>  
> +	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
> +		kvm_mmu_perform_pending_ops(vcpu->kvm);
> +
>  	ret = kvm_vcpu_first_run_init(vcpu);

Is there any reason why this isn't done as part of the 'first run'
handling? I am refactoring that part to remove as many things as
possible from the fast path, and would love not to go back to piling
more stuff here.

Or do you expect this to happen more than once per VM (despite what
the comment says)?

>  	if (ret)
>  		return ret;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 59c2bfef2fd1..94fa08f3d9d3 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1253,6 +1253,41 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  	return ret;
>  }
>  
> +/*
> + * It's safe to do the CMOs when the first VCPU is run because:
> + * - VCPUs cannot run until mmu_cmo_needed is cleared.
> + * - Memslots cannot be modified because we hold the kvm->slots_lock.

It would be good to document the expected locking order for this kind
of stuff.

> + *
> + * It's safe to periodically release the mmu_lock because:
> + * - VCPUs cannot run.
> + * - Any changes to the stage 2 tables triggered by the MMU notifiers also take
> + *   the mmu_lock, which means accesses will be serialized.
> + * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
> + *   is live, which means that the VM will be live.
> + */
> +void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> +{
> +	struct kvm_memory_slot *memslot;
> +
> +	mutex_lock(&kvm->slots_lock);
> +	if (!kvm_mmu_has_pending_ops(kvm))
> +		goto out_unlock;
> +
> +	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
> +		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> +			if (!memslot_is_locked(memslot))
> +				continue;
> +			stage2_flush_memslot(kvm, memslot);
> +		}
> +	}
> +
> +	bitmap_zero(&kvm->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS);

clear_bit() instead? I understand that you want to support multiple
ops, but this looks odd.

> +
> +out_unlock:
> +	mutex_unlock(&kvm->slots_lock);
> +	return;
> +}
> +
>  static int try_rlimit_memlock(unsigned long npages)
>  {
>  	unsigned long lock_limit;
> @@ -1293,7 +1328,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  	struct kvm_memory_slot_page *page_entry;
>  	bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
>  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> -	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> +	struct kvm_pgtable pgt;
> +	struct kvm_pgtable_mm_ops mm_ops;
>  	struct vm_area_struct *vma;
>  	unsigned long npages = memslot->npages;
>  	unsigned int pin_flags = FOLL_LONGTERM;
> @@ -1311,6 +1347,16 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  		pin_flags |= FOLL_WRITE;
>  	}
>  
> +	/*
> +	 * Make a copy of the stage 2 translation table struct to remove the
> +	 * dcache callback so we can postpone the cache maintenance operations
> +	 * until the first VCPU is run.
> +	 */
> +	mm_ops = *kvm->arch.mmu.pgt->mm_ops;
> +	mm_ops.dcache_clean_inval_poc = NULL;
> +	pgt = *kvm->arch.mmu.pgt;
> +	pgt.mm_ops = &mm_ops;

Huhuh... Can't really say I'm in love with this. Are you trying to
avoid a double dcache clean to PoC? Is this a performance or a
correctness issue?

> +
>  	hva = memslot->userspace_addr;
>  	ipa = memslot->base_gfn << PAGE_SHIFT;
>  
> @@ -1362,13 +1408,13 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  			goto out_err;
>  		}
>  
> -		ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
> +		ret = kvm_pgtable_stage2_map(&pgt, ipa, PAGE_SIZE,
>  					     page_to_phys(page_entry->page),
>  					     prot, &cache);
>  		spin_unlock(&kvm->mmu_lock);
>  
>  		if (ret) {
> -			kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +			kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
>  						 i << PAGE_SHIFT);
>  			unpin_memslot_pages(memslot, writable);
>  			goto out_err;
> @@ -1387,7 +1433,7 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  	 */
>  	ret = account_locked_vm(current->mm, npages, true);
>  	if (ret) {
> -		kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +		kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
>  					 npages << PAGE_SHIFT);
>  		unpin_memslot_pages(memslot, writable);
>  		goto out_err;
> @@ -1397,6 +1443,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  	if (writable)
>  		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
>  
> +	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> +
>  	kvm_mmu_free_memory_cache(&cache);
>  
>  	return 0;

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: james.morse@arm.com, suzuki.poulose@arm.com,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, will@kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v4 04/39] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
Date: Sun, 17 Oct 2021 12:43:29 +0100	[thread overview]
Message-ID: <871r4jq3ku.wl-maz@kernel.org> (raw)
In-Reply-To: <20210825161815.266051-5-alexandru.elisei@arm.com>

On Wed, 25 Aug 2021 17:17:40 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> KVM relies on doing dcache maintenance on stage 2 faults to present to a
> gueste running with the MMU off the same view of memory as userspace. For
> locked memslots, KVM so far has done the dcache maintenance when a memslot
> is locked, but that leaves KVM in a rather awkward position: what userspace
> writes to guest memory after the memslot is locked, but before a VCPU is
> run, might not be visible to the guest.
> 
> Fix this by deferring the dcache maintenance until the first VCPU is run.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  7 ++++
>  arch/arm64/include/asm/kvm_mmu.h  |  5 +++
>  arch/arm64/kvm/arm.c              |  3 ++
>  arch/arm64/kvm/mmu.c              | 56 ++++++++++++++++++++++++++++---
>  4 files changed, 67 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 97ff3ed5d4b7..ed67f914d169 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -112,6 +112,10 @@ struct kvm_arch_memory_slot {
>  	u32 flags;
>  };
>  
> +/* kvm->arch.mmu_pending_ops flags */
> +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
> +#define KVM_MAX_MMU_PENDING_OPS		1
> +
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> @@ -135,6 +139,9 @@ struct kvm_arch {
>  	 */
>  	bool return_nisv_io_abort_to_user;
>  
> +	/* Defer MMU operations until a VCPU is run. */
> +	unsigned long mmu_pending_ops;

This has a funny taste of VM-wide requests...

> +
>  	/*
>  	 * VM-wide PMU filter, implemented as a bitmap and big enough for
>  	 * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index ef079b5eb475..525c223e769f 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -219,6 +219,11 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>  int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
>  int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
>  
> +#define kvm_mmu_has_pending_ops(kvm)	\
> +	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
> +
> +void kvm_mmu_perform_pending_ops(struct kvm *kvm);
> +
>  static inline unsigned int kvm_get_vmid_bits(void)
>  {
>  	int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index efb3501c6016..144c982912d8 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -829,6 +829,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  	if (unlikely(!kvm_vcpu_initialized(vcpu)))
>  		return -ENOEXEC;
>  
> +	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
> +		kvm_mmu_perform_pending_ops(vcpu->kvm);
> +
>  	ret = kvm_vcpu_first_run_init(vcpu);

Is there any reason why this isn't done as part of the 'first run'
handling? I am refactoring that part to remove as many things as
possible from the fast path, and would love not to go back to piling
more stuff here.

Or do you expect this to happen more than once per VM (despite what
the comment says)?

>  	if (ret)
>  		return ret;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 59c2bfef2fd1..94fa08f3d9d3 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1253,6 +1253,41 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  	return ret;
>  }
>  
> +/*
> + * It's safe to do the CMOs when the first VCPU is run because:
> + * - VCPUs cannot run until mmu_cmo_needed is cleared.
> + * - Memslots cannot be modified because we hold the kvm->slots_lock.

It would be good to document the expected locking order for this kind
of stuff.

> + *
> + * It's safe to periodically release the mmu_lock because:
> + * - VCPUs cannot run.
> + * - Any changes to the stage 2 tables triggered by the MMU notifiers also take
> + *   the mmu_lock, which means accesses will be serialized.
> + * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
> + *   is live, which means that the VM will be live.
> + */
> +void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> +{
> +	struct kvm_memory_slot *memslot;
> +
> +	mutex_lock(&kvm->slots_lock);
> +	if (!kvm_mmu_has_pending_ops(kvm))
> +		goto out_unlock;
> +
> +	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
> +		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> +			if (!memslot_is_locked(memslot))
> +				continue;
> +			stage2_flush_memslot(kvm, memslot);
> +		}
> +	}
> +
> +	bitmap_zero(&kvm->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS);

clear_bit() instead? I understand that you want to support multiple
ops, but this looks odd.

> +
> +out_unlock:
> +	mutex_unlock(&kvm->slots_lock);
> +	return;
> +}
> +
>  static int try_rlimit_memlock(unsigned long npages)
>  {
>  	unsigned long lock_limit;
> @@ -1293,7 +1328,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  	struct kvm_memory_slot_page *page_entry;
>  	bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
>  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> -	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> +	struct kvm_pgtable pgt;
> +	struct kvm_pgtable_mm_ops mm_ops;
>  	struct vm_area_struct *vma;
>  	unsigned long npages = memslot->npages;
>  	unsigned int pin_flags = FOLL_LONGTERM;
> @@ -1311,6 +1347,16 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  		pin_flags |= FOLL_WRITE;
>  	}
>  
> +	/*
> +	 * Make a copy of the stage 2 translation table struct to remove the
> +	 * dcache callback so we can postpone the cache maintenance operations
> +	 * until the first VCPU is run.
> +	 */
> +	mm_ops = *kvm->arch.mmu.pgt->mm_ops;
> +	mm_ops.dcache_clean_inval_poc = NULL;
> +	pgt = *kvm->arch.mmu.pgt;
> +	pgt.mm_ops = &mm_ops;

Huhuh... Can't really say I'm in love with this. Are you trying to
avoid a double dcache clean to PoC? Is this a performance or a
correctness issue?

> +
>  	hva = memslot->userspace_addr;
>  	ipa = memslot->base_gfn << PAGE_SHIFT;
>  
> @@ -1362,13 +1408,13 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  			goto out_err;
>  		}
>  
> -		ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
> +		ret = kvm_pgtable_stage2_map(&pgt, ipa, PAGE_SIZE,
>  					     page_to_phys(page_entry->page),
>  					     prot, &cache);
>  		spin_unlock(&kvm->mmu_lock);
>  
>  		if (ret) {
> -			kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +			kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
>  						 i << PAGE_SHIFT);
>  			unpin_memslot_pages(memslot, writable);
>  			goto out_err;
> @@ -1387,7 +1433,7 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  	 */
>  	ret = account_locked_vm(current->mm, npages, true);
>  	if (ret) {
> -		kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +		kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
>  					 npages << PAGE_SHIFT);
>  		unpin_memslot_pages(memslot, writable);
>  		goto out_err;
> @@ -1397,6 +1443,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  	if (writable)
>  		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
>  
> +	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> +
>  	kvm_mmu_free_memory_cache(&cache);
>  
>  	return 0;

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-10-17 11:43 UTC|newest]

Thread overview: 141+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
2021-08-25 16:17 ` Alexandru Elisei
2021-08-25 16:17 ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 01/39] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-09-22 10:39   ` Suzuki K Poulose
2021-09-22 10:39     ` Suzuki K Poulose
2021-09-22 10:39     ` Suzuki K Poulose
2021-08-25 16:17 ` [RFC PATCH v4 02/39] KVM: arm64: Add lock/unlock memslot user API Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-10-18  9:04   ` Suzuki K Poulose
2021-10-18  9:04     ` Suzuki K Poulose
2021-10-18  9:04     ` Suzuki K Poulose
2021-10-18 14:09     ` Alexandru Elisei
2021-10-18 14:09       ` Alexandru Elisei
2021-10-18 14:09       ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 03/39] KVM: arm64: Implement the memslot lock/unlock functionality Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 04/39] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-10-17 11:43   ` Marc Zyngier [this message]
2021-10-17 11:43     ` Marc Zyngier
2021-10-17 11:43     ` Marc Zyngier
2021-10-18 12:56     ` Alexandru Elisei
2021-10-18 12:56       ` Alexandru Elisei
2021-10-18 12:56       ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 05/39] KVM: arm64: Perform CMOs on locked memslots when userspace resets VCPUs Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 06/39] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 07/39] KVM: arm64: Unlock memslots after stage 2 tables are freed Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 08/39] KVM: arm64: Deny changes to locked memslots Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 09/39] KVM: Add kvm_warn{,_ratelimited} macros Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 10/39] KVM: arm64: Print a warning for unexpected faults on locked memslots Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 11/39] KVM: arm64: Allow userspace to lock and unlock memslots Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 12/39] KVM: arm64: Add the KVM_ARM_VCPU_SUPPORTED_CPUS VCPU ioctl Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 13/39] KVM: arm64: Add CONFIG_KVM_ARM_SPE Kconfig option Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 14/39] KVM: arm64: Add SPE capability and VCPU feature Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 15/39] drivers/perf: Expose the cpumask of CPUs that support SPE Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 16/39] KVM: arm64: Make SPE available when at least one CPU supports it Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 17/39] KVM: arm64: Set the VCPU SPE feature bit when SPE is available Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 18/39] KVM: arm64: Expose SPE version to guests Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 19/39] KVM: arm64: Do not emulate SPE on CPUs which don't have SPE Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 20/39] KVM: arm64: Add a new VCPU device control group for SPE Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 21/39] KVM: arm64: Add SPE VCPU device attribute to set the interrupt number Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 22/39] KVM: arm64: Add SPE VCPU device attribute to initialize SPE Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 23/39] KVM: arm64: VHE: Clear MDCR_EL2.E2PB in vcpu_put() Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:17   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 24/39] KVM: arm64: debug: Configure MDCR_EL2 when a VCPU has SPE Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 25/39] KVM: arm64: Move the write to MDCR_EL2 out of __activate_traps_common() Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 26/39] KVM: arm64: VHE: Change MDCR_EL2 at world switch if VCPU has SPE Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 27/39] KVM: arm64: Add SPE system registers to VCPU context Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 28/39] KVM: arm64: nVHE: Save PMSCR_EL1 to the host context Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 29/39] KVM: arm64: Rename DEBUG_STATE_SAVE_SPE -> DEBUG_SAVE_SPE_BUFFER flags Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 30/39] KVM: arm64: nVHE: Context switch SPE state if VCPU has SPE Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 31/39] KVM: arm64: VHE: " Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 32/39] KVM: arm64: Save/restore PMSNEVFR_EL1 on VCPU put/load Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 33/39] KVM: arm64: Allow guest to use physical timestamps if perfmon_capable() Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 34/39] KVM: arm64: Emulate SPE buffer management interrupt Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 35/39] KVM: arm64: Add an userspace API to stop a VCPU profiling Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 36/39] KVM: arm64: Implement " Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 37/39] KVM: arm64: Add PMSIDR_EL1 to the SPE register context Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 38/39] KVM: arm64: Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 39/39] KVM: arm64: Allow userspace to enable SPE for guests Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-08-25 16:18   ` Alexandru Elisei
2021-09-22 10:11 ` [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Suzuki K Poulose
2021-09-22 10:11   ` Suzuki K Poulose
2021-09-22 10:11   ` Suzuki K Poulose
2021-09-23 15:12   ` Alexandru Elisei
2021-09-23 15:12     ` Alexandru Elisei
2021-09-23 15:12     ` Alexandru Elisei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871r4jq3ku.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=alexandru.elisei@arm.com \
    --cc=james.morse@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.