linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Will Deacon <will@kernel.org>, kvmarm@lists.cs.columbia.edu
Cc: Gavin Shan <gshan@redhat.com>,
	Suzuki Poulose <suzuki.poulose@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	Quentin Perret <qperret@google.com>,
	James Morse <james.morse@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	kernel-team@android.com, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v4 17/21] KVM: arm64: Convert user_mem_abort() to generic page-table API
Date: Wed, 9 Sep 2020 15:20:21 +0100	[thread overview]
Message-ID: <2ae77a66-9cc4-f4e1-9e98-a50d5891cf20@arm.com> (raw)
In-Reply-To: <20200907152344.12978-18-will@kernel.org>

Hi Will,

On 9/7/20 4:23 PM, Will Deacon wrote:
> Convert user_mem_abort() to call kvm_pgtable_stage2_relax_perms() when
> handling a stage-2 permission fault and kvm_pgtable_stage2_map() when
> handling a stage-2 translation fault, rather than walking the page-table
> manually.
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kvm/mmu.c | 124 +++++++++++++++----------------------------
>  1 file changed, 44 insertions(+), 80 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 0af48f35c8dd..dc923e873dad 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1496,18 +1496,19 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  {
>  	int ret;
>  	bool write_fault, writable, force_pte = false;
> -	bool exec_fault, needs_exec;
> +	bool exec_fault;
> +	bool device = false;
>  	unsigned long mmu_seq;
> -	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
>  	struct kvm *kvm = vcpu->kvm;
>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>  	struct vm_area_struct *vma;
>  	short vma_shift;
> +	gfn_t gfn;
>  	kvm_pfn_t pfn;
> -	pgprot_t mem_type = PAGE_S2;
>  	bool logging_active = memslot_is_logging(memslot);
> -	unsigned long vma_pagesize, flags = 0;
> -	struct kvm_s2_mmu *mmu = vcpu->arch.hw_mmu;
> +	unsigned long vma_pagesize;
> +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> +	struct kvm_pgtable *pgt;
>  
>  	write_fault = kvm_is_write_fault(vcpu);
>  	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> @@ -1540,22 +1541,24 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		vma_pagesize = PAGE_SIZE;
>  	}
>  
> -	/*
> -	 * The stage2 has a minimum of 2 level table (For arm64 see
> -	 * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can
> -	 * use PMD_SIZE huge mappings (even when the PMD is folded into PGD).
> -	 * As for PUD huge maps, we must make sure that we have at least
> -	 * 3 levels, i.e, PMD is not folded.
> -	 */
> -	if (vma_pagesize == PMD_SIZE ||
> -	    (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
> -		gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
> +	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
> +		fault_ipa &= huge_page_mask(hstate_vma(vma));

This looks correct to me - if !kvm_stage2_has_pmd(), then PMD is folded onto PUD
and PGD, and PMD_SIZE == PUD_SIZE. Also I like the fact that we update gfn **and**
fault_ipa, the previous version updated only gfn, which made gfn != (fault_ipa >>
PAGE_SHIFT).

> +
> +	gfn = fault_ipa >> PAGE_SHIFT;
>  	mmap_read_unlock(current->mm);
>  
> -	/* We need minimum second+third level pages */
> -	ret = kvm_mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm));
> -	if (ret)
> -		return ret;
> +	/*
> +	 * Permission faults just need to update the existing leaf entry,
> +	 * and so normally don't require allocations from the memcache. The
> +	 * only exception to this is when dirty logging is enabled at runtime
> +	 * and a write fault needs to collapse a block entry into a table.
> +	 */
> +	if (fault_status != FSC_PERM || (logging_active && write_fault)) {
> +		ret = kvm_mmu_topup_memory_cache(memcache,
> +						 kvm_mmu_cache_min_pages(kvm));
> +		if (ret)
> +			return ret;
> +	}

I'm not 100% sure about this.

I don't think we gain much over the previous code - if we had allocated cache
objects which we hadn't used, we would have used them next time user_mem_abort()
is called (kvm_mmu_topup_memory_cache() checks if we have the required number of
objects in the cache and returns early).

I'm not sure the condition is entirely correct either - if stage 2 already has a
mapping for the IPA and we only need to set write permissions, according to the
condition above we still try to topup the cache, even though we don't strictly
need to.

>  
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	/*
> @@ -1578,28 +1581,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		return -EFAULT;
>  
>  	if (kvm_is_device_pfn(pfn)) {
> -		mem_type = PAGE_S2_DEVICE;
> -		flags |= KVM_S2PTE_FLAG_IS_IOMAP;
> -	} else if (logging_active) {
> -		/*
> -		 * Faults on pages in a memslot with logging enabled
> -		 * should not be mapped with huge pages (it introduces churn
> -		 * and performance degradation), so force a pte mapping.
> -		 */
> -		flags |= KVM_S2_FLAG_LOGGING_ACTIVE;
> -
> +		device = true;
> +	} else if (logging_active && !write_fault) {
>  		/*
>  		 * Only actually map the page as writable if this was a write
>  		 * fault.
>  		 */
> -		if (!write_fault)
> -			writable = false;
> +		writable = false;
>  	}
>  
> -	if (exec_fault && is_iomap(flags))
> +	if (exec_fault && device)
>  		return -ENOEXEC;
>  
>  	spin_lock(&kvm->mmu_lock);
> +	pgt = vcpu->arch.hw_mmu->pgt;
>  	if (mmu_notifier_retry(kvm, mmu_seq))
>  		goto out_unlock;
>  
> @@ -1610,62 +1605,31 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (vma_pagesize == PAGE_SIZE && !force_pte)
>  		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>  							   &pfn, &fault_ipa);
> -	if (writable)
> +	if (writable) {
> +		prot |= KVM_PGTABLE_PROT_W;
>  		kvm_set_pfn_dirty(pfn);
> +		mark_page_dirty(kvm, gfn);

The previous code called mark_page_dirty() only if the vma_pagesize == PAGE_SIZE
(and writable was true, obviously). Is this supposed to fix a bug?

> +	}
>  
> -	if (fault_status != FSC_PERM && !is_iomap(flags))
> +	if (fault_status != FSC_PERM && !device)
>  		clean_dcache_guest_page(pfn, vma_pagesize);
>  
> -	if (exec_fault)
> +	if (exec_fault) {
> +		prot |= KVM_PGTABLE_PROT_X;
>  		invalidate_icache_guest_page(pfn, vma_pagesize);
> +	}
>  
> -	/*
> -	 * If we took an execution fault we have made the
> -	 * icache/dcache coherent above and should now let the s2
> -	 * mapping be executable.
> -	 *
> -	 * Write faults (!exec_fault && FSC_PERM) are orthogonal to
> -	 * execute permissions, and we preserve whatever we have.
> -	 */
> -	needs_exec = exec_fault ||
> -		(fault_status == FSC_PERM &&
> -		 stage2_is_exec(mmu, fault_ipa, vma_pagesize));
> -
> -	if (vma_pagesize == PUD_SIZE) {
> -		pud_t new_pud = kvm_pfn_pud(pfn, mem_type);
> -
> -		new_pud = kvm_pud_mkhuge(new_pud);
> -		if (writable)
> -			new_pud = kvm_s2pud_mkwrite(new_pud);
> -
> -		if (needs_exec)
> -			new_pud = kvm_s2pud_mkexec(new_pud);
> -
> -		ret = stage2_set_pud_huge(mmu, memcache, fault_ipa, &new_pud);
> -	} else if (vma_pagesize == PMD_SIZE) {
> -		pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
> -
> -		new_pmd = kvm_pmd_mkhuge(new_pmd);
> -
> -		if (writable)
> -			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
> -
> -		if (needs_exec)
> -			new_pmd = kvm_s2pmd_mkexec(new_pmd);
> +	if (device)
> +		prot |= KVM_PGTABLE_PROT_DEVICE;
> +	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
> +		prot |= KVM_PGTABLE_PROT_X;
>  
> -		ret = stage2_set_pmd_huge(mmu, memcache, fault_ipa, &new_pmd);
> +	if (fault_status == FSC_PERM && !(logging_active && writable)) {

I don't understand the second part of the condition (!(logging_active &&
writable)). With logging active, when we get a fault because of a missing stage 2
entry, we map the IPA as read-only at stage 2. If I understand this code
correctly, when the guest then tries to write to the same IPA, writable == true
and we map the IPA again instead of relaxing the permissions. Why is that?

Thanks,
Alex
> +		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
>  	} else {
> -		pte_t new_pte = kvm_pfn_pte(pfn, mem_type);
> -
> -		if (writable) {
> -			new_pte = kvm_s2pte_mkwrite(new_pte);
> -			mark_page_dirty(kvm, gfn);
> -		}
> -
> -		if (needs_exec)
> -			new_pte = kvm_s2pte_mkexec(new_pte);
> -
> -		ret = stage2_set_pte(mmu, memcache, fault_ipa, &new_pte, flags);
> +		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
> +					     __pfn_to_phys(pfn), prot,
> +					     memcache);
>  	}
>  
>  out_unlock:

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-09-09 14:20 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 15:23 [PATCH v4 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
2020-09-07 15:23 ` [PATCH v4 01/21] KVM: arm64: Remove kvm_mmu_free_memory_caches() Will Deacon
2020-09-07 15:23 ` [PATCH v4 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure Will Deacon
2020-09-08  0:03   ` Gavin Shan
2020-09-10 10:57     ` Will Deacon
2020-09-09 15:29   ` Alexandru Elisei
2020-09-10 12:37     ` Will Deacon
2020-09-10 14:21       ` Andrew Scull
2020-09-11 10:15         ` Will Deacon
2020-09-11 11:22           ` Andrew Scull
2020-09-07 15:23 ` [PATCH v4 03/21] KVM: arm64: Add support for creating kernel-agnostic stage-1 page tables Will Deacon
2020-09-08  1:09   ` Gavin Shan
2020-09-07 15:23 ` [PATCH v4 04/21] KVM: arm64: Use generic allocator for hyp stage-1 page-tables Will Deacon
2020-09-08  1:03   ` Gavin Shan
2020-09-07 15:23 ` [PATCH v4 05/21] KVM: arm64: Add support for creating kernel-agnostic stage-2 page tables Will Deacon
2020-09-07 15:23 ` [PATCH v4 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table Will Deacon
2020-09-10 11:20   ` Alexandru Elisei
2020-09-10 12:34     ` Will Deacon
2020-09-10 13:55       ` Alexandru Elisei
2020-09-07 15:23 ` [PATCH v4 07/21] KVM: arm64: Convert kvm_phys_addr_ioremap() to generic page-table API Will Deacon
2020-09-07 15:23 ` [PATCH v4 08/21] KVM: arm64: Convert kvm_set_spte_hva() " Will Deacon
2020-09-07 15:23 ` [PATCH v4 09/21] KVM: arm64: Convert unmap_stage2_range() " Will Deacon
2020-09-07 15:23 ` [PATCH v4 10/21] KVM: arm64: Add support for stage-2 page-aging in generic page-table Will Deacon
2020-09-08 15:30   ` Alexandru Elisei
2020-09-10 12:42     ` Will Deacon
2020-09-07 15:23 ` [PATCH v4 11/21] KVM: arm64: Convert page-aging and access faults to generic page-table API Will Deacon
2020-09-08 15:39   ` Alexandru Elisei
2020-09-07 15:23 ` [PATCH v4 12/21] KVM: arm64: Add support for stage-2 write-protect in generic page-table Will Deacon
2020-09-07 15:23 ` [PATCH v4 13/21] KVM: arm64: Convert write-protect operation to generic page-table API Will Deacon
2020-09-07 15:23 ` [PATCH v4 14/21] KVM: arm64: Add support for stage-2 cache flushing in generic page-table Will Deacon
2020-09-07 15:23 ` [PATCH v4 15/21] KVM: arm64: Convert memslot cache-flushing code to generic page-table API Will Deacon
2020-09-07 15:23 ` [PATCH v4 16/21] KVM: arm64: Add support for relaxing stage-2 perms in generic page-table code Will Deacon
2020-09-08 16:37   ` Alexandru Elisei
2020-09-07 15:23 ` [PATCH v4 17/21] KVM: arm64: Convert user_mem_abort() to generic page-table API Will Deacon
2020-09-09 14:20   ` Alexandru Elisei [this message]
2020-09-09 17:12     ` Marc Zyngier
2020-09-10 10:51       ` Will Deacon
2020-09-10 10:58         ` Marc Zyngier
2020-09-10 13:10         ` Alexandru Elisei
2020-09-10 13:20       ` Alexandru Elisei
2020-09-07 15:23 ` [PATCH v4 18/21] KVM: arm64: Check the pgt instead of the pgd when modifying page-table Will Deacon
2020-09-07 15:23 ` [PATCH v4 19/21] KVM: arm64: Remove unused page-table code Will Deacon
2020-09-08 10:33   ` Marc Zyngier
2020-09-10 10:54     ` Will Deacon
2020-09-07 15:23 ` [PATCH v4 20/21] KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu' Will Deacon
2020-09-07 15:23 ` [PATCH v4 21/21] KVM: arm64: Don't constrain maximum IPA size based on host configuration Will Deacon
2020-09-09 14:53   ` Alexandru Elisei
2020-09-07 17:16 ` [PATCH v4 00/21] KVM: arm64: Rewrite page-table code and fault handling Marc Zyngier
2020-09-07 17:31   ` Will Deacon
2020-09-10  4:06 ` Gavin Shan
2020-09-10  4:11   ` Gavin Shan
2020-09-10 10:58   ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2ae77a66-9cc4-f4e1-9e98-a50d5891cf20@arm.com \
    --to=alexandru.elisei@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=gshan@redhat.com \
    --cc=james.morse@arm.com \
    --cc=kernel-team@android.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=qperret@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).