kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] KVM: arm/arm64: user_mem_abort() assorted fixes
@ 2019-12-11 16:56 Marc Zyngier
  2019-12-11 16:56 ` [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings Marc Zyngier
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Marc Zyngier @ 2019-12-11 16:56 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel

Alexandru recently reported an interesting issue with our handling
of device mapping in user_mem_abort(), which is sligtly less than
correct. The first patch of the series address this issue, and
is a stable candidate.

While I was looking at this code, I spotted what I think is a potential
issue when handling a poisoned page, where we can race with a VMA
being removed. This second patch is mostly a RFC, as this is not
my area of expertise.

Finally, the last patch is a cleanup removing an unnecessary console
output.

Marc Zyngier (3):
  KVM: arm/arm64: Properly handle faulting of device mappings
  KVM: arm/arm64: Re-check VMA on detecting a poisoned page
  KVM: arm/arm64: Drop spurious message when faulting on a non-existent
    mapping

 virt/kvm/arm/mmu.c | 47 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 40 insertions(+), 7 deletions(-)

-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings
  2019-12-11 16:56 [PATCH 0/3] KVM: arm/arm64: user_mem_abort() assorted fixes Marc Zyngier
@ 2019-12-11 16:56 ` Marc Zyngier
  2019-12-11 17:53   ` Alexandru Elisei
                     ` (2 more replies)
  2019-12-11 16:56 ` [PATCH 2/3] KVM: arm/arm64: Re-check VMA on detecting a poisoned page Marc Zyngier
  2019-12-11 16:56 ` [PATCH 3/3] KVM: arm/arm64: Drop spurious message when faulting on a non-existent mapping Marc Zyngier
  2 siblings, 3 replies; 19+ messages in thread
From: Marc Zyngier @ 2019-12-11 16:56 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel; +Cc: stable

A device mapping is normally always mapped at Stage-2, since there
is very little gain in having it faulted in.

Nonetheless, it is possible to end-up in a situation where the device
mapping has been removed from Stage-2 (userspace munmaped the VFIO
region, and the MMU notifier did its job), but present in a userspace
mapping (userpace has mapped it back at the same address). In such
a situation, the device mapping will be demand-paged as the guest
performs memory accesses.

This requires to be careful when dealing with mapping size, cache
management, and to handle potential execution of a device mapping.

Cc: stable@vger.kernel.org
Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 virt/kvm/arm/mmu.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index a48994af70b8..0b32a904a1bb 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -38,6 +38,11 @@ static unsigned long io_map_base;
 #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
 #define KVM_S2_FLAG_LOGGING_ACTIVE	(1UL << 1)
 
+static bool is_iomap(unsigned long flags)
+{
+	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
+}
+
 static bool memslot_is_logging(struct kvm_memory_slot *memslot)
 {
 	return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
@@ -1698,6 +1703,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	vma_pagesize = vma_kernel_pagesize(vma);
 	if (logging_active ||
+	    (vma->vm_flags & VM_PFNMAP) ||
 	    !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) {
 		force_pte = true;
 		vma_pagesize = PAGE_SIZE;
@@ -1760,6 +1766,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			writable = false;
 	}
 
+	if (exec_fault && is_iomap(flags))
+		return -ENOEXEC;
+
 	spin_lock(&kvm->mmu_lock);
 	if (mmu_notifier_retry(kvm, mmu_seq))
 		goto out_unlock;
@@ -1781,7 +1790,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (writable)
 		kvm_set_pfn_dirty(pfn);
 
-	if (fault_status != FSC_PERM)
+	if (fault_status != FSC_PERM && !is_iomap(flags))
 		clean_dcache_guest_page(pfn, vma_pagesize);
 
 	if (exec_fault)
@@ -1948,9 +1957,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
 		if (is_iabt) {
 			/* Prefetch Abort on I/O address */
-			kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu));
-			ret = 1;
-			goto out_unlock;
+			ret = -ENOEXEC;
+			goto out;
 		}
 
 		/*
@@ -1992,6 +2000,11 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
 	if (ret == 0)
 		ret = 1;
+out:
+	if (ret == -ENOEXEC) {
+		kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu));
+		ret = 1;
+	}
 out_unlock:
 	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 	return ret;
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] KVM: arm/arm64: Re-check VMA on detecting a poisoned page
  2019-12-11 16:56 [PATCH 0/3] KVM: arm/arm64: user_mem_abort() assorted fixes Marc Zyngier
  2019-12-11 16:56 ` [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings Marc Zyngier
@ 2019-12-11 16:56 ` Marc Zyngier
  2019-12-12 11:33   ` Marc Zyngier
  2019-12-13  9:22   ` Christoffer Dall
  2019-12-11 16:56 ` [PATCH 3/3] KVM: arm/arm64: Drop spurious message when faulting on a non-existent mapping Marc Zyngier
  2 siblings, 2 replies; 19+ messages in thread
From: Marc Zyngier @ 2019-12-11 16:56 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel

When we check for a poisoned page, we use the VMA to tell userspace
about the looming disaster. But we pass a pointer to this VMA
after having released the mmap_sem, which isn't a good idea.

Instead, re-check that we have still have a VMA, and that this
VMA still points to a poisoned page. If the VMA isn't there,
userspace is playing with our nerves, so lety's give it a -EFAULT
(it deserves it). If the PFN isn't poisoned anymore, let's restart
from the top and handle the fault again.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 virt/kvm/arm/mmu.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 0b32a904a1bb..f73393f5ddb7 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1741,9 +1741,30 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
-		kvm_send_hwpoison_signal(hva, vma);
-		return 0;
+		/*
+		 * Search for the VMA again, as it may have been
+		 * removed in the interval...
+		 */
+		down_read(&current->mm->mmap_sem);
+		vma = find_vma_intersection(current->mm, hva, hva + 1);
+		if (vma) {
+			/*
+			 * Recheck for a poisoned page. If something changed
+			 * behind our back, don't do a thing and take the
+			 * fault again.
+			 */
+			pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
+			if (pfn == KVM_PFN_ERR_HWPOISON)
+				kvm_send_hwpoison_signal(hva, vma);
+
+			ret = 0;
+		} else {
+			ret = -EFAULT;
+		}
+		up_read(&current->mm->mmap_sem);
+		return ret;
 	}
+
 	if (is_error_noslot_pfn(pfn))
 		return -EFAULT;
 
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] KVM: arm/arm64: Drop spurious message when faulting on a non-existent mapping
  2019-12-11 16:56 [PATCH 0/3] KVM: arm/arm64: user_mem_abort() assorted fixes Marc Zyngier
  2019-12-11 16:56 ` [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings Marc Zyngier
  2019-12-11 16:56 ` [PATCH 2/3] KVM: arm/arm64: Re-check VMA on detecting a poisoned page Marc Zyngier
@ 2019-12-11 16:56 ` Marc Zyngier
  2019-12-13  9:26   ` Christoffer Dall
  2 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2019-12-11 16:56 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel

Should userspace unmap memory whilst the guest is running, we exit
with a -EFAULT, but also having spat a useless message on the console.

Get rid of it.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 virt/kvm/arm/mmu.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index f73393f5ddb7..fbfdffb8fe8e 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1696,7 +1696,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	down_read(&current->mm->mmap_sem);
 	vma = find_vma_intersection(current->mm, hva, hva + 1);
 	if (unlikely(!vma)) {
-		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
 		up_read(&current->mm->mmap_sem);
 		return -EFAULT;
 	}
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings
  2019-12-11 16:56 ` [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings Marc Zyngier
@ 2019-12-11 17:53   ` Alexandru Elisei
  2019-12-11 18:01     ` Marc Zyngier
  2019-12-12 15:35   ` James Morse
  2019-12-13  8:29   ` Christoffer Dall
  2 siblings, 1 reply; 19+ messages in thread
From: Alexandru Elisei @ 2019-12-11 17:53 UTC (permalink / raw)
  To: Marc Zyngier, kvm, kvmarm, linux-arm-kernel; +Cc: stable

Hi,

On 12/11/19 4:56 PM, Marc Zyngier wrote:
> A device mapping is normally always mapped at Stage-2, since there
> is very little gain in having it faulted in.
>
> Nonetheless, it is possible to end-up in a situation where the device
> mapping has been removed from Stage-2 (userspace munmaped the VFIO
> region, and the MMU notifier did its job), but present in a userspace
> mapping (userpace has mapped it back at the same address). In such
> a situation, the device mapping will be demand-paged as the guest
> performs memory accesses.
>
> This requires to be careful when dealing with mapping size, cache
> management, and to handle potential execution of a device mapping.
>
> Cc: stable@vger.kernel.org
> Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  virt/kvm/arm/mmu.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index a48994af70b8..0b32a904a1bb 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -38,6 +38,11 @@ static unsigned long io_map_base;
>  #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
>  #define KVM_S2_FLAG_LOGGING_ACTIVE	(1UL << 1)
>  
> +static bool is_iomap(unsigned long flags)
> +{
> +	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
> +}
> +
>  static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>  {
>  	return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
> @@ -1698,6 +1703,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  
>  	vma_pagesize = vma_kernel_pagesize(vma);
>  	if (logging_active ||
> +	    (vma->vm_flags & VM_PFNMAP) ||
>  	    !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) {
>  		force_pte = true;
>  		vma_pagesize = PAGE_SIZE;
> @@ -1760,6 +1766,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			writable = false;
>  	}
>  
> +	if (exec_fault && is_iomap(flags))
> +		return -ENOEXEC;
> +
>  	spin_lock(&kvm->mmu_lock);
>  	if (mmu_notifier_retry(kvm, mmu_seq))
>  		goto out_unlock;
> @@ -1781,7 +1790,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (writable)
>  		kvm_set_pfn_dirty(pfn);
>  
> -	if (fault_status != FSC_PERM)
> +	if (fault_status != FSC_PERM && !is_iomap(flags))
>  		clean_dcache_guest_page(pfn, vma_pagesize);
>  
>  	if (exec_fault)
> @@ -1948,9 +1957,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
>  		if (is_iabt) {
>  			/* Prefetch Abort on I/O address */
> -			kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> -			ret = 1;
> -			goto out_unlock;
> +			ret = -ENOEXEC;
> +			goto out;
>  		}
>  
>  		/*
> @@ -1992,6 +2000,11 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
>  	if (ret == 0)
>  		ret = 1;
> +out:
> +	if (ret == -ENOEXEC) {
> +		kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> +		ret = 1;
> +	}
>  out_unlock:
>  	srcu_read_unlock(&vcpu->kvm->srcu, idx);
>  	return ret;

I've tested this patch using the scenario that you described in the commit
message. I've also tested it with the following scenarios:

- The region is mmap'ed as backed by a VFIO device fd and the memslot is created,
it is munmap'ed, then mmap'ed as anonymous.
- The region is mmap'ed as anonymous and the memslot is created, it is munmap'ed,
then mmap'ed as backed by a VFIO device fd.

Everything worked as expected, so:

Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>

Just a nitpick, but stage2_set_pte has a local variable iomap which can be
replaced with a call to is_iomap.

Thanks,
Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings
  2019-12-11 17:53   ` Alexandru Elisei
@ 2019-12-11 18:01     ` Marc Zyngier
  0 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2019-12-11 18:01 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: kvm, stable, kvmarm, linux-arm-kernel

Hi Alex,

On 2019-12-11 17:53, Alexandru Elisei wrote:
> Hi,
>
> On 12/11/19 4:56 PM, Marc Zyngier wrote:
>> A device mapping is normally always mapped at Stage-2, since there
>> is very little gain in having it faulted in.
>>
>> Nonetheless, it is possible to end-up in a situation where the 
>> device
>> mapping has been removed from Stage-2 (userspace munmaped the VFIO
>> region, and the MMU notifier did its job), but present in a 
>> userspace
>> mapping (userpace has mapped it back at the same address). In such
>> a situation, the device mapping will be demand-paged as the guest
>> performs memory accesses.
>>
>> This requires to be careful when dealing with mapping size, cache
>> management, and to handle potential execution of a device mapping.
>>
>> Cc: stable@vger.kernel.org
>> Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> ---
>>  virt/kvm/arm/mmu.c | 21 +++++++++++++++++----
>>  1 file changed, 17 insertions(+), 4 deletions(-)

[...]

> I've tested this patch using the scenario that you described in the 
> commit
> message. I've also tested it with the following scenarios:
>
> - The region is mmap'ed as backed by a VFIO device fd and the memslot
> is created,
> it is munmap'ed, then mmap'ed as anonymous.
> - The region is mmap'ed as anonymous and the memslot is created, it
> is munmap'ed,
> then mmap'ed as backed by a VFIO device fd.
>
> Everything worked as expected, so:
>
> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks for that!

> Just a nitpick, but stage2_set_pte has a local variable iomap which 
> can be
> replaced with a call to is_iomap.

Yeah, I noticed. I'm just trying to keep the patch relatively small so
that it can be easily backported. The cleanup can come later! ;-)

Cheers,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] KVM: arm/arm64: Re-check VMA on detecting a poisoned page
  2019-12-11 16:56 ` [PATCH 2/3] KVM: arm/arm64: Re-check VMA on detecting a poisoned page Marc Zyngier
@ 2019-12-12 11:33   ` Marc Zyngier
  2019-12-12 15:34     ` James Morse
  2019-12-13  9:22   ` Christoffer Dall
  1 sibling, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2019-12-12 11:33 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel, james.morse

On 2019-12-11 16:56, Marc Zyngier wrote:
> When we check for a poisoned page, we use the VMA to tell userspace
> about the looming disaster. But we pass a pointer to this VMA
> after having released the mmap_sem, which isn't a good idea.
>
> Instead, re-check that we have still have a VMA, and that this
> VMA still points to a poisoned page. If the VMA isn't there,
> userspace is playing with our nerves, so lety's give it a -EFAULT
> (it deserves it). If the PFN isn't poisoned anymore, let's restart
> from the top and handle the fault again.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  virt/kvm/arm/mmu.c | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 0b32a904a1bb..f73393f5ddb7 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1741,9 +1741,30 @@ static int user_mem_abort(struct kvm_vcpu
> *vcpu, phys_addr_t fault_ipa,
>
>  	pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
>  	if (pfn == KVM_PFN_ERR_HWPOISON) {
> -		kvm_send_hwpoison_signal(hva, vma);
> -		return 0;
> +		/*
> +		 * Search for the VMA again, as it may have been
> +		 * removed in the interval...
> +		 */
> +		down_read(&current->mm->mmap_sem);
> +		vma = find_vma_intersection(current->mm, hva, hva + 1);
> +		if (vma) {
> +			/*
> +			 * Recheck for a poisoned page. If something changed
> +			 * behind our back, don't do a thing and take the
> +			 * fault again.
> +			 */
> +			pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
> +			if (pfn == KVM_PFN_ERR_HWPOISON)
> +				kvm_send_hwpoison_signal(hva, vma);
> +
> +			ret = 0;
> +		} else {
> +			ret = -EFAULT;
> +		}
> +		up_read(&current->mm->mmap_sem);
> +		return ret;
>  	}
> +
>  	if (is_error_noslot_pfn(pfn))
>  		return -EFAULT;

Revisiting this, I wonder if we're not better off just holding the 
mmap_sem
for a bit longer. Something like:

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 0b32a904a1bb..87d416d000c6 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1719,13 +1719,13 @@ static int user_mem_abort(struct kvm_vcpu 
*vcpu, phys_addr_t fault_ipa,
  	if (vma_pagesize == PMD_SIZE ||
  	    (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
  		gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
-	up_read(&current->mm->mmap_sem);
-
  	/* We need minimum second+third level pages */
  	ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm),
  				     KVM_NR_MEM_OBJS);
-	if (ret)
+	if (ret) {
+		up_read(&current->mm->mmap_sem);
  		return ret;
+	}

  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
  	/*
@@ -1742,8 +1742,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
  	pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
  	if (pfn == KVM_PFN_ERR_HWPOISON) {
  		kvm_send_hwpoison_signal(hva, vma);
+		up_read(&current->mm->mmap_sem);
  		return 0;
  	}
+
+	up_read(&current->mm->mmap_sem);
+
  	if (is_error_noslot_pfn(pfn))
  		return -EFAULT;


James, what do you think?

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] KVM: arm/arm64: Re-check VMA on detecting a poisoned page
  2019-12-12 11:33   ` Marc Zyngier
@ 2019-12-12 15:34     ` James Morse
  2019-12-12 15:40       ` Marc Zyngier
  2019-12-13  9:25       ` Christoffer Dall
  0 siblings, 2 replies; 19+ messages in thread
From: James Morse @ 2019-12-12 15:34 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-arm-kernel, kvmarm, kvm

Hi Marc,

On 12/12/2019 11:33, Marc Zyngier wrote:
> On 2019-12-11 16:56, Marc Zyngier wrote:
>> When we check for a poisoned page, we use the VMA to tell userspace
>> about the looming disaster. But we pass a pointer to this VMA
>> after having released the mmap_sem, which isn't a good idea.

Sounds like a bug! The vma-size might not match the poisoned pfn.


>> Instead, re-check that we have still have a VMA, and that this
>> VMA still points to a poisoned page. If the VMA isn't there,
>> userspace is playing with our nerves, so lety's give it a -EFAULT
>> (it deserves it). If the PFN isn't poisoned anymore, let's restart
>> from the top and handle the fault again.


>>  virt/kvm/arm/mmu.c | 25 +++++++++++++++++++++++--
>>  1 file changed, 23 insertions(+), 2 deletions(-)

... yeah ...

>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 0b32a904a1bb..f73393f5ddb7 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1741,9 +1741,30 @@ static int user_mem_abort(struct kvm_vcpu
>> *vcpu, phys_addr_t fault_ipa,
>>
>>      pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
>>      if (pfn == KVM_PFN_ERR_HWPOISON) {
>> -        kvm_send_hwpoison_signal(hva, vma);
>> -        return 0;
>> +        /*
>> +         * Search for the VMA again, as it may have been
>> +         * removed in the interval...
>> +         */
>> +        down_read(&current->mm->mmap_sem);
>> +        vma = find_vma_intersection(current->mm, hva, hva + 1);
>> +        if (vma) {
>> +            /*
>> +             * Recheck for a poisoned page. If something changed
>> +             * behind our back, don't do a thing and take the
>> +             * fault again.
>> +             */
>> +            pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
>> +            if (pfn == KVM_PFN_ERR_HWPOISON)
>> +                kvm_send_hwpoison_signal(hva, vma);
>> +
>> +            ret = 0;
>> +        } else {
>> +            ret = -EFAULT;
>> +        }
>> +        up_read(&current->mm->mmap_sem);
>> +        return ret;
>>      }
>> +
>>      if (is_error_noslot_pfn(pfn))
>>          return -EFAULT;


> Revisiting this, I wonder if we're not better off just holding the mmap_sem
> for a bit longer. Something like:
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 0b32a904a1bb..87d416d000c6 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1719,13 +1719,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t
> fault_ipa,
>      if (vma_pagesize == PMD_SIZE ||
>          (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
>          gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
> -    up_read(&current->mm->mmap_sem);
> -
>      /* We need minimum second+third level pages */
>      ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm),
>                       KVM_NR_MEM_OBJS);
> -    if (ret)
> +    if (ret) {
> +        up_read(&current->mm->mmap_sem);
>          return ret;
> +    }
> 
>      mmu_seq = vcpu->kvm->mmu_notifier_seq;
>      /*
> @@ -1742,8 +1742,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t
> fault_ipa,
>      pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
>      if (pfn == KVM_PFN_ERR_HWPOISON) {
>          kvm_send_hwpoison_signal(hva, vma);
> +        up_read(&current->mm->mmap_sem);
>          return 0;
>      }
> +
> +    up_read(&current->mm->mmap_sem);
> +
>      if (is_error_noslot_pfn(pfn))
>          return -EFAULT;
> 
> 
> James, what do you think?

(allocating from a kmemcache while holding current's mmap_sem. I don't want to think about
it!)

Can we be lazier? We want the VMA to get the size of the poisoned mapping correct in the
signal. The bug is that this could change when we drop the lock, before queuing the
signal, so we report hwpoison on old-vmas:pfn with new-vmas:size.

Can't it equally change when we drop the lock after queuing the signal? Any time before
the thread returns to user-space to take the signal gives us a stale value.

I think all that matters is the size goes with the pfn that was poisoned. If we look the
vma up by hva again, we have to check if the pfn has changed too... (which you are doing)

Can we stash the size in the existing mmap_sem region, and use that in
kvm_send_hwpoison_signal()? We know it matches the pfn we saw as poisoned.

The vma could be changed before/after we send the signal, but user-space can't know which.
This is user-spaces' problem for messing with the memslots while a vpcu is running.


How about (untested):
-------------------------%<-------------------------
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 38b4c910b6c3..80212d4935bd 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1591,16 +1591,8 @@ static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned
long size)
        __invalidate_icache_guest_page(pfn, size);
 }

-static void kvm_send_hwpoison_signal(unsigned long address,
-                                    struct vm_area_struct *vma)
+static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
 {
-       short lsb;
-
-       if (is_vm_hugetlb_page(vma))
-               lsb = huge_page_shift(hstate_vma(vma));
-       else
-               lsb = PAGE_SHIFT;
-
        send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
 }

@@ -1673,6 +1665,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
        struct kvm *kvm = vcpu->kvm;
        struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
        struct vm_area_struct *vma;
+       short stage1_vma_size;
        kvm_pfn_t pfn;
        pgprot_t mem_type = PAGE_S2;
        bool logging_active = memslot_is_logging(memslot);

@@ -1703,6 +1696,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
                vma_pagesize = PAGE_SIZE;
        }

+       /* For signals due to hwpoison, we need to use the stage1 size */
+       if (is_vm_hugetlb_page(vma))
+               stage1_vma_size = huge_page_shift(hstate_vma(vma));
+       else
+               stage1_vma_size = PAGE_SHIFT;
+
        /*
         * The stage2 has a minimum of 2 level table (For arm64 see
         * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can
@@ -1735,7 +1734,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,

        pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
        if (pfn == KVM_PFN_ERR_HWPOISON) {
-               kvm_send_hwpoison_signal(hva, vma);
+               kvm_send_hwpoison_signal(hva, stage1_vma_size);
                return 0;
        }
        if (is_error_noslot_pfn(pfn))
-------------------------%<-------------------------

Its possible this could even be the original output of vma_kernel_pagesize()... (Punit
supplied the original huge_page_shift(hstate_vma()) runes...)



Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings
  2019-12-11 16:56 ` [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings Marc Zyngier
  2019-12-11 17:53   ` Alexandru Elisei
@ 2019-12-12 15:35   ` James Morse
  2019-12-13  8:29   ` Christoffer Dall
  2 siblings, 0 replies; 19+ messages in thread
From: James Morse @ 2019-12-12 15:35 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, stable, kvmarm, linux-arm-kernel

Hi Marc,

On 11/12/2019 16:56, Marc Zyngier wrote:
> A device mapping is normally always mapped at Stage-2, since there
> is very little gain in having it faulted in.
> 
> Nonetheless, it is possible to end-up in a situation where the device
> mapping has been removed from Stage-2 (userspace munmaped the VFIO
> region, and the MMU notifier did its job), but present in a userspace
> mapping (userpace has mapped it back at the same address). In such
> a situation, the device mapping will be demand-paged as the guest
> performs memory accesses.
> 
> This requires to be careful when dealing with mapping size, cache
> management, and to handle potential execution of a device mapping.

Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] KVM: arm/arm64: Re-check VMA on detecting a poisoned page
  2019-12-12 15:34     ` James Morse
@ 2019-12-12 15:40       ` Marc Zyngier
  2019-12-13  9:25       ` Christoffer Dall
  1 sibling, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2019-12-12 15:40 UTC (permalink / raw)
  To: James Morse; +Cc: linux-arm-kernel, kvmarm, kvm

Hi James,

On 2019-12-12 15:34, James Morse wrote:
> Hi Marc,
>
> On 12/12/2019 11:33, Marc Zyngier wrote:
>> On 2019-12-11 16:56, Marc Zyngier wrote:
>>> When we check for a poisoned page, we use the VMA to tell userspace
>>> about the looming disaster. But we pass a pointer to this VMA
>>> after having released the mmap_sem, which isn't a good idea.
>
> Sounds like a bug! The vma-size might not match the poisoned pfn.
>
>
>>> Instead, re-check that we have still have a VMA, and that this
>>> VMA still points to a poisoned page. If the VMA isn't there,
>>> userspace is playing with our nerves, so lety's give it a -EFAULT
>>> (it deserves it). If the PFN isn't poisoned anymore, let's restart
>>> from the top and handle the fault again.
>
>
>>>  virt/kvm/arm/mmu.c | 25 +++++++++++++++++++++++--
>>>  1 file changed, 23 insertions(+), 2 deletions(-)
>
> ... yeah ...
>

[...]

> How about (untested):
> -------------------------%<-------------------------
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 38b4c910b6c3..80212d4935bd 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1591,16 +1591,8 @@ static void
> invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned
> long size)
>         __invalidate_icache_guest_page(pfn, size);
>  }
>
> -static void kvm_send_hwpoison_signal(unsigned long address,
> -                                    struct vm_area_struct *vma)
> +static void kvm_send_hwpoison_signal(unsigned long address, short 
> lsb)
>  {
> -       short lsb;
> -
> -       if (is_vm_hugetlb_page(vma))
> -               lsb = huge_page_shift(hstate_vma(vma));
> -       else
> -               lsb = PAGE_SHIFT;
> -
>         send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, 
> current);
>  }
>
> @@ -1673,6 +1665,7 @@ static int user_mem_abort(struct kvm_vcpu
> *vcpu, phys_addr_t fault_ipa,
>         struct kvm *kvm = vcpu->kvm;
>         struct kvm_mmu_memory_cache *memcache = 
> &vcpu->arch.mmu_page_cache;
>         struct vm_area_struct *vma;
> +       short stage1_vma_size;
>         kvm_pfn_t pfn;
>         pgprot_t mem_type = PAGE_S2;
>         bool logging_active = memslot_is_logging(memslot);
>
> @@ -1703,6 +1696,12 @@ static int user_mem_abort(struct kvm_vcpu
> *vcpu, phys_addr_t fault_ipa,
>                 vma_pagesize = PAGE_SIZE;
>         }
>
> +       /* For signals due to hwpoison, we need to use the stage1 
> size */
> +       if (is_vm_hugetlb_page(vma))
> +               stage1_vma_size = huge_page_shift(hstate_vma(vma));
> +       else
> +               stage1_vma_size = PAGE_SHIFT;
> +
>         /*
>          * The stage2 has a minimum of 2 level table (For arm64 see
>          * kvm_arm_setup_stage2()). Hence, we are guaranteed that we 
> can
> @@ -1735,7 +1734,7 @@ static int user_mem_abort(struct kvm_vcpu
> *vcpu, phys_addr_t fault_ipa,
>
>         pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
>         if (pfn == KVM_PFN_ERR_HWPOISON) {
> -               kvm_send_hwpoison_signal(hva, vma);
> +               kvm_send_hwpoison_signal(hva, stage1_vma_size);
>                 return 0;
>         }
>         if (is_error_noslot_pfn(pfn))
> -------------------------%<-------------------------
>
> Its possible this could even be the original output of
> vma_kernel_pagesize()... (Punit supplied the original
> huge_page_shift(hstate_vma()) runes...)

I'd be happy with something along these lines. Any chance you could
a proper patch?

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings
  2019-12-11 16:56 ` [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings Marc Zyngier
  2019-12-11 17:53   ` Alexandru Elisei
  2019-12-12 15:35   ` James Morse
@ 2019-12-13  8:29   ` Christoffer Dall
  2019-12-13  9:28     ` Marc Zyngier
  2 siblings, 1 reply; 19+ messages in thread
From: Christoffer Dall @ 2019-12-13  8:29 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, stable, kvmarm, linux-arm-kernel

Hi Marc,

On Wed, Dec 11, 2019 at 04:56:48PM +0000, Marc Zyngier wrote:
> A device mapping is normally always mapped at Stage-2, since there
> is very little gain in having it faulted in.

It is actually becoming less clear to me what the real benefits of
pre-populating the stage 2 page table are, especially given that we can
provoke a situation where they're faulted in anyhow.  Do you recall if
we had any specific case that motivated us to pre-fault in the pages?

> 
> Nonetheless, it is possible to end-up in a situation where the device
> mapping has been removed from Stage-2 (userspace munmaped the VFIO
> region, and the MMU notifier did its job), but present in a userspace
> mapping (userpace has mapped it back at the same address). In such
> a situation, the device mapping will be demand-paged as the guest
> performs memory accesses.
> 
> This requires to be careful when dealing with mapping size, cache
> management, and to handle potential execution of a device mapping.
> 
> Cc: stable@vger.kernel.org
> Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  virt/kvm/arm/mmu.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index a48994af70b8..0b32a904a1bb 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -38,6 +38,11 @@ static unsigned long io_map_base;
>  #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
>  #define KVM_S2_FLAG_LOGGING_ACTIVE	(1UL << 1)
>  
> +static bool is_iomap(unsigned long flags)
> +{
> +	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
> +}
> +

nit: I'm not really sure this indirection makes the code more readable,
but I guess that's a matter of taste.

>  static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>  {
>  	return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
> @@ -1698,6 +1703,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  
>  	vma_pagesize = vma_kernel_pagesize(vma);
>  	if (logging_active ||
> +	    (vma->vm_flags & VM_PFNMAP) ||

WHat is actually the rationale for this?

Why is a huge mapping not permitted to device memory?

Are we guaranteed that VM_PFNMAP on the vma results in device mappings?
I'm not convinced this is the case, and it would be better if we can
stick to a single primitive (either kvm_is_device_pfn, or VM_PFNMAP) to
detect device mappings.

As a subsequent patch, I'd like to make sure that at the very least our
memslot prepare function follows the exact same logic for mapping device
memory as a fault-in approach does, or that we simply always fault pages
in.

>  	    !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) {
>  		force_pte = true;
>  		vma_pagesize = PAGE_SIZE;
> @@ -1760,6 +1766,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			writable = false;
>  	}
>  
> +	if (exec_fault && is_iomap(flags))
> +		return -ENOEXEC;
> +

nit: why don't you just do this when checking kvm_is_device_pfn() and
avoid having logic in two places to deal with this case?

>  	spin_lock(&kvm->mmu_lock);
>  	if (mmu_notifier_retry(kvm, mmu_seq))
>  		goto out_unlock;
> @@ -1781,7 +1790,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (writable)
>  		kvm_set_pfn_dirty(pfn);
>  
> -	if (fault_status != FSC_PERM)
> +	if (fault_status != FSC_PERM && !is_iomap(flags))
>  		clean_dcache_guest_page(pfn, vma_pagesize);
>  
>  	if (exec_fault)
> @@ -1948,9 +1957,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
>  		if (is_iabt) {
>  			/* Prefetch Abort on I/O address */
> -			kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> -			ret = 1;
> -			goto out_unlock;
> +			ret = -ENOEXEC;
> +			goto out;
>  		}
>  
>  		/*
> @@ -1992,6 +2000,11 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
>  	if (ret == 0)
>  		ret = 1;
> +out:
> +	if (ret == -ENOEXEC) {
> +		kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> +		ret = 1;
> +	}
>  out_unlock:
>  	srcu_read_unlock(&vcpu->kvm->srcu, idx);
>  	return ret;
> -- 
> 2.20.1
> 

I can't seem to decide for myself if I think there's a sematic
difference between trying to execute from somewhere the VMM has
explicitly told us is device memory and from somewhere which we happen
to have mapped with VM_PFNMAP from user space.  But I also can't seem to
really fault it (pun intended).  Thoughts?


Thanks,

    Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] KVM: arm/arm64: Re-check VMA on detecting a poisoned page
  2019-12-11 16:56 ` [PATCH 2/3] KVM: arm/arm64: Re-check VMA on detecting a poisoned page Marc Zyngier
  2019-12-12 11:33   ` Marc Zyngier
@ 2019-12-13  9:22   ` Christoffer Dall
  2019-12-16 18:29     ` James Morse
  1 sibling, 1 reply; 19+ messages in thread
From: Christoffer Dall @ 2019-12-13  9:22 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, kvmarm, linux-arm-kernel

On Wed, Dec 11, 2019 at 04:56:49PM +0000, Marc Zyngier wrote:
> When we check for a poisoned page, we use the VMA to tell userspace
> about the looming disaster. But we pass a pointer to this VMA
> after having released the mmap_sem, which isn't a good idea.
> 
> Instead, re-check that we have still have a VMA, and that this
> VMA still points to a poisoned page. If the VMA isn't there,
> userspace is playing with our nerves, so lety's give it a -EFAULT
> (it deserves it). If the PFN isn't poisoned anymore, let's restart
> from the top and handle the fault again.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  virt/kvm/arm/mmu.c | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 0b32a904a1bb..f73393f5ddb7 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1741,9 +1741,30 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  
>  	pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
>  	if (pfn == KVM_PFN_ERR_HWPOISON) {
> -		kvm_send_hwpoison_signal(hva, vma);
> -		return 0;
> +		/*
> +		 * Search for the VMA again, as it may have been
> +		 * removed in the interval...
> +		 */
> +		down_read(&current->mm->mmap_sem);
> +		vma = find_vma_intersection(current->mm, hva, hva + 1);
> +		if (vma) {
> +			/*
> +			 * Recheck for a poisoned page. If something changed
> +			 * behind our back, don't do a thing and take the
> +			 * fault again.
> +			 */
> +			pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
> +			if (pfn == KVM_PFN_ERR_HWPOISON)
> +				kvm_send_hwpoison_signal(hva, vma);
> +
> +			ret = 0;
> +		} else {
> +			ret = -EFAULT;
> +		}
> +		up_read(&current->mm->mmap_sem);
> +		return ret;
>  	}
> +
>  	if (is_error_noslot_pfn(pfn))
>  		return -EFAULT;
>  
> -- 
> 2.20.1
> 

If I read this code correctly, then all we use the VMA for is to find
the page size used by the MMU to back the VMA, which we've already
established in the vma_pagesize (and possibly adjusted to something more
accurate based on our constraints in stage 2 which generated the error),
so all we need is the size and a way to convert that into a shift.

Not being 100% confident about the semantics of the lsb bit we pass to
user space (is it indicating the size of the mapping which caused the
error or the size of the mapping where user space could potentially
trigger an error?), or wheter we care enough at that level, could we
consider something like the following instead?

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 38b4c910b6c3..2509d9dec42d 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1592,15 +1592,9 @@ static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address,
-				     struct vm_area_struct *vma)
+				     unsigned long vma_pagesize)
 {
-	short lsb;
-
-	if (is_vm_hugetlb_page(vma))
-		lsb = huge_page_shift(hstate_vma(vma));
-	else
-		lsb = PAGE_SHIFT;
-
+	short lsb = __ffs(vma_pagesize);
 	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
 }
 
@@ -1735,7 +1729,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
-		kvm_send_hwpoison_signal(hva, vma);
+		kvm_send_hwpoison_signal(hva, vma_pagesize);
 		return 0;
 	}
 	if (is_error_noslot_pfn(pfn))


Thanks,

    Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] KVM: arm/arm64: Re-check VMA on detecting a poisoned page
  2019-12-12 15:34     ` James Morse
  2019-12-12 15:40       ` Marc Zyngier
@ 2019-12-13  9:25       ` Christoffer Dall
  1 sibling, 0 replies; 19+ messages in thread
From: Christoffer Dall @ 2019-12-13  9:25 UTC (permalink / raw)
  To: James Morse; +Cc: Marc Zyngier, kvmarm, linux-arm-kernel, kvm

Hi James,

On Thu, Dec 12, 2019 at 03:34:31PM +0000, James Morse wrote:
> Hi Marc,
> 
> On 12/12/2019 11:33, Marc Zyngier wrote:
> > On 2019-12-11 16:56, Marc Zyngier wrote:

[...]

> 
> (allocating from a kmemcache while holding current's mmap_sem. I don't want to think about
> it!)
> 
> Can we be lazier? We want the VMA to get the size of the poisoned mapping correct in the
> signal. The bug is that this could change when we drop the lock, before queuing the
> signal, so we report hwpoison on old-vmas:pfn with new-vmas:size.
> 
> Can't it equally change when we drop the lock after queuing the signal? Any time before
> the thread returns to user-space to take the signal gives us a stale value.
> 
> I think all that matters is the size goes with the pfn that was poisoned. If we look the
> vma up by hva again, we have to check if the pfn has changed too... (which you are doing)
> 
> Can we stash the size in the existing mmap_sem region, and use that in
> kvm_send_hwpoison_signal()? We know it matches the pfn we saw as poisoned.
> 
> The vma could be changed before/after we send the signal, but user-space can't know which.
> This is user-spaces' problem for messing with the memslots while a vpcu is running.
> 

(I should clearly have expanded this thread before I replied to the
original patch...)

> 
> How about (untested):
> -------------------------%<-------------------------
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 38b4c910b6c3..80212d4935bd 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1591,16 +1591,8 @@ static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned
> long size)
>         __invalidate_icache_guest_page(pfn, size);
>  }
> 
> -static void kvm_send_hwpoison_signal(unsigned long address,
> -                                    struct vm_area_struct *vma)
> +static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
>  {
> -       short lsb;
> -
> -       if (is_vm_hugetlb_page(vma))
> -               lsb = huge_page_shift(hstate_vma(vma));
> -       else
> -               lsb = PAGE_SHIFT;
> -
>         send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
>  }
> 
> @@ -1673,6 +1665,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         struct kvm *kvm = vcpu->kvm;
>         struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>         struct vm_area_struct *vma;
> +       short stage1_vma_size;
>         kvm_pfn_t pfn;
>         pgprot_t mem_type = PAGE_S2;
>         bool logging_active = memslot_is_logging(memslot);
> 
> @@ -1703,6 +1696,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                 vma_pagesize = PAGE_SIZE;
>         }
> 
> +       /* For signals due to hwpoison, we need to use the stage1 size */
> +       if (is_vm_hugetlb_page(vma))
> +               stage1_vma_size = huge_page_shift(hstate_vma(vma));
> +       else
> +               stage1_vma_size = PAGE_SHIFT;
> +

But (see my patch) as far as I can tell, this is already what we have in
vma_pagesize, and do we really have to provide the stage 1 size to user
space if the fault happened within a smaller boundary?  Isn't that just
providing more precise information to the user?


Thanks,

    Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] KVM: arm/arm64: Drop spurious message when faulting on a non-existent mapping
  2019-12-11 16:56 ` [PATCH 3/3] KVM: arm/arm64: Drop spurious message when faulting on a non-existent mapping Marc Zyngier
@ 2019-12-13  9:26   ` Christoffer Dall
  0 siblings, 0 replies; 19+ messages in thread
From: Christoffer Dall @ 2019-12-13  9:26 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, kvmarm, linux-arm-kernel

On Wed, Dec 11, 2019 at 04:56:50PM +0000, Marc Zyngier wrote:
> Should userspace unmap memory whilst the guest is running, we exit
> with a -EFAULT, but also having spat a useless message on the console.
> 
> Get rid of it.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>

> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  virt/kvm/arm/mmu.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index f73393f5ddb7..fbfdffb8fe8e 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1696,7 +1696,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	down_read(&current->mm->mmap_sem);
>  	vma = find_vma_intersection(current->mm, hva, hva + 1);
>  	if (unlikely(!vma)) {
> -		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
>  		up_read(&current->mm->mmap_sem);
>  		return -EFAULT;
>  	}
> -- 
> 2.20.1
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings
  2019-12-13  8:29   ` Christoffer Dall
@ 2019-12-13  9:28     ` Marc Zyngier
  2019-12-13 11:14       ` Christoffer Dall
  0 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2019-12-13  9:28 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, stable, kvmarm, linux-arm-kernel

Hi Christoffer,

On 2019-12-13 08:29, Christoffer Dall wrote:
> Hi Marc,
>
> On Wed, Dec 11, 2019 at 04:56:48PM +0000, Marc Zyngier wrote:
>> A device mapping is normally always mapped at Stage-2, since there
>> is very little gain in having it faulted in.
>
> It is actually becoming less clear to me what the real benefits of
> pre-populating the stage 2 page table are, especially given that we 
> can
> provoke a situation where they're faulted in anyhow.  Do you recall 
> if
> we had any specific case that motivated us to pre-fault in the pages?

It's only a minor performance optimization that was introduced by Ard 
in
8eef91239e57d. Which makes sense for platform devices that have a 
single
fixed location in memory. It makes slightly less sense for PCI, where
you can move things around.

>> Nonetheless, it is possible to end-up in a situation where the 
>> device
>> mapping has been removed from Stage-2 (userspace munmaped the VFIO
>> region, and the MMU notifier did its job), but present in a 
>> userspace
>> mapping (userpace has mapped it back at the same address). In such
>> a situation, the device mapping will be demand-paged as the guest
>> performs memory accesses.
>>
>> This requires to be careful when dealing with mapping size, cache
>> management, and to handle potential execution of a device mapping.
>>
>> Cc: stable@vger.kernel.org
>> Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> ---
>>  virt/kvm/arm/mmu.c | 21 +++++++++++++++++----
>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index a48994af70b8..0b32a904a1bb 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -38,6 +38,11 @@ static unsigned long io_map_base;
>>  #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
>>  #define KVM_S2_FLAG_LOGGING_ACTIVE	(1UL << 1)
>>
>> +static bool is_iomap(unsigned long flags)
>> +{
>> +	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
>> +}
>> +
>
> nit: I'm not really sure this indirection makes the code more 
> readable,
> but I guess that's a matter of taste.
>
>>  static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>>  {
>>  	return memslot->dirty_bitmap && !(memslot->flags & 
>> KVM_MEM_READONLY);
>> @@ -1698,6 +1703,7 @@ static int user_mem_abort(struct kvm_vcpu 
>> *vcpu, phys_addr_t fault_ipa,
>>
>>  	vma_pagesize = vma_kernel_pagesize(vma);
>>  	if (logging_active ||
>> +	    (vma->vm_flags & VM_PFNMAP) ||
>
> WHat is actually the rationale for this?
>
> Why is a huge mapping not permitted to device memory?
>
> Are we guaranteed that VM_PFNMAP on the vma results in device 
> mappings?
> I'm not convinced this is the case, and it would be better if we can
> stick to a single primitive (either kvm_is_device_pfn, or VM_PFNMAP) 
> to
> detect device mappings.

For now, I've tried to keep the two paths that deal with mapping 
devices
(or rather, things that we interpret as devices) as close as possible.
If we drop the "eager" mapping, then we're at liberty to restructure
this in creative ways.

This includes potential huge mappings, but I'm not sure the rest of the
kernel uses them for devices anyway (I need to find out).

> As a subsequent patch, I'd like to make sure that at the very least 
> our
> memslot prepare function follows the exact same logic for mapping 
> device
> memory as a fault-in approach does, or that we simply always fault 
> pages
> in.

As far as I can see, the two approach are now identical. Am I missing 
something?
And yes, getting rid of the eager mapping works for me.

>
>>  	    !fault_supports_stage2_huge_mapping(memslot, hva, 
>> vma_pagesize)) {
>>  		force_pte = true;
>>  		vma_pagesize = PAGE_SIZE;
>> @@ -1760,6 +1766,9 @@ static int user_mem_abort(struct kvm_vcpu 
>> *vcpu, phys_addr_t fault_ipa,
>>  			writable = false;
>>  	}
>>
>> +	if (exec_fault && is_iomap(flags))
>> +		return -ENOEXEC;
>> +
>
> nit: why don't you just do this when checking kvm_is_device_pfn() and
> avoid having logic in two places to deal with this case?

Good point. I've already sent the PR, but that could be a further 
cleanup.

>
>>  	spin_lock(&kvm->mmu_lock);
>>  	if (mmu_notifier_retry(kvm, mmu_seq))
>>  		goto out_unlock;
>> @@ -1781,7 +1790,7 @@ static int user_mem_abort(struct kvm_vcpu 
>> *vcpu, phys_addr_t fault_ipa,
>>  	if (writable)
>>  		kvm_set_pfn_dirty(pfn);
>>
>> -	if (fault_status != FSC_PERM)
>> +	if (fault_status != FSC_PERM && !is_iomap(flags))
>>  		clean_dcache_guest_page(pfn, vma_pagesize);
>>
>>  	if (exec_fault)
>> @@ -1948,9 +1957,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu 
>> *vcpu, struct kvm_run *run)
>>  	if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
>>  		if (is_iabt) {
>>  			/* Prefetch Abort on I/O address */
>> -			kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu));
>> -			ret = 1;
>> -			goto out_unlock;
>> +			ret = -ENOEXEC;
>> +			goto out;
>>  		}
>>
>>  		/*
>> @@ -1992,6 +2000,11 @@ int kvm_handle_guest_abort(struct kvm_vcpu 
>> *vcpu, struct kvm_run *run)
>>  	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
>>  	if (ret == 0)
>>  		ret = 1;
>> +out:
>> +	if (ret == -ENOEXEC) {
>> +		kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu));
>> +		ret = 1;
>> +	}
>>  out_unlock:
>>  	srcu_read_unlock(&vcpu->kvm->srcu, idx);
>>  	return ret;
>> --
>> 2.20.1
>>
>
> I can't seem to decide for myself if I think there's a sematic
> difference between trying to execute from somewhere the VMM has
> explicitly told us is device memory and from somewhere which we 
> happen
> to have mapped with VM_PFNMAP from user space.  But I also can't seem 
> to
> really fault it (pun intended).  Thoughts?

The issue is that the VMM never really tells us whether something is a
device mapping or not (the only exception being the GICv2 cpuif). Even
with PFNMAP, we guess it (it could well be memory that lives outside
of the linear mapping). I don't see a way to lift this ambiguity.

Ideally, faulting on executing a non-mapping should be offloaded to
userspace for emulation, in line with your patches that offload
non-emulated data accesses. That'd be a new ABI, and I can't imagine
anyone willing to deal with it.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings
  2019-12-13  9:28     ` Marc Zyngier
@ 2019-12-13 11:14       ` Christoffer Dall
  2019-12-16 10:31         ` Marc Zyngier
  0 siblings, 1 reply; 19+ messages in thread
From: Christoffer Dall @ 2019-12-13 11:14 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, stable, kvmarm, linux-arm-kernel

On Fri, Dec 13, 2019 at 09:28:59AM +0000, Marc Zyngier wrote:
> Hi Christoffer,
> 
> On 2019-12-13 08:29, Christoffer Dall wrote:
> > Hi Marc,
> > 
> > On Wed, Dec 11, 2019 at 04:56:48PM +0000, Marc Zyngier wrote:
> > > A device mapping is normally always mapped at Stage-2, since there
> > > is very little gain in having it faulted in.
> > 
> > It is actually becoming less clear to me what the real benefits of
> > pre-populating the stage 2 page table are, especially given that we can
> > provoke a situation where they're faulted in anyhow.  Do you recall if
> > we had any specific case that motivated us to pre-fault in the pages?
> 
> It's only a minor performance optimization that was introduced by Ard in
> 8eef91239e57d. Which makes sense for platform devices that have a single
> fixed location in memory. It makes slightly less sense for PCI, where
> you can move things around.

User space could still decide to move things around in its VA map even
if the device is fixed.

Anyway, I was thinking more if there was some sort of device, like a
frambuffer, which for example crosses page boundaries and where it would
be visible to the user that there's a sudden performance drop while
operating the device over page boundaries.  Anything like that?

> 
> > > Nonetheless, it is possible to end-up in a situation where the
> > > device
> > > mapping has been removed from Stage-2 (userspace munmaped the VFIO
> > > region, and the MMU notifier did its job), but present in a
> > > userspace
> > > mapping (userpace has mapped it back at the same address). In such
> > > a situation, the device mapping will be demand-paged as the guest
> > > performs memory accesses.
> > > 
> > > This requires to be careful when dealing with mapping size, cache
> > > management, and to handle potential execution of a device mapping.
> > > 
> > > Cc: stable@vger.kernel.org
> > > Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  virt/kvm/arm/mmu.c | 21 +++++++++++++++++----
> > >  1 file changed, 17 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> > > index a48994af70b8..0b32a904a1bb 100644
> > > --- a/virt/kvm/arm/mmu.c
> > > +++ b/virt/kvm/arm/mmu.c
> > > @@ -38,6 +38,11 @@ static unsigned long io_map_base;
> > >  #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
> > >  #define KVM_S2_FLAG_LOGGING_ACTIVE	(1UL << 1)
> > > 
> > > +static bool is_iomap(unsigned long flags)
> > > +{
> > > +	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
> > > +}
> > > +
> > 
> > nit: I'm not really sure this indirection makes the code more readable,
> > but I guess that's a matter of taste.
> > 
> > >  static bool memslot_is_logging(struct kvm_memory_slot *memslot)
> > >  {
> > >  	return memslot->dirty_bitmap && !(memslot->flags &
> > > KVM_MEM_READONLY);
> > > @@ -1698,6 +1703,7 @@ static int user_mem_abort(struct kvm_vcpu
> > > *vcpu, phys_addr_t fault_ipa,
> > > 
> > >  	vma_pagesize = vma_kernel_pagesize(vma);
> > >  	if (logging_active ||
> > > +	    (vma->vm_flags & VM_PFNMAP) ||
> > 
> > WHat is actually the rationale for this?
> > 
> > Why is a huge mapping not permitted to device memory?
> > 
> > Are we guaranteed that VM_PFNMAP on the vma results in device mappings?
> > I'm not convinced this is the case, and it would be better if we can
> > stick to a single primitive (either kvm_is_device_pfn, or VM_PFNMAP) to
> > detect device mappings.
> 
> For now, I've tried to keep the two paths that deal with mapping devices
> (or rather, things that we interpret as devices) as close as possible.
> If we drop the "eager" mapping, then we're at liberty to restructure
> this in creative ways.
> 
> This includes potential huge mappings, but I'm not sure the rest of the
> kernel uses them for devices anyway (I need to find out).
> 
> > As a subsequent patch, I'd like to make sure that at the very least our
> > memslot prepare function follows the exact same logic for mapping device
> > memory as a fault-in approach does, or that we simply always fault pages
> > in.
> 
> As far as I can see, the two approach are now identical. Am I missing
> something?
> And yes, getting rid of the eager mapping works for me.
> 

As far as I can tell, our user_mem_abort() uses gfn_to_pfn_prot() which
goes doesn a long trail which ends up at hva_to_pfn_remapped(), which
might result in doing the same offset calculation that we do in
kvm_arch_prepare_memory_region(), but it also considers other scenarios.

Even if we analyze all that and convince oursleves it's always all the
same on arm64, the two code paths could change, leading to really hard
to debug differing behavior, and nobody will actively keep the two paths
in sync.  I'd be fine with keeping the performance optimization if we
have good grounds for that though, and using the same translation
mechanism for VM_PFNMAP as user_mem_abort.

Am I missing something?

> > 
> > >  	    !fault_supports_stage2_huge_mapping(memslot, hva,
> > > vma_pagesize)) {
> > >  		force_pte = true;
> > >  		vma_pagesize = PAGE_SIZE;
> > > @@ -1760,6 +1766,9 @@ static int user_mem_abort(struct kvm_vcpu
> > > *vcpu, phys_addr_t fault_ipa,
> > >  			writable = false;
> > >  	}
> > > 
> > > +	if (exec_fault && is_iomap(flags))
> > > +		return -ENOEXEC;
> > > +
> > 
> > nit: why don't you just do this when checking kvm_is_device_pfn() and
> > avoid having logic in two places to deal with this case?
> 
> Good point. I've already sent the PR, but that could be a further cleanup.
> 

Sure, I can have a look when we agree on the above.

> > 
> > >  	spin_lock(&kvm->mmu_lock);
> > >  	if (mmu_notifier_retry(kvm, mmu_seq))
> > >  		goto out_unlock;
> > > @@ -1781,7 +1790,7 @@ static int user_mem_abort(struct kvm_vcpu
> > > *vcpu, phys_addr_t fault_ipa,
> > >  	if (writable)
> > >  		kvm_set_pfn_dirty(pfn);
> > > 
> > > -	if (fault_status != FSC_PERM)
> > > +	if (fault_status != FSC_PERM && !is_iomap(flags))
> > >  		clean_dcache_guest_page(pfn, vma_pagesize);
> > > 
> > >  	if (exec_fault)
> > > @@ -1948,9 +1957,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu
> > > *vcpu, struct kvm_run *run)
> > >  	if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
> > >  		if (is_iabt) {
> > >  			/* Prefetch Abort on I/O address */
> > > -			kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> > > -			ret = 1;
> > > -			goto out_unlock;
> > > +			ret = -ENOEXEC;
> > > +			goto out;
> > >  		}
> > > 
> > >  		/*
> > > @@ -1992,6 +2000,11 @@ int kvm_handle_guest_abort(struct kvm_vcpu
> > > *vcpu, struct kvm_run *run)
> > >  	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
> > >  	if (ret == 0)
> > >  		ret = 1;
> > > +out:
> > > +	if (ret == -ENOEXEC) {
> > > +		kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> > > +		ret = 1;
> > > +	}
> > >  out_unlock:
> > >  	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> > >  	return ret;
> > > --
> > > 2.20.1
> > > 
> > 
> > I can't seem to decide for myself if I think there's a sematic
> > difference between trying to execute from somewhere the VMM has
> > explicitly told us is device memory and from somewhere which we happen
> > to have mapped with VM_PFNMAP from user space.  But I also can't seem to
> > really fault it (pun intended).  Thoughts?
> 
> The issue is that the VMM never really tells us whether something is a
> device mapping or not (the only exception being the GICv2 cpuif). Even
> with PFNMAP, we guess it (it could well be memory that lives outside
> of the linear mapping). I don't see a way to lift this ambiguity.
> 
> Ideally, faulting on executing a non-mapping should be offloaded to
> userspace for emulation, in line with your patches that offload
> non-emulated data accesses. That'd be a new ABI, and I can't imagine
> anyone willing to deal with it.

So what I was asking was if it makes sense to report the Prefetch Abort
in the case where the VMM has already told us that it doesn't want to
register anything backing the IPA (no memslot), and instead return an
error to user space, so that it can make a decision (for example inject
an external abort, which may have been the right thing to do in the
former case as well, but that could be considered ABI now, so let's not
kick that hornet's nest).

In any case, no strong feelings here, I just have a vague feeling that
injecting more prefetch aborts on execute-from-some-device is not
necessarily the right thing to do.


Thanks,

    Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings
  2019-12-13 11:14       ` Christoffer Dall
@ 2019-12-16 10:31         ` Marc Zyngier
  2019-12-18 15:13           ` Christoffer Dall
  0 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2019-12-16 10:31 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, stable, kvmarm, linux-arm-kernel

On 2019-12-13 11:14, Christoffer Dall wrote:
> On Fri, Dec 13, 2019 at 09:28:59AM +0000, Marc Zyngier wrote:
>> Hi Christoffer,
>>
>> On 2019-12-13 08:29, Christoffer Dall wrote:
>> > Hi Marc,
>> >
>> > On Wed, Dec 11, 2019 at 04:56:48PM +0000, Marc Zyngier wrote:
>> > > A device mapping is normally always mapped at Stage-2, since 
>> there
>> > > is very little gain in having it faulted in.
>> >
>> > It is actually becoming less clear to me what the real benefits of
>> > pre-populating the stage 2 page table are, especially given that 
>> we can
>> > provoke a situation where they're faulted in anyhow.  Do you 
>> recall if
>> > we had any specific case that motivated us to pre-fault in the 
>> pages?
>>
>> It's only a minor performance optimization that was introduced by 
>> Ard in
>> 8eef91239e57d. Which makes sense for platform devices that have a 
>> single
>> fixed location in memory. It makes slightly less sense for PCI, 
>> where
>> you can move things around.
>
> User space could still decide to move things around in its VA map 
> even
> if the device is fixed.
>
> Anyway, I was thinking more if there was some sort of device, like a
> frambuffer, which for example crosses page boundaries and where it 
> would
> be visible to the user that there's a sudden performance drop while
> operating the device over page boundaries.  Anything like that?
>
>>
>> > > Nonetheless, it is possible to end-up in a situation where the
>> > > device
>> > > mapping has been removed from Stage-2 (userspace munmaped the 
>> VFIO
>> > > region, and the MMU notifier did its job), but present in a
>> > > userspace
>> > > mapping (userpace has mapped it back at the same address). In 
>> such
>> > > a situation, the device mapping will be demand-paged as the 
>> guest
>> > > performs memory accesses.
>> > >
>> > > This requires to be careful when dealing with mapping size, 
>> cache
>> > > management, and to handle potential execution of a device 
>> mapping.
>> > >
>> > > Cc: stable@vger.kernel.org
>> > > Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
>> > > ---
>> > >  virt/kvm/arm/mmu.c | 21 +++++++++++++++++----
>> > >  1 file changed, 17 insertions(+), 4 deletions(-)
>> > >
>> > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> > > index a48994af70b8..0b32a904a1bb 100644
>> > > --- a/virt/kvm/arm/mmu.c
>> > > +++ b/virt/kvm/arm/mmu.c
>> > > @@ -38,6 +38,11 @@ static unsigned long io_map_base;
>> > >  #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
>> > >  #define KVM_S2_FLAG_LOGGING_ACTIVE	(1UL << 1)
>> > >
>> > > +static bool is_iomap(unsigned long flags)
>> > > +{
>> > > +	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
>> > > +}
>> > > +
>> >
>> > nit: I'm not really sure this indirection makes the code more 
>> readable,
>> > but I guess that's a matter of taste.
>> >
>> > >  static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>> > >  {
>> > >  	return memslot->dirty_bitmap && !(memslot->flags &
>> > > KVM_MEM_READONLY);
>> > > @@ -1698,6 +1703,7 @@ static int user_mem_abort(struct kvm_vcpu
>> > > *vcpu, phys_addr_t fault_ipa,
>> > >
>> > >  	vma_pagesize = vma_kernel_pagesize(vma);
>> > >  	if (logging_active ||
>> > > +	    (vma->vm_flags & VM_PFNMAP) ||
>> >
>> > WHat is actually the rationale for this?
>> >
>> > Why is a huge mapping not permitted to device memory?
>> >
>> > Are we guaranteed that VM_PFNMAP on the vma results in device 
>> mappings?
>> > I'm not convinced this is the case, and it would be better if we 
>> can
>> > stick to a single primitive (either kvm_is_device_pfn, or 
>> VM_PFNMAP) to
>> > detect device mappings.
>>
>> For now, I've tried to keep the two paths that deal with mapping 
>> devices
>> (or rather, things that we interpret as devices) as close as 
>> possible.
>> If we drop the "eager" mapping, then we're at liberty to restructure
>> this in creative ways.
>>
>> This includes potential huge mappings, but I'm not sure the rest of 
>> the
>> kernel uses them for devices anyway (I need to find out).
>>
>> > As a subsequent patch, I'd like to make sure that at the very 
>> least our
>> > memslot prepare function follows the exact same logic for mapping 
>> device
>> > memory as a fault-in approach does, or that we simply always fault 
>> pages
>> > in.
>>
>> As far as I can see, the two approach are now identical. Am I 
>> missing
>> something?
>> And yes, getting rid of the eager mapping works for me.
>>
>
> As far as I can tell, our user_mem_abort() uses gfn_to_pfn_prot() 
> which
> goes doesn a long trail which ends up at hva_to_pfn_remapped(), which
> might result in doing the same offset calculation that we do in
> kvm_arch_prepare_memory_region(), but it also considers other 
> scenarios.
>
> Even if we analyze all that and convince oursleves it's always all 
> the
> same on arm64, the two code paths could change, leading to really 
> hard
> to debug differing behavior, and nobody will actively keep the two 
> paths
> in sync.  I'd be fine with keeping the performance optimization if we
> have good grounds for that though, and using the same translation
> mechanism for VM_PFNMAP as user_mem_abort.
>
> Am I missing something?

I'm not disputing any of the above. I'm only trying to keep this patch
minimal so that we can easily backport it (although it is arguable that
deleting code isn't that big a deal).

[...]

>> > I can't seem to decide for myself if I think there's a sematic
>> > difference between trying to execute from somewhere the VMM has
>> > explicitly told us is device memory and from somewhere which we 
>> happen
>> > to have mapped with VM_PFNMAP from user space.  But I also can't 
>> seem to
>> > really fault it (pun intended).  Thoughts?
>>
>> The issue is that the VMM never really tells us whether something is 
>> a
>> device mapping or not (the only exception being the GICv2 cpuif). 
>> Even
>> with PFNMAP, we guess it (it could well be memory that lives outside
>> of the linear mapping). I don't see a way to lift this ambiguity.
>>
>> Ideally, faulting on executing a non-mapping should be offloaded to
>> userspace for emulation, in line with your patches that offload
>> non-emulated data accesses. That'd be a new ABI, and I can't imagine
>> anyone willing to deal with it.
>
> So what I was asking was if it makes sense to report the Prefetch 
> Abort
> in the case where the VMM has already told us that it doesn't want to
> register anything backing the IPA (no memslot), and instead return an
> error to user space, so that it can make a decision (for example 
> inject
> an external abort, which may have been the right thing to do in the
> former case as well, but that could be considered ABI now, so let's 
> not
> kick that hornet's nest).
>
> In any case, no strong feelings here, I just have a vague feeling 
> that
> injecting more prefetch aborts on execute-from-some-device is not
> necessarily the right thing to do.

The ARMv8 ARM has the following stuff in B2.7.2 (Device Memory):

<quote>
Hardware does not prevent speculative instruction fetches from a memory 
location with any of the Device
memory attributes unless the memory location is also marked as 
Execute-never for all Exception levels.

Note

This means that to prevent speculative instruction fetches from memory 
locations with Device memory
attributes, any location that is assigned any Device memory type must 
also be marked as Execute-never for
all Exception levels. Failure to mark a memory location with any Device 
memory attribute as Execute-never
for all Exception levels is a programming error.
</quote>

and

<quote>
For instruction fetches, if branches cause the program counter to point 
to an area of memory with the Device
attribute which is not marked as Execute-never for the current 
Exception level, an implementation can either:

- Treat the instruction fetch as if it were to a memory location with 
the Normal Non-cacheable attribute.

- Take a Permission fault.
</quote>

My reading here is that a prefetch abort is the right thing to do.
What we don't do correctly is that we qualify it as an external abort
instead of a permission fault (which is annoying as it requires us
to find out about the S1 translation level).

Happy to revisit this once we get a S1 PTW.

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] KVM: arm/arm64: Re-check VMA on detecting a poisoned page
  2019-12-13  9:22   ` Christoffer Dall
@ 2019-12-16 18:29     ` James Morse
  0 siblings, 0 replies; 19+ messages in thread
From: James Morse @ 2019-12-16 18:29 UTC (permalink / raw)
  To: Christoffer Dall, Marc Zyngier; +Cc: kvm, kvmarm, linux-arm-kernel

Hi Christoffer,

On 13/12/2019 09:22, Christoffer Dall wrote:
> On Wed, Dec 11, 2019 at 04:56:49PM +0000, Marc Zyngier wrote:
>> When we check for a poisoned page, we use the VMA to tell userspace
>> about the looming disaster. But we pass a pointer to this VMA
>> after having released the mmap_sem, which isn't a good idea.
>>
>> Instead, re-check that we have still have a VMA, and that this
>> VMA still points to a poisoned page. If the VMA isn't there,
>> userspace is playing with our nerves, so lety's give it a -EFAULT
>> (it deserves it). If the PFN isn't poisoned anymore, let's restart
>> from the top and handle the fault again.

> If I read this code correctly, then all we use the VMA for is to find
> the page size used by the MMU to back the VMA, which we've already
> established in the vma_pagesize (and possibly adjusted to something more
> accurate based on our constraints in stage 2 which generated the error),
> so all we need is the size and a way to convert that into a shift.
> 
> Not being 100% confident about the semantics of the lsb bit we pass to
> user space (is it indicating the size of the mapping which caused the
> error or the size of the mapping where user space could potentially

Its the size of the hole that has opened up in the address map. The error was very likely
to be much smaller, but all we can do is unmap pages.
Transparent huge-pages are split up to minimise the impact. This code is for hugetlbfs
(?), whose pages are dedicated for that use, so don't get split.

arch/arm64/mm/fault.c::do_page_fault() has:
|	lsb = PAGE_SHIFT;
|	if (fault & VM_FAULT_HWPOISON_LARGE)
|		lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
|
|	arm64_force_sig_mceerr(BUS_MCEERR_AR, (void __user *)addr, lsb,

(I assume its a shift because bytes in the signal union are precious)


> trigger an error?), or wheter we care enough at that level, could we
> consider something like the following instead?

> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 38b4c910b6c3..2509d9dec42d 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1592,15 +1592,9 @@ static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
>  }
>  
>  static void kvm_send_hwpoison_signal(unsigned long address,
> -				     struct vm_area_struct *vma)
> +				     unsigned long vma_pagesize)
>  {
> -	short lsb;
> -
> -	if (is_vm_hugetlb_page(vma))
> -		lsb = huge_page_shift(hstate_vma(vma));
> -	else
> -		lsb = PAGE_SHIFT;
> -
> +	short lsb = __ffs(vma_pagesize);
>  	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
>  }
>  
> @@ -1735,7 +1729,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  
>  	pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
>  	if (pfn == KVM_PFN_ERR_HWPOISON) {
> -		kvm_send_hwpoison_signal(hva, vma);
> +		kvm_send_hwpoison_signal(hva, vma_pagesize);
>  		return 0;
>  	}
>  	if (is_error_noslot_pfn(pfn))

This changes the meaning, vma_pagesize is a value like 4K, not a shift like 12.

But yes, caching the shift value under the mmap_sem and passing it in is the
right-thing-to-do(tm). I have a patch which I'll post, once I remember how to test this thing!



Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings
  2019-12-16 10:31         ` Marc Zyngier
@ 2019-12-18 15:13           ` Christoffer Dall
  0 siblings, 0 replies; 19+ messages in thread
From: Christoffer Dall @ 2019-12-18 15:13 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, stable, kvmarm, linux-arm-kernel

On Mon, Dec 16, 2019 at 10:31:19AM +0000, Marc Zyngier wrote:
> On 2019-12-13 11:14, Christoffer Dall wrote:
> > On Fri, Dec 13, 2019 at 09:28:59AM +0000, Marc Zyngier wrote:
> > > Hi Christoffer,
> > > 
> > > On 2019-12-13 08:29, Christoffer Dall wrote:
> > > > Hi Marc,
> > > >
> > > > On Wed, Dec 11, 2019 at 04:56:48PM +0000, Marc Zyngier wrote:
> > > > > A device mapping is normally always mapped at Stage-2, since
> > > there
> > > > > is very little gain in having it faulted in.
> > > >
> > > > It is actually becoming less clear to me what the real benefits of
> > > > pre-populating the stage 2 page table are, especially given that
> > > we can
> > > > provoke a situation where they're faulted in anyhow.  Do you
> > > recall if
> > > > we had any specific case that motivated us to pre-fault in the
> > > pages?
> > > 
> > > It's only a minor performance optimization that was introduced by
> > > Ard in
> > > 8eef91239e57d. Which makes sense for platform devices that have a
> > > single
> > > fixed location in memory. It makes slightly less sense for PCI,
> > > where
> > > you can move things around.
> > 
> > User space could still decide to move things around in its VA map even
> > if the device is fixed.
> > 
> > Anyway, I was thinking more if there was some sort of device, like a
> > frambuffer, which for example crosses page boundaries and where it would
> > be visible to the user that there's a sudden performance drop while
> > operating the device over page boundaries.  Anything like that?
> > 
> > > 
> > > > > Nonetheless, it is possible to end-up in a situation where the
> > > > > device
> > > > > mapping has been removed from Stage-2 (userspace munmaped the
> > > VFIO
> > > > > region, and the MMU notifier did its job), but present in a
> > > > > userspace
> > > > > mapping (userpace has mapped it back at the same address). In
> > > such
> > > > > a situation, the device mapping will be demand-paged as the
> > > guest
> > > > > performs memory accesses.
> > > > >
> > > > > This requires to be careful when dealing with mapping size,
> > > cache
> > > > > management, and to handle potential execution of a device
> > > mapping.
> > > > >
> > > > > Cc: stable@vger.kernel.org
> > > > > Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > > > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > > > ---
> > > > >  virt/kvm/arm/mmu.c | 21 +++++++++++++++++----
> > > > >  1 file changed, 17 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> > > > > index a48994af70b8..0b32a904a1bb 100644
> > > > > --- a/virt/kvm/arm/mmu.c
> > > > > +++ b/virt/kvm/arm/mmu.c
> > > > > @@ -38,6 +38,11 @@ static unsigned long io_map_base;
> > > > >  #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
> > > > >  #define KVM_S2_FLAG_LOGGING_ACTIVE	(1UL << 1)
> > > > >
> > > > > +static bool is_iomap(unsigned long flags)
> > > > > +{
> > > > > +	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
> > > > > +}
> > > > > +
> > > >
> > > > nit: I'm not really sure this indirection makes the code more
> > > readable,
> > > > but I guess that's a matter of taste.
> > > >
> > > > >  static bool memslot_is_logging(struct kvm_memory_slot *memslot)
> > > > >  {
> > > > >  	return memslot->dirty_bitmap && !(memslot->flags &
> > > > > KVM_MEM_READONLY);
> > > > > @@ -1698,6 +1703,7 @@ static int user_mem_abort(struct kvm_vcpu
> > > > > *vcpu, phys_addr_t fault_ipa,
> > > > >
> > > > >  	vma_pagesize = vma_kernel_pagesize(vma);
> > > > >  	if (logging_active ||
> > > > > +	    (vma->vm_flags & VM_PFNMAP) ||
> > > >
> > > > WHat is actually the rationale for this?
> > > >
> > > > Why is a huge mapping not permitted to device memory?
> > > >
> > > > Are we guaranteed that VM_PFNMAP on the vma results in device
> > > mappings?
> > > > I'm not convinced this is the case, and it would be better if we
> > > can
> > > > stick to a single primitive (either kvm_is_device_pfn, or
> > > VM_PFNMAP) to
> > > > detect device mappings.
> > > 
> > > For now, I've tried to keep the two paths that deal with mapping
> > > devices
> > > (or rather, things that we interpret as devices) as close as
> > > possible.
> > > If we drop the "eager" mapping, then we're at liberty to restructure
> > > this in creative ways.
> > > 
> > > This includes potential huge mappings, but I'm not sure the rest of
> > > the
> > > kernel uses them for devices anyway (I need to find out).
> > > 
> > > > As a subsequent patch, I'd like to make sure that at the very
> > > least our
> > > > memslot prepare function follows the exact same logic for mapping
> > > device
> > > > memory as a fault-in approach does, or that we simply always fault
> > > pages
> > > > in.
> > > 
> > > As far as I can see, the two approach are now identical. Am I
> > > missing
> > > something?
> > > And yes, getting rid of the eager mapping works for me.
> > > 
> > 
> > As far as I can tell, our user_mem_abort() uses gfn_to_pfn_prot() which
> > goes doesn a long trail which ends up at hva_to_pfn_remapped(), which
> > might result in doing the same offset calculation that we do in
> > kvm_arch_prepare_memory_region(), but it also considers other scenarios.
> > 
> > Even if we analyze all that and convince oursleves it's always all the
> > same on arm64, the two code paths could change, leading to really hard
> > to debug differing behavior, and nobody will actively keep the two paths
> > in sync.  I'd be fine with keeping the performance optimization if we
> > have good grounds for that though, and using the same translation
> > mechanism for VM_PFNMAP as user_mem_abort.
> > 
> > Am I missing something?
> 
> I'm not disputing any of the above. I'm only trying to keep this patch
> minimal so that we can easily backport it (although it is arguable that
> deleting code isn't that big a deal).

Yes, sorry, I wasn't arguing we should change the patch, only what the
direction for the future should be.  Sorry for being unclear.

> 
> [...]
> 
> > > > I can't seem to decide for myself if I think there's a sematic
> > > > difference between trying to execute from somewhere the VMM has
> > > > explicitly told us is device memory and from somewhere which we
> > > happen
> > > > to have mapped with VM_PFNMAP from user space.  But I also can't
> > > seem to
> > > > really fault it (pun intended).  Thoughts?
> > > 
> > > The issue is that the VMM never really tells us whether something is
> > > a
> > > device mapping or not (the only exception being the GICv2 cpuif).
> > > Even
> > > with PFNMAP, we guess it (it could well be memory that lives outside
> > > of the linear mapping). I don't see a way to lift this ambiguity.
> > > 
> > > Ideally, faulting on executing a non-mapping should be offloaded to
> > > userspace for emulation, in line with your patches that offload
> > > non-emulated data accesses. That'd be a new ABI, and I can't imagine
> > > anyone willing to deal with it.
> > 
> > So what I was asking was if it makes sense to report the Prefetch Abort
> > in the case where the VMM has already told us that it doesn't want to
> > register anything backing the IPA (no memslot), and instead return an
> > error to user space, so that it can make a decision (for example inject
> > an external abort, which may have been the right thing to do in the
> > former case as well, but that could be considered ABI now, so let's not
> > kick that hornet's nest).
> > 
> > In any case, no strong feelings here, I just have a vague feeling that
> > injecting more prefetch aborts on execute-from-some-device is not
> > necessarily the right thing to do.
> 
> The ARMv8 ARM has the following stuff in B2.7.2 (Device Memory):
> 
> <quote>
> Hardware does not prevent speculative instruction fetches from a memory
> location with any of the Device
> memory attributes unless the memory location is also marked as Execute-never
> for all Exception levels.
> 
> Note
> 
> This means that to prevent speculative instruction fetches from memory
> locations with Device memory
> attributes, any location that is assigned any Device memory type must also
> be marked as Execute-never for
> all Exception levels. Failure to mark a memory location with any Device
> memory attribute as Execute-never
> for all Exception levels is a programming error.
> </quote>
> 
> and
> 
> <quote>
> For instruction fetches, if branches cause the program counter to point to
> an area of memory with the Device
> attribute which is not marked as Execute-never for the current Exception
> level, an implementation can either:
> 
> - Treat the instruction fetch as if it were to a memory location with the
> Normal Non-cacheable attribute.
> 
> - Take a Permission fault.
> </quote>
> 
> My reading here is that a prefetch abort is the right thing to do.
> What we don't do correctly is that we qualify it as an external abort
> instead of a permission fault (which is annoying as it requires us
> to find out about the S1 translation level).

I did not remember we have the permission fault option as a valid
implementation option.  In that case, never mind my ramblings.


Thanks,

    Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-12-18 15:13 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-11 16:56 [PATCH 0/3] KVM: arm/arm64: user_mem_abort() assorted fixes Marc Zyngier
2019-12-11 16:56 ` [PATCH 1/3] KVM: arm/arm64: Properly handle faulting of device mappings Marc Zyngier
2019-12-11 17:53   ` Alexandru Elisei
2019-12-11 18:01     ` Marc Zyngier
2019-12-12 15:35   ` James Morse
2019-12-13  8:29   ` Christoffer Dall
2019-12-13  9:28     ` Marc Zyngier
2019-12-13 11:14       ` Christoffer Dall
2019-12-16 10:31         ` Marc Zyngier
2019-12-18 15:13           ` Christoffer Dall
2019-12-11 16:56 ` [PATCH 2/3] KVM: arm/arm64: Re-check VMA on detecting a poisoned page Marc Zyngier
2019-12-12 11:33   ` Marc Zyngier
2019-12-12 15:34     ` James Morse
2019-12-12 15:40       ` Marc Zyngier
2019-12-13  9:25       ` Christoffer Dall
2019-12-13  9:22   ` Christoffer Dall
2019-12-16 18:29     ` James Morse
2019-12-11 16:56 ` [PATCH 3/3] KVM: arm/arm64: Drop spurious message when faulting on a non-existent mapping Marc Zyngier
2019-12-13  9:26   ` Christoffer Dall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).