kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] KVM: arm64: Plug a couple of MM races
@ 2023-03-16 17:45 Marc Zyngier
  2023-03-16 17:45 ` [PATCH v2 1/2] KVM: arm64: Disable interrupts while walking userspace PTs Marc Zyngier
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Marc Zyngier @ 2023-03-16 17:45 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ard Biesheuvel, Will Deacon, Quentin Perret, Sean Christopherson,
	David Matlack

Ard recently reported a really odd warning generated with KASAN, where
the page table walker we use to inspect the userspace page tables was
going into the weeds and accessing something that was looking totally
unrelated (and previously freed).

Will and I spent quite some time looking into it, and while we were
not able to reproduce the issue, we were able to spot at least a
couple of issues that could partially explain the issue.

The first course of action is to disable interrupts while walking the
userspace PTs. This prevents exit_mmap() from tearing down these PTs
by blocking the IPI. We also fail gracefully if the IPI won the race
and killed the page tables before we started the walk.

The second issue is to not use a VMA pointer that was obtained with
the mmap_read_lock held after that lock has been released. There is no
guarantee that it is still valid.

I've earmarked both for stable, though I expect backporting this to
older revisions of the kernel could be... interesting.

* From v1[1]:

  - Return -EAGAIN from get_user_mapping_size() when the mapping is
    gone instead of -EFAULT which would be fatal (which is still
    returned in cases that are not expected to be seen). Other error
    codes can also be returned from kvm_pgtable_get_leaf(), but always
    in conditions that are rather bad.

  - Rebased on top of kvmarm/fixes which already contains David's own
    MMU fix.

[1] https://lore.kernel.org/r/20230313091425.1962708-1-maz@kernel.org

Marc Zyngier (2):
  KVM: arm64: Disable interrupts while walking userspace PTs
  KVM: arm64: Check for kvm_vma_mte_allowed in the critical section

 arch/arm64/kvm/mmu.c | 53 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 44 insertions(+), 9 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] KVM: arm64: Disable interrupts while walking userspace PTs
  2023-03-16 17:45 [PATCH v2 0/2] KVM: arm64: Plug a couple of MM races Marc Zyngier
@ 2023-03-16 17:45 ` Marc Zyngier
  2023-03-16 23:42   ` Oliver Upton
  2023-03-16 17:45 ` [PATCH v2 2/2] KVM: arm64: Check for kvm_vma_mte_allowed in the critical section Marc Zyngier
  2023-03-17  1:20 ` [PATCH v2 0/2] KVM: arm64: Plug a couple of MM races Oliver Upton
  2 siblings, 1 reply; 6+ messages in thread
From: Marc Zyngier @ 2023-03-16 17:45 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ard Biesheuvel, Will Deacon, Quentin Perret, Sean Christopherson,
	David Matlack, stable

We walk the userspace PTs to discover what mapping size was
used there. However, this can race against the userspace tables
being freed, and we end-up in the weeds.

Thankfully, the mm code is being generous and will IPI us when
doing so. So let's implement our part of the bargain and disable
interrupts around the walk. This ensures that nothing terrible
happens during that time.

We still need to handle the removal of the page tables before
the walk. For that, allow get_user_mapping_size() to return an
error, and make sure this error can be propagated all the way
to the the exit handler.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
---
 arch/arm64/kvm/mmu.c | 45 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 38 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f54408355d1d..d3d4cdc0f617 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -666,14 +666,33 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 				   CONFIG_PGTABLE_LEVELS),
 		.mm_ops		= &kvm_user_mm_ops,
 	};
+	unsigned long flags;
 	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
 	u32 level = ~0;
 	int ret;
 
+	/*
+	 * Disable IRQs so that we hazard against a concurrent
+	 * teardown of the userspace page tables (which relies on
+	 * IPI-ing threads).
+	 */
+	local_irq_save(flags);
 	ret = kvm_pgtable_get_leaf(&pgt, addr, &pte, &level);
-	VM_BUG_ON(ret);
-	VM_BUG_ON(level >= KVM_PGTABLE_MAX_LEVELS);
-	VM_BUG_ON(!(pte & PTE_VALID));
+	local_irq_restore(flags);
+
+	if (ret)
+		return ret;
+
+	/*
+	 * Not seeing an error, but not updating level? Something went
+	 * deeply wrong...
+	 */
+	if (WARN_ON(level >= KVM_PGTABLE_MAX_LEVELS))
+		return -EFAULT;
+
+	/* Oops, the userspace PTs are gone... Replay the fault */
+	if (!(pte & PTE_VALID))
+		return -EAGAIN;
 
 	return BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(level));
 }
@@ -1079,7 +1098,7 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
  *
  * Returns the size of the mapping.
  */
-static unsigned long
+static long
 transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			    unsigned long hva, kvm_pfn_t *pfnp,
 			    phys_addr_t *ipap)
@@ -1091,8 +1110,15 @@ transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	 * sure that the HVA and IPA are sufficiently aligned and that the
 	 * block map is contained within the memslot.
 	 */
-	if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE) &&
-	    get_user_mapping_size(kvm, hva) >= PMD_SIZE) {
+	if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
+		int sz = get_user_mapping_size(kvm, hva);
+
+		if (sz < 0)
+			return sz;
+
+		if (sz < PMD_SIZE)
+			return PAGE_SIZE;
+
 		/*
 		 * The address we faulted on is backed by a transparent huge
 		 * page.  However, because we map the compound huge page and
@@ -1203,7 +1229,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
-	unsigned long vma_pagesize, fault_granule;
+	long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
 
@@ -1344,6 +1370,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
 								   hva, &pfn,
 								   &fault_ipa);
+
+		if (vma_pagesize < 0) {
+			ret = vma_pagesize;
+			goto out_unlock;
+		}
 	}
 
 	if (fault_status != ESR_ELx_FSC_PERM && !device && kvm_has_mte(kvm)) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] KVM: arm64: Check for kvm_vma_mte_allowed in the critical section
  2023-03-16 17:45 [PATCH v2 0/2] KVM: arm64: Plug a couple of MM races Marc Zyngier
  2023-03-16 17:45 ` [PATCH v2 1/2] KVM: arm64: Disable interrupts while walking userspace PTs Marc Zyngier
@ 2023-03-16 17:45 ` Marc Zyngier
  2023-03-17  1:20 ` [PATCH v2 0/2] KVM: arm64: Plug a couple of MM races Oliver Upton
  2 siblings, 0 replies; 6+ messages in thread
From: Marc Zyngier @ 2023-03-16 17:45 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ard Biesheuvel, Will Deacon, Quentin Perret, Sean Christopherson,
	David Matlack, stable

On page fault, we find about the VMA that backs the page fault
early on, and quickly release the mmap_read_lock. However, using
the VMA pointer after the critical section is pretty dangerous,
as a teardown may happen in the meantime and the VMA be long gone.

Move the sampling of the MTE permission early, and NULL-ify the
VMA pointer after that, just to be on the safe side.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
---
 arch/arm64/kvm/mmu.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d3d4cdc0f617..e95593736ae3 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1218,7 +1218,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 {
 	int ret = 0;
 	bool write_fault, writable, force_pte = false;
-	bool exec_fault;
+	bool exec_fault, mte_allowed;
 	bool device = false;
 	unsigned long mmu_seq;
 	struct kvm *kvm = vcpu->kvm;
@@ -1309,6 +1309,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		fault_ipa &= ~(vma_pagesize - 1);
 
 	gfn = fault_ipa >> PAGE_SHIFT;
+	mte_allowed = kvm_vma_mte_allowed(vma);
+
+	/* Don't use the VMA after the unlock -- it may have vanished */
+	vma = NULL;
 
 	/*
 	 * Read mmu_invalidate_seq so that KVM can detect if the results of
@@ -1379,7 +1383,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	if (fault_status != ESR_ELx_FSC_PERM && !device && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
-		if (kvm_vma_mte_allowed(vma)) {
+		if (mte_allowed) {
 			sanitise_mte_tags(kvm, pfn, vma_pagesize);
 		} else {
 			ret = -EFAULT;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] KVM: arm64: Disable interrupts while walking userspace PTs
  2023-03-16 17:45 ` [PATCH v2 1/2] KVM: arm64: Disable interrupts while walking userspace PTs Marc Zyngier
@ 2023-03-16 23:42   ` Oliver Upton
  2023-03-17  9:03     ` Marc Zyngier
  0 siblings, 1 reply; 6+ messages in thread
From: Oliver Upton @ 2023-03-16 23:42 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Zenghui Yu, Ard Biesheuvel, Will Deacon, Quentin Perret,
	Sean Christopherson, David Matlack, stable

Marc,

On Thu, Mar 16, 2023 at 05:45:45PM +0000, Marc Zyngier wrote:
> We walk the userspace PTs to discover what mapping size was
> used there. However, this can race against the userspace tables
> being freed, and we end-up in the weeds.
> 
> Thankfully, the mm code is being generous and will IPI us when
> doing so. So let's implement our part of the bargain and disable
> interrupts around the walk. This ensures that nothing terrible
> happens during that time.
> 
> We still need to handle the removal of the page tables before
> the walk. For that, allow get_user_mapping_size() to return an
> error, and make sure this error can be propagated all the way
> to the the exit handler.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Cc: stable@vger.kernel.org

Looks good. I've squashed in this meaningless diff to make use of an existing
helper.


diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index e95593736ae3..3b9d4d24c361 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -691,7 +691,7 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 		return -EFAULT;
 
 	/* Oops, the userspace PTs are gone... Replay the fault */
-	if (!(pte & PTE_VALID))
+	if (!kvm_pte_valid(pte))
 		return -EAGAIN;
 
 	return BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(level));

-- 
Thanks,
Oliver

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/2] KVM: arm64: Plug a couple of MM races
  2023-03-16 17:45 [PATCH v2 0/2] KVM: arm64: Plug a couple of MM races Marc Zyngier
  2023-03-16 17:45 ` [PATCH v2 1/2] KVM: arm64: Disable interrupts while walking userspace PTs Marc Zyngier
  2023-03-16 17:45 ` [PATCH v2 2/2] KVM: arm64: Check for kvm_vma_mte_allowed in the critical section Marc Zyngier
@ 2023-03-17  1:20 ` Oliver Upton
  2 siblings, 0 replies; 6+ messages in thread
From: Oliver Upton @ 2023-03-17  1:20 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvm, kvmarm
  Cc: Oliver Upton, Will Deacon, Quentin Perret, Ard Biesheuvel,
	Zenghui Yu, James Morse, Sean Christopherson, Suzuki K Poulose,
	David Matlack

On Thu, 16 Mar 2023 17:45:44 +0000, Marc Zyngier wrote:
> Ard recently reported a really odd warning generated with KASAN, where
> the page table walker we use to inspect the userspace page tables was
> going into the weeds and accessing something that was looking totally
> unrelated (and previously freed).
> 
> Will and I spent quite some time looking into it, and while we were
> not able to reproduce the issue, we were able to spot at least a
> couple of issues that could partially explain the issue.
> 
> [...]

Applied to kvmarm/fixes, thanks!

[1/2] KVM: arm64: Disable interrupts while walking userspace PTs
      https://git.kernel.org/kvmarm/kvmarm/c/e86fc1a3a3e9
[2/2] KVM: arm64: Check for kvm_vma_mte_allowed in the critical section
      https://git.kernel.org/kvmarm/kvmarm/c/8c2e8ac8ad4b

--
Best,
Oliver

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] KVM: arm64: Disable interrupts while walking userspace PTs
  2023-03-16 23:42   ` Oliver Upton
@ 2023-03-17  9:03     ` Marc Zyngier
  0 siblings, 0 replies; 6+ messages in thread
From: Marc Zyngier @ 2023-03-17  9:03 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Zenghui Yu, Ard Biesheuvel, Will Deacon, Quentin Perret,
	Sean Christopherson, David Matlack, stable

On 2023-03-16 23:42, Oliver Upton wrote:
> Marc,
> 
> On Thu, Mar 16, 2023 at 05:45:45PM +0000, Marc Zyngier wrote:
>> We walk the userspace PTs to discover what mapping size was
>> used there. However, this can race against the userspace tables
>> being freed, and we end-up in the weeds.
>> 
>> Thankfully, the mm code is being generous and will IPI us when
>> doing so. So let's implement our part of the bargain and disable
>> interrupts around the walk. This ensures that nothing terrible
>> happens during that time.
>> 
>> We still need to handle the removal of the page tables before
>> the walk. For that, allow get_user_mapping_size() to return an
>> error, and make sure this error can be propagated all the way
>> to the the exit handler.
>> 
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> Cc: stable@vger.kernel.org
> 
> Looks good. I've squashed in this meaningless diff to make use of an 
> existing
> helper.
> 
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index e95593736ae3..3b9d4d24c361 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -691,7 +691,7 @@ static int get_user_mapping_size(struct kvm *kvm, 
> u64 addr)
>  		return -EFAULT;
> 
>  	/* Oops, the userspace PTs are gone... Replay the fault */
> -	if (!(pte & PTE_VALID))
> +	if (!kvm_pte_valid(pte))
>  		return -EAGAIN;

Sure, LGTM.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-03-17  9:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-16 17:45 [PATCH v2 0/2] KVM: arm64: Plug a couple of MM races Marc Zyngier
2023-03-16 17:45 ` [PATCH v2 1/2] KVM: arm64: Disable interrupts while walking userspace PTs Marc Zyngier
2023-03-16 23:42   ` Oliver Upton
2023-03-17  9:03     ` Marc Zyngier
2023-03-16 17:45 ` [PATCH v2 2/2] KVM: arm64: Check for kvm_vma_mte_allowed in the critical section Marc Zyngier
2023-03-17  1:20 ` [PATCH v2 0/2] KVM: arm64: Plug a couple of MM races Oliver Upton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).