* [PATCH 0/2] KVM: arm64: Plug a couple of MM races @ 2023-03-13 9:14 Marc Zyngier 2023-03-13 9:14 ` [PATCH 1/2] KVM: arm64: Disable interrupts while walking userspace PTs Marc Zyngier 2023-03-13 9:14 ` [PATCH 2/2] KVM: arm64: Check for kvm_vma_mte_allowed in the critical section Marc Zyngier 0 siblings, 2 replies; 8+ messages in thread From: Marc Zyngier @ 2023-03-13 9:14 UTC (permalink / raw) To: kvmarm, linux-arm-kernel, kvm Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Ard Biesheuvel, Will Deacon, Quentin Perret Ard recently reported a really odd warning generated with KASAN, where the page table walker we use to inspect the userspace page tables was going into the weeds and accessing something that was looking totally unrelated (and previously freed). Will and I spent quite some time looking into it, and while we were not able to reproduce the issue, we were able to spot at least a couple of issues that could partially explain the issue. The first course of action is to disable interrupts while walking the userspace PTs. This prevents exit_mmap() from tearing down these PTs by blocking the IPI. We also fail gracefully if the IPI won the race and killed the page tables before we started the walk. The second issue is to not use a VMA pointer that was obtained with the mmap_read_lock held after that lock has been released. There is no guarantee that it is still valid. I've earmarked both for stable, though I expect backporting this to older revisions of the kernel could be... interesting. M. Marc Zyngier (2): KVM: arm64: Disable interrupts while walking userspace PTs KVM: arm64: Check for kvm_vma_mte_allowed in the critical section arch/arm64/kvm/mmu.c | 42 +++++++++++++++++++++++++++++++++--------- 1 file changed, 33 insertions(+), 9 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] KVM: arm64: Disable interrupts while walking userspace PTs 2023-03-13 9:14 [PATCH 0/2] KVM: arm64: Plug a couple of MM races Marc Zyngier @ 2023-03-13 9:14 ` Marc Zyngier 2023-03-13 15:53 ` Sean Christopherson 2023-03-13 9:14 ` [PATCH 2/2] KVM: arm64: Check for kvm_vma_mte_allowed in the critical section Marc Zyngier 1 sibling, 1 reply; 8+ messages in thread From: Marc Zyngier @ 2023-03-13 9:14 UTC (permalink / raw) To: kvmarm, linux-arm-kernel, kvm Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Ard Biesheuvel, Will Deacon, Quentin Perret, stable We walk the userspace PTs to discover what mapping size was used there. However, this can race against the userspace tables being freed, and we end-up in the weeds. Thankfully, the mm code is being generous and will IPI us when doing so. So let's implement our part of the bargain and disable interrupts around the walk. This ensures that nothing terrible happens during that time. We still need to handle the removal of the page tables before the walk. For that, allow get_user_mapping_size() to return an error, and make sure this error can be propagated all the way to the the exit handler. Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org --- arch/arm64/kvm/mmu.c | 35 ++++++++++++++++++++++++++++------- 1 file changed, 28 insertions(+), 7 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 7113587222ff..d7b8b25942df 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -666,14 +666,23 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr) CONFIG_PGTABLE_LEVELS), .mm_ops = &kvm_user_mm_ops, }; + unsigned long flags; kvm_pte_t pte = 0; /* Keep GCC quiet... */ u32 level = ~0; int ret; + /* + * Disable IRQs so that we hazard against a concurrent + * teardown of the userspace page tables (which relies on + * IPI-ing threads). + */ + local_irq_save(flags); ret = kvm_pgtable_get_leaf(&pgt, addr, &pte, &level); - VM_BUG_ON(ret); - VM_BUG_ON(level >= KVM_PGTABLE_MAX_LEVELS); - VM_BUG_ON(!(pte & PTE_VALID)); + local_irq_restore(flags); + + /* Oops, the userspace PTs are gone... */ + if (ret || level >= KVM_PGTABLE_MAX_LEVELS || !(pte & PTE_VALID)) + return -EFAULT; return BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(level)); } @@ -1079,7 +1088,7 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, * * Returns the size of the mapping. */ -static unsigned long +static long transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot, unsigned long hva, kvm_pfn_t *pfnp, phys_addr_t *ipap) @@ -1091,8 +1100,15 @@ transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot, * sure that the HVA and IPA are sufficiently aligned and that the * block map is contained within the memslot. */ - if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE) && - get_user_mapping_size(kvm, hva) >= PMD_SIZE) { + if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { + int sz = get_user_mapping_size(kvm, hva); + + if (sz < 0) + return sz; + + if (sz < PMD_SIZE) + return PAGE_SIZE; + /* * The address we faulted on is backed by a transparent huge * page. However, because we map the compound huge page and @@ -1203,7 +1219,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_pfn_t pfn; bool logging_active = memslot_is_logging(memslot); unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu); - unsigned long vma_pagesize, fault_granule; + long vma_pagesize, fault_granule; enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; struct kvm_pgtable *pgt; @@ -1350,6 +1366,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, vma_pagesize = transparent_hugepage_adjust(kvm, memslot, hva, &pfn, &fault_ipa); + + if (vma_pagesize < 0) { + ret = vma_pagesize; + goto out_unlock; + } } if (fault_status != ESR_ELx_FSC_PERM && !device && kvm_has_mte(kvm)) { -- 2.34.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] KVM: arm64: Disable interrupts while walking userspace PTs 2023-03-13 9:14 ` [PATCH 1/2] KVM: arm64: Disable interrupts while walking userspace PTs Marc Zyngier @ 2023-03-13 15:53 ` Sean Christopherson 2023-03-13 17:16 ` David Matlack 2023-03-13 17:40 ` Marc Zyngier 0 siblings, 2 replies; 8+ messages in thread From: Sean Christopherson @ 2023-03-13 15:53 UTC (permalink / raw) To: Marc Zyngier Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Ard Biesheuvel, Will Deacon, Quentin Perret, stable, David Matlack +David On Mon, Mar 13, 2023, Marc Zyngier wrote: > We walk the userspace PTs to discover what mapping size was > used there. However, this can race against the userspace tables > being freed, and we end-up in the weeds. > > Thankfully, the mm code is being generous and will IPI us when > doing so. So let's implement our part of the bargain and disable > interrupts around the walk. This ensures that nothing terrible > happens during that time. > > We still need to handle the removal of the page tables before > the walk. For that, allow get_user_mapping_size() to return an > error, and make sure this error can be propagated all the way > to the the exit handler. > > Signed-off-by: Marc Zyngier <maz@kernel.org> > Cc: stable@vger.kernel.org > --- > arch/arm64/kvm/mmu.c | 35 ++++++++++++++++++++++++++++------- > 1 file changed, 28 insertions(+), 7 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 7113587222ff..d7b8b25942df 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -666,14 +666,23 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr) > CONFIG_PGTABLE_LEVELS), > .mm_ops = &kvm_user_mm_ops, > }; > + unsigned long flags; > kvm_pte_t pte = 0; /* Keep GCC quiet... */ > u32 level = ~0; > int ret; > > + /* > + * Disable IRQs so that we hazard against a concurrent > + * teardown of the userspace page tables (which relies on > + * IPI-ing threads). > + */ > + local_irq_save(flags); > ret = kvm_pgtable_get_leaf(&pgt, addr, &pte, &level); > - VM_BUG_ON(ret); > - VM_BUG_ON(level >= KVM_PGTABLE_MAX_LEVELS); > - VM_BUG_ON(!(pte & PTE_VALID)); > + local_irq_restore(flags); > + > + /* Oops, the userspace PTs are gone... */ > + if (ret || level >= KVM_PGTABLE_MAX_LEVELS || !(pte & PTE_VALID)) > + return -EFAULT; I don't think this should return -EFAULT all the way out to userspace. Unless arm64 differs from x86 in terms of how the userspace page tables are managed, not having a valid translation _right now_ doesn't mean that one can't be created in the future, e.g. by way of a subsequent hva_to_pfn(). FWIW, the approach x86 takes is to install a 4KiB (smallest granuale) translation, which is safe since there _was_ a valid translation when mmu_lock was acquired and mmu_invalidate_retry() was checked. It's the primary MMU's responsibility to ensure all secondary MMUs are purged before freeing memory, i.e. worst case should be that KVMs stage-2 translation will be immediately zapped via mmu_notifier. KVM ARM also has a bug that might be related: the mmu_seq snapshot needs to be taken _before_ mmap_read_unlock(), otherwise vma_shift may be stale by the time it's consumed. I believe David is going to submit a patch (I found and "reported" the bug when doing an internal review of "common MMU" stuff). ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] KVM: arm64: Disable interrupts while walking userspace PTs 2023-03-13 15:53 ` Sean Christopherson @ 2023-03-13 17:16 ` David Matlack 2023-03-13 17:21 ` Sean Christopherson 2023-03-13 17:40 ` Marc Zyngier 1 sibling, 1 reply; 8+ messages in thread From: David Matlack @ 2023-03-13 17:16 UTC (permalink / raw) To: Sean Christopherson Cc: Marc Zyngier, kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Ard Biesheuvel, Will Deacon, Quentin Perret, stable On Mon, Mar 13, 2023 at 8:53 AM Sean Christopherson <seanjc@google.com> wrote: > > +David > > On Mon, Mar 13, 2023, Marc Zyngier wrote: > > We walk the userspace PTs to discover what mapping size was > > used there. However, this can race against the userspace tables > > being freed, and we end-up in the weeds. > > > > Thankfully, the mm code is being generous and will IPI us when > > doing so. So let's implement our part of the bargain and disable > > interrupts around the walk. This ensures that nothing terrible > > happens during that time. > > > > We still need to handle the removal of the page tables before > > the walk. For that, allow get_user_mapping_size() to return an > > error, and make sure this error can be propagated all the way > > to the the exit handler. > > > > Signed-off-by: Marc Zyngier <maz@kernel.org> > > Cc: stable@vger.kernel.org > > --- > > arch/arm64/kvm/mmu.c | 35 ++++++++++++++++++++++++++++------- > > 1 file changed, 28 insertions(+), 7 deletions(-) > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 7113587222ff..d7b8b25942df 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -666,14 +666,23 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr) > > CONFIG_PGTABLE_LEVELS), > > .mm_ops = &kvm_user_mm_ops, > > }; > > + unsigned long flags; > > kvm_pte_t pte = 0; /* Keep GCC quiet... */ > > u32 level = ~0; > > int ret; > > > > + /* > > + * Disable IRQs so that we hazard against a concurrent > > + * teardown of the userspace page tables (which relies on > > + * IPI-ing threads). > > + */ > > + local_irq_save(flags); > > ret = kvm_pgtable_get_leaf(&pgt, addr, &pte, &level); > > - VM_BUG_ON(ret); > > - VM_BUG_ON(level >= KVM_PGTABLE_MAX_LEVELS); > > - VM_BUG_ON(!(pte & PTE_VALID)); > > + local_irq_restore(flags); > > + > > + /* Oops, the userspace PTs are gone... */ > > + if (ret || level >= KVM_PGTABLE_MAX_LEVELS || !(pte & PTE_VALID)) > > + return -EFAULT; > > I don't think this should return -EFAULT all the way out to userspace. Unless > arm64 differs from x86 in terms of how the userspace page tables are managed, not > having a valid translation _right now_ doesn't mean that one can't be created in > the future, e.g. by way of a subsequent hva_to_pfn(). > > FWIW, the approach x86 takes is to install a 4KiB (smallest granuale) translation, If I'm reading the ARM code correctly, returning -EFAULT here will have that effect. get_user_mapping_size() is only called by transparent_hugepage_adjust() which returns PAGE_SIZE if get_user_mapping_size() returns anything less than PMD_SIZE. > which is safe since there _was_ a valid translation when mmu_lock was acquired and > mmu_invalidate_retry() was checked. It's the primary MMU's responsibility to ensure > all secondary MMUs are purged before freeing memory, i.e. worst case should be that > KVMs stage-2 translation will be immediately zapped via mmu_notifier. > > KVM ARM also has a bug that might be related: the mmu_seq snapshot needs to be > taken _before_ mmap_read_unlock(), otherwise vma_shift may be stale by the time > it's consumed. I believe David is going to submit a patch (I found and "reported" > the bug when doing an internal review of "common MMU" stuff). Yeah and RISC-V has that same bug. I'll try to have fixes for each out this week. After that, I'd also like to refactor how ARM and RISC-V calculate the host mapping size to match what we do on x86: always walk the host page table. This will unify the handling for HugeTLB and THP, avoid needing to take the mmap_lock, and we can even share the host page table walk code across architectures (Linux's host page table code is already common). ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] KVM: arm64: Disable interrupts while walking userspace PTs 2023-03-13 17:16 ` David Matlack @ 2023-03-13 17:21 ` Sean Christopherson 2023-03-13 17:26 ` David Matlack 0 siblings, 1 reply; 8+ messages in thread From: Sean Christopherson @ 2023-03-13 17:21 UTC (permalink / raw) To: David Matlack Cc: Marc Zyngier, kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Ard Biesheuvel, Will Deacon, Quentin Perret, stable On Mon, Mar 13, 2023, David Matlack wrote: > On Mon, Mar 13, 2023 at 8:53 AM Sean Christopherson <seanjc@google.com> wrote: > > > > +David > > > > On Mon, Mar 13, 2023, Marc Zyngier wrote: > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > > index 7113587222ff..d7b8b25942df 100644 > > > --- a/arch/arm64/kvm/mmu.c > > > +++ b/arch/arm64/kvm/mmu.c > > > @@ -666,14 +666,23 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr) > > > CONFIG_PGTABLE_LEVELS), > > > .mm_ops = &kvm_user_mm_ops, > > > }; > > > + unsigned long flags; > > > kvm_pte_t pte = 0; /* Keep GCC quiet... */ > > > u32 level = ~0; > > > int ret; > > > > > > + /* > > > + * Disable IRQs so that we hazard against a concurrent > > > + * teardown of the userspace page tables (which relies on > > > + * IPI-ing threads). > > > + */ > > > + local_irq_save(flags); > > > ret = kvm_pgtable_get_leaf(&pgt, addr, &pte, &level); > > > - VM_BUG_ON(ret); > > > - VM_BUG_ON(level >= KVM_PGTABLE_MAX_LEVELS); > > > - VM_BUG_ON(!(pte & PTE_VALID)); > > > + local_irq_restore(flags); > > > + > > > + /* Oops, the userspace PTs are gone... */ > > > + if (ret || level >= KVM_PGTABLE_MAX_LEVELS || !(pte & PTE_VALID)) > > > + return -EFAULT; > > > > I don't think this should return -EFAULT all the way out to userspace. Unless > > arm64 differs from x86 in terms of how the userspace page tables are managed, not > > having a valid translation _right now_ doesn't mean that one can't be created in > > the future, e.g. by way of a subsequent hva_to_pfn(). > > > > FWIW, the approach x86 takes is to install a 4KiB (smallest granuale) translation, > > If I'm reading the ARM code correctly, returning -EFAULT here will > have that effect. get_user_mapping_size() is only called by > transparent_hugepage_adjust() which returns PAGE_SIZE if > get_user_mapping_size() returns anything less than PMD_SIZE. No, this patch adds + int sz = get_user_mapping_size(kvm, hva); + + if (sz < 0) + return sz; + + if (sz < PMD_SIZE) + return PAGE_SIZE; + and vma_pagesize = transparent_hugepage_adjust(kvm, memslot, hva, &pfn, &fault_ipa); + + if (vma_pagesize < 0) { + ret = vma_pagesize; + goto out_unlock; + } ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] KVM: arm64: Disable interrupts while walking userspace PTs 2023-03-13 17:21 ` Sean Christopherson @ 2023-03-13 17:26 ` David Matlack 0 siblings, 0 replies; 8+ messages in thread From: David Matlack @ 2023-03-13 17:26 UTC (permalink / raw) To: Sean Christopherson Cc: Marc Zyngier, kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Ard Biesheuvel, Will Deacon, Quentin Perret, stable On Mon, Mar 13, 2023 at 10:21 AM Sean Christopherson <seanjc@google.com> wrote: > > On Mon, Mar 13, 2023, David Matlack wrote: > > On Mon, Mar 13, 2023 at 8:53 AM Sean Christopherson <seanjc@google.com> wrote: > > > > > > +David > > > > > > On Mon, Mar 13, 2023, Marc Zyngier wrote: > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > > > index 7113587222ff..d7b8b25942df 100644 > > > > --- a/arch/arm64/kvm/mmu.c > > > > +++ b/arch/arm64/kvm/mmu.c > > > > @@ -666,14 +666,23 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr) > > > > CONFIG_PGTABLE_LEVELS), > > > > .mm_ops = &kvm_user_mm_ops, > > > > }; > > > > + unsigned long flags; > > > > kvm_pte_t pte = 0; /* Keep GCC quiet... */ > > > > u32 level = ~0; > > > > int ret; > > > > > > > > + /* > > > > + * Disable IRQs so that we hazard against a concurrent > > > > + * teardown of the userspace page tables (which relies on > > > > + * IPI-ing threads). > > > > + */ > > > > + local_irq_save(flags); > > > > ret = kvm_pgtable_get_leaf(&pgt, addr, &pte, &level); > > > > - VM_BUG_ON(ret); > > > > - VM_BUG_ON(level >= KVM_PGTABLE_MAX_LEVELS); > > > > - VM_BUG_ON(!(pte & PTE_VALID)); > > > > + local_irq_restore(flags); > > > > + > > > > + /* Oops, the userspace PTs are gone... */ > > > > + if (ret || level >= KVM_PGTABLE_MAX_LEVELS || !(pte & PTE_VALID)) > > > > + return -EFAULT; > > > > > > I don't think this should return -EFAULT all the way out to userspace. Unless > > > arm64 differs from x86 in terms of how the userspace page tables are managed, not > > > having a valid translation _right now_ doesn't mean that one can't be created in > > > the future, e.g. by way of a subsequent hva_to_pfn(). > > > > > > FWIW, the approach x86 takes is to install a 4KiB (smallest granuale) translation, > > > > If I'm reading the ARM code correctly, returning -EFAULT here will > > have that effect. get_user_mapping_size() is only called by > > transparent_hugepage_adjust() which returns PAGE_SIZE if > > get_user_mapping_size() returns anything less than PMD_SIZE. > > No, this patch adds > > + int sz = get_user_mapping_size(kvm, hva); > + > + if (sz < 0) > + return sz; > + > + if (sz < PMD_SIZE) > + return PAGE_SIZE; > + Gah, I just looked at the trimmed patch in the reply. Thanks for pointing that out. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] KVM: arm64: Disable interrupts while walking userspace PTs 2023-03-13 15:53 ` Sean Christopherson 2023-03-13 17:16 ` David Matlack @ 2023-03-13 17:40 ` Marc Zyngier 1 sibling, 0 replies; 8+ messages in thread From: Marc Zyngier @ 2023-03-13 17:40 UTC (permalink / raw) To: Sean Christopherson Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Ard Biesheuvel, Will Deacon, Quentin Perret, stable, David Matlack On Mon, 13 Mar 2023 15:53:37 +0000, Sean Christopherson <seanjc@google.com> wrote: > > +David > > On Mon, Mar 13, 2023, Marc Zyngier wrote: > > We walk the userspace PTs to discover what mapping size was > > used there. However, this can race against the userspace tables > > being freed, and we end-up in the weeds. > > > > Thankfully, the mm code is being generous and will IPI us when > > doing so. So let's implement our part of the bargain and disable > > interrupts around the walk. This ensures that nothing terrible > > happens during that time. > > > > We still need to handle the removal of the page tables before > > the walk. For that, allow get_user_mapping_size() to return an > > error, and make sure this error can be propagated all the way > > to the the exit handler. > > > > Signed-off-by: Marc Zyngier <maz@kernel.org> > > Cc: stable@vger.kernel.org > > --- > > arch/arm64/kvm/mmu.c | 35 ++++++++++++++++++++++++++++------- > > 1 file changed, 28 insertions(+), 7 deletions(-) > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 7113587222ff..d7b8b25942df 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -666,14 +666,23 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr) > > CONFIG_PGTABLE_LEVELS), > > .mm_ops = &kvm_user_mm_ops, > > }; > > + unsigned long flags; > > kvm_pte_t pte = 0; /* Keep GCC quiet... */ > > u32 level = ~0; > > int ret; > > > > + /* > > + * Disable IRQs so that we hazard against a concurrent > > + * teardown of the userspace page tables (which relies on > > + * IPI-ing threads). > > + */ > > + local_irq_save(flags); > > ret = kvm_pgtable_get_leaf(&pgt, addr, &pte, &level); > > - VM_BUG_ON(ret); > > - VM_BUG_ON(level >= KVM_PGTABLE_MAX_LEVELS); > > - VM_BUG_ON(!(pte & PTE_VALID)); > > + local_irq_restore(flags); > > + > > + /* Oops, the userspace PTs are gone... */ > > + if (ret || level >= KVM_PGTABLE_MAX_LEVELS || !(pte & PTE_VALID)) > > + return -EFAULT; > > I don't think this should return -EFAULT all the way out to userspace. Unless > arm64 differs from x86 in terms of how the userspace page tables are managed, not > having a valid translation _right now_ doesn't mean that one can't be created in > the future, e.g. by way of a subsequent hva_to_pfn(). I probably took an overly restrictive approach of only looking at the issue at hand, where exit_mmap() had already torn down the userspace PTs. But I guess there are other ways for this scenario to happen, none of which deserve -EFAULT indeed. > FWIW, the approach x86 takes is to install a 4KiB (smallest granuale) translation, > which is safe since there _was_ a valid translation when mmu_lock was acquired and > mmu_invalidate_retry() was checked. It's the primary MMU's responsibility to ensure > all secondary MMUs are purged before freeing memory, i.e. worst case should be that > KVMs stage-2 translation will be immediately zapped via mmu_notifier. I'd rather avoid extra work. At this stage, we might as well return -EAGAIN and replay the fault. We already do that in a number of racy cases, so it fits in the infrastructure. > KVM ARM also has a bug that might be related: the mmu_seq snapshot needs to be > taken _before_ mmap_read_unlock(), otherwise vma_shift may be stale by the time > it's consumed. I believe David is going to submit a patch (I found and "reported" > the bug when doing an internal review of "common MMU" stuff). Huh, that's interesting. David, please post this at your earliest convenience. I'd rather squash these all in one go. Thanks, M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] KVM: arm64: Check for kvm_vma_mte_allowed in the critical section 2023-03-13 9:14 [PATCH 0/2] KVM: arm64: Plug a couple of MM races Marc Zyngier 2023-03-13 9:14 ` [PATCH 1/2] KVM: arm64: Disable interrupts while walking userspace PTs Marc Zyngier @ 2023-03-13 9:14 ` Marc Zyngier 1 sibling, 0 replies; 8+ messages in thread From: Marc Zyngier @ 2023-03-13 9:14 UTC (permalink / raw) To: kvmarm, linux-arm-kernel, kvm Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Ard Biesheuvel, Will Deacon, Quentin Perret, stable On page fault, we find about the VMA that backs the page fault early on, and quickly release the mmap_read_lock. However, using the VMA pointer after the critical section is pretty dangerous, as a teardown may happen in the meantime and the VMA be long gone. Move the sampling of the MTE permission early, and NULL-ify the VMA pointer after that, just to be on the safe side. Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org --- arch/arm64/kvm/mmu.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index d7b8b25942df..2424be11eb52 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1208,7 +1208,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, { int ret = 0; bool write_fault, writable, force_pte = false; - bool exec_fault; + bool exec_fault, mte_allowed; bool device = false; unsigned long mmu_seq; struct kvm *kvm = vcpu->kvm; @@ -1285,6 +1285,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, fault_ipa &= ~(vma_pagesize - 1); gfn = fault_ipa >> PAGE_SHIFT; + mte_allowed = kvm_vma_mte_allowed(vma); + /* Don't use the VMA after that -- it may have vanished */ + vma = NULL; mmap_read_unlock(current->mm); /* @@ -1375,7 +1378,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (fault_status != ESR_ELx_FSC_PERM && !device && kvm_has_mte(kvm)) { /* Check the VMM hasn't introduced a new disallowed VMA */ - if (kvm_vma_mte_allowed(vma)) { + if (mte_allowed) { sanitise_mte_tags(kvm, pfn, vma_pagesize); } else { ret = -EFAULT; -- 2.34.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-03-13 17:40 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-03-13 9:14 [PATCH 0/2] KVM: arm64: Plug a couple of MM races Marc Zyngier 2023-03-13 9:14 ` [PATCH 1/2] KVM: arm64: Disable interrupts while walking userspace PTs Marc Zyngier 2023-03-13 15:53 ` Sean Christopherson 2023-03-13 17:16 ` David Matlack 2023-03-13 17:21 ` Sean Christopherson 2023-03-13 17:26 ` David Matlack 2023-03-13 17:40 ` Marc Zyngier 2023-03-13 9:14 ` [PATCH 2/2] KVM: arm64: Check for kvm_vma_mte_allowed in the critical section Marc Zyngier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).