From: Suzuki K Poulose <suzuki.poulose@arm.com> To: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, will.deacon@arm.com, catalin.marinas@arm.com, james.morse@arm.com, julien.thierry@arm.com, wanghaibin.wang@huawei.com, lious.lilei@hisilicon.com, lishuo1@hisilicon.com, zhengxiang9@huawei.com, yuzenghui@huawei.com, Suzuki K Poulose <suzuki.poulose@arm.com>, Marc Zyngier <marc.zyngier@arm.com>, Christoffer Dall <christoffer.dall@arm.com> Subject: [PATCH] kvm: arm: Fix handling of stage2 huge mappings Date: Tue, 19 Mar 2019 14:11:08 +0000 [thread overview] Message-ID: <1553004668-23296-1-git-send-email-suzuki.poulose@arm.com> (raw) In-Reply-To: <25971fd5-3774-3389-a82a-04707480c1e0@huawei.com> We rely on the mmu_notifier call backs to handle the split/merge of huge pages and thus we are guaranteed that, while creating a block mapping, either the entire block is unmapped at stage2 or it is missing permission. However, we miss a case where the block mapping is split for dirty logging case and then could later be made block mapping, if we cancel the dirty logging. This not only creates inconsistent TLB entries for the pages in the the block, but also leakes the table pages for PMD level. Handle this corner case for the huge mappings at stage2 by unmapping the non-huge mapping for the block. This could potentially release the upper level table. So we need to restart the table walk once we unmap the range. Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages") Reported-by: Zheng Xiang <zhengxiang9@huawei.com> Cc: Zheng Xiang <zhengxiang9@huawei.com> Cc: Zhengui Yu <yuzenghui@huawei.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> --- virt/kvm/arm/mmu.c | 63 ++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 45 insertions(+), 18 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index fce0983..6ad6f19d 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1060,25 +1060,43 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache { pmd_t *pmd, old_pmd; +retry: pmd = stage2_get_pmd(kvm, cache, addr); VM_BUG_ON(!pmd); old_pmd = *pmd; + /* + * Multiple vcpus faulting on the same PMD entry, can + * lead to them sequentially updating the PMD with the + * same value. Following the break-before-make + * (pmd_clear() followed by tlb_flush()) process can + * hinder forward progress due to refaults generated + * on missing translations. + * + * Skip updating the page table if the entry is + * unchanged. + */ + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) + return 0; + if (pmd_present(old_pmd)) { /* - * Multiple vcpus faulting on the same PMD entry, can - * lead to them sequentially updating the PMD with the - * same value. Following the break-before-make - * (pmd_clear() followed by tlb_flush()) process can - * hinder forward progress due to refaults generated - * on missing translations. + * If we already have PTE level mapping for this block, + * we must unmap it to avoid inconsistent TLB state and + * leaking the table page. We could end up in this situation + * if the memory slot was marked for dirty logging and was + * reverted, leaving PTE level mappings for the pages accessed + * during the period. So, unmap the PTE level mapping for this + * block and retry, as we could have released the upper level + * table in the process. * - * Skip updating the page table if the entry is - * unchanged. + * Normal THP split/merge follows mmu_notifier callbacks and do + * get handled accordingly. */ - if (pmd_val(old_pmd) == pmd_val(*new_pmd)) - return 0; - + if (!pmd_thp_or_huge(old_pmd)) { + unmap_stage2_range(kvm, addr & S2_PMD_MASK, S2_PMD_SIZE); + goto retry; + } /* * Mapping in huge pages should only happen through a * fault. If a page is merged into a transparent huge @@ -1090,8 +1108,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache * should become splitting first, unmapped, merged, * and mapped back in on-demand. */ - VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); - + WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); pmd_clear(pmd); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { @@ -1107,6 +1124,7 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac { pud_t *pudp, old_pud; +retry: pudp = stage2_get_pud(kvm, cache, addr); VM_BUG_ON(!pudp); @@ -1114,16 +1132,25 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac /* * A large number of vcpus faulting on the same stage 2 entry, - * can lead to a refault due to the - * stage2_pud_clear()/tlb_flush(). Skip updating the page - * tables if there is no change. + * can lead to a refault due to the stage2_pud_clear()/tlb_flush(). + * Skip updating the page tables if there is no change. */ if (pud_val(old_pud) == pud_val(*new_pudp)) return 0; if (stage2_pud_present(kvm, old_pud)) { - stage2_pud_clear(kvm, pudp); - kvm_tlb_flush_vmid_ipa(kvm, addr); + /* + * If we already have table level mapping for this block, unmap + * the range for this block and retry. + */ + if (!stage2_pud_huge(kvm, old_pud)) { + unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); + goto retry; + } else { + WARN_ON_ONCE(pud_pfn(old_pud) != pud_pfn(*new_pudp)); + stage2_pud_clear(kvm, pudp); + kvm_tlb_flush_vmid_ipa(kvm, addr); + } } else { get_page(virt_to_page(pudp)); } -- 2.7.4
WARNING: multiple messages have this Message-ID (diff)
From: Suzuki K Poulose <suzuki.poulose@arm.com> To: linux-arm-kernel@lists.infradead.org Cc: kvm@vger.kernel.org, julien.thierry@arm.com, Marc Zyngier <marc.zyngier@arm.com>, catalin.marinas@arm.com, Suzuki K Poulose <suzuki.poulose@arm.com>, will.deacon@arm.com, linux-kernel@vger.kernel.org, Christoffer Dall <christoffer.dall@arm.com>, zhengxiang9@huawei.com, james.morse@arm.com, lishuo1@hisilicon.com, yuzenghui@huawei.com, wanghaibin.wang@huawei.com, kvmarm@lists.cs.columbia.edu, lious.lilei@hisilicon.com Subject: [PATCH] kvm: arm: Fix handling of stage2 huge mappings Date: Tue, 19 Mar 2019 14:11:08 +0000 [thread overview] Message-ID: <1553004668-23296-1-git-send-email-suzuki.poulose@arm.com> (raw) In-Reply-To: <25971fd5-3774-3389-a82a-04707480c1e0@huawei.com> We rely on the mmu_notifier call backs to handle the split/merge of huge pages and thus we are guaranteed that, while creating a block mapping, either the entire block is unmapped at stage2 or it is missing permission. However, we miss a case where the block mapping is split for dirty logging case and then could later be made block mapping, if we cancel the dirty logging. This not only creates inconsistent TLB entries for the pages in the the block, but also leakes the table pages for PMD level. Handle this corner case for the huge mappings at stage2 by unmapping the non-huge mapping for the block. This could potentially release the upper level table. So we need to restart the table walk once we unmap the range. Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages") Reported-by: Zheng Xiang <zhengxiang9@huawei.com> Cc: Zheng Xiang <zhengxiang9@huawei.com> Cc: Zhengui Yu <yuzenghui@huawei.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> --- virt/kvm/arm/mmu.c | 63 ++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 45 insertions(+), 18 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index fce0983..6ad6f19d 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1060,25 +1060,43 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache { pmd_t *pmd, old_pmd; +retry: pmd = stage2_get_pmd(kvm, cache, addr); VM_BUG_ON(!pmd); old_pmd = *pmd; + /* + * Multiple vcpus faulting on the same PMD entry, can + * lead to them sequentially updating the PMD with the + * same value. Following the break-before-make + * (pmd_clear() followed by tlb_flush()) process can + * hinder forward progress due to refaults generated + * on missing translations. + * + * Skip updating the page table if the entry is + * unchanged. + */ + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) + return 0; + if (pmd_present(old_pmd)) { /* - * Multiple vcpus faulting on the same PMD entry, can - * lead to them sequentially updating the PMD with the - * same value. Following the break-before-make - * (pmd_clear() followed by tlb_flush()) process can - * hinder forward progress due to refaults generated - * on missing translations. + * If we already have PTE level mapping for this block, + * we must unmap it to avoid inconsistent TLB state and + * leaking the table page. We could end up in this situation + * if the memory slot was marked for dirty logging and was + * reverted, leaving PTE level mappings for the pages accessed + * during the period. So, unmap the PTE level mapping for this + * block and retry, as we could have released the upper level + * table in the process. * - * Skip updating the page table if the entry is - * unchanged. + * Normal THP split/merge follows mmu_notifier callbacks and do + * get handled accordingly. */ - if (pmd_val(old_pmd) == pmd_val(*new_pmd)) - return 0; - + if (!pmd_thp_or_huge(old_pmd)) { + unmap_stage2_range(kvm, addr & S2_PMD_MASK, S2_PMD_SIZE); + goto retry; + } /* * Mapping in huge pages should only happen through a * fault. If a page is merged into a transparent huge @@ -1090,8 +1108,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache * should become splitting first, unmapped, merged, * and mapped back in on-demand. */ - VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); - + WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); pmd_clear(pmd); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { @@ -1107,6 +1124,7 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac { pud_t *pudp, old_pud; +retry: pudp = stage2_get_pud(kvm, cache, addr); VM_BUG_ON(!pudp); @@ -1114,16 +1132,25 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac /* * A large number of vcpus faulting on the same stage 2 entry, - * can lead to a refault due to the - * stage2_pud_clear()/tlb_flush(). Skip updating the page - * tables if there is no change. + * can lead to a refault due to the stage2_pud_clear()/tlb_flush(). + * Skip updating the page tables if there is no change. */ if (pud_val(old_pud) == pud_val(*new_pudp)) return 0; if (stage2_pud_present(kvm, old_pud)) { - stage2_pud_clear(kvm, pudp); - kvm_tlb_flush_vmid_ipa(kvm, addr); + /* + * If we already have table level mapping for this block, unmap + * the range for this block and retry. + */ + if (!stage2_pud_huge(kvm, old_pud)) { + unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); + goto retry; + } else { + WARN_ON_ONCE(pud_pfn(old_pud) != pud_pfn(*new_pudp)); + stage2_pud_clear(kvm, pudp); + kvm_tlb_flush_vmid_ipa(kvm, addr); + } } else { get_page(virt_to_page(pudp)); } -- 2.7.4 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-03-19 14:11 UTC|newest] Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-03-11 16:31 [RFC] Question about TLB flush while set Stage-2 huge pages Zheng Xiang 2019-03-11 16:31 ` Zheng Xiang 2019-03-11 16:31 ` Zheng Xiang 2019-03-12 11:32 ` Marc Zyngier 2019-03-12 11:32 ` Marc Zyngier 2019-03-12 15:30 ` Zheng Xiang 2019-03-12 15:30 ` Zheng Xiang 2019-03-12 15:30 ` Zheng Xiang 2019-03-12 18:18 ` Marc Zyngier 2019-03-12 18:18 ` Marc Zyngier 2019-03-13 9:45 ` Zheng Xiang 2019-03-13 9:45 ` Zheng Xiang 2019-03-13 9:45 ` Zheng Xiang 2019-03-14 10:55 ` Suzuki K Poulose 2019-03-14 10:55 ` Suzuki K Poulose 2019-03-14 15:50 ` Zenghui Yu 2019-03-14 15:50 ` Zenghui Yu 2019-03-14 15:50 ` Zenghui Yu 2019-03-15 8:21 ` Zheng Xiang 2019-03-15 8:21 ` Zheng Xiang 2019-03-15 8:21 ` Zheng Xiang 2019-03-15 14:56 ` Suzuki K Poulose 2019-03-15 14:56 ` Suzuki K Poulose 2019-03-17 13:34 ` Zenghui Yu 2019-03-17 13:34 ` Zenghui Yu 2019-03-17 13:34 ` Zenghui Yu 2019-03-18 17:34 ` Suzuki K Poulose 2019-03-18 17:34 ` Suzuki K Poulose 2019-03-19 9:05 ` Zenghui Yu 2019-03-19 9:05 ` Zenghui Yu 2019-03-19 9:05 ` Zenghui Yu 2019-03-19 14:11 ` Suzuki K Poulose [this message] 2019-03-19 14:11 ` [PATCH] kvm: arm: Fix handling of stage2 huge mappings Suzuki K Poulose 2019-03-19 16:02 ` Zenghui Yu 2019-03-19 16:02 ` Zenghui Yu 2019-03-19 16:02 ` Zenghui Yu 2019-03-20 8:15 ` Marc Zyngier 2019-03-20 8:15 ` Marc Zyngier 2019-03-20 8:15 ` Marc Zyngier 2019-03-20 9:44 ` Suzuki K Poulose 2019-03-20 9:44 ` Suzuki K Poulose 2019-03-20 9:44 ` Suzuki K Poulose 2019-03-20 10:11 ` Marc Zyngier 2019-03-20 10:11 ` Marc Zyngier 2019-03-20 10:11 ` Marc Zyngier 2019-03-20 10:23 ` Suzuki K Poulose 2019-03-20 10:23 ` Suzuki K Poulose 2019-03-20 10:35 ` Marc Zyngier 2019-03-20 10:35 ` Marc Zyngier 2019-03-20 10:35 ` Marc Zyngier 2019-03-20 11:12 ` Suzuki K Poulose 2019-03-20 11:12 ` Suzuki K Poulose 2019-03-20 17:24 ` Marc Zyngier 2019-03-20 17:24 ` Marc Zyngier 2019-03-20 17:24 ` Marc Zyngier 2019-03-17 13:55 ` [RFC] Question about TLB flush while set Stage-2 huge pages Zenghui Yu 2019-03-17 13:55 ` Zenghui Yu 2019-03-17 13:55 ` Zenghui Yu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1553004668-23296-1-git-send-email-suzuki.poulose@arm.com \ --to=suzuki.poulose@arm.com \ --cc=catalin.marinas@arm.com \ --cc=christoffer.dall@arm.com \ --cc=james.morse@arm.com \ --cc=julien.thierry@arm.com \ --cc=kvm@vger.kernel.org \ --cc=kvmarm@lists.cs.columbia.edu \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=lious.lilei@hisilicon.com \ --cc=lishuo1@hisilicon.com \ --cc=marc.zyngier@arm.com \ --cc=wanghaibin.wang@huawei.com \ --cc=will.deacon@arm.com \ --cc=yuzenghui@huawei.com \ --cc=zhengxiang9@huawei.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.