From: Zenghui Yu <yuzenghui@huawei.com>
To: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Cc: <zhengxiang9@huawei.com>, <marc.zyngier@arm.com>,
<christoffer.dall@arm.com>, <catalin.marinas@arm.com>,
<will.deacon@arm.com>, <james.morse@arm.com>,
<linux-arm-kernel@lists.infradead.org>,
<kvmarm@lists.cs.columbia.edu>, <linux-kernel@vger.kernel.org>,
<wanghaibin.wang@huawei.com>, <lious.lilei@hisilicon.com>,
<lishuo1@hisilicon.com>
Subject: Re: [RFC] Question about TLB flush while set Stage-2 huge pages
Date: Tue, 19 Mar 2019 17:05:23 +0800 [thread overview]
Message-ID: <25971fd5-3774-3389-a82a-04707480c1e0@huawei.com> (raw)
In-Reply-To: <20190318173405.GA31412@en101>
Hi Suzuki,
On 2019/3/19 1:34, Suzuki K Poulose wrote:
> Hi !
> On Sun, Mar 17, 2019 at 09:34:11PM +0800, Zenghui Yu wrote:
>> Hi Suzuki,
>>
>> ---8<---
>>
>> test: kvm: arm: Maybe two more fixes
>>
>> Applied based on Suzuki's patch.
>>
>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
>> ---
>> virt/kvm/arm/mmu.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 05765df..ccd5d5d 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1089,7 +1089,9 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct
>> kvm_mmu_memory_cache
>> * Normal THP split/merge follows mmu_notifier
>> * callbacks and do get handled accordingly.
>> */
>> - unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE);
>> + addr &= S2_PMD_MASK;
>> + unmap_stage2_ptes(kvm, pmd, addr, addr + S2_PMD_SIZE);
>> + get_page(virt_to_page(pmd));
>> } else {
>>
>> /*
>> @@ -1138,7 +1140,9 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct
>> kvm_mmu_memory_cache *cac
>> if (stage2_pud_present(kvm, old_pud)) {
>> /* If we have PTE level mapping, unmap the entire range */
>> if (WARN_ON_ONCE(!stage2_pud_huge(kvm, old_pud))) {
>> - unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE);
>> + addr &= S2_PUD_MASK;
>> + unmap_stage2_pmds(kvm, pudp, addr, addr + S2_PUD_SIZE);
>> + get_page(virt_to_page(pudp));
>> } else {
>> stage2_pud_clear(kvm, pudp);
>> kvm_tlb_flush_vmid_ipa(kvm, addr);
>
> This makes it a bit tricky to follow the code. The other option is to
> do something like :
Yes.
>
>
> ---8>---
>
> kvm: arm: Fix handling of stage2 huge mappings
>
> We rely on the mmu_notifier call backs to handle the split/merging
> of huge pages and thus we are guaranteed that while creating a
> block mapping, the entire block is unmapped at stage2. However,
> we miss a case where the block mapping is split for dirty logging
> case and then could later be made block mapping, if we cancel the
> dirty logging. This not only creates inconsistent TLB entries for
> the pages in the the block, but also leakes the table pages for
> PMD level.
>
> Handle these corner cases for the huge mappings at stage2 by
> unmapping the PTE level mapping. This could potentially release
> the upper level table. So we need to restart the table walk
> once we unmap the range.
>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
> virt/kvm/arm/mmu.c | 57 +++++++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 41 insertions(+), 16 deletions(-)
>
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index fce0983..a38a3f1 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1060,25 +1060,41 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
> {
> pmd_t *pmd, old_pmd;
>
> +retry:
> pmd = stage2_get_pmd(kvm, cache, addr);
> VM_BUG_ON(!pmd);
>
> old_pmd = *pmd;
> + /*
> + * Multiple vcpus faulting on the same PMD entry, can
> + * lead to them sequentially updating the PMD with the
> + * same value. Following the break-before-make
> + * (pmd_clear() followed by tlb_flush()) process can
> + * hinder forward progress due to refaults generated
> + * on missing translations.
> + *
> + * Skip updating the page table if the entry is
> + * unchanged.
> + */
> + if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> + return 0;
> +
> if (pmd_present(old_pmd)) {
> /*
> - * Multiple vcpus faulting on the same PMD entry, can
> - * lead to them sequentially updating the PMD with the
> - * same value. Following the break-before-make
> - * (pmd_clear() followed by tlb_flush()) process can
> - * hinder forward progress due to refaults generated
> - * on missing translations.
> - *
> - * Skip updating the page table if the entry is
> - * unchanged.
> + * If we already have PTE level mapping for this block,
> + * we must unmap it to avoid inconsistent TLB
> + * state. We could end up in this situation if
> + * the memory slot was marked for dirty logging
> + * and was reverted, leaving PTE level mappings
> + * for the pages accessed during the period.
> + * Normal THP split/merge follows mmu_notifier
> + * callbacks and do get handled accordingly.
> + * Unmap the PTE level mapping and retry.
> */
> - if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> - return 0;
> -
> + if (!pmd_thp_or_huge(old_pmd)) {
> + unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE);
Nit: we can get rid of the parentheses around "addr & S2_PMD_MASK" to
make it looks the same as PUD level (but it is not necessary).
> + goto retry;
> + }
> /*
> * Mapping in huge pages should only happen through a
> * fault. If a page is merged into a transparent huge
> @@ -1090,8 +1106,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
> * should become splitting first, unmapped, merged,
> * and mapped back in on-demand.
> */
> - VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
> -
> + WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
> pmd_clear(pmd);
> kvm_tlb_flush_vmid_ipa(kvm, addr);
> } else {
> @@ -1107,6 +1122,7 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
> {
> pud_t *pudp, old_pud;
>
> +retry:
> pudp = stage2_get_pud(kvm, cache, addr);
> VM_BUG_ON(!pudp);
>
> @@ -1122,8 +1138,17 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
> return 0;
>
> if (stage2_pud_present(kvm, old_pud)) {
> - stage2_pud_clear(kvm, pudp);
> - kvm_tlb_flush_vmid_ipa(kvm, addr);
> + /*
> + * If we already have PTE level mapping, unmap the entire
> + * range and retry.
> + */
> + if (!stage2_pud_huge(kvm, old_pud)) {
> + unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE);
> + goto retry;
> + } else {
> + stage2_pud_clear(kvm, pudp);
> + kvm_tlb_flush_vmid_ipa(kvm, addr);
> + }
> } else {
> get_page(virt_to_page(pudp));
> }
>
It look much better, and works fine now!
thanks,
zenghui
next prev parent reply other threads:[~2019-03-19 9:08 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-11 16:31 [RFC] Question about TLB flush while set Stage-2 huge pages Zheng Xiang
2019-03-12 11:32 ` Marc Zyngier
2019-03-12 15:30 ` Zheng Xiang
2019-03-12 18:18 ` Marc Zyngier
2019-03-13 9:45 ` Zheng Xiang
2019-03-14 10:55 ` Suzuki K Poulose
2019-03-14 15:50 ` Zenghui Yu
2019-03-15 8:21 ` Zheng Xiang
2019-03-15 14:56 ` Suzuki K Poulose
2019-03-17 13:34 ` Zenghui Yu
2019-03-18 17:34 ` Suzuki K Poulose
2019-03-19 9:05 ` Zenghui Yu [this message]
2019-03-19 14:11 ` [PATCH] kvm: arm: Fix handling of stage2 huge mappings Suzuki K Poulose
2019-03-19 16:02 ` Zenghui Yu
2019-03-20 8:15 ` Marc Zyngier
2019-03-20 9:44 ` Suzuki K Poulose
2019-03-20 10:11 ` Marc Zyngier
2019-03-20 10:23 ` Suzuki K Poulose
2019-03-20 10:35 ` Marc Zyngier
2019-03-20 11:12 ` Suzuki K Poulose
2019-03-20 17:24 ` Marc Zyngier
2019-03-17 13:55 ` [RFC] Question about TLB flush while set Stage-2 huge pages Zenghui Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=25971fd5-3774-3389-a82a-04707480c1e0@huawei.com \
--to=yuzenghui@huawei.com \
--cc=Suzuki.Poulose@arm.com \
--cc=catalin.marinas@arm.com \
--cc=christoffer.dall@arm.com \
--cc=james.morse@arm.com \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lious.lilei@hisilicon.com \
--cc=lishuo1@hisilicon.com \
--cc=marc.zyngier@arm.com \
--cc=wanghaibin.wang@huawei.com \
--cc=will.deacon@arm.com \
--cc=zhengxiang9@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).