From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41643C43381 for ; Tue, 19 Mar 2019 09:08:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0FFD620828 for ; Tue, 19 Mar 2019 09:08:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727188AbfCSJIN (ORCPT ); Tue, 19 Mar 2019 05:08:13 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]:45046 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726703AbfCSJIN (ORCPT ); Tue, 19 Mar 2019 05:08:13 -0400 Received: from DGGEMS406-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id ED06B4D81E64B95264A1; Tue, 19 Mar 2019 17:08:10 +0800 (CST) Received: from [127.0.0.1] (10.184.12.158) by DGGEMS406-HUB.china.huawei.com (10.3.19.206) with Microsoft SMTP Server id 14.3.408.0; Tue, 19 Mar 2019 17:08:03 +0800 Subject: Re: [RFC] Question about TLB flush while set Stage-2 huge pages To: Suzuki K Poulose CC: , , , , , , , , , , , References: <5f712cc6-0874-adbe-add6-46f5de24f36f@huawei.com> <1c0e07b9-73f0-efa4-c1b7-ad81789b42c5@huawei.com> <5188e3b9-5b5a-a6a7-7ef0-09b7b4f06af6@arm.com> <348d0b3b-c74b-7b39-ec30-85905c077c38@huawei.com> <20190314105537.GA15323@en101> <368bd218-ac1d-19b2-6e92-960b91afee8b@huawei.com> <6aea4049-7860-7144-a7be-14f856cdc789@arm.com> <20190318173405.GA31412@en101> From: Zenghui Yu Message-ID: <25971fd5-3774-3389-a82a-04707480c1e0@huawei.com> Date: Tue, 19 Mar 2019 17:05:23 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:64.0) Gecko/20100101 Thunderbird/64.0 MIME-Version: 1.0 In-Reply-To: <20190318173405.GA31412@en101> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.184.12.158] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Suzuki, On 2019/3/19 1:34, Suzuki K Poulose wrote: > Hi ! > On Sun, Mar 17, 2019 at 09:34:11PM +0800, Zenghui Yu wrote: >> Hi Suzuki, >> >> ---8<--- >> >> test: kvm: arm: Maybe two more fixes >> >> Applied based on Suzuki's patch. >> >> Signed-off-by: Zenghui Yu >> --- >> virt/kvm/arm/mmu.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 05765df..ccd5d5d 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1089,7 +1089,9 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct >> kvm_mmu_memory_cache >> * Normal THP split/merge follows mmu_notifier >> * callbacks and do get handled accordingly. >> */ >> - unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE); >> + addr &= S2_PMD_MASK; >> + unmap_stage2_ptes(kvm, pmd, addr, addr + S2_PMD_SIZE); >> + get_page(virt_to_page(pmd)); >> } else { >> >> /* >> @@ -1138,7 +1140,9 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct >> kvm_mmu_memory_cache *cac >> if (stage2_pud_present(kvm, old_pud)) { >> /* If we have PTE level mapping, unmap the entire range */ >> if (WARN_ON_ONCE(!stage2_pud_huge(kvm, old_pud))) { >> - unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); >> + addr &= S2_PUD_MASK; >> + unmap_stage2_pmds(kvm, pudp, addr, addr + S2_PUD_SIZE); >> + get_page(virt_to_page(pudp)); >> } else { >> stage2_pud_clear(kvm, pudp); >> kvm_tlb_flush_vmid_ipa(kvm, addr); > > This makes it a bit tricky to follow the code. The other option is to > do something like : Yes. > > > ---8>--- > > kvm: arm: Fix handling of stage2 huge mappings > > We rely on the mmu_notifier call backs to handle the split/merging > of huge pages and thus we are guaranteed that while creating a > block mapping, the entire block is unmapped at stage2. However, > we miss a case where the block mapping is split for dirty logging > case and then could later be made block mapping, if we cancel the > dirty logging. This not only creates inconsistent TLB entries for > the pages in the the block, but also leakes the table pages for > PMD level. > > Handle these corner cases for the huge mappings at stage2 by > unmapping the PTE level mapping. This could potentially release > the upper level table. So we need to restart the table walk > once we unmap the range. > > Signed-off-by: Suzuki K Poulose > --- > virt/kvm/arm/mmu.c | 57 +++++++++++++++++++++++++++++++++++++++--------------- > 1 file changed, 41 insertions(+), 16 deletions(-) > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index fce0983..a38a3f1 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -1060,25 +1060,41 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache > { > pmd_t *pmd, old_pmd; > > +retry: > pmd = stage2_get_pmd(kvm, cache, addr); > VM_BUG_ON(!pmd); > > old_pmd = *pmd; > + /* > + * Multiple vcpus faulting on the same PMD entry, can > + * lead to them sequentially updating the PMD with the > + * same value. Following the break-before-make > + * (pmd_clear() followed by tlb_flush()) process can > + * hinder forward progress due to refaults generated > + * on missing translations. > + * > + * Skip updating the page table if the entry is > + * unchanged. > + */ > + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) > + return 0; > + > if (pmd_present(old_pmd)) { > /* > - * Multiple vcpus faulting on the same PMD entry, can > - * lead to them sequentially updating the PMD with the > - * same value. Following the break-before-make > - * (pmd_clear() followed by tlb_flush()) process can > - * hinder forward progress due to refaults generated > - * on missing translations. > - * > - * Skip updating the page table if the entry is > - * unchanged. > + * If we already have PTE level mapping for this block, > + * we must unmap it to avoid inconsistent TLB > + * state. We could end up in this situation if > + * the memory slot was marked for dirty logging > + * and was reverted, leaving PTE level mappings > + * for the pages accessed during the period. > + * Normal THP split/merge follows mmu_notifier > + * callbacks and do get handled accordingly. > + * Unmap the PTE level mapping and retry. > */ > - if (pmd_val(old_pmd) == pmd_val(*new_pmd)) > - return 0; > - > + if (!pmd_thp_or_huge(old_pmd)) { > + unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE); Nit: we can get rid of the parentheses around "addr & S2_PMD_MASK" to make it looks the same as PUD level (but it is not necessary). > + goto retry; > + } > /* > * Mapping in huge pages should only happen through a > * fault. If a page is merged into a transparent huge > @@ -1090,8 +1106,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache > * should become splitting first, unmapped, merged, > * and mapped back in on-demand. > */ > - VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); > - > + WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); > pmd_clear(pmd); > kvm_tlb_flush_vmid_ipa(kvm, addr); > } else { > @@ -1107,6 +1122,7 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac > { > pud_t *pudp, old_pud; > > +retry: > pudp = stage2_get_pud(kvm, cache, addr); > VM_BUG_ON(!pudp); > > @@ -1122,8 +1138,17 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac > return 0; > > if (stage2_pud_present(kvm, old_pud)) { > - stage2_pud_clear(kvm, pudp); > - kvm_tlb_flush_vmid_ipa(kvm, addr); > + /* > + * If we already have PTE level mapping, unmap the entire > + * range and retry. > + */ > + if (!stage2_pud_huge(kvm, old_pud)) { > + unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); > + goto retry; > + } else { > + stage2_pud_clear(kvm, pudp); > + kvm_tlb_flush_vmid_ipa(kvm, addr); > + } > } else { > get_page(virt_to_page(pudp)); > } > It look much better, and works fine now! thanks, zenghui From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zenghui Yu Subject: Re: [RFC] Question about TLB flush while set Stage-2 huge pages Date: Tue, 19 Mar 2019 17:05:23 +0800 Message-ID: <25971fd5-3774-3389-a82a-04707480c1e0@huawei.com> References: <5f712cc6-0874-adbe-add6-46f5de24f36f@huawei.com> <1c0e07b9-73f0-efa4-c1b7-ad81789b42c5@huawei.com> <5188e3b9-5b5a-a6a7-7ef0-09b7b4f06af6@arm.com> <348d0b3b-c74b-7b39-ec30-85905c077c38@huawei.com> <20190314105537.GA15323@en101> <368bd218-ac1d-19b2-6e92-960b91afee8b@huawei.com> <6aea4049-7860-7144-a7be-14f856cdc789@arm.com> <20190318173405.GA31412@en101> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20190318173405.GA31412@en101> Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org To: Suzuki K Poulose Cc: zhengxiang9@huawei.com, marc.zyngier@arm.com, christoffer.dall@arm.com, catalin.marinas@arm.com, will.deacon@arm.com, james.morse@arm.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, wanghaibin.wang@huawei.com, lious.lilei@hisilicon.com, lishuo1@hisilicon.com List-Id: kvmarm@lists.cs.columbia.edu Hi Suzuki, On 2019/3/19 1:34, Suzuki K Poulose wrote: > Hi ! > On Sun, Mar 17, 2019 at 09:34:11PM +0800, Zenghui Yu wrote: >> Hi Suzuki, >> >> ---8<--- >> >> test: kvm: arm: Maybe two more fixes >> >> Applied based on Suzuki's patch. >> >> Signed-off-by: Zenghui Yu >> --- >> virt/kvm/arm/mmu.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 05765df..ccd5d5d 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1089,7 +1089,9 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct >> kvm_mmu_memory_cache >> * Normal THP split/merge follows mmu_notifier >> * callbacks and do get handled accordingly. >> */ >> - unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE); >> + addr &= S2_PMD_MASK; >> + unmap_stage2_ptes(kvm, pmd, addr, addr + S2_PMD_SIZE); >> + get_page(virt_to_page(pmd)); >> } else { >> >> /* >> @@ -1138,7 +1140,9 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct >> kvm_mmu_memory_cache *cac >> if (stage2_pud_present(kvm, old_pud)) { >> /* If we have PTE level mapping, unmap the entire range */ >> if (WARN_ON_ONCE(!stage2_pud_huge(kvm, old_pud))) { >> - unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); >> + addr &= S2_PUD_MASK; >> + unmap_stage2_pmds(kvm, pudp, addr, addr + S2_PUD_SIZE); >> + get_page(virt_to_page(pudp)); >> } else { >> stage2_pud_clear(kvm, pudp); >> kvm_tlb_flush_vmid_ipa(kvm, addr); > > This makes it a bit tricky to follow the code. The other option is to > do something like : Yes. > > > ---8>--- > > kvm: arm: Fix handling of stage2 huge mappings > > We rely on the mmu_notifier call backs to handle the split/merging > of huge pages and thus we are guaranteed that while creating a > block mapping, the entire block is unmapped at stage2. However, > we miss a case where the block mapping is split for dirty logging > case and then could later be made block mapping, if we cancel the > dirty logging. This not only creates inconsistent TLB entries for > the pages in the the block, but also leakes the table pages for > PMD level. > > Handle these corner cases for the huge mappings at stage2 by > unmapping the PTE level mapping. This could potentially release > the upper level table. So we need to restart the table walk > once we unmap the range. > > Signed-off-by: Suzuki K Poulose > --- > virt/kvm/arm/mmu.c | 57 +++++++++++++++++++++++++++++++++++++++--------------- > 1 file changed, 41 insertions(+), 16 deletions(-) > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index fce0983..a38a3f1 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -1060,25 +1060,41 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache > { > pmd_t *pmd, old_pmd; > > +retry: > pmd = stage2_get_pmd(kvm, cache, addr); > VM_BUG_ON(!pmd); > > old_pmd = *pmd; > + /* > + * Multiple vcpus faulting on the same PMD entry, can > + * lead to them sequentially updating the PMD with the > + * same value. Following the break-before-make > + * (pmd_clear() followed by tlb_flush()) process can > + * hinder forward progress due to refaults generated > + * on missing translations. > + * > + * Skip updating the page table if the entry is > + * unchanged. > + */ > + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) > + return 0; > + > if (pmd_present(old_pmd)) { > /* > - * Multiple vcpus faulting on the same PMD entry, can > - * lead to them sequentially updating the PMD with the > - * same value. Following the break-before-make > - * (pmd_clear() followed by tlb_flush()) process can > - * hinder forward progress due to refaults generated > - * on missing translations. > - * > - * Skip updating the page table if the entry is > - * unchanged. > + * If we already have PTE level mapping for this block, > + * we must unmap it to avoid inconsistent TLB > + * state. We could end up in this situation if > + * the memory slot was marked for dirty logging > + * and was reverted, leaving PTE level mappings > + * for the pages accessed during the period. > + * Normal THP split/merge follows mmu_notifier > + * callbacks and do get handled accordingly. > + * Unmap the PTE level mapping and retry. > */ > - if (pmd_val(old_pmd) == pmd_val(*new_pmd)) > - return 0; > - > + if (!pmd_thp_or_huge(old_pmd)) { > + unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE); Nit: we can get rid of the parentheses around "addr & S2_PMD_MASK" to make it looks the same as PUD level (but it is not necessary). > + goto retry; > + } > /* > * Mapping in huge pages should only happen through a > * fault. If a page is merged into a transparent huge > @@ -1090,8 +1106,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache > * should become splitting first, unmapped, merged, > * and mapped back in on-demand. > */ > - VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); > - > + WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); > pmd_clear(pmd); > kvm_tlb_flush_vmid_ipa(kvm, addr); > } else { > @@ -1107,6 +1122,7 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac > { > pud_t *pudp, old_pud; > > +retry: > pudp = stage2_get_pud(kvm, cache, addr); > VM_BUG_ON(!pudp); > > @@ -1122,8 +1138,17 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac > return 0; > > if (stage2_pud_present(kvm, old_pud)) { > - stage2_pud_clear(kvm, pudp); > - kvm_tlb_flush_vmid_ipa(kvm, addr); > + /* > + * If we already have PTE level mapping, unmap the entire > + * range and retry. > + */ > + if (!stage2_pud_huge(kvm, old_pud)) { > + unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); > + goto retry; > + } else { > + stage2_pud_clear(kvm, pudp); > + kvm_tlb_flush_vmid_ipa(kvm, addr); > + } > } else { > get_page(virt_to_page(pudp)); > } > It look much better, and works fine now! thanks, zenghui From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC99CC43381 for ; Tue, 19 Mar 2019 09:08:21 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B8B1B20828 for ; Tue, 19 Mar 2019 09:08:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="czlAVerP" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B8B1B20828 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=+yKy56wn2fNqy8VLBZX6tRTAdEZ1+avX/e1OqyaE3FY=; b=czlAVerPAmqd9vvQ9smH709hx NK3pLn4omB7EfFU6ElrrDRy/FHjjnluzR+jo9rlqK3bzHKBx8zWFJ/BXNVBR+B1QQN/jRT3JRRWuI tnPKsEHA8MeqaGlXJL4JB7zflhBalyaPqk83juEO+NJ8T/rUxfPdhrErUc+pskvQWeJUacNn8i2P5 VPKdAc93BES97RCRclXdyZDgbfzvDc3YG9Y6lSVymritMhO3O6eCsQ2YA/ZzGGACWWQkX2w3LeZAv 3BhrhxddWox9H757+TnS8QeQINBqbOym0DmpDBWTYx7AG9z/xfZyMxrCFzCVVf8FbMMFZuRnwW01c evp0OH6VQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1h6AjK-0005FC-NZ; Tue, 19 Mar 2019 09:08:18 +0000 Received: from szxga07-in.huawei.com ([45.249.212.35] helo=huawei.com) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1h6AjG-0005EO-V0 for linux-arm-kernel@lists.infradead.org; Tue, 19 Mar 2019 09:08:17 +0000 Received: from DGGEMS406-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id ED06B4D81E64B95264A1; Tue, 19 Mar 2019 17:08:10 +0800 (CST) Received: from [127.0.0.1] (10.184.12.158) by DGGEMS406-HUB.china.huawei.com (10.3.19.206) with Microsoft SMTP Server id 14.3.408.0; Tue, 19 Mar 2019 17:08:03 +0800 Subject: Re: [RFC] Question about TLB flush while set Stage-2 huge pages To: Suzuki K Poulose References: <5f712cc6-0874-adbe-add6-46f5de24f36f@huawei.com> <1c0e07b9-73f0-efa4-c1b7-ad81789b42c5@huawei.com> <5188e3b9-5b5a-a6a7-7ef0-09b7b4f06af6@arm.com> <348d0b3b-c74b-7b39-ec30-85905c077c38@huawei.com> <20190314105537.GA15323@en101> <368bd218-ac1d-19b2-6e92-960b91afee8b@huawei.com> <6aea4049-7860-7144-a7be-14f856cdc789@arm.com> <20190318173405.GA31412@en101> From: Zenghui Yu Message-ID: <25971fd5-3774-3389-a82a-04707480c1e0@huawei.com> Date: Tue, 19 Mar 2019 17:05:23 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:64.0) Gecko/20100101 Thunderbird/64.0 MIME-Version: 1.0 In-Reply-To: <20190318173405.GA31412@en101> Content-Language: en-US X-Originating-IP: [10.184.12.158] X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190319_020815_329885_B5569CC8 X-CRM114-Status: GOOD ( 28.76 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: marc.zyngier@arm.com, catalin.marinas@arm.com, will.deacon@arm.com, christoffer.dall@arm.com, linux-kernel@vger.kernel.org, zhengxiang9@huawei.com, james.morse@arm.com, lishuo1@hisilicon.com, wanghaibin.wang@huawei.com, kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, lious.lilei@hisilicon.com Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Suzuki, On 2019/3/19 1:34, Suzuki K Poulose wrote: > Hi ! > On Sun, Mar 17, 2019 at 09:34:11PM +0800, Zenghui Yu wrote: >> Hi Suzuki, >> >> ---8<--- >> >> test: kvm: arm: Maybe two more fixes >> >> Applied based on Suzuki's patch. >> >> Signed-off-by: Zenghui Yu >> --- >> virt/kvm/arm/mmu.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 05765df..ccd5d5d 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1089,7 +1089,9 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct >> kvm_mmu_memory_cache >> * Normal THP split/merge follows mmu_notifier >> * callbacks and do get handled accordingly. >> */ >> - unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE); >> + addr &= S2_PMD_MASK; >> + unmap_stage2_ptes(kvm, pmd, addr, addr + S2_PMD_SIZE); >> + get_page(virt_to_page(pmd)); >> } else { >> >> /* >> @@ -1138,7 +1140,9 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct >> kvm_mmu_memory_cache *cac >> if (stage2_pud_present(kvm, old_pud)) { >> /* If we have PTE level mapping, unmap the entire range */ >> if (WARN_ON_ONCE(!stage2_pud_huge(kvm, old_pud))) { >> - unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); >> + addr &= S2_PUD_MASK; >> + unmap_stage2_pmds(kvm, pudp, addr, addr + S2_PUD_SIZE); >> + get_page(virt_to_page(pudp)); >> } else { >> stage2_pud_clear(kvm, pudp); >> kvm_tlb_flush_vmid_ipa(kvm, addr); > > This makes it a bit tricky to follow the code. The other option is to > do something like : Yes. > > > ---8>--- > > kvm: arm: Fix handling of stage2 huge mappings > > We rely on the mmu_notifier call backs to handle the split/merging > of huge pages and thus we are guaranteed that while creating a > block mapping, the entire block is unmapped at stage2. However, > we miss a case where the block mapping is split for dirty logging > case and then could later be made block mapping, if we cancel the > dirty logging. This not only creates inconsistent TLB entries for > the pages in the the block, but also leakes the table pages for > PMD level. > > Handle these corner cases for the huge mappings at stage2 by > unmapping the PTE level mapping. This could potentially release > the upper level table. So we need to restart the table walk > once we unmap the range. > > Signed-off-by: Suzuki K Poulose > --- > virt/kvm/arm/mmu.c | 57 +++++++++++++++++++++++++++++++++++++++--------------- > 1 file changed, 41 insertions(+), 16 deletions(-) > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index fce0983..a38a3f1 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -1060,25 +1060,41 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache > { > pmd_t *pmd, old_pmd; > > +retry: > pmd = stage2_get_pmd(kvm, cache, addr); > VM_BUG_ON(!pmd); > > old_pmd = *pmd; > + /* > + * Multiple vcpus faulting on the same PMD entry, can > + * lead to them sequentially updating the PMD with the > + * same value. Following the break-before-make > + * (pmd_clear() followed by tlb_flush()) process can > + * hinder forward progress due to refaults generated > + * on missing translations. > + * > + * Skip updating the page table if the entry is > + * unchanged. > + */ > + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) > + return 0; > + > if (pmd_present(old_pmd)) { > /* > - * Multiple vcpus faulting on the same PMD entry, can > - * lead to them sequentially updating the PMD with the > - * same value. Following the break-before-make > - * (pmd_clear() followed by tlb_flush()) process can > - * hinder forward progress due to refaults generated > - * on missing translations. > - * > - * Skip updating the page table if the entry is > - * unchanged. > + * If we already have PTE level mapping for this block, > + * we must unmap it to avoid inconsistent TLB > + * state. We could end up in this situation if > + * the memory slot was marked for dirty logging > + * and was reverted, leaving PTE level mappings > + * for the pages accessed during the period. > + * Normal THP split/merge follows mmu_notifier > + * callbacks and do get handled accordingly. > + * Unmap the PTE level mapping and retry. > */ > - if (pmd_val(old_pmd) == pmd_val(*new_pmd)) > - return 0; > - > + if (!pmd_thp_or_huge(old_pmd)) { > + unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE); Nit: we can get rid of the parentheses around "addr & S2_PMD_MASK" to make it looks the same as PUD level (but it is not necessary). > + goto retry; > + } > /* > * Mapping in huge pages should only happen through a > * fault. If a page is merged into a transparent huge > @@ -1090,8 +1106,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache > * should become splitting first, unmapped, merged, > * and mapped back in on-demand. > */ > - VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); > - > + WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); > pmd_clear(pmd); > kvm_tlb_flush_vmid_ipa(kvm, addr); > } else { > @@ -1107,6 +1122,7 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac > { > pud_t *pudp, old_pud; > > +retry: > pudp = stage2_get_pud(kvm, cache, addr); > VM_BUG_ON(!pudp); > > @@ -1122,8 +1138,17 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac > return 0; > > if (stage2_pud_present(kvm, old_pud)) { > - stage2_pud_clear(kvm, pudp); > - kvm_tlb_flush_vmid_ipa(kvm, addr); > + /* > + * If we already have PTE level mapping, unmap the entire > + * range and retry. > + */ > + if (!stage2_pud_huge(kvm, old_pud)) { > + unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); > + goto retry; > + } else { > + stage2_pud_clear(kvm, pudp); > + kvm_tlb_flush_vmid_ipa(kvm, addr); > + } > } else { > get_page(virt_to_page(pudp)); > } > It look much better, and works fine now! thanks, zenghui _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel