From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A76FC43381 for ; Fri, 15 Mar 2019 08:24:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 69A4C218CD for ; Fri, 15 Mar 2019 08:24:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727273AbfCOIYB (ORCPT ); Fri, 15 Mar 2019 04:24:01 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:4686 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726582AbfCOIYB (ORCPT ); Fri, 15 Mar 2019 04:24:01 -0400 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id E93509FB3120A50F2BFF; Fri, 15 Mar 2019 16:23:58 +0800 (CST) Received: from [127.0.0.1] (10.177.29.32) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.408.0; Fri, 15 Mar 2019 16:23:50 +0800 Subject: Re: [RFC] Question about TLB flush while set Stage-2 huge pages To: Zenghui Yu , Suzuki K Poulose CC: Marc Zyngier , , , , , , , , Wang Haibin , , References: <5f712cc6-0874-adbe-add6-46f5de24f36f@huawei.com> <1c0e07b9-73f0-efa4-c1b7-ad81789b42c5@huawei.com> <5188e3b9-5b5a-a6a7-7ef0-09b7b4f06af6@arm.com> <348d0b3b-c74b-7b39-ec30-85905c077c38@huawei.com> <20190314105537.GA15323@en101> <368bd218-ac1d-19b2-6e92-960b91afee8b@huawei.com> From: Zheng Xiang Message-ID: Date: Fri, 15 Mar 2019 16:21:03 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:64.0) Gecko/20100101 Thunderbird/64.0 MIME-Version: 1.0 In-Reply-To: <368bd218-ac1d-19b2-6e92-960b91afee8b@huawei.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.177.29.32] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Suzuki, I have tested this patch, VM doesn't hang and we get expected WARNING log: [ 526.184452] pstate: 20400009 (nzCv daif +PAN -UAO) [ 526.184454] pc : user_mem_abort+0x484/0x9e0 [ 526.184455] lr : user_mem_abort+0x478/0x9e0 [ 526.184456] sp : ffff000084a038e0 [ 526.184457] x29: ffff000084a038e0 x28: 000000012f600000 [ 526.184458] x27: ffff8a2fa27ae918 x26: 0000000000200000 [ 526.184460] x25: 0000000000000000 x24: 0000000000000000 [ 526.184461] x23: 00400a269d0007fd x22: ffff0000849cd000 [ 526.184462] x21: ffff00001181d000 x20: 00000a26eef72003 [ 526.184463] x19: ffff8a2fb41d4bd8 x18: 00004fffb8b22000 [ 526.184465] x17: 0000000000000000 x16: 0000000000000000 [ 526.184466] x15: 0000000000000001 x14: ffff000008dd12a8 [ 526.184467] x13: 0000000000000041 x12: ffff8a26eeca6e30 [ 526.184468] x11: ffff8000fe4af800 x10: 0000000000000040 [ 526.184469] x9 : ffff0000097c46c0 x8 : ffff8000ff400248 [ 526.184471] x7 : 0000001000000000 x6 : 00000000000021f8 [ 526.184472] x5 : 00000000a269d000 x4 : 0000000000000018 [ 526.184473] x3 : 000000000000000a x2 : 0000000000000004 [ 526.184474] x1 : 0000000000000000 x0 : 0000000000000000 [ 526.184476] Call trace: [ 526.184477] user_mem_abort+0x484/0x9e0 [ 526.184479] kvm_handle_guest_abort+0x11c/0x478 [ 526.184480] handle_exit+0x14c/0x1c8 [ 526.184482] kvm_arch_vcpu_ioctl_run+0x280/0x898 [ 526.184483] kvm_vcpu_ioctl+0x488/0x8a8 [ 526.184485] do_vfs_ioctl+0xc4/0x8c0 [ 526.184486] ksys_ioctl+0x8c/0xa0 [ 526.184487] __arm64_sys_ioctl+0x28/0x38 [ 526.184489] el0_svc_common+0xa0/0x180 [ 526.184491] el0_svc_handler+0x38/0x78 [ 526.184492] el0_svc+0x8/0xc However, we also get the following unexpected log: [ 908.329900] BUG: Bad page state in process qemu-kvm pfn:a2fb41cf [ 908.339415] page:ffff7e28bed073c0 count:-4 mapcount:0 mapping:0000000000000000 index:0x0 [ 908.339416] flags: 0x4ffffe0000000000() [ 908.339418] raw: 4ffffe0000000000 dead000000000100 dead000000000200 0000000000000000 [ 908.339419] raw: 0000000000000000 0000000000000000 fffffffcffffffff 0000000000000000 [ 908.339420] page dumped because: nonzero _refcount [ 908.339437] CPU: 32 PID: 72599 Comm: qemu-kvm Kdump: loaded Tainted: G B W 5.0.0+ #1 [ 908.339438] Call trace: [ 908.339439] dump_backtrace+0x0/0x188 [ 908.339441] show_stack+0x24/0x30 [ 908.339442] dump_stack+0xa8/0xcc [ 908.339443] bad_page+0xf0/0x150 [ 908.339445] free_pages_check_bad+0x84/0xa0 [ 908.339446] free_pcppages_bulk+0x4b8/0x750 [ 908.339448] free_unref_page_commit+0x13c/0x198 [ 908.339449] free_unref_page+0x84/0xa0 [ 908.339451] __free_pages+0x58/0x68 [ 908.339452] zap_huge_pmd+0x290/0x2d8 [ 908.339454] unmap_page_range+0x2b4/0x470 [ 908.339455] unmap_single_vma+0x94/0xe8 [ 908.339457] unmap_vmas+0x8c/0x108 [ 908.339458] exit_mmap+0xd4/0x178 [ 908.339459] mmput+0x74/0x180 [ 908.339460] do_exit+0x2b4/0x5b0 [ 908.339462] do_group_exit+0x3c/0xe0 [ 908.339463] __arm64_sys_exit_group+0x24/0x28 [ 908.339465] el0_svc_common+0xa0/0x180 [ 908.339466] el0_svc_handler+0x38/0x78 [ 908.339467] el0_svc+0x8/0xc >> Marc and I had a discussion about this and it looks like we may have an >> issue here. So with the cancellation of logging, we do not trigger the >> mmu_notifiers (as the userspace memory mapping hasn't changed) and thus >> have memory leaks while trying to install a huge mapping. Would it be >> possible for you to try the patch below ? It will trigger a WARNING >> to confirm our theory, but should not cause the hang. As we unmap >> the PMD/PUD range of PTE mappings before reinstalling a block map. > > Thanks for the reply. And I think this is alomst what Zheng Xiang wanted to say! We will test this patch tomorrow and give you some feedback. > > BTW, we have noticed that X86 had also suffered from the similar issue. You may want to look into commit 3ea3b7fa9af0 ("kvm: mmu: lazy collapse small sptes into large sptes" 2015) :-) > > > thanks, > > zenghui > >> >> >> ---8>--- >> >> test: kvm: arm: Fix handling of stage2 huge mappings >> >> We rely on the mmu_notifier call backs to handle the split/merging >> of huge pages and thus we are guaranteed that while creating a >> block mapping, the entire block is unmapped at stage2. However, >> we miss a case where the block mapping is split for dirty logging >> case and then could later be made block mapping, if we cancel the >> dirty logging. This not only creates inconsistent TLB entries for >> the pages in the the block, but also leakes the table pages for >> PMD level. >> >> Handle these corner cases for the huge mappings at stage2. >> >> Signed-off-by: Suzuki K Poulose >> --- >>   virt/kvm/arm/mmu.c | 51 +++++++++++++++++++++++++++++++++++---------------- >>   1 file changed, 35 insertions(+), 16 deletions(-) >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 66e0fbb5..04b0f9b 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1076,24 +1076,38 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache >>            * Skip updating the page table if the entry is >>            * unchanged. >>            */ >> -        if (pmd_val(old_pmd) == pmd_val(*new_pmd)) >> +        if (pmd_val(old_pmd) == pmd_val(*new_pmd)) { >>               return 0; >> - >> +        } else if (WARN_ON_ONCE(!pmd_thp_or_huge(old_pmd))) { >>           /* >> -         * Mapping in huge pages should only happen through a >> -         * fault.  If a page is merged into a transparent huge >> -         * page, the individual subpages of that huge page >> -         * should be unmapped through MMU notifiers before we >> -         * get here. >> -         * >> -         * Merging of CompoundPages is not supported; they >> -         * should become splitting first, unmapped, merged, >> -         * and mapped back in on-demand. >> +         * If we have PTE level mapping for this block, >> +         * we must unmap it to avoid inconsistent TLB >> +         * state. We could end up in this situation if >> +         * the memory slot was marked for dirty logging >> +         * and was reverted, leaving PTE level mappings >> +         * for the pages accessed during the period. >> +         * Normal THP split/merge follows mmu_notifier >> +         * callbacks and do get handled accordingly. >>            */ >> -        VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); >> +            unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE); It seems that kvm decreases the _refcount of the page twice in transparent_hugepage_adjust() and unmap_stage2_range(). >> +        } else { >>   -        pmd_clear(pmd); >> -        kvm_tlb_flush_vmid_ipa(kvm, addr); >> +            /* >> +             * Mapping in huge pages should only happen through a >> +             * fault.  If a page is merged into a transparent huge >> +             * page, the individual subpages of that huge page >> +             * should be unmapped through MMU notifiers before we >> +             * get here. >> +             * >> +             * Merging of CompoundPages is not supported; they >> +             * should become splitting first, unmapped, merged, >> +             * and mapped back in on-demand. >> +             */ >> +            WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); >> + >> +            pmd_clear(pmd); >> +            kvm_tlb_flush_vmid_ipa(kvm, addr); >> +        } >>       } else { >>           get_page(virt_to_page(pmd)); >>       } >> @@ -1122,8 +1136,13 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac >>           return 0; >>         if (stage2_pud_present(kvm, old_pud)) { >> -        stage2_pud_clear(kvm, pudp); >> -        kvm_tlb_flush_vmid_ipa(kvm, addr); >> +        /* If we have PTE level mapping, unmap the entire range */ >> +        if (WARN_ON_ONCE(!stage2_pud_huge(kvm, old_pud))) { >> +            unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); >> +        } else { >> +            stage2_pud_clear(kvm, pudp); >> +            kvm_tlb_flush_vmid_ipa(kvm, addr); >> +        } >>       } else { >>           get_page(virt_to_page(pudp)); >>       } >> > > > . -- Thanks, Xiang