From: Marc Zyngier <marc.zyngier@arm.com> To: Zheng Xiang <zhengxiang9@huawei.com>, christoffer.dall@arm.com, catalin.marinas@arm.com, will.deacon@arm.com, suzuki.poulose@arm.com, james.morse@arm.com Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, Wang Haibin <wanghaibin.wang@huawei.com>, "yuzenghui@huawei.com" <yuzenghui@huawei.com>, lious.lilei@hisilicon.com, lishuo1@hisilicon.com Subject: Re: [RFC] Question about TLB flush while set Stage-2 huge pages Date: Tue, 12 Mar 2019 11:32:53 +0000 [thread overview] Message-ID: <e2a94937-c324-e2d6-7e61-3f998e6e6e22@arm.com> (raw) In-Reply-To: <5f712cc6-0874-adbe-add6-46f5de24f36f@huawei.com> Hi Zheng, On 11/03/2019 16:31, Zheng Xiang wrote: > Hi all, > > While a page is merged into a transparent huge page, KVM will invalidate Stage-2 for > the base address of the huge page and the whole of Stage-1. > However, this just only invalidates the first page within the huge page and the other > pages are not invalidated, see bellow: > > +---------------+--------------+ > |abcde 2MB-Page | > +---------------+--------------+ > > TLB before setting new pmd: > +---------------+--------------+ > | VA | PAGESIZE | > +---------------+--------------+ > | a | 4KB | > +---------------+--------------+ > | b | 4KB | > +---------------+--------------+ > | c | 4KB | > +---------------+--------------+ > | d | 4KB | > +---------------+--------------+ > > TLB after setting new pmd: > +---------------+--------------+ > | VA | PAGESIZE | > +---------------+--------------+ > | a | 2MB | > +---------------+--------------+ > | b | 4KB | > +---------------+--------------+ > | c | 4KB | > +---------------+--------------+ > | d | 4KB | > +---------------+--------------+ > > When VM access *b* address, it will hit the TLB and result in TLB conflict aborts or other potential exceptions. That's really bad. I can only imagine two scenarios: 1) We fail to unmap a,b,c,d (and potentially another 508 PTEs), loosing the PTE table in the process, and place the PMD instead. I can't see this happening. 2) We fail to invalidate on unmap, and that slightly less bad (but still quite bad). Which of the two cases are you seeing? > For example, we need to keep tracking of the VM memory dirty pages when VM is in live migration. > KVM will set the memslot READONLY and split the huge pages. > After live migration is canceled and abort, the pages will be merged into THP. > The later access to these pages which are READONLY will cause level-3 Permission Fault until they are invalidated. > > So should we invalidate the tlb entries for all relative pages(e.g a,b,c,d), like __flush_tlb_range()? > Or we can call __kvm_tlb_flush_vmid() to invalidate all tlb entries. We should perform an invalidate on each unmap. unmap_stage2_range seems to do the right thing. __flush_tlb_range only caters for Stage1 mappings, and __kvm_tlb_flush_vmid() is too big a hammer, as it nukes TLBs for the whole VM. I'd really like to understand what you're seeing, and how to reproduce it. Do you have a minimal example I could run on my own HW? Thanks, M. -- Jazz is not dead. It just smells funny...
WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <marc.zyngier@arm.com> To: Zheng Xiang <zhengxiang9@huawei.com>, christoffer.dall@arm.com, catalin.marinas@arm.com, will.deacon@arm.com, suzuki.poulose@arm.com, james.morse@arm.com Cc: lishuo1@hisilicon.com, linux-kernel@vger.kernel.org, "yuzenghui@huawei.com" <yuzenghui@huawei.com>, Wang Haibin <wanghaibin.wang@huawei.com>, kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, lious.lilei@hisilicon.com Subject: Re: [RFC] Question about TLB flush while set Stage-2 huge pages Date: Tue, 12 Mar 2019 11:32:53 +0000 [thread overview] Message-ID: <e2a94937-c324-e2d6-7e61-3f998e6e6e22@arm.com> (raw) In-Reply-To: <5f712cc6-0874-adbe-add6-46f5de24f36f@huawei.com> Hi Zheng, On 11/03/2019 16:31, Zheng Xiang wrote: > Hi all, > > While a page is merged into a transparent huge page, KVM will invalidate Stage-2 for > the base address of the huge page and the whole of Stage-1. > However, this just only invalidates the first page within the huge page and the other > pages are not invalidated, see bellow: > > +---------------+--------------+ > |abcde 2MB-Page | > +---------------+--------------+ > > TLB before setting new pmd: > +---------------+--------------+ > | VA | PAGESIZE | > +---------------+--------------+ > | a | 4KB | > +---------------+--------------+ > | b | 4KB | > +---------------+--------------+ > | c | 4KB | > +---------------+--------------+ > | d | 4KB | > +---------------+--------------+ > > TLB after setting new pmd: > +---------------+--------------+ > | VA | PAGESIZE | > +---------------+--------------+ > | a | 2MB | > +---------------+--------------+ > | b | 4KB | > +---------------+--------------+ > | c | 4KB | > +---------------+--------------+ > | d | 4KB | > +---------------+--------------+ > > When VM access *b* address, it will hit the TLB and result in TLB conflict aborts or other potential exceptions. That's really bad. I can only imagine two scenarios: 1) We fail to unmap a,b,c,d (and potentially another 508 PTEs), loosing the PTE table in the process, and place the PMD instead. I can't see this happening. 2) We fail to invalidate on unmap, and that slightly less bad (but still quite bad). Which of the two cases are you seeing? > For example, we need to keep tracking of the VM memory dirty pages when VM is in live migration. > KVM will set the memslot READONLY and split the huge pages. > After live migration is canceled and abort, the pages will be merged into THP. > The later access to these pages which are READONLY will cause level-3 Permission Fault until they are invalidated. > > So should we invalidate the tlb entries for all relative pages(e.g a,b,c,d), like __flush_tlb_range()? > Or we can call __kvm_tlb_flush_vmid() to invalidate all tlb entries. We should perform an invalidate on each unmap. unmap_stage2_range seems to do the right thing. __flush_tlb_range only caters for Stage1 mappings, and __kvm_tlb_flush_vmid() is too big a hammer, as it nukes TLBs for the whole VM. I'd really like to understand what you're seeing, and how to reproduce it. Do you have a minimal example I could run on my own HW? Thanks, M. -- Jazz is not dead. It just smells funny... _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-03-12 11:33 UTC|newest] Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-03-11 16:31 [RFC] Question about TLB flush while set Stage-2 huge pages Zheng Xiang 2019-03-11 16:31 ` Zheng Xiang 2019-03-11 16:31 ` Zheng Xiang 2019-03-12 11:32 ` Marc Zyngier [this message] 2019-03-12 11:32 ` Marc Zyngier 2019-03-12 15:30 ` Zheng Xiang 2019-03-12 15:30 ` Zheng Xiang 2019-03-12 15:30 ` Zheng Xiang 2019-03-12 18:18 ` Marc Zyngier 2019-03-12 18:18 ` Marc Zyngier 2019-03-13 9:45 ` Zheng Xiang 2019-03-13 9:45 ` Zheng Xiang 2019-03-13 9:45 ` Zheng Xiang 2019-03-14 10:55 ` Suzuki K Poulose 2019-03-14 10:55 ` Suzuki K Poulose 2019-03-14 15:50 ` Zenghui Yu 2019-03-14 15:50 ` Zenghui Yu 2019-03-14 15:50 ` Zenghui Yu 2019-03-15 8:21 ` Zheng Xiang 2019-03-15 8:21 ` Zheng Xiang 2019-03-15 8:21 ` Zheng Xiang 2019-03-15 14:56 ` Suzuki K Poulose 2019-03-15 14:56 ` Suzuki K Poulose 2019-03-17 13:34 ` Zenghui Yu 2019-03-17 13:34 ` Zenghui Yu 2019-03-17 13:34 ` Zenghui Yu 2019-03-18 17:34 ` Suzuki K Poulose 2019-03-18 17:34 ` Suzuki K Poulose 2019-03-19 9:05 ` Zenghui Yu 2019-03-19 9:05 ` Zenghui Yu 2019-03-19 9:05 ` Zenghui Yu 2019-03-19 14:11 ` [PATCH] kvm: arm: Fix handling of stage2 huge mappings Suzuki K Poulose 2019-03-19 14:11 ` Suzuki K Poulose 2019-03-19 16:02 ` Zenghui Yu 2019-03-19 16:02 ` Zenghui Yu 2019-03-19 16:02 ` Zenghui Yu 2019-03-20 8:15 ` Marc Zyngier 2019-03-20 8:15 ` Marc Zyngier 2019-03-20 8:15 ` Marc Zyngier 2019-03-20 9:44 ` Suzuki K Poulose 2019-03-20 9:44 ` Suzuki K Poulose 2019-03-20 9:44 ` Suzuki K Poulose 2019-03-20 10:11 ` Marc Zyngier 2019-03-20 10:11 ` Marc Zyngier 2019-03-20 10:11 ` Marc Zyngier 2019-03-20 10:23 ` Suzuki K Poulose 2019-03-20 10:23 ` Suzuki K Poulose 2019-03-20 10:35 ` Marc Zyngier 2019-03-20 10:35 ` Marc Zyngier 2019-03-20 10:35 ` Marc Zyngier 2019-03-20 11:12 ` Suzuki K Poulose 2019-03-20 11:12 ` Suzuki K Poulose 2019-03-20 17:24 ` Marc Zyngier 2019-03-20 17:24 ` Marc Zyngier 2019-03-20 17:24 ` Marc Zyngier 2019-03-17 13:55 ` [RFC] Question about TLB flush while set Stage-2 huge pages Zenghui Yu 2019-03-17 13:55 ` Zenghui Yu 2019-03-17 13:55 ` Zenghui Yu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=e2a94937-c324-e2d6-7e61-3f998e6e6e22@arm.com \ --to=marc.zyngier@arm.com \ --cc=catalin.marinas@arm.com \ --cc=christoffer.dall@arm.com \ --cc=james.morse@arm.com \ --cc=kvmarm@lists.cs.columbia.edu \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=lious.lilei@hisilicon.com \ --cc=lishuo1@hisilicon.com \ --cc=suzuki.poulose@arm.com \ --cc=wanghaibin.wang@huawei.com \ --cc=will.deacon@arm.com \ --cc=yuzenghui@huawei.com \ --cc=zhengxiang9@huawei.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.