From: David Matlack <dmatlack@google.com> To: Peter Xu <peterx@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com>, Marc Zyngier <maz@kernel.org>, Huacai Chen <chenhuacai@kernel.org>, Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>, Anup Patel <anup@brainfault.org>, Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, Albert Ou <aou@eecs.berkeley.edu>, Sean Christopherson <seanjc@google.com>, Andrew Jones <drjones@redhat.com>, Ben Gardon <bgardon@google.com>, "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>, "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" <kvmarm@lists.cs.columbia.edu>, "open list:KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips)" <linux-mips@vger.kernel.org>, "open list:KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips)" <kvm@vger.kernel.org>, "open list:KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv)" <kvm-riscv@lists.infradead.org>, Peter Feiner <pfeiner@google.com> Subject: Re: [PATCH v2 16/26] KVM: x86/mmu: Cache the access bits of shadowed translations Date: Thu, 31 Mar 2022 14:40:35 -0700 [thread overview] Message-ID: <CALzav=cxm=A31PJOMes3eWpCV8s0zQGgaGZhYiQFJyxY2dNDXg@mail.gmail.com> (raw) In-Reply-To: <YkShwFaRqlQpyL87@xz-m1.local> On Wed, Mar 30, 2022 at 11:30 AM Peter Xu <peterx@redhat.com> wrote: > > On Tue, Mar 22, 2022 at 03:51:54PM -0700, David Matlack wrote: > > On Wed, Mar 16, 2022 at 1:32 AM Peter Xu <peterx@redhat.com> wrote: > > > > > > On Fri, Mar 11, 2022 at 12:25:18AM +0000, David Matlack wrote: > > > > In order to split a huge page we need to know what access bits to assign > > > > to the role of the new child page table. This can't be easily derived > > > > from the huge page SPTE itself since KVM applies its own access policies > > > > on top, such as for HugePage NX. > > > > > > > > We could walk the guest page tables to determine the correct access > > > > bits, but that is difficult to plumb outside of a vCPU fault context. > > > > Instead, we can store the original access bits for each leaf SPTE > > > > alongside the GFN in the gfns array. The access bits only take up 3 > > > > bits, which leaves 61 bits left over for gfns, which is more than > > > > enough. So this change does not require any additional memory. > > > > > > I have a pure question on why eager page split needs to worry on hugepage > > > nx.. > > > > > > IIUC that was about forbidden huge page being mapped as executable. So > > > afaiu the only missing bit that could happen if we copy over the huge page > > > ptes is the executable bit. > > > > > > But then? I think we could get a page fault on fault->exec==true on the > > > split small page (because when we copy over it's cleared, even though the > > > page can actually be executable), but it should be well resolved right > > > after that small page fault. > > > > > > The thing is IIUC this is a very rare case, IOW, it should mostly not > > > happen in 99% of the use case? And there's a slight penalty when it > > > happens, but only perf-wise. > > > > > > As I'm not really fluent with the code base, perhaps I missed something? > > > > You're right that we could get away with not knowing the shadowed > > access permissions to assign to the child SPTEs when splitting a huge > > SPTE. We could just copy the huge SPTE access permissions and then let > > the execute bit be repaired on fault (although those faults would be a > > performance cost). > > > > But the access permissions are also needed to lookup an existing > > shadow page (or create a new shadow page) to use to split the huge > > page. For example, let's say we are going to split a huge page that > > does not have execute permissions enabled. That could be because NX > > HugePages are enabled or because we are shadowing a guest translation > > that does not allow execution (or both). We wouldn't want to propagate > > the no-execute permission into the child SP role.access if the > > shadowed translation really does allow execution (and vice versa). > > Then the follow up (pure) question is what will happen if we simply > propagate the no-exec permission into the child SP? > > I think that only happens with direct sptes where guest used huge pages > because that's where the shadow page will be huge, so IIUC that's checked > here when the small page fault triggers: > > static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep, > unsigned direct_access) > { > if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) { > struct kvm_mmu_page *child; > > /* > * For the direct sp, if the guest pte's dirty bit > * changed form clean to dirty, it will corrupt the > * sp's access: allow writable in the read-only sp, > * so we should update the spte at this point to get > * a new sp with the correct access. > */ > child = to_shadow_page(*sptep & PT64_BASE_ADDR_MASK); > if (child->role.access == direct_access) > return; > > drop_parent_pte(child, sptep); > kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1); > } > } > > Due to missing EXEC the role.access check will not match with direct > access, which is the guest pgtable value which has EXEC set. Then IIUC > we'll simply drop the no-exec SP and replace it with a new one with exec > perm. The question is, would that untimately work too? > > Even if that works, I agree this sounds tricky because we're potentially > caching fake sp.role conditionally and it seems we never do that before. > It's just that the other option that you proposed here seems to add other > way of complexity on caching spte permission information while kvm doesn't > do either before. IMHO we need to see which is the best trade off. Clever! I think you're right that it would work correctly. This approach avoids the need for caching access bits, but comes with downsides: - Performance impact from the extra faults needed to drop the SP and repair the execute permission bit. - Some amount of memory overhead from KVM allocating new SPs when it could re-use existing SPs. Given the relative simplicity of access caching (and the fact that it requires no additional memory), I'm inclined to stick with it rather than taking the access permissions from the huge page. > > I could have missed something else, though. > > Thanks, > > -- > Peter Xu >
WARNING: multiple messages have this Message-ID (diff)
From: David Matlack <dmatlack@google.com> To: Peter Xu <peterx@redhat.com> Cc: Marc Zyngier <maz@kernel.org>, Albert Ou <aou@eecs.berkeley.edu>, "open list:KERNEL VIRTUAL MACHINE FOR MIPS \(KVM/mips\)" <kvm@vger.kernel.org>, Huacai Chen <chenhuacai@kernel.org>, "open list:KERNEL VIRTUAL MACHINE FOR MIPS \(KVM/mips\)" <linux-mips@vger.kernel.org>, Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>, Palmer Dabbelt <palmer@dabbelt.com>, "open list:KERNEL VIRTUAL MACHINE FOR RISC-V \(KVM/riscv\)" <kvm-riscv@lists.infradead.org>, Paul Walmsley <paul.walmsley@sifive.com>, Ben Gardon <bgardon@google.com>, Paolo Bonzini <pbonzini@redhat.com>, "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>, "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 \(KVM/arm64\)" <kvmarm@lists.cs.columbia.edu>, Peter Feiner <pfeiner@google.com> Subject: Re: [PATCH v2 16/26] KVM: x86/mmu: Cache the access bits of shadowed translations Date: Thu, 31 Mar 2022 14:40:35 -0700 [thread overview] Message-ID: <CALzav=cxm=A31PJOMes3eWpCV8s0zQGgaGZhYiQFJyxY2dNDXg@mail.gmail.com> (raw) In-Reply-To: <YkShwFaRqlQpyL87@xz-m1.local> On Wed, Mar 30, 2022 at 11:30 AM Peter Xu <peterx@redhat.com> wrote: > > On Tue, Mar 22, 2022 at 03:51:54PM -0700, David Matlack wrote: > > On Wed, Mar 16, 2022 at 1:32 AM Peter Xu <peterx@redhat.com> wrote: > > > > > > On Fri, Mar 11, 2022 at 12:25:18AM +0000, David Matlack wrote: > > > > In order to split a huge page we need to know what access bits to assign > > > > to the role of the new child page table. This can't be easily derived > > > > from the huge page SPTE itself since KVM applies its own access policies > > > > on top, such as for HugePage NX. > > > > > > > > We could walk the guest page tables to determine the correct access > > > > bits, but that is difficult to plumb outside of a vCPU fault context. > > > > Instead, we can store the original access bits for each leaf SPTE > > > > alongside the GFN in the gfns array. The access bits only take up 3 > > > > bits, which leaves 61 bits left over for gfns, which is more than > > > > enough. So this change does not require any additional memory. > > > > > > I have a pure question on why eager page split needs to worry on hugepage > > > nx.. > > > > > > IIUC that was about forbidden huge page being mapped as executable. So > > > afaiu the only missing bit that could happen if we copy over the huge page > > > ptes is the executable bit. > > > > > > But then? I think we could get a page fault on fault->exec==true on the > > > split small page (because when we copy over it's cleared, even though the > > > page can actually be executable), but it should be well resolved right > > > after that small page fault. > > > > > > The thing is IIUC this is a very rare case, IOW, it should mostly not > > > happen in 99% of the use case? And there's a slight penalty when it > > > happens, but only perf-wise. > > > > > > As I'm not really fluent with the code base, perhaps I missed something? > > > > You're right that we could get away with not knowing the shadowed > > access permissions to assign to the child SPTEs when splitting a huge > > SPTE. We could just copy the huge SPTE access permissions and then let > > the execute bit be repaired on fault (although those faults would be a > > performance cost). > > > > But the access permissions are also needed to lookup an existing > > shadow page (or create a new shadow page) to use to split the huge > > page. For example, let's say we are going to split a huge page that > > does not have execute permissions enabled. That could be because NX > > HugePages are enabled or because we are shadowing a guest translation > > that does not allow execution (or both). We wouldn't want to propagate > > the no-execute permission into the child SP role.access if the > > shadowed translation really does allow execution (and vice versa). > > Then the follow up (pure) question is what will happen if we simply > propagate the no-exec permission into the child SP? > > I think that only happens with direct sptes where guest used huge pages > because that's where the shadow page will be huge, so IIUC that's checked > here when the small page fault triggers: > > static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep, > unsigned direct_access) > { > if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) { > struct kvm_mmu_page *child; > > /* > * For the direct sp, if the guest pte's dirty bit > * changed form clean to dirty, it will corrupt the > * sp's access: allow writable in the read-only sp, > * so we should update the spte at this point to get > * a new sp with the correct access. > */ > child = to_shadow_page(*sptep & PT64_BASE_ADDR_MASK); > if (child->role.access == direct_access) > return; > > drop_parent_pte(child, sptep); > kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1); > } > } > > Due to missing EXEC the role.access check will not match with direct > access, which is the guest pgtable value which has EXEC set. Then IIUC > we'll simply drop the no-exec SP and replace it with a new one with exec > perm. The question is, would that untimately work too? > > Even if that works, I agree this sounds tricky because we're potentially > caching fake sp.role conditionally and it seems we never do that before. > It's just that the other option that you proposed here seems to add other > way of complexity on caching spte permission information while kvm doesn't > do either before. IMHO we need to see which is the best trade off. Clever! I think you're right that it would work correctly. This approach avoids the need for caching access bits, but comes with downsides: - Performance impact from the extra faults needed to drop the SP and repair the execute permission bit. - Some amount of memory overhead from KVM allocating new SPs when it could re-use existing SPs. Given the relative simplicity of access caching (and the fact that it requires no additional memory), I'm inclined to stick with it rather than taking the access permissions from the huge page. > > I could have missed something else, though. > > Thanks, > > -- > Peter Xu > _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
next prev parent reply other threads:[~2022-03-31 21:41 UTC|newest] Thread overview: 134+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-03-11 0:25 [PATCH v2 00/26] Extend Eager Page Splitting to the shadow MMU David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 01/26] KVM: x86/mmu: Optimize MMU page cache lookup for all direct SPs David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-15 7:40 ` Peter Xu 2022-03-15 7:40 ` Peter Xu 2022-03-22 18:16 ` David Matlack 2022-03-22 18:16 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 02/26] KVM: x86/mmu: Use a bool for direct David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-15 7:46 ` Peter Xu 2022-03-15 7:46 ` Peter Xu 2022-03-22 18:21 ` David Matlack 2022-03-22 18:21 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 03/26] KVM: x86/mmu: Derive shadow MMU page role from parent David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-15 8:15 ` Peter Xu 2022-03-15 8:15 ` Peter Xu 2022-03-22 18:30 ` David Matlack 2022-03-22 18:30 ` David Matlack 2022-03-30 14:25 ` Peter Xu 2022-03-30 14:25 ` Peter Xu 2022-03-11 0:25 ` [PATCH v2 04/26] KVM: x86/mmu: Decompose kvm_mmu_get_page() into separate functions David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-15 8:50 ` Peter Xu 2022-03-15 8:50 ` Peter Xu 2022-03-22 22:09 ` David Matlack 2022-03-22 22:09 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 05/26] KVM: x86/mmu: Rename shadow MMU functions that deal with shadow pages David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-15 8:52 ` Peter Xu 2022-03-15 8:52 ` Peter Xu 2022-03-22 21:35 ` David Matlack 2022-03-22 21:35 ` David Matlack 2022-03-30 14:28 ` Peter Xu 2022-03-30 14:28 ` Peter Xu 2022-03-11 0:25 ` [PATCH v2 06/26] KVM: x86/mmu: Pass memslot to kvm_mmu_new_shadow_page() David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-15 9:03 ` Peter Xu 2022-03-15 9:03 ` Peter Xu 2022-03-22 22:05 ` David Matlack 2022-03-22 22:05 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 07/26] KVM: x86/mmu: Separate shadow MMU sp allocation from initialization David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-15 9:54 ` Peter Xu 2022-03-15 9:54 ` Peter Xu 2022-03-11 0:25 ` [PATCH v2 08/26] KVM: x86/mmu: Link spt to sp during allocation David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-15 10:04 ` Peter Xu 2022-03-15 10:04 ` Peter Xu 2022-03-22 22:30 ` David Matlack 2022-03-22 22:30 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 09/26] KVM: x86/mmu: Move huge page split sp allocation code to mmu.c David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-15 10:17 ` Peter Xu 2022-03-15 10:17 ` Peter Xu 2022-03-11 0:25 ` [PATCH v2 10/26] KVM: x86/mmu: Use common code to free kvm_mmu_page structs David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-15 10:22 ` Peter Xu 2022-03-15 10:22 ` Peter Xu 2022-03-22 22:33 ` David Matlack 2022-03-22 22:33 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 11/26] KVM: x86/mmu: Use common code to allocate kvm_mmu_page structs from vCPU caches David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-15 10:27 ` Peter Xu 2022-03-15 10:27 ` Peter Xu 2022-03-22 22:35 ` David Matlack 2022-03-22 22:35 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 12/26] KVM: x86/mmu: Pass const memslot to rmap_add() David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 13/26] KVM: x86/mmu: Pass const memslot to init_shadow_page() and descendants David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 14/26] KVM: x86/mmu: Decouple rmap_add() and link_shadow_page() from kvm_vcpu David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-15 10:37 ` Peter Xu 2022-03-15 10:37 ` Peter Xu 2022-03-11 0:25 ` [PATCH v2 15/26] KVM: x86/mmu: Update page stats in __rmap_add() David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-15 10:39 ` Peter Xu 2022-03-15 10:39 ` Peter Xu 2022-03-11 0:25 ` [PATCH v2 16/26] KVM: x86/mmu: Cache the access bits of shadowed translations David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-16 8:32 ` Peter Xu 2022-03-16 8:32 ` Peter Xu 2022-03-22 22:51 ` David Matlack 2022-03-22 22:51 ` David Matlack 2022-03-30 18:30 ` Peter Xu 2022-03-30 18:30 ` Peter Xu 2022-03-31 21:40 ` David Matlack [this message] 2022-03-31 21:40 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 17/26] KVM: x86/mmu: Pass access information to make_huge_page_split_spte() David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-16 8:44 ` Peter Xu 2022-03-16 8:44 ` Peter Xu 2022-03-22 23:08 ` David Matlack 2022-03-22 23:08 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 18/26] KVM: x86/mmu: Zap collapsible SPTEs at all levels in the shadow MMU David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-16 8:49 ` Peter Xu 2022-03-16 8:49 ` Peter Xu 2022-03-22 23:11 ` David Matlack 2022-03-22 23:11 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 19/26] KVM: x86/mmu: Refactor drop_large_spte() David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-16 8:53 ` Peter Xu 2022-03-16 8:53 ` Peter Xu 2022-03-11 0:25 ` [PATCH v2 20/26] KVM: x86/mmu: Extend Eager Page Splitting to the shadow MMU David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-16 10:26 ` Peter Xu 2022-03-16 10:26 ` Peter Xu 2022-03-22 0:07 ` David Matlack 2022-03-22 0:07 ` David Matlack 2022-03-22 23:58 ` David Matlack 2022-03-22 23:58 ` David Matlack 2022-03-30 18:34 ` Peter Xu 2022-03-30 18:34 ` Peter Xu 2022-03-31 19:57 ` David Matlack 2022-03-31 19:57 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 21/26] KVM: Allow for different capacities in kvm_mmu_memory_cache structs David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-19 5:27 ` Anup Patel 2022-03-19 5:27 ` Anup Patel 2022-03-22 23:13 ` David Matlack 2022-03-22 23:13 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 22/26] KVM: Allow GFP flags to be passed when topping up MMU caches David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 23/26] KVM: x86/mmu: Fully split huge pages that require extra pte_list_desc structs David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 24/26] KVM: x86/mmu: Split huge pages aliased by multiple SPTEs David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 25/26] KVM: x86/mmu: Drop NULL pte_list_desc_cache fallback David Matlack 2022-03-11 0:25 ` David Matlack 2022-03-11 0:25 ` [PATCH v2 26/26] KVM: selftests: Map x86_64 guest virtual memory with huge pages David Matlack 2022-03-11 0:25 ` David Matlack
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CALzav=cxm=A31PJOMes3eWpCV8s0zQGgaGZhYiQFJyxY2dNDXg@mail.gmail.com' \ --to=dmatlack@google.com \ --cc=aleksandar.qemu.devel@gmail.com \ --cc=anup@brainfault.org \ --cc=aou@eecs.berkeley.edu \ --cc=bgardon@google.com \ --cc=chenhuacai@kernel.org \ --cc=drjones@redhat.com \ --cc=kvm-riscv@lists.infradead.org \ --cc=kvm@vger.kernel.org \ --cc=kvmarm@lists.cs.columbia.edu \ --cc=linux-mips@vger.kernel.org \ --cc=maciej.szmigiero@oracle.com \ --cc=maz@kernel.org \ --cc=palmer@dabbelt.com \ --cc=paul.walmsley@sifive.com \ --cc=pbonzini@redhat.com \ --cc=peterx@redhat.com \ --cc=pfeiner@google.com \ --cc=seanjc@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.