All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Matlack <dmatlack@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Marc Zyngier <maz@kernel.org>,
	Huacai Chen <chenhuacai@kernel.org>,
	Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
	Anup Patel <anup@brainfault.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Sean Christopherson <seanjc@google.com>,
	Andrew Jones <drjones@redhat.com>,
	Ben Gardon <bgardon@google.com>,
	"Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>,
	"moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" 
	<kvmarm@lists.cs.columbia.edu>,
	"open list:KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips)" 
	<linux-mips@vger.kernel.org>,
	"open list:KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips)" 
	<kvm@vger.kernel.org>,
	"open list:KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv)" 
	<kvm-riscv@lists.infradead.org>,
	Peter Feiner <pfeiner@google.com>
Subject: Re: [PATCH v2 16/26] KVM: x86/mmu: Cache the access bits of shadowed translations
Date: Thu, 31 Mar 2022 14:40:35 -0700	[thread overview]
Message-ID: <CALzav=cxm=A31PJOMes3eWpCV8s0zQGgaGZhYiQFJyxY2dNDXg@mail.gmail.com> (raw)
In-Reply-To: <YkShwFaRqlQpyL87@xz-m1.local>

On Wed, Mar 30, 2022 at 11:30 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Mar 22, 2022 at 03:51:54PM -0700, David Matlack wrote:
> > On Wed, Mar 16, 2022 at 1:32 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Fri, Mar 11, 2022 at 12:25:18AM +0000, David Matlack wrote:
> > > > In order to split a huge page we need to know what access bits to assign
> > > > to the role of the new child page table. This can't be easily derived
> > > > from the huge page SPTE itself since KVM applies its own access policies
> > > > on top, such as for HugePage NX.
> > > >
> > > > We could walk the guest page tables to determine the correct access
> > > > bits, but that is difficult to plumb outside of a vCPU fault context.
> > > > Instead, we can store the original access bits for each leaf SPTE
> > > > alongside the GFN in the gfns array. The access bits only take up 3
> > > > bits, which leaves 61 bits left over for gfns, which is more than
> > > > enough. So this change does not require any additional memory.
> > >
> > > I have a pure question on why eager page split needs to worry on hugepage
> > > nx..
> > >
> > > IIUC that was about forbidden huge page being mapped as executable.  So
> > > afaiu the only missing bit that could happen if we copy over the huge page
> > > ptes is the executable bit.
> > >
> > > But then?  I think we could get a page fault on fault->exec==true on the
> > > split small page (because when we copy over it's cleared, even though the
> > > page can actually be executable), but it should be well resolved right
> > > after that small page fault.
> > >
> > > The thing is IIUC this is a very rare case, IOW, it should mostly not
> > > happen in 99% of the use case?  And there's a slight penalty when it
> > > happens, but only perf-wise.
> > >
> > > As I'm not really fluent with the code base, perhaps I missed something?
> >
> > You're right that we could get away with not knowing the shadowed
> > access permissions to assign to the child SPTEs when splitting a huge
> > SPTE. We could just copy the huge SPTE access permissions and then let
> > the execute bit be repaired on fault (although those faults would be a
> > performance cost).
> >
> > But the access permissions are also needed to lookup an existing
> > shadow page (or create a new shadow page) to use to split the huge
> > page. For example, let's say we are going to split a huge page that
> > does not have execute permissions enabled. That could be because NX
> > HugePages are enabled or because we are shadowing a guest translation
> > that does not allow execution (or both). We wouldn't want to propagate
> > the no-execute permission into the child SP role.access if the
> > shadowed translation really does allow execution (and vice versa).
>
> Then the follow up (pure) question is what will happen if we simply
> propagate the no-exec permission into the child SP?
>
> I think that only happens with direct sptes where guest used huge pages
> because that's where the shadow page will be huge, so IIUC that's checked
> here when the small page fault triggers:
>
> static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>                                    unsigned direct_access)
> {
>         if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) {
>                 struct kvm_mmu_page *child;
>
>                 /*
>                  * For the direct sp, if the guest pte's dirty bit
>                  * changed form clean to dirty, it will corrupt the
>                  * sp's access: allow writable in the read-only sp,
>                  * so we should update the spte at this point to get
>                  * a new sp with the correct access.
>                  */
>                 child = to_shadow_page(*sptep & PT64_BASE_ADDR_MASK);
>                 if (child->role.access == direct_access)
>                         return;
>
>                 drop_parent_pte(child, sptep);
>                 kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
>         }
> }
>
> Due to missing EXEC the role.access check will not match with direct
> access, which is the guest pgtable value which has EXEC set.  Then IIUC
> we'll simply drop the no-exec SP and replace it with a new one with exec
> perm.  The question is, would that untimately work too?
>
> Even if that works, I agree this sounds tricky because we're potentially
> caching fake sp.role conditionally and it seems we never do that before.
> It's just that the other option that you proposed here seems to add other
> way of complexity on caching spte permission information while kvm doesn't
> do either before.  IMHO we need to see which is the best trade off.

Clever! I think you're right that it would work correctly.

This approach avoids the need for caching access bits, but comes with downsides:
 - Performance impact from the extra faults needed to drop the SP and
repair the execute permission bit.
 - Some amount of memory overhead from KVM allocating new SPs when it
could re-use existing SPs.

Given the relative simplicity of access caching (and the fact that it
requires no additional memory), I'm inclined to stick with it rather
than taking the access permissions from the huge page.

>
> I could have missed something else, though.
>
> Thanks,
>
> --
> Peter Xu
>

WARNING: multiple messages have this Message-ID (diff)
From: David Matlack <dmatlack@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>, Albert Ou <aou@eecs.berkeley.edu>,
	"open list:KERNEL VIRTUAL MACHINE FOR MIPS \(KVM/mips\)"
	<kvm@vger.kernel.org>, Huacai Chen <chenhuacai@kernel.org>,
	"open list:KERNEL VIRTUAL MACHINE FOR MIPS \(KVM/mips\)"
	<linux-mips@vger.kernel.org>,
	Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	"open list:KERNEL VIRTUAL MACHINE FOR RISC-V \(KVM/riscv\)"
	<kvm-riscv@lists.infradead.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Ben Gardon <bgardon@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>,
	"moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 \(KVM/arm64\)"
	<kvmarm@lists.cs.columbia.edu>, Peter Feiner <pfeiner@google.com>
Subject: Re: [PATCH v2 16/26] KVM: x86/mmu: Cache the access bits of shadowed translations
Date: Thu, 31 Mar 2022 14:40:35 -0700	[thread overview]
Message-ID: <CALzav=cxm=A31PJOMes3eWpCV8s0zQGgaGZhYiQFJyxY2dNDXg@mail.gmail.com> (raw)
In-Reply-To: <YkShwFaRqlQpyL87@xz-m1.local>

On Wed, Mar 30, 2022 at 11:30 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Mar 22, 2022 at 03:51:54PM -0700, David Matlack wrote:
> > On Wed, Mar 16, 2022 at 1:32 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Fri, Mar 11, 2022 at 12:25:18AM +0000, David Matlack wrote:
> > > > In order to split a huge page we need to know what access bits to assign
> > > > to the role of the new child page table. This can't be easily derived
> > > > from the huge page SPTE itself since KVM applies its own access policies
> > > > on top, such as for HugePage NX.
> > > >
> > > > We could walk the guest page tables to determine the correct access
> > > > bits, but that is difficult to plumb outside of a vCPU fault context.
> > > > Instead, we can store the original access bits for each leaf SPTE
> > > > alongside the GFN in the gfns array. The access bits only take up 3
> > > > bits, which leaves 61 bits left over for gfns, which is more than
> > > > enough. So this change does not require any additional memory.
> > >
> > > I have a pure question on why eager page split needs to worry on hugepage
> > > nx..
> > >
> > > IIUC that was about forbidden huge page being mapped as executable.  So
> > > afaiu the only missing bit that could happen if we copy over the huge page
> > > ptes is the executable bit.
> > >
> > > But then?  I think we could get a page fault on fault->exec==true on the
> > > split small page (because when we copy over it's cleared, even though the
> > > page can actually be executable), but it should be well resolved right
> > > after that small page fault.
> > >
> > > The thing is IIUC this is a very rare case, IOW, it should mostly not
> > > happen in 99% of the use case?  And there's a slight penalty when it
> > > happens, but only perf-wise.
> > >
> > > As I'm not really fluent with the code base, perhaps I missed something?
> >
> > You're right that we could get away with not knowing the shadowed
> > access permissions to assign to the child SPTEs when splitting a huge
> > SPTE. We could just copy the huge SPTE access permissions and then let
> > the execute bit be repaired on fault (although those faults would be a
> > performance cost).
> >
> > But the access permissions are also needed to lookup an existing
> > shadow page (or create a new shadow page) to use to split the huge
> > page. For example, let's say we are going to split a huge page that
> > does not have execute permissions enabled. That could be because NX
> > HugePages are enabled or because we are shadowing a guest translation
> > that does not allow execution (or both). We wouldn't want to propagate
> > the no-execute permission into the child SP role.access if the
> > shadowed translation really does allow execution (and vice versa).
>
> Then the follow up (pure) question is what will happen if we simply
> propagate the no-exec permission into the child SP?
>
> I think that only happens with direct sptes where guest used huge pages
> because that's where the shadow page will be huge, so IIUC that's checked
> here when the small page fault triggers:
>
> static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>                                    unsigned direct_access)
> {
>         if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) {
>                 struct kvm_mmu_page *child;
>
>                 /*
>                  * For the direct sp, if the guest pte's dirty bit
>                  * changed form clean to dirty, it will corrupt the
>                  * sp's access: allow writable in the read-only sp,
>                  * so we should update the spte at this point to get
>                  * a new sp with the correct access.
>                  */
>                 child = to_shadow_page(*sptep & PT64_BASE_ADDR_MASK);
>                 if (child->role.access == direct_access)
>                         return;
>
>                 drop_parent_pte(child, sptep);
>                 kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
>         }
> }
>
> Due to missing EXEC the role.access check will not match with direct
> access, which is the guest pgtable value which has EXEC set.  Then IIUC
> we'll simply drop the no-exec SP and replace it with a new one with exec
> perm.  The question is, would that untimately work too?
>
> Even if that works, I agree this sounds tricky because we're potentially
> caching fake sp.role conditionally and it seems we never do that before.
> It's just that the other option that you proposed here seems to add other
> way of complexity on caching spte permission information while kvm doesn't
> do either before.  IMHO we need to see which is the best trade off.

Clever! I think you're right that it would work correctly.

This approach avoids the need for caching access bits, but comes with downsides:
 - Performance impact from the extra faults needed to drop the SP and
repair the execute permission bit.
 - Some amount of memory overhead from KVM allocating new SPs when it
could re-use existing SPs.

Given the relative simplicity of access caching (and the fact that it
requires no additional memory), I'm inclined to stick with it rather
than taking the access permissions from the huge page.

>
> I could have missed something else, though.
>
> Thanks,
>
> --
> Peter Xu
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  reply	other threads:[~2022-03-31 21:41 UTC|newest]

Thread overview: 134+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-11  0:25 [PATCH v2 00/26] Extend Eager Page Splitting to the shadow MMU David Matlack
2022-03-11  0:25 ` David Matlack
2022-03-11  0:25 ` [PATCH v2 01/26] KVM: x86/mmu: Optimize MMU page cache lookup for all direct SPs David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-15  7:40   ` Peter Xu
2022-03-15  7:40     ` Peter Xu
2022-03-22 18:16     ` David Matlack
2022-03-22 18:16       ` David Matlack
2022-03-11  0:25 ` [PATCH v2 02/26] KVM: x86/mmu: Use a bool for direct David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-15  7:46   ` Peter Xu
2022-03-15  7:46     ` Peter Xu
2022-03-22 18:21     ` David Matlack
2022-03-22 18:21       ` David Matlack
2022-03-11  0:25 ` [PATCH v2 03/26] KVM: x86/mmu: Derive shadow MMU page role from parent David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-15  8:15   ` Peter Xu
2022-03-15  8:15     ` Peter Xu
2022-03-22 18:30     ` David Matlack
2022-03-22 18:30       ` David Matlack
2022-03-30 14:25       ` Peter Xu
2022-03-30 14:25         ` Peter Xu
2022-03-11  0:25 ` [PATCH v2 04/26] KVM: x86/mmu: Decompose kvm_mmu_get_page() into separate functions David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-15  8:50   ` Peter Xu
2022-03-15  8:50     ` Peter Xu
2022-03-22 22:09     ` David Matlack
2022-03-22 22:09       ` David Matlack
2022-03-11  0:25 ` [PATCH v2 05/26] KVM: x86/mmu: Rename shadow MMU functions that deal with shadow pages David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-15  8:52   ` Peter Xu
2022-03-15  8:52     ` Peter Xu
2022-03-22 21:35     ` David Matlack
2022-03-22 21:35       ` David Matlack
2022-03-30 14:28       ` Peter Xu
2022-03-30 14:28         ` Peter Xu
2022-03-11  0:25 ` [PATCH v2 06/26] KVM: x86/mmu: Pass memslot to kvm_mmu_new_shadow_page() David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-15  9:03   ` Peter Xu
2022-03-15  9:03     ` Peter Xu
2022-03-22 22:05     ` David Matlack
2022-03-22 22:05       ` David Matlack
2022-03-11  0:25 ` [PATCH v2 07/26] KVM: x86/mmu: Separate shadow MMU sp allocation from initialization David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-15  9:54   ` Peter Xu
2022-03-15  9:54     ` Peter Xu
2022-03-11  0:25 ` [PATCH v2 08/26] KVM: x86/mmu: Link spt to sp during allocation David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-15 10:04   ` Peter Xu
2022-03-15 10:04     ` Peter Xu
2022-03-22 22:30     ` David Matlack
2022-03-22 22:30       ` David Matlack
2022-03-11  0:25 ` [PATCH v2 09/26] KVM: x86/mmu: Move huge page split sp allocation code to mmu.c David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-15 10:17   ` Peter Xu
2022-03-15 10:17     ` Peter Xu
2022-03-11  0:25 ` [PATCH v2 10/26] KVM: x86/mmu: Use common code to free kvm_mmu_page structs David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-15 10:22   ` Peter Xu
2022-03-15 10:22     ` Peter Xu
2022-03-22 22:33     ` David Matlack
2022-03-22 22:33       ` David Matlack
2022-03-11  0:25 ` [PATCH v2 11/26] KVM: x86/mmu: Use common code to allocate kvm_mmu_page structs from vCPU caches David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-15 10:27   ` Peter Xu
2022-03-15 10:27     ` Peter Xu
2022-03-22 22:35     ` David Matlack
2022-03-22 22:35       ` David Matlack
2022-03-11  0:25 ` [PATCH v2 12/26] KVM: x86/mmu: Pass const memslot to rmap_add() David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-11  0:25 ` [PATCH v2 13/26] KVM: x86/mmu: Pass const memslot to init_shadow_page() and descendants David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-11  0:25 ` [PATCH v2 14/26] KVM: x86/mmu: Decouple rmap_add() and link_shadow_page() from kvm_vcpu David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-15 10:37   ` Peter Xu
2022-03-15 10:37     ` Peter Xu
2022-03-11  0:25 ` [PATCH v2 15/26] KVM: x86/mmu: Update page stats in __rmap_add() David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-15 10:39   ` Peter Xu
2022-03-15 10:39     ` Peter Xu
2022-03-11  0:25 ` [PATCH v2 16/26] KVM: x86/mmu: Cache the access bits of shadowed translations David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-16  8:32   ` Peter Xu
2022-03-16  8:32     ` Peter Xu
2022-03-22 22:51     ` David Matlack
2022-03-22 22:51       ` David Matlack
2022-03-30 18:30       ` Peter Xu
2022-03-30 18:30         ` Peter Xu
2022-03-31 21:40         ` David Matlack [this message]
2022-03-31 21:40           ` David Matlack
2022-03-11  0:25 ` [PATCH v2 17/26] KVM: x86/mmu: Pass access information to make_huge_page_split_spte() David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-16  8:44   ` Peter Xu
2022-03-16  8:44     ` Peter Xu
2022-03-22 23:08     ` David Matlack
2022-03-22 23:08       ` David Matlack
2022-03-11  0:25 ` [PATCH v2 18/26] KVM: x86/mmu: Zap collapsible SPTEs at all levels in the shadow MMU David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-16  8:49   ` Peter Xu
2022-03-16  8:49     ` Peter Xu
2022-03-22 23:11     ` David Matlack
2022-03-22 23:11       ` David Matlack
2022-03-11  0:25 ` [PATCH v2 19/26] KVM: x86/mmu: Refactor drop_large_spte() David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-16  8:53   ` Peter Xu
2022-03-16  8:53     ` Peter Xu
2022-03-11  0:25 ` [PATCH v2 20/26] KVM: x86/mmu: Extend Eager Page Splitting to the shadow MMU David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-16 10:26   ` Peter Xu
2022-03-16 10:26     ` Peter Xu
2022-03-22  0:07     ` David Matlack
2022-03-22  0:07       ` David Matlack
2022-03-22 23:58     ` David Matlack
2022-03-22 23:58       ` David Matlack
2022-03-30 18:34       ` Peter Xu
2022-03-30 18:34         ` Peter Xu
2022-03-31 19:57         ` David Matlack
2022-03-31 19:57           ` David Matlack
2022-03-11  0:25 ` [PATCH v2 21/26] KVM: Allow for different capacities in kvm_mmu_memory_cache structs David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-19  5:27   ` Anup Patel
2022-03-19  5:27     ` Anup Patel
2022-03-22 23:13     ` David Matlack
2022-03-22 23:13       ` David Matlack
2022-03-11  0:25 ` [PATCH v2 22/26] KVM: Allow GFP flags to be passed when topping up MMU caches David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-11  0:25 ` [PATCH v2 23/26] KVM: x86/mmu: Fully split huge pages that require extra pte_list_desc structs David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-11  0:25 ` [PATCH v2 24/26] KVM: x86/mmu: Split huge pages aliased by multiple SPTEs David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-11  0:25 ` [PATCH v2 25/26] KVM: x86/mmu: Drop NULL pte_list_desc_cache fallback David Matlack
2022-03-11  0:25   ` David Matlack
2022-03-11  0:25 ` [PATCH v2 26/26] KVM: selftests: Map x86_64 guest virtual memory with huge pages David Matlack
2022-03-11  0:25   ` David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALzav=cxm=A31PJOMes3eWpCV8s0zQGgaGZhYiQFJyxY2dNDXg@mail.gmail.com' \
    --to=dmatlack@google.com \
    --cc=aleksandar.qemu.devel@gmail.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=bgardon@google.com \
    --cc=chenhuacai@kernel.org \
    --cc=drjones@redhat.com \
    --cc=kvm-riscv@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-mips@vger.kernel.org \
    --cc=maciej.szmigiero@oracle.com \
    --cc=maz@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pfeiner@google.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.