All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu, linux-mm@kvack.org,
	Matthew Wilcox <willy@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Quentin Perret <qperret@google.com>,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	kernel-team@android.com
Subject: Re: [PATCH 1/5] KVM: arm64: Walk userspace page tables to compute the THP mapping size
Date: Wed, 21 Jul 2021 15:58:29 +0100	[thread overview]
Message-ID: <20210721145828.GA11003@willie-the-truck> (raw)
In-Reply-To: <YPczKoLqlKElLxzb@google.com>

Hey Sean,

On Tue, Jul 20, 2021 at 08:33:46PM +0000, Sean Christopherson wrote:
> On Tue, Jul 20, 2021, Alexandru Elisei wrote:
> > I just can't figure out why having the mmap lock is not needed to walk the
> > userspace page tables. Any hints? Or am I not seeing where it's taken?
> 
> Disclaimer: I'm not super familiar with arm64's page tables, but the relevant KVM
> functionality is common across x86 and arm64.

No need for the disclaimer, there are so many moving parts here that I don't
think it's possible to be familiar with them all! Thanks for taking the time
to write it up so clearly.

> KVM arm64 (and x86) unconditionally registers a mmu_notifier for the mm_struct
> associated with the VM, and disallows calling ioctls from a different process,
> i.e. walking the page tables during KVM_RUN is guaranteed to use the mm for which
> KVM registered the mmu_notifier.  As part of registration, the mmu_notifier
> does mmgrab() and doesn't do mmdrop() until it's unregistered.  That ensures the
> mm_struct itself is live.
> 
> For the page tables liveliness, KVM implements mmu_notifier_ops.release, which is
> invoked at the beginning of exit_mmap(), before the page tables are freed.  In
> its implementation, KVM takes mmu_lock and zaps all its shadow page tables, a.k.a.
> the stage2 tables in KVM arm64.  The flow in question, get_user_mapping_size(),
> also runs under mmu_lock, and so effectively blocks exit_mmap() and thus is
> guaranteed to run with live userspace tables.

Unless I missed a case, exit_mmap() only runs when mm_struct::mm_users drops
to zero, right? The vCPU tasks should hold references to that afaict, so I
don't think it should be possible for exit_mmap() to run while there are
vCPUs running with the corresponding page-table.

> Looking at the arm64 code, one thing I'm not clear on is whether arm64 correctly
> handles the case where exit_mmap() wins the race.  The invalidate_range hooks will
> still be called, so userspace page tables aren't a problem, but
> kvm_arch_flush_shadow_all() -> kvm_free_stage2_pgd() nullifies mmu->pgt without
> any additional notifications that I see.  x86 deals with this by ensuring its
> top-level TDP entry (stage2 equivalent) is valid while the page fault handler is
> running.

But the fact that x86 handles this race has me worried. What am I missing?

I agree that, if the race can occur, we don't appear to handle it in the
arm64 backend.

Cheers,

Will

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: kernel-team@android.com, kvm@vger.kernel.org,
	Marc Zyngier <maz@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	linux-mm@kvack.org, Paolo Bonzini <pbonzini@redhat.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 1/5] KVM: arm64: Walk userspace page tables to compute the THP mapping size
Date: Wed, 21 Jul 2021 15:58:29 +0100	[thread overview]
Message-ID: <20210721145828.GA11003@willie-the-truck> (raw)
In-Reply-To: <YPczKoLqlKElLxzb@google.com>

Hey Sean,

On Tue, Jul 20, 2021 at 08:33:46PM +0000, Sean Christopherson wrote:
> On Tue, Jul 20, 2021, Alexandru Elisei wrote:
> > I just can't figure out why having the mmap lock is not needed to walk the
> > userspace page tables. Any hints? Or am I not seeing where it's taken?
> 
> Disclaimer: I'm not super familiar with arm64's page tables, but the relevant KVM
> functionality is common across x86 and arm64.

No need for the disclaimer, there are so many moving parts here that I don't
think it's possible to be familiar with them all! Thanks for taking the time
to write it up so clearly.

> KVM arm64 (and x86) unconditionally registers a mmu_notifier for the mm_struct
> associated with the VM, and disallows calling ioctls from a different process,
> i.e. walking the page tables during KVM_RUN is guaranteed to use the mm for which
> KVM registered the mmu_notifier.  As part of registration, the mmu_notifier
> does mmgrab() and doesn't do mmdrop() until it's unregistered.  That ensures the
> mm_struct itself is live.
> 
> For the page tables liveliness, KVM implements mmu_notifier_ops.release, which is
> invoked at the beginning of exit_mmap(), before the page tables are freed.  In
> its implementation, KVM takes mmu_lock and zaps all its shadow page tables, a.k.a.
> the stage2 tables in KVM arm64.  The flow in question, get_user_mapping_size(),
> also runs under mmu_lock, and so effectively blocks exit_mmap() and thus is
> guaranteed to run with live userspace tables.

Unless I missed a case, exit_mmap() only runs when mm_struct::mm_users drops
to zero, right? The vCPU tasks should hold references to that afaict, so I
don't think it should be possible for exit_mmap() to run while there are
vCPUs running with the corresponding page-table.

> Looking at the arm64 code, one thing I'm not clear on is whether arm64 correctly
> handles the case where exit_mmap() wins the race.  The invalidate_range hooks will
> still be called, so userspace page tables aren't a problem, but
> kvm_arch_flush_shadow_all() -> kvm_free_stage2_pgd() nullifies mmu->pgt without
> any additional notifications that I see.  x86 deals with this by ensuring its
> top-level TDP entry (stage2 equivalent) is valid while the page fault handler is
> running.

But the fact that x86 handles this race has me worried. What am I missing?

I agree that, if the race can occur, we don't appear to handle it in the
arm64 backend.

Cheers,

Will
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will@kernel.org>
To: Sean Christopherson <seanjc@google.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu, linux-mm@kvack.org,
	Matthew Wilcox <willy@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Quentin Perret <qperret@google.com>,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	kernel-team@android.com
Subject: Re: [PATCH 1/5] KVM: arm64: Walk userspace page tables to compute the THP mapping size
Date: Wed, 21 Jul 2021 15:58:29 +0100	[thread overview]
Message-ID: <20210721145828.GA11003@willie-the-truck> (raw)
In-Reply-To: <YPczKoLqlKElLxzb@google.com>

Hey Sean,

On Tue, Jul 20, 2021 at 08:33:46PM +0000, Sean Christopherson wrote:
> On Tue, Jul 20, 2021, Alexandru Elisei wrote:
> > I just can't figure out why having the mmap lock is not needed to walk the
> > userspace page tables. Any hints? Or am I not seeing where it's taken?
> 
> Disclaimer: I'm not super familiar with arm64's page tables, but the relevant KVM
> functionality is common across x86 and arm64.

No need for the disclaimer, there are so many moving parts here that I don't
think it's possible to be familiar with them all! Thanks for taking the time
to write it up so clearly.

> KVM arm64 (and x86) unconditionally registers a mmu_notifier for the mm_struct
> associated with the VM, and disallows calling ioctls from a different process,
> i.e. walking the page tables during KVM_RUN is guaranteed to use the mm for which
> KVM registered the mmu_notifier.  As part of registration, the mmu_notifier
> does mmgrab() and doesn't do mmdrop() until it's unregistered.  That ensures the
> mm_struct itself is live.
> 
> For the page tables liveliness, KVM implements mmu_notifier_ops.release, which is
> invoked at the beginning of exit_mmap(), before the page tables are freed.  In
> its implementation, KVM takes mmu_lock and zaps all its shadow page tables, a.k.a.
> the stage2 tables in KVM arm64.  The flow in question, get_user_mapping_size(),
> also runs under mmu_lock, and so effectively blocks exit_mmap() and thus is
> guaranteed to run with live userspace tables.

Unless I missed a case, exit_mmap() only runs when mm_struct::mm_users drops
to zero, right? The vCPU tasks should hold references to that afaict, so I
don't think it should be possible for exit_mmap() to run while there are
vCPUs running with the corresponding page-table.

> Looking at the arm64 code, one thing I'm not clear on is whether arm64 correctly
> handles the case where exit_mmap() wins the race.  The invalidate_range hooks will
> still be called, so userspace page tables aren't a problem, but
> kvm_arch_flush_shadow_all() -> kvm_free_stage2_pgd() nullifies mmu->pgt without
> any additional notifications that I see.  x86 deals with this by ensuring its
> top-level TDP entry (stage2 equivalent) is valid while the page fault handler is
> running.

But the fact that x86 handles this race has me worried. What am I missing?

I agree that, if the race can occur, we don't appear to handle it in the
arm64 backend.

Cheers,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-07-21 14:58 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-17  9:55 [PATCH 0/5] KVM: Remove kvm_is_transparent_hugepage() and friends Marc Zyngier
2021-07-17  9:55 ` Marc Zyngier
2021-07-17  9:55 ` Marc Zyngier
2021-07-17  9:55 ` [PATCH 1/5] KVM: arm64: Walk userspace page tables to compute the THP mapping size Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-19  6:31   ` Paolo Bonzini
2021-07-19  6:31     ` Paolo Bonzini
2021-07-19  6:31     ` Paolo Bonzini
2021-07-19  9:31     ` Marc Zyngier
2021-07-19  9:31       ` Marc Zyngier
2021-07-19  9:31       ` Marc Zyngier
2021-07-20 17:23   ` Alexandru Elisei
2021-07-20 17:23     ` Alexandru Elisei
2021-07-20 17:23     ` Alexandru Elisei
2021-07-20 20:33     ` Sean Christopherson
2021-07-20 20:33       ` Sean Christopherson
2021-07-20 20:33       ` Sean Christopherson
2021-07-21 14:58       ` Will Deacon [this message]
2021-07-21 14:58         ` Will Deacon
2021-07-21 14:58         ` Will Deacon
2021-07-21 15:56         ` Sean Christopherson
2021-07-21 15:56           ` Sean Christopherson
2021-07-21 15:56           ` Sean Christopherson
2021-07-21 16:37       ` Alexandru Elisei
2021-07-21 16:37         ` Alexandru Elisei
2021-07-21 16:37         ` Alexandru Elisei
2021-07-23  8:48     ` Marc Zyngier
2021-07-23  8:48       ` Marc Zyngier
2021-07-23  8:48       ` Marc Zyngier
2021-07-17  9:55 ` [PATCH 2/5] KVM: arm64: Avoid mapping size adjustment on permission fault Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-23 15:55   ` Alexandru Elisei
2021-07-23 15:55     ` Alexandru Elisei
2021-07-23 15:55     ` Alexandru Elisei
2021-07-23 16:18     ` Marc Zyngier
2021-07-23 16:18       ` Marc Zyngier
2021-07-23 16:18       ` Marc Zyngier
2021-07-17  9:55 ` [PATCH 3/5] KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap() Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-19  6:31   ` Paolo Bonzini
2021-07-19  6:31     ` Paolo Bonzini
2021-07-19  6:31     ` Paolo Bonzini
2021-07-17  9:55 ` [PATCH 4/5] KVM: arm64: Use get_page() instead of kvm_get_pfn() Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-17  9:55 ` [PATCH 5/5] KVM: Get rid " Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-19  6:31   ` Paolo Bonzini
2021-07-19  6:31     ` Paolo Bonzini
2021-07-19  6:31     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210721145828.GA11003@willie-the-truck \
    --to=will@kernel.org \
    --cc=alexandru.elisei@arm.com \
    --cc=james.morse@arm.com \
    --cc=kernel-team@android.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=maz@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=seanjc@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.