All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu, linux-mm@kvack.org,
	Sean Christopherson <seanjc@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Will Deacon <will@kernel.org>,
	Quentin Perret <qperret@google.com>,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	kernel-team@android.com
Subject: Re: [PATCH 1/5] KVM: arm64: Walk userspace page tables to compute the THP mapping size
Date: Fri, 23 Jul 2021 09:48:29 +0100	[thread overview]
Message-ID: <871r7p2z6q.wl-maz@kernel.org> (raw)
In-Reply-To: <f09c297b-21dd-a6fa-6e72-49587ba80fe5@arm.com>

On Tue, 20 Jul 2021 18:23:02 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> I just can't figure out why having the mmap lock is not needed to walk the
> userspace page tables. Any hints? Or am I not seeing where it's taken?

I trust Sean's explanation was complete enough!

> On 7/17/21 10:55 AM, Marc Zyngier wrote:
> > We currently rely on the kvm_is_transparent_hugepage() helper to
> > discover whether a given page has the potential to be mapped as
> > a block mapping.
> >
> > However, this API doesn't really give un everything we want:
> > - we don't get the size: this is not crucial today as we only
> >   support PMD-sized THPs, but we'd like to have larger sizes
> >   in the future
> > - we're the only user left of the API, and there is a will
> >   to remove it altogether
> >
> > To address the above, implement a simple walker using the existing
> > page table infrastructure, and plumb it into transparent_hugepage_adjust().
> > No new page sizes are supported in the process.
> >
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/kvm/mmu.c | 46 ++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 42 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 3155c9e778f0..db6314b93e99 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -433,6 +433,44 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
> >  	return 0;
> >  }
> >  
> > +static struct kvm_pgtable_mm_ops kvm_user_mm_ops = {
> > +	/* We shouldn't need any other callback to walk the PT */
> > +	.phys_to_virt		= kvm_host_va,
> > +};
> > +
> > +struct user_walk_data {
> > +	u32	level;
> > +};
> > +
> > +static int user_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > +		       enum kvm_pgtable_walk_flags flag, void * const arg)
> > +{
> > +	struct user_walk_data *data = arg;
> > +
> > +	data->level = level;
> > +	return 0;
> > +}
> > +
> > +static int get_user_mapping_size(struct kvm *kvm, u64 addr)
> > +{
> > +	struct user_walk_data data;
> > +	struct kvm_pgtable pgt = {
> > +		.pgd		= (kvm_pte_t *)kvm->mm->pgd,
> > +		.ia_bits	= VA_BITS,
> > +		.start_level	= 4 - CONFIG_PGTABLE_LEVELS,
> > +		.mm_ops		= &kvm_user_mm_ops,
> > +	};
> > +	struct kvm_pgtable_walker walker = {
> > +		.cb		= user_walker,
> > +		.flags		= KVM_PGTABLE_WALK_LEAF,
> > +		.arg		= &data,
> > +	};
> > +
> > +	kvm_pgtable_walk(&pgt, ALIGN_DOWN(addr, PAGE_SIZE), PAGE_SIZE, &walker);
> 
> I take it that it is guaranteed that kvm_pgtable_walk() will never
> fail? For example, I can see it failing if someone messes with
> KVM_PGTABLE_MAX_LEVELS.

But that's an architectural constant. How could it be messed with?
When we introduce 5 levels of page tables, we'll have to check all
this anyway.

> To be honest, I would rather have a check here instead of
> potentially feeding a bogus value to ARM64_HW_PGTABLE_LEVEL_SHIFT.
> It could be a VM_WARN_ON, so there's no runtime overhead unless
> CONFIG_DEBUG_VM.

Fair enough. That's easy enough to check.

> The patch looks good to me so far, but I want to give it another
> look (or two) after I figure out why the mmap semaphone is not
> needed.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: kernel-team@android.com, kvm@vger.kernel.org,
	Sean Christopherson <seanjc@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	linux-mm@kvack.org, Paolo Bonzini <pbonzini@redhat.com>,
	Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 1/5] KVM: arm64: Walk userspace page tables to compute the THP mapping size
Date: Fri, 23 Jul 2021 09:48:29 +0100	[thread overview]
Message-ID: <871r7p2z6q.wl-maz@kernel.org> (raw)
In-Reply-To: <f09c297b-21dd-a6fa-6e72-49587ba80fe5@arm.com>

On Tue, 20 Jul 2021 18:23:02 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> I just can't figure out why having the mmap lock is not needed to walk the
> userspace page tables. Any hints? Or am I not seeing where it's taken?

I trust Sean's explanation was complete enough!

> On 7/17/21 10:55 AM, Marc Zyngier wrote:
> > We currently rely on the kvm_is_transparent_hugepage() helper to
> > discover whether a given page has the potential to be mapped as
> > a block mapping.
> >
> > However, this API doesn't really give un everything we want:
> > - we don't get the size: this is not crucial today as we only
> >   support PMD-sized THPs, but we'd like to have larger sizes
> >   in the future
> > - we're the only user left of the API, and there is a will
> >   to remove it altogether
> >
> > To address the above, implement a simple walker using the existing
> > page table infrastructure, and plumb it into transparent_hugepage_adjust().
> > No new page sizes are supported in the process.
> >
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/kvm/mmu.c | 46 ++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 42 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 3155c9e778f0..db6314b93e99 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -433,6 +433,44 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
> >  	return 0;
> >  }
> >  
> > +static struct kvm_pgtable_mm_ops kvm_user_mm_ops = {
> > +	/* We shouldn't need any other callback to walk the PT */
> > +	.phys_to_virt		= kvm_host_va,
> > +};
> > +
> > +struct user_walk_data {
> > +	u32	level;
> > +};
> > +
> > +static int user_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > +		       enum kvm_pgtable_walk_flags flag, void * const arg)
> > +{
> > +	struct user_walk_data *data = arg;
> > +
> > +	data->level = level;
> > +	return 0;
> > +}
> > +
> > +static int get_user_mapping_size(struct kvm *kvm, u64 addr)
> > +{
> > +	struct user_walk_data data;
> > +	struct kvm_pgtable pgt = {
> > +		.pgd		= (kvm_pte_t *)kvm->mm->pgd,
> > +		.ia_bits	= VA_BITS,
> > +		.start_level	= 4 - CONFIG_PGTABLE_LEVELS,
> > +		.mm_ops		= &kvm_user_mm_ops,
> > +	};
> > +	struct kvm_pgtable_walker walker = {
> > +		.cb		= user_walker,
> > +		.flags		= KVM_PGTABLE_WALK_LEAF,
> > +		.arg		= &data,
> > +	};
> > +
> > +	kvm_pgtable_walk(&pgt, ALIGN_DOWN(addr, PAGE_SIZE), PAGE_SIZE, &walker);
> 
> I take it that it is guaranteed that kvm_pgtable_walk() will never
> fail? For example, I can see it failing if someone messes with
> KVM_PGTABLE_MAX_LEVELS.

But that's an architectural constant. How could it be messed with?
When we introduce 5 levels of page tables, we'll have to check all
this anyway.

> To be honest, I would rather have a check here instead of
> potentially feeding a bogus value to ARM64_HW_PGTABLE_LEVEL_SHIFT.
> It could be a VM_WARN_ON, so there's no runtime overhead unless
> CONFIG_DEBUG_VM.

Fair enough. That's easy enough to check.

> The patch looks good to me so far, but I want to give it another
> look (or two) after I figure out why the mmap semaphone is not
> needed.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
	kvmarm@lists.cs.columbia.edu, linux-mm@kvack.org,
	Sean Christopherson <seanjc@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Will Deacon <will@kernel.org>,
	Quentin Perret <qperret@google.com>,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	kernel-team@android.com
Subject: Re: [PATCH 1/5] KVM: arm64: Walk userspace page tables to compute the THP mapping size
Date: Fri, 23 Jul 2021 09:48:29 +0100	[thread overview]
Message-ID: <871r7p2z6q.wl-maz@kernel.org> (raw)
In-Reply-To: <f09c297b-21dd-a6fa-6e72-49587ba80fe5@arm.com>

On Tue, 20 Jul 2021 18:23:02 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> I just can't figure out why having the mmap lock is not needed to walk the
> userspace page tables. Any hints? Or am I not seeing where it's taken?

I trust Sean's explanation was complete enough!

> On 7/17/21 10:55 AM, Marc Zyngier wrote:
> > We currently rely on the kvm_is_transparent_hugepage() helper to
> > discover whether a given page has the potential to be mapped as
> > a block mapping.
> >
> > However, this API doesn't really give un everything we want:
> > - we don't get the size: this is not crucial today as we only
> >   support PMD-sized THPs, but we'd like to have larger sizes
> >   in the future
> > - we're the only user left of the API, and there is a will
> >   to remove it altogether
> >
> > To address the above, implement a simple walker using the existing
> > page table infrastructure, and plumb it into transparent_hugepage_adjust().
> > No new page sizes are supported in the process.
> >
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/kvm/mmu.c | 46 ++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 42 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 3155c9e778f0..db6314b93e99 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -433,6 +433,44 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
> >  	return 0;
> >  }
> >  
> > +static struct kvm_pgtable_mm_ops kvm_user_mm_ops = {
> > +	/* We shouldn't need any other callback to walk the PT */
> > +	.phys_to_virt		= kvm_host_va,
> > +};
> > +
> > +struct user_walk_data {
> > +	u32	level;
> > +};
> > +
> > +static int user_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > +		       enum kvm_pgtable_walk_flags flag, void * const arg)
> > +{
> > +	struct user_walk_data *data = arg;
> > +
> > +	data->level = level;
> > +	return 0;
> > +}
> > +
> > +static int get_user_mapping_size(struct kvm *kvm, u64 addr)
> > +{
> > +	struct user_walk_data data;
> > +	struct kvm_pgtable pgt = {
> > +		.pgd		= (kvm_pte_t *)kvm->mm->pgd,
> > +		.ia_bits	= VA_BITS,
> > +		.start_level	= 4 - CONFIG_PGTABLE_LEVELS,
> > +		.mm_ops		= &kvm_user_mm_ops,
> > +	};
> > +	struct kvm_pgtable_walker walker = {
> > +		.cb		= user_walker,
> > +		.flags		= KVM_PGTABLE_WALK_LEAF,
> > +		.arg		= &data,
> > +	};
> > +
> > +	kvm_pgtable_walk(&pgt, ALIGN_DOWN(addr, PAGE_SIZE), PAGE_SIZE, &walker);
> 
> I take it that it is guaranteed that kvm_pgtable_walk() will never
> fail? For example, I can see it failing if someone messes with
> KVM_PGTABLE_MAX_LEVELS.

But that's an architectural constant. How could it be messed with?
When we introduce 5 levels of page tables, we'll have to check all
this anyway.

> To be honest, I would rather have a check here instead of
> potentially feeding a bogus value to ARM64_HW_PGTABLE_LEVEL_SHIFT.
> It could be a VM_WARN_ON, so there's no runtime overhead unless
> CONFIG_DEBUG_VM.

Fair enough. That's easy enough to check.

> The patch looks good to me so far, but I want to give it another
> look (or two) after I figure out why the mmap semaphone is not
> needed.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2021-07-23  8:48 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-17  9:55 [PATCH 0/5] KVM: Remove kvm_is_transparent_hugepage() and friends Marc Zyngier
2021-07-17  9:55 ` Marc Zyngier
2021-07-17  9:55 ` Marc Zyngier
2021-07-17  9:55 ` [PATCH 1/5] KVM: arm64: Walk userspace page tables to compute the THP mapping size Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-19  6:31   ` Paolo Bonzini
2021-07-19  6:31     ` Paolo Bonzini
2021-07-19  6:31     ` Paolo Bonzini
2021-07-19  9:31     ` Marc Zyngier
2021-07-19  9:31       ` Marc Zyngier
2021-07-19  9:31       ` Marc Zyngier
2021-07-20 17:23   ` Alexandru Elisei
2021-07-20 17:23     ` Alexandru Elisei
2021-07-20 17:23     ` Alexandru Elisei
2021-07-20 20:33     ` Sean Christopherson
2021-07-20 20:33       ` Sean Christopherson
2021-07-20 20:33       ` Sean Christopherson
2021-07-21 14:58       ` Will Deacon
2021-07-21 14:58         ` Will Deacon
2021-07-21 14:58         ` Will Deacon
2021-07-21 15:56         ` Sean Christopherson
2021-07-21 15:56           ` Sean Christopherson
2021-07-21 15:56           ` Sean Christopherson
2021-07-21 16:37       ` Alexandru Elisei
2021-07-21 16:37         ` Alexandru Elisei
2021-07-21 16:37         ` Alexandru Elisei
2021-07-23  8:48     ` Marc Zyngier [this message]
2021-07-23  8:48       ` Marc Zyngier
2021-07-23  8:48       ` Marc Zyngier
2021-07-17  9:55 ` [PATCH 2/5] KVM: arm64: Avoid mapping size adjustment on permission fault Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-23 15:55   ` Alexandru Elisei
2021-07-23 15:55     ` Alexandru Elisei
2021-07-23 15:55     ` Alexandru Elisei
2021-07-23 16:18     ` Marc Zyngier
2021-07-23 16:18       ` Marc Zyngier
2021-07-23 16:18       ` Marc Zyngier
2021-07-17  9:55 ` [PATCH 3/5] KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap() Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-19  6:31   ` Paolo Bonzini
2021-07-19  6:31     ` Paolo Bonzini
2021-07-19  6:31     ` Paolo Bonzini
2021-07-17  9:55 ` [PATCH 4/5] KVM: arm64: Use get_page() instead of kvm_get_pfn() Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-17  9:55 ` [PATCH 5/5] KVM: Get rid " Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-17  9:55   ` Marc Zyngier
2021-07-19  6:31   ` Paolo Bonzini
2021-07-19  6:31     ` Paolo Bonzini
2021-07-19  6:31     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871r7p2z6q.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=alexandru.elisei@arm.com \
    --cc=james.morse@arm.com \
    --cc=kernel-team@android.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=seanjc@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.