All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoffer Dall <christoffer.dall@arm.com>
To: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org, cdall@kernel.org,
	kvm@vger.kernel.org, marc.zyngier@arm.com, will.deacon@arm.com,
	linux-kernel@vger.kernel.org, dave.martin@arm.com,
	pbonzini@redhat.com, kvmarm@lists.cs.columbia.edu,
	Punit Agrawal <punit.agrawal@arm.com>
Subject: Re: [PATCH v6 18/18] kvm: arm64: Allow tuning the physical address size for VM
Date: Thu, 1 Nov 2018 09:36:22 +0100	[thread overview]
Message-ID: <20181101083622.GH12057@e113682-lin.lund.arm.com> (raw)
In-Reply-To: <ebfbc17a-5954-7a67-0ae4-b911ba5f8b9e@arm.com>

On Wed, Oct 31, 2018 at 05:55:13PM +0000, Suzuki K Poulose wrote:
> Hi Christoffer,
> 
> On 31/10/18 14:22, Christoffer Dall wrote:
> >On Wed, Sep 26, 2018 at 05:32:54PM +0100, Suzuki K Poulose wrote:
> >>Allow specifying the physical address size limit for a new
> >>VM via the kvm_type argument for the KVM_CREATE_VM ioctl. This
> >>allows us to finalise the stage2 page table as early as possible
> >>and hence perform the right checks on the memory slots
> >>without complication. The size is encoded as Log2(PA_Size) in
> >>bits[7:0] of the type field. For backward compatibility the
> >>value 0 is reserved and implies 40bits. Also, lift the limit
> >>of the IPA to host limit and allow lower IPA sizes (e.g, 32).
> >>
> >>The userspace could check the extension KVM_CAP_ARM_VM_IPA_SIZE
> >>for the availability of this feature. The cap check returns the
> >>maximum limit for the physical address shift supported by the host.
> >>
> >>Cc: Marc Zyngier <marc.zyngier@arm.com>
> >>Cc: Christoffer Dall <cdall@kernel.org>
> >>Cc: Peter Maydell <peter.maydell@linaro.org>
> >>Cc: Paolo Bonzini <pbonzini@redhat.com>
> >>Cc: Radim Krčmář <rkrcmar@redhat.com>
> >>Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>---
> 
> >>@@ -192,17 +195,23 @@ int kvm_arm_config_vm(struct kvm *kvm, unsigned long type)
> >>  	u32 parange, phys_shift;
> >>  	u8 lvls;
> >>-	if (type)
> >>+	if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
> >>  		return -EINVAL;
> >>+	phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
> >>+	if (phys_shift) {
> >>+		if (phys_shift > kvm_ipa_limit ||
> >>+		    phys_shift < 32)
> >>+			return -EINVAL;
> >
> >I am concerned here that if we allow the user to set the phys_size to 32
> >bits, then we end up with 2 levels of stage2 page tables, which means
> >that the size of a stage2 pmd mapping becomes the size of a stage2 pgd
> >mapping, yet we can still decide in user_mem_abort() that a stage2 fault
> >is backed by PMD size mappings on the host, and attempt a huge mapping
> >at stage2, which then becomes a PGD level block map, I think.
> 
> Yes, you're right. We will have a pgd-level block map in that case.
> This should work transparently as PMD at stage2 is folded into PGD and
> we endup marking the PGD entry as huge and the stage2 accessors deal
> with it appropriately. This is similar to having a PMD huge page with
> 64K + 42bit VA (2 level page table) on stage1.
> 
> >
> >Is this handled somehow?  If so, how?
> 
> We don't prevent this. We have a guaranteed minimum number of levels
> at 2, which implies you can map a stage1 PMD huge page at stage2.
> I acknowledge that the Linux naming convention does cause some confusion
> for a "level" at stage1 and stage2 levels. But if you think of it
> from the hardware level (and like the ARM ARM defines it , Level 0-3),
> it is much simpler. i.e, you can map a huge page at level N in stage1
> into stage2 if you have that level N. It doesn't matter if stage2 has
> more or less number of levels than stage1, as long as stage2 table can
> deal with it.
> 

That is indeed a good way to reason about it.

> >
> >I can't see user_mem_abort() being modified to explicitly handle this
> >in your code, but perhaps the stage2_set_pmd_huge() call ends up
> >actually mapping at the stage2 pte level, but I can't tell that it does.
> 
> The stage2_set_pmd_huge() installs it at the PGD (level 2, which would
> have been PMD if we had levels > 2) slot.
> 
> pmd = stage2_get_pmd(addr)
>        \-> pud = stage2_get_pud(addr)
>              \-> pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>              \-> (we have stage2_pgd_none(x) = 0 and
>              \-> stage2_pud_offset(pgd, addr) = pgd
>              \->returns (kvm->arch.pgd + stage2_pgd_index(addr);
>        \->  stage_pud_none(x) = 0 & stage2_pmd_offset(pud, addr) = pud
>        \->  returns pud (kvm->arch.pgd + stage2_pgd_index(addr))
> 
> and we install the PMD huge mapping at the location.
> 
> >In any case, I think user_mem_abort() should give up on pmd/pud huge
> >mappings if the size mapped by the stage2/stage1 pmd/pud levels don't
> >line up.  What do you think?
> 
> Does it matter ? Personally I don't think it matters much as long as we
> are able to map the "huge" page at stage1 in the stage2 as huge, even if
> the stage2 has lesser levels and manage it well. Given that PMD huge
> pages are quite common, it would be good to exploit it when we can.

What I couldn't convince myself of was whether having 2 levels at stage2
implied the entry level block mapping being of the same size as the
stage1 block mapping, but given your explanation above, I think that's
fine.

> 
> On the other hand, for stage2 PUD we are checking if the stage2 has a
> PUD level (kvm_has_stage2_pud()). May be we should relax it just like
> we do for PMD to check (kvm_stage2_levels > 2).
> 

Depends on how the code ends up looking like I suppose, but the more
symmetry we can have between the approach for host PMD and host PUD and
host PTE mappings, the better.


Thanks,

    Christoffer

WARNING: multiple messages have this Message-ID (diff)
From: christoffer.dall@arm.com (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v6 18/18] kvm: arm64: Allow tuning the physical address size for VM
Date: Thu, 1 Nov 2018 09:36:22 +0100	[thread overview]
Message-ID: <20181101083622.GH12057@e113682-lin.lund.arm.com> (raw)
In-Reply-To: <ebfbc17a-5954-7a67-0ae4-b911ba5f8b9e@arm.com>

On Wed, Oct 31, 2018 at 05:55:13PM +0000, Suzuki K Poulose wrote:
> Hi Christoffer,
> 
> On 31/10/18 14:22, Christoffer Dall wrote:
> >On Wed, Sep 26, 2018 at 05:32:54PM +0100, Suzuki K Poulose wrote:
> >>Allow specifying the physical address size limit for a new
> >>VM via the kvm_type argument for the KVM_CREATE_VM ioctl. This
> >>allows us to finalise the stage2 page table as early as possible
> >>and hence perform the right checks on the memory slots
> >>without complication. The size is encoded as Log2(PA_Size) in
> >>bits[7:0] of the type field. For backward compatibility the
> >>value 0 is reserved and implies 40bits. Also, lift the limit
> >>of the IPA to host limit and allow lower IPA sizes (e.g, 32).
> >>
> >>The userspace could check the extension KVM_CAP_ARM_VM_IPA_SIZE
> >>for the availability of this feature. The cap check returns the
> >>maximum limit for the physical address shift supported by the host.
> >>
> >>Cc: Marc Zyngier <marc.zyngier@arm.com>
> >>Cc: Christoffer Dall <cdall@kernel.org>
> >>Cc: Peter Maydell <peter.maydell@linaro.org>
> >>Cc: Paolo Bonzini <pbonzini@redhat.com>
> >>Cc: Radim Kr?m?? <rkrcmar@redhat.com>
> >>Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>---
> 
> >>@@ -192,17 +195,23 @@ int kvm_arm_config_vm(struct kvm *kvm, unsigned long type)
> >>  	u32 parange, phys_shift;
> >>  	u8 lvls;
> >>-	if (type)
> >>+	if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
> >>  		return -EINVAL;
> >>+	phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
> >>+	if (phys_shift) {
> >>+		if (phys_shift > kvm_ipa_limit ||
> >>+		    phys_shift < 32)
> >>+			return -EINVAL;
> >
> >I am concerned here that if we allow the user to set the phys_size to 32
> >bits, then we end up with 2 levels of stage2 page tables, which means
> >that the size of a stage2 pmd mapping becomes the size of a stage2 pgd
> >mapping, yet we can still decide in user_mem_abort() that a stage2 fault
> >is backed by PMD size mappings on the host, and attempt a huge mapping
> >at stage2, which then becomes a PGD level block map, I think.
> 
> Yes, you're right. We will have a pgd-level block map in that case.
> This should work transparently as PMD at stage2 is folded into PGD and
> we endup marking the PGD entry as huge and the stage2 accessors deal
> with it appropriately. This is similar to having a PMD huge page with
> 64K + 42bit VA (2 level page table) on stage1.
> 
> >
> >Is this handled somehow?  If so, how?
> 
> We don't prevent this. We have a guaranteed minimum number of levels
> at 2, which implies you can map a stage1 PMD huge page at stage2.
> I acknowledge that the Linux naming convention does cause some confusion
> for a "level" at stage1 and stage2 levels. But if you think of it
> from the hardware level (and like the ARM ARM defines it , Level 0-3),
> it is much simpler. i.e, you can map a huge page at level N in stage1
> into stage2 if you have that level N. It doesn't matter if stage2 has
> more or less number of levels than stage1, as long as stage2 table can
> deal with it.
> 

That is indeed a good way to reason about it.

> >
> >I can't see user_mem_abort() being modified to explicitly handle this
> >in your code, but perhaps the stage2_set_pmd_huge() call ends up
> >actually mapping at the stage2 pte level, but I can't tell that it does.
> 
> The stage2_set_pmd_huge() installs it at the PGD (level 2, which would
> have been PMD if we had levels > 2) slot.
> 
> pmd = stage2_get_pmd(addr)
>        \-> pud = stage2_get_pud(addr)
>              \-> pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>              \-> (we have stage2_pgd_none(x) = 0 and
>              \-> stage2_pud_offset(pgd, addr) = pgd
>              \->returns (kvm->arch.pgd + stage2_pgd_index(addr);
>        \->  stage_pud_none(x) = 0 & stage2_pmd_offset(pud, addr) = pud
>        \->  returns pud (kvm->arch.pgd + stage2_pgd_index(addr))
> 
> and we install the PMD huge mapping at the location.
> 
> >In any case, I think user_mem_abort() should give up on pmd/pud huge
> >mappings if the size mapped by the stage2/stage1 pmd/pud levels don't
> >line up.  What do you think?
> 
> Does it matter ? Personally I don't think it matters much as long as we
> are able to map the "huge" page at stage1 in the stage2 as huge, even if
> the stage2 has lesser levels and manage it well. Given that PMD huge
> pages are quite common, it would be good to exploit it when we can.

What I couldn't convince myself of was whether having 2 levels at stage2
implied the entry level block mapping being of the same size as the
stage1 block mapping, but given your explanation above, I think that's
fine.

> 
> On the other hand, for stage2 PUD we are checking if the stage2 has a
> PUD level (kvm_has_stage2_pud()). May be we should relax it just like
> we do for PMD to check (kvm_stage2_levels > 2).
> 

Depends on how the code ends up looking like I suppose, but the more
symmetry we can have between the approach for host PMD and host PUD and
host PTE mappings, the better.


Thanks,

    Christoffer

  reply	other threads:[~2018-11-01  8:36 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-26 16:32 [PATCH v6 00/18] kvm: arm64: Dynamic IPA and 52bit IPA Suzuki K Poulose
2018-09-26 16:32 ` Suzuki K Poulose
2018-09-26 16:32 ` [PATCH v6 01/18] kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32 ` [PATCH v6 02/18] kvm: arm/arm64: Remove spurious WARN_ON Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32 ` [PATCH v6 03/18] kvm: arm64: Add helper for loading the stage2 setting for a VM Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32 ` [PATCH v6 04/18] arm64: Add a helper for PARange to physical shift conversion Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-10-01 12:05   ` Catalin Marinas
2018-10-01 12:05     ` Catalin Marinas
2018-09-26 16:32 ` [PATCH v6 05/18] kvm: arm64: Clean up VTCR_EL2 initialisation Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32 ` [PATCH v6 06/18] kvm: arm/arm64: Allow arch specific configurations for VM Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-28 17:27   ` Marc Zyngier
2018-09-28 17:27     ` Marc Zyngier
2018-09-29  8:30     ` Suzuki K Poulose
2018-09-29  8:30       ` Suzuki K Poulose
2018-09-26 16:32 ` [PATCH v6 07/18] kvm: arm64: Configure VTCR_EL2 per VM Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-10-02  7:48   ` Auger Eric
2018-10-02  7:48     ` Auger Eric
2018-09-26 16:32 ` [PATCH v6 08/18] kvm: arm/arm64: Prepare for VM specific stage2 translations Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32 ` [PATCH v6 09/18] kvm: arm64: Prepare for dynamic stage2 page table layout Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32 ` [PATCH v6 10/18] kvm: arm64: Make stage2 page table layout dynamic Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32 ` [PATCH v6 11/18] kvm: arm64: Dynamic configuration of VTTBR mask Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-10-02  7:54   ` Auger Eric
2018-10-02  7:54     ` Auger Eric
2018-09-26 16:32 ` [PATCH v6 12/18] kvm: arm64: Configure VTCR_EL2.SL0 per VM Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32 ` [PATCH v6 13/18] kvm: arm64: Switch to per VM IPA limit Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-10-02  7:58   ` Auger Eric
2018-10-02  7:58     ` Auger Eric
2018-09-26 16:32 ` [PATCH v6 14/18] vgic: Add support for 52bit guest physical address Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32 ` [PATCH v6 15/18] kvm: arm64: Add 52bit support for PAR to HPFAR conversoin Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32 ` [PATCH v6 16/18] kvm: arm64: Set a limit on the IPA size Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-10-02  8:20   ` Auger Eric
2018-10-02  8:20     ` Auger Eric
2018-09-26 16:32 ` [PATCH v6 17/18] kvm: arm64: Limit the minimum number of page table levels Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-10-02  8:22   ` Auger Eric
2018-10-02  8:22     ` Auger Eric
2018-09-26 16:32 ` [PATCH v6 18/18] kvm: arm64: Allow tuning the physical address size for VM Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-10-02  8:37   ` Auger Eric
2018-10-02  8:37     ` Auger Eric
2018-10-31 14:22   ` Christoffer Dall
2018-10-31 14:22     ` Christoffer Dall
2018-10-31 17:55     ` Suzuki K Poulose
2018-10-31 17:55       ` Suzuki K Poulose
2018-11-01  8:36       ` Christoffer Dall [this message]
2018-11-01  8:36         ` Christoffer Dall
2018-11-01  9:32         ` Suzuki K Poulose
2018-11-01  9:32           ` Suzuki K Poulose
2018-09-26 16:32 ` [kvmtool PATCH v6 19/18] kvmtool: Allow backends to run checks on the KVM device fd Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32 ` [kvmtool PATCH v6 20/18] kvmtool: arm64: Add support for guest physical address size Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32 ` [kvmtool PATCH v6 21/18] kvmtool: arm64: Switch memory layout Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32 ` [kvmtool PATCH v6 22/18] kvmtool: arm: Add support for creating VM with PA size Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-09-26 16:32   ` Suzuki K Poulose
2018-10-01 14:13   ` Marc Zyngier
2018-10-01 14:13     ` Marc Zyngier
2018-10-04  8:40 ` [PATCH v6 00/18] kvm: arm64: Dynamic IPA and 52bit IPA Auger Eric
2018-10-04  8:40   ` Auger Eric

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181101083622.GH12057@e113682-lin.lund.arm.com \
    --to=christoffer.dall@arm.com \
    --cc=cdall@kernel.org \
    --cc=dave.martin@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=pbonzini@redhat.com \
    --cc=punit.agrawal@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.