All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <marc.zyngier@arm.com>
To: "Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>
Cc: Punit Agrawal <punit.agrawal@arm.com>,
	kvm@vger.kernel.org,
	"Gustavo A . R . Silva" <gustavo@embeddedor.com>,
	Will Deacon <will.deacon@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, Lukas Braun <koomi@moshbit.net>
Subject: [PATCH 25/28] KVM: arm/arm64: Fix unintended stage 2 PMD mappings
Date: Wed, 19 Dec 2018 18:03:46 +0000	[thread overview]
Message-ID: <20181219180349.242681-26-marc.zyngier@arm.com> (raw)
In-Reply-To: <20181219180349.242681-1-marc.zyngier@arm.com>

From: Christoffer Dall <christoffer.dall@arm.com>

There are two things we need to take care of when we create block
mappings in the stage 2 page tables:

  (1) The alignment within a PMD between the host address range and the
  guest IPA range must be the same, since otherwise we end up mapping
  pages with the wrong offset.

  (2) The head and tail of a memory slot may not cover a full block
  size, and we have to take care to not map those with block
  descriptors, since we could expose memory to the guest that the host
  did not intend to expose.

So far, we have been taking care of (1), but not (2), and our commentary
describing (1) was somewhat confusing.

This commit attempts to factor out the checks of both into a common
function, and if we don't pass the check, we won't attempt any PMD
mappings for neither hugetlbfs nor THP.

Note that we used to only check the alignment for THP, not for
hugetlbfs, but as far as I can tell the check needs to be applied to
both scenarios.

Cc: Ralph Palutke <ralph.palutke@fau.de>
Cc: Lukas Braun <koomi@moshbit.net>
Reported-by: Lukas Braun <koomi@moshbit.net>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/mmu.c | 86 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 64 insertions(+), 22 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index f605514395a1..dee3dbd98712 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1595,6 +1595,63 @@ static void kvm_send_hwpoison_signal(unsigned long address,
 	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
 }
 
+static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot,
+					       unsigned long hva)
+{
+	gpa_t gpa_start, gpa_end;
+	hva_t uaddr_start, uaddr_end;
+	size_t size;
+
+	size = memslot->npages * PAGE_SIZE;
+
+	gpa_start = memslot->base_gfn << PAGE_SHIFT;
+	gpa_end = gpa_start + size;
+
+	uaddr_start = memslot->userspace_addr;
+	uaddr_end = uaddr_start + size;
+
+	/*
+	 * Pages belonging to memslots that don't have the same alignment
+	 * within a PMD for userspace and IPA cannot be mapped with stage-2
+	 * PMD entries, because we'll end up mapping the wrong pages.
+	 *
+	 * Consider a layout like the following:
+	 *
+	 *    memslot->userspace_addr:
+	 *    +-----+--------------------+--------------------+---+
+	 *    |abcde|fgh  Stage-1 PMD    |    Stage-1 PMD   tv|xyz|
+	 *    +-----+--------------------+--------------------+---+
+	 *
+	 *    memslot->base_gfn << PAGE_SIZE:
+	 *      +---+--------------------+--------------------+-----+
+	 *      |abc|def  Stage-2 PMD    |    Stage-2 PMD     |tvxyz|
+	 *      +---+--------------------+--------------------+-----+
+	 *
+	 * If we create those stage-2 PMDs, we'll end up with this incorrect
+	 * mapping:
+	 *   d -> f
+	 *   e -> g
+	 *   f -> h
+	 */
+	if ((gpa_start & ~S2_PMD_MASK) != (uaddr_start & ~S2_PMD_MASK))
+		return false;
+
+	/*
+	 * Next, let's make sure we're not trying to map anything not covered
+	 * by the memslot. This means we have to prohibit PMD size mappings
+	 * for the beginning and end of a non-PMD aligned and non-PMD sized
+	 * memory slot (illustrated by the head and tail parts of the
+	 * userspace view above containing pages 'abcde' and 'xyz',
+	 * respectively).
+	 *
+	 * Note that it doesn't matter if we do the check using the
+	 * userspace_addr or the base_gfn, as both are equally aligned (per
+	 * the check above) and equally sized.
+	 */
+	return (hva & S2_PMD_MASK) >= uaddr_start &&
+	       (hva & S2_PMD_MASK) + S2_PMD_SIZE <= uaddr_end;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  unsigned long fault_status)
@@ -1621,6 +1678,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		return -EFAULT;
 	}
 
+	if (!fault_supports_stage2_pmd_mappings(memslot, hva))
+		force_pte = true;
+
+	if (logging_active)
+		force_pte = true;
+
 	/* Let's check if we will get back a huge page backed by hugetlbfs */
 	down_read(&current->mm->mmap_sem);
 	vma = find_vma_intersection(current->mm, hva, hva + 1);
@@ -1637,28 +1700,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	if ((vma_pagesize == PMD_SIZE ||
 	     (vma_pagesize == PUD_SIZE && kvm_stage2_has_pud(kvm))) &&
-	    !logging_active) {
+	    !force_pte) {
 		gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
-	} else {
-		/*
-		 * Fallback to PTE if it's not one of the Stage 2
-		 * supported hugepage sizes or the corresponding level
-		 * doesn't exist
-		 */
-		vma_pagesize = PAGE_SIZE;
-
-		/*
-		 * Pages belonging to memslots that don't have the same
-		 * alignment for userspace and IPA cannot be mapped using
-		 * block descriptors even if the pages belong to a THP for
-		 * the process, because the stage-2 block descriptor will
-		 * cover more than a single THP and we loose atomicity for
-		 * unmapping, updates, and splits of the THP or other pages
-		 * in the stage-2 block range.
-		 */
-		if ((memslot->userspace_addr & ~PMD_MASK) !=
-		    ((memslot->base_gfn << PAGE_SHIFT) & ~PMD_MASK))
-			force_pte = true;
 	}
 	up_read(&current->mm->mmap_sem);
 
@@ -1697,7 +1740,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		 * should not be mapped with huge pages (it introduces churn
 		 * and performance degradation), so force a pte mapping.
 		 */
-		force_pte = true;
 		flags |= KVM_S2_FLAG_LOGGING_ACTIVE;
 
 		/*
-- 
2.19.2

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <marc.zyngier@arm.com>
To: "Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>
Cc: "Mark Rutland" <mark.rutland@arm.com>,
	"Punit Agrawal" <punit.agrawal@arm.com>,
	kvm@vger.kernel.org, "Julien Thierry" <julien.thierry@arm.com>,
	"Gustavo A . R . Silva" <gustavo@embeddedor.com>,
	"Will Deacon" <will.deacon@arm.com>,
	"Christoffer Dall" <christoffer.dall@arm.com>,
	linux-arm-kernel@lists.infradead.org, punitagrawal@gmail.com,
	"Alex Bennée" <alex.bennee@linaro.org>,
	kvmarm@lists.cs.columbia.edu,
	"Suzuki Poulose" <suzuki.poulose@arm.com>,
	"Lukas Braun" <koomi@moshbit.net>
Subject: [PATCH 25/28] KVM: arm/arm64: Fix unintended stage 2 PMD mappings
Date: Wed, 19 Dec 2018 18:03:46 +0000	[thread overview]
Message-ID: <20181219180349.242681-26-marc.zyngier@arm.com> (raw)
In-Reply-To: <20181219180349.242681-1-marc.zyngier@arm.com>

From: Christoffer Dall <christoffer.dall@arm.com>

There are two things we need to take care of when we create block
mappings in the stage 2 page tables:

  (1) The alignment within a PMD between the host address range and the
  guest IPA range must be the same, since otherwise we end up mapping
  pages with the wrong offset.

  (2) The head and tail of a memory slot may not cover a full block
  size, and we have to take care to not map those with block
  descriptors, since we could expose memory to the guest that the host
  did not intend to expose.

So far, we have been taking care of (1), but not (2), and our commentary
describing (1) was somewhat confusing.

This commit attempts to factor out the checks of both into a common
function, and if we don't pass the check, we won't attempt any PMD
mappings for neither hugetlbfs nor THP.

Note that we used to only check the alignment for THP, not for
hugetlbfs, but as far as I can tell the check needs to be applied to
both scenarios.

Cc: Ralph Palutke <ralph.palutke@fau.de>
Cc: Lukas Braun <koomi@moshbit.net>
Reported-by: Lukas Braun <koomi@moshbit.net>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/mmu.c | 86 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 64 insertions(+), 22 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index f605514395a1..dee3dbd98712 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1595,6 +1595,63 @@ static void kvm_send_hwpoison_signal(unsigned long address,
 	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
 }
 
+static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot,
+					       unsigned long hva)
+{
+	gpa_t gpa_start, gpa_end;
+	hva_t uaddr_start, uaddr_end;
+	size_t size;
+
+	size = memslot->npages * PAGE_SIZE;
+
+	gpa_start = memslot->base_gfn << PAGE_SHIFT;
+	gpa_end = gpa_start + size;
+
+	uaddr_start = memslot->userspace_addr;
+	uaddr_end = uaddr_start + size;
+
+	/*
+	 * Pages belonging to memslots that don't have the same alignment
+	 * within a PMD for userspace and IPA cannot be mapped with stage-2
+	 * PMD entries, because we'll end up mapping the wrong pages.
+	 *
+	 * Consider a layout like the following:
+	 *
+	 *    memslot->userspace_addr:
+	 *    +-----+--------------------+--------------------+---+
+	 *    |abcde|fgh  Stage-1 PMD    |    Stage-1 PMD   tv|xyz|
+	 *    +-----+--------------------+--------------------+---+
+	 *
+	 *    memslot->base_gfn << PAGE_SIZE:
+	 *      +---+--------------------+--------------------+-----+
+	 *      |abc|def  Stage-2 PMD    |    Stage-2 PMD     |tvxyz|
+	 *      +---+--------------------+--------------------+-----+
+	 *
+	 * If we create those stage-2 PMDs, we'll end up with this incorrect
+	 * mapping:
+	 *   d -> f
+	 *   e -> g
+	 *   f -> h
+	 */
+	if ((gpa_start & ~S2_PMD_MASK) != (uaddr_start & ~S2_PMD_MASK))
+		return false;
+
+	/*
+	 * Next, let's make sure we're not trying to map anything not covered
+	 * by the memslot. This means we have to prohibit PMD size mappings
+	 * for the beginning and end of a non-PMD aligned and non-PMD sized
+	 * memory slot (illustrated by the head and tail parts of the
+	 * userspace view above containing pages 'abcde' and 'xyz',
+	 * respectively).
+	 *
+	 * Note that it doesn't matter if we do the check using the
+	 * userspace_addr or the base_gfn, as both are equally aligned (per
+	 * the check above) and equally sized.
+	 */
+	return (hva & S2_PMD_MASK) >= uaddr_start &&
+	       (hva & S2_PMD_MASK) + S2_PMD_SIZE <= uaddr_end;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  unsigned long fault_status)
@@ -1621,6 +1678,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		return -EFAULT;
 	}
 
+	if (!fault_supports_stage2_pmd_mappings(memslot, hva))
+		force_pte = true;
+
+	if (logging_active)
+		force_pte = true;
+
 	/* Let's check if we will get back a huge page backed by hugetlbfs */
 	down_read(&current->mm->mmap_sem);
 	vma = find_vma_intersection(current->mm, hva, hva + 1);
@@ -1637,28 +1700,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	if ((vma_pagesize == PMD_SIZE ||
 	     (vma_pagesize == PUD_SIZE && kvm_stage2_has_pud(kvm))) &&
-	    !logging_active) {
+	    !force_pte) {
 		gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
-	} else {
-		/*
-		 * Fallback to PTE if it's not one of the Stage 2
-		 * supported hugepage sizes or the corresponding level
-		 * doesn't exist
-		 */
-		vma_pagesize = PAGE_SIZE;
-
-		/*
-		 * Pages belonging to memslots that don't have the same
-		 * alignment for userspace and IPA cannot be mapped using
-		 * block descriptors even if the pages belong to a THP for
-		 * the process, because the stage-2 block descriptor will
-		 * cover more than a single THP and we loose atomicity for
-		 * unmapping, updates, and splits of the THP or other pages
-		 * in the stage-2 block range.
-		 */
-		if ((memslot->userspace_addr & ~PMD_MASK) !=
-		    ((memslot->base_gfn << PAGE_SHIFT) & ~PMD_MASK))
-			force_pte = true;
 	}
 	up_read(&current->mm->mmap_sem);
 
@@ -1697,7 +1740,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		 * should not be mapped with huge pages (it introduces churn
 		 * and performance degradation), so force a pte mapping.
 		 */
-		force_pte = true;
 		flags |= KVM_S2_FLAG_LOGGING_ACTIVE;
 
 		/*
-- 
2.19.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2018-12-19 18:03 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-19 18:03 [GIT PULL] KVM/arm updates for 4.21 Marc Zyngier
2018-12-19 18:03 ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 01/28] arm64: KVM: Skip MMIO insn after emulation Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 02/28] arm64: KVM: Consistently advance singlestep when emulating instructions Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 03/28] KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 04/28] KVM: arm/arm64: Log PSTATE for unhandled sysregs Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 05/28] KVM: arm/arm64: vgic-v2: Set active_source to 0 when restoring state Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 06/28] KVM: arm/arm64: Share common code in user_mem_abort() Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 07/28] KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 08/28] KVM: arm/arm64: Introduce helpers to manipulate page table entries Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 09/28] KVM: arm64: Support dirty page tracking for PUD hugepages Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 10/28] KVM: arm64: Support PUD hugepage in stage2_is_exec() Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 11/28] KVM: arm64: Support handling access faults for PUD hugepages Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 12/28] KVM: arm64: Update age handlers to support " Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 13/28] KVM: arm64: Add support for creating PUD hugepages at stage 2 Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 14/28] KVM: arm/arm64: vgic: Do not cond_resched_lock() with IRQs disabled Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 15/28] KVM: arm64: Clarify explanation of STAGE2_PGTABLE_LEVELS Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 16/28] KVM: arm/arm64: vgic: Cap SPIs to the VM-defined maximum Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 17/28] KVM: arm/arm64: vgic: Fix off-by-one bug in vgic_get_irq() Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 18/28] KVM: arm/arm64: vgic: Consider priority and active state for pending irq Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 19/28] KVM: arm/arm64: Fixup the kvm_exit tracepoint Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 20/28] KVM: arm/arm64: Remove arch timer workqueue Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 21/28] KVM: arm/arm64: arch_timer: Simplify kvm_timer_vcpu_terminate Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 22/28] KVM: arm64: Make vcpu const in vcpu_read_sys_reg Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 23/28] arm64: KVM: Add trapped system register access tracepoint Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 24/28] arm/arm64: KVM: vgic: Force VM halt when changing the active state of GICv3 PPIs/SGIs Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` Marc Zyngier [this message]
2018-12-19 18:03   ` [PATCH 25/28] KVM: arm/arm64: Fix unintended stage 2 PMD mappings Marc Zyngier
2018-12-19 18:03 ` [PATCH 26/28] arm64: KVM: Avoid setting the upper 32 bits of VTCR_EL2 to 1 Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 27/28] arm/arm64: KVM: Add ARM_EXCEPTION_IS_TRAP macro Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 18:03 ` [PATCH 28/28] arm: KVM: Add S2_PMD_{MASK,SIZE} constants Marc Zyngier
2018-12-19 18:03   ` Marc Zyngier
2018-12-19 19:34 ` [GIT PULL] KVM/arm updates for 4.21 Paolo Bonzini
2018-12-19 19:34   ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181219180349.242681-26-marc.zyngier@arm.com \
    --to=marc.zyngier@arm.com \
    --cc=gustavo@embeddedor.com \
    --cc=koomi@moshbit.net \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=pbonzini@redhat.com \
    --cc=punit.agrawal@arm.com \
    --cc=rkrcmar@redhat.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.