All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Alexey Kardashevskiy <aik@ozlabs.ru>,
	David Gibson <david@gibson.dropbear.id.au>,
	Michael Ellerman <mpe@ellerman.id.au>
Subject: [PATCH 4.17 09/66] KVM: PPC: Check if IOMMU page is contained in the pinned physical page
Date: Fri, 27 Jul 2018 11:45:02 +0200	[thread overview]
Message-ID: <20180727093810.100569633@linuxfoundation.org> (raw)
In-Reply-To: <20180727093809.043856530@linuxfoundation.org>

4.17-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexey Kardashevskiy <aik@ozlabs.ru>

commit 76fa4975f3ed12d15762bc979ca44078598ed8ee upstream.

A VM which has:
 - a DMA capable device passed through to it (eg. network card);
 - running a malicious kernel that ignores H_PUT_TCE failure;
 - capability of using IOMMU pages bigger that physical pages
can create an IOMMU mapping that exposes (for example) 16MB of
the host physical memory to the device when only 64K was allocated to the VM.

The remaining 16MB - 64K will be some other content of host memory, possibly
including pages of the VM, but also pages of host kernel memory, host
programs or other VMs.

The attacking VM does not control the location of the page it can map,
and is only allowed to map as many pages as it has pages of RAM.

We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that
an IOMMU page is contained in the physical page so the PCI hardware won't
get access to unassigned host memory; however this check is missing in
the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and
did not hit this yet as the very first time when the mapping happens
we do not have tbl::it_userspace allocated yet and fall back to
the userspace which in turn calls VFIO IOMMU driver, this fails and
the guest does not retry,

This stores the smallest preregistered page size in the preregistered
region descriptor and changes the mm_iommu_xxx API to check this against
the IOMMU page size.

This calculates maximum page size as a minimum of the natural region
alignment and compound page size. For the page shift this uses the shift
returned by find_linux_pte() which indicates how the page is mapped to
the current userspace - if the page is huge and this is not a zero, then
it is a leaf pte and the page is mapped within the range.

Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 arch/powerpc/include/asm/mmu_context.h |    4 +--
 arch/powerpc/kvm/book3s_64_vio.c       |    2 -
 arch/powerpc/kvm/book3s_64_vio_hv.c    |    6 +++--
 arch/powerpc/mm/mmu_context_iommu.c    |   37 +++++++++++++++++++++++++++++++--
 drivers/vfio/vfio_iommu_spapr_tce.c    |    2 -
 5 files changed, 43 insertions(+), 8 deletions(-)

--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -35,9 +35,9 @@ extern struct mm_iommu_table_group_mem_t
 extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
 		unsigned long ua, unsigned long entries);
 extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
-		unsigned long ua, unsigned long *hpa);
+		unsigned long ua, unsigned int pageshift, unsigned long *hpa);
 extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
-		unsigned long ua, unsigned long *hpa);
+		unsigned long ua, unsigned int pageshift, unsigned long *hpa);
 extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
 extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
 #endif
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -433,7 +433,7 @@ long kvmppc_tce_iommu_map(struct kvm *kv
 		/* This only handles v2 IOMMU type, v1 is handled via ioctl() */
 		return H_TOO_HARD;
 
-	if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, &hpa)))
+	if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, &hpa)))
 		return H_HARDWARE;
 
 	if (mm_iommu_mapped_inc(mem))
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -262,7 +262,8 @@ static long kvmppc_rm_tce_iommu_map(stru
 	if (!mem)
 		return H_TOO_HARD;
 
-	if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, &hpa)))
+	if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift,
+			&hpa)))
 		return H_HARDWARE;
 
 	pua = (void *) vmalloc_to_phys(pua);
@@ -431,7 +432,8 @@ long kvmppc_rm_h_put_tce_indirect(struct
 
 		mem = mm_iommu_lookup_rm(vcpu->kvm->mm, ua, IOMMU_PAGE_SIZE_4K);
 		if (mem)
-			prereg = mm_iommu_ua_to_hpa_rm(mem, ua, &tces) == 0;
+			prereg = mm_iommu_ua_to_hpa_rm(mem, ua,
+					IOMMU_PAGE_SHIFT_4K, &tces) == 0;
 	}
 
 	if (!prereg) {
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -19,6 +19,7 @@
 #include <linux/hugetlb.h>
 #include <linux/swap.h>
 #include <asm/mmu_context.h>
+#include <asm/pte-walk.h>
 
 static DEFINE_MUTEX(mem_list_mutex);
 
@@ -27,6 +28,7 @@ struct mm_iommu_table_group_mem_t {
 	struct rcu_head rcu;
 	unsigned long used;
 	atomic64_t mapped;
+	unsigned int pageshift;
 	u64 ua;			/* userspace address */
 	u64 entries;		/* number of entries in hpas[] */
 	u64 *hpas;		/* vmalloc'ed */
@@ -125,6 +127,8 @@ long mm_iommu_get(struct mm_struct *mm,
 {
 	struct mm_iommu_table_group_mem_t *mem;
 	long i, j, ret = 0, locked_entries = 0;
+	unsigned int pageshift;
+	unsigned long flags;
 	struct page *page = NULL;
 
 	mutex_lock(&mem_list_mutex);
@@ -159,6 +163,12 @@ long mm_iommu_get(struct mm_struct *mm,
 		goto unlock_exit;
 	}
 
+	/*
+	 * For a starting point for a maximum page size calculation
+	 * we use @ua and @entries natural alignment to allow IOMMU pages
+	 * smaller than huge pages but still bigger than PAGE_SIZE.
+	 */
+	mem->pageshift = __ffs(ua | (entries << PAGE_SHIFT));
 	mem->hpas = vzalloc(entries * sizeof(mem->hpas[0]));
 	if (!mem->hpas) {
 		kfree(mem);
@@ -199,6 +209,23 @@ long mm_iommu_get(struct mm_struct *mm,
 			}
 		}
 populate:
+		pageshift = PAGE_SHIFT;
+		if (PageCompound(page)) {
+			pte_t *pte;
+			struct page *head = compound_head(page);
+			unsigned int compshift = compound_order(head);
+
+			local_irq_save(flags); /* disables as well */
+			pte = find_linux_pte(mm->pgd, ua, NULL, &pageshift);
+			local_irq_restore(flags);
+
+			/* Double check it is still the same pinned page */
+			if (pte && pte_page(*pte) == head &&
+					pageshift == compshift)
+				pageshift = max_t(unsigned int, pageshift,
+						PAGE_SHIFT);
+		}
+		mem->pageshift = min(mem->pageshift, pageshift);
 		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
 	}
 
@@ -349,7 +376,7 @@ struct mm_iommu_table_group_mem_t *mm_io
 EXPORT_SYMBOL_GPL(mm_iommu_find);
 
 long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
-		unsigned long ua, unsigned long *hpa)
+		unsigned long ua, unsigned int pageshift, unsigned long *hpa)
 {
 	const long entry = (ua - mem->ua) >> PAGE_SHIFT;
 	u64 *va = &mem->hpas[entry];
@@ -357,6 +384,9 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_
 	if (entry >= mem->entries)
 		return -EFAULT;
 
+	if (pageshift > mem->pageshift)
+		return -EFAULT;
+
 	*hpa = *va | (ua & ~PAGE_MASK);
 
 	return 0;
@@ -364,7 +394,7 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_
 EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa);
 
 long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
-		unsigned long ua, unsigned long *hpa)
+		unsigned long ua, unsigned int pageshift, unsigned long *hpa)
 {
 	const long entry = (ua - mem->ua) >> PAGE_SHIFT;
 	void *va = &mem->hpas[entry];
@@ -373,6 +403,9 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iom
 	if (entry >= mem->entries)
 		return -EFAULT;
 
+	if (pageshift > mem->pageshift)
+		return -EFAULT;
+
 	pa = (void *) vmalloc_to_phys(va);
 	if (!pa)
 		return -EFAULT;
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -467,7 +467,7 @@ static int tce_iommu_prereg_ua_to_hpa(st
 	if (!mem)
 		return -EINVAL;
 
-	ret = mm_iommu_ua_to_hpa(mem, tce, phpa);
+	ret = mm_iommu_ua_to_hpa(mem, tce, shift, phpa);
 	if (ret)
 		return -EINVAL;
 



  parent reply	other threads:[~2018-07-27  9:47 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-27  9:44 [PATCH 4.17 00/66] 4.17.11-stable review Greg Kroah-Hartman
2018-07-27  9:44 ` [PATCH 4.17 01/66] KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR Greg Kroah-Hartman
2018-07-27  9:44 ` [PATCH 4.17 02/66] Revert "iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent()" Greg Kroah-Hartman
2018-07-27  9:44 ` [PATCH 4.17 03/66] MIPS: ath79: fix register address in ath79_ddr_wb_flush() Greg Kroah-Hartman
2018-07-27  9:44 ` [PATCH 4.17 04/66] MIPS: Fix off-by-one in pci_resource_to_user() Greg Kroah-Hartman
2018-07-27  9:44 ` [PATCH 4.17 05/66] clk: mvebu: armada-37xx-periph: Fix switching CPU rate from 300Mhz to 1.2GHz Greg Kroah-Hartman
2018-07-27  9:44 ` [PATCH 4.17 06/66] clk: aspeed: Mark bclk (PCIe) and dclk (VGA) as critical Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 08/66] xen/PVH: Set up GS segment for stack canary Greg Kroah-Hartman
2018-07-27  9:45 ` Greg Kroah-Hartman [this message]
2018-07-27  9:45 ` [PATCH 4.17 10/66] drm/nouveau/drm/nouveau: Fix runtime PM leak in nv50_disp_atomic_commit() Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 11/66] drm/nouveau: Set DRIVER_ATOMIC cap earlier to fix debugfs Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 12/66] clk: meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 13/66] bonding: set default miimon value for non-arp modes if not set Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 14/66] ip: hash fragments consistently Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 15/66] ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 17/66] net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 18/66] net-next/hinic: fix a problem in hinic_xmit_frame() Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 19/66] net: skb_segment() should not return NULL Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 20/66] tcp: fix dctcp delayed ACK schedule Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 21/66] tcp: helpers to send special DCTCP ack Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 22/66] tcp: do not cancel delay-AcK on DCTCP special ACK Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 23/66] tcp: do not delay ACK in DCTCP upon CE status change Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 24/66] net/mlx5: E-Switch, UBSAN fix undefined behavior in mlx5_eswitch_mode Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 25/66] r8169: restore previous behavior to accept BIOS WoL settings Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 26/66] tls: check RCV_SHUTDOWN in tls_wait_data Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 27/66] net/mlx5e: Add ingress/egress indication for offloaded TC flows Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 28/66] net/mlx5e: Only allow offloading decap egress (egdev) flows Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 29/66] net/mlx5e: Refine ets validation function Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 30/66] nfp: flower: ensure dead neighbour entries are not offloaded Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 31/66] sock: fix sg page frag coalescing in sk_alloc_sg Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 32/66] net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 33/66] multicast: do not restore deleted record source filter mode to new one Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 34/66] net/ipv6: Fix linklocal to global address with VRF Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 35/66] net/mlx5e: Dont allow aRFS for encapsulated packets Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 36/66] net/mlx5e: Fix quota counting in aRFS expire flow Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 37/66] net/mlx5: Adjust clock overflow work period Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 38/66] rtnetlink: add rtnl_link_state check in rtnl_configure_link Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 39/66] vxlan: add new fdb alloc and create helpers Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 40/66] vxlan: make netlink notify in vxlan_fdb_destroy optional Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 41/66] vxlan: fix default fdb entry netlink notify ordering during netdev create Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 42/66] tcp: free batches of packets in tcp_prune_ofo_queue() Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 43/66] tcp: avoid collapses in tcp_prune_queue() if possible Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 44/66] tcp: detect malicious patterns in tcp_collapse_ofo_queue() Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 45/66] tcp: call tcp_drop() from tcp_data_queue_ofo() Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 46/66] tcp: add tcp_ooo_try_coalesce() helper Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 47/66] Revert "staging:r8188eu: Use lib80211 to support TKIP" Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 48/66] staging: speakup: fix wraparound in uaccess length check Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 49/66] usb: cdc_acm: Add quirk for Castles VEGA3000 Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 50/66] usb: core: handle hub C_PORT_OVER_CURRENT condition Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 52/66] usb: xhci: Fix memory leak in xhci_endpoint_reset() Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 53/66] usb: gadget: Fix OS descriptors support Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 54/66] usb: gadget: f_fs: Only return delayed status when len is 0 Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 55/66] ACPICA: AML Parser: ignore dispatcher error status during table load Greg Kroah-Hartman
2018-07-30  9:52   ` Rafael J. Wysocki
2018-07-30 11:44     ` Greg Kroah-Hartman
2018-07-30 11:44       ` Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 56/66] driver core: Partially revert "driver core: correct devices shutdown order" Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 57/66] can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 58/66] can: xilinx_can: fix power management handling Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 59/66] can: xilinx_can: fix recovery from error states not being propagated Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 60/66] can: xilinx_can: fix device dropping off bus on RX overrun Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 61/66] can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 62/66] can: xilinx_can: fix incorrect clear of non-processed interrupts Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 63/66] can: xilinx_can: fix RX overflow interrupt not being enabled Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 64/66] can: peak_canfd: fix firmware < v3.3.0: limit allocation to 32-bit DMA addr only Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 65/66] can: m_can: Fix runtime resume call Greg Kroah-Hartman
2018-07-27  9:45 ` [PATCH 4.17 66/66] can: m_can.c: fix setup of CCCR register: clear CCCR NISO bit before checking can.ctrlmode Greg Kroah-Hartman
2018-07-27 17:31 ` [PATCH 4.17 00/66] 4.17.11-stable review Guenter Roeck
2018-07-28  5:41   ` Greg Kroah-Hartman
2018-07-27 19:49 ` Shuah Khan
2018-07-28  5:41   ` Greg Kroah-Hartman
2018-07-28  6:54 ` Naresh Kamboju
2018-07-28  7:20   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180727093810.100569633@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=aik@ozlabs.ru \
    --cc=david@gibson.dropbear.id.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.