linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Support IOMMU page sizes larger than the CPU page size
@ 2021-10-19 16:37 Sven Peter
  2021-10-19 16:37 ` [PATCH v3 1/6] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE Sven Peter
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Sven Peter @ 2021-10-19 16:37 UTC (permalink / raw)
  To: iommu, Robin Murphy
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Arnd Bergmann,
	Marc Zyngier, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

Hi,

RFC: https://lore.kernel.org/linux-iommu/20210806155523.50429-1-sven@svenpeter.dev/
 v2: https://lore.kernel.org/linux-iommu/20210828153642.19396-1-sven@svenpeter.dev/

Time to revive this series:

v2 -> v3:
  - Dropped support for untrusted devices since swiotlb currently does not
    allow aligning buffers to granularities bigger than PAGE_SIZE.
    Getting this to work is possibly but a bit tricky together with min_align_mask.
    Right now there are no untrusted device on the M1 anyway and this series already
    feels big enough. I've therefore decided to address this in a follow up.
  - Replaced phys_to_page with pfn_to_page(PHYS_PFN(..)) since not all architectures
    define phys_to_page
  - Replaced the PAGE_SIZE > granule check in iommu_check_page_size with
    domain->pgsize_bitmap & (PAGE_SIZE | (PAGE_SIZE - 1)) as suggested by Robin
  - Rebased on the latest rc which required to introduce sg_alloc_append_table_from_pages
    since __sg_alloc_table_from_pages no longer exists 

RFC -> v2:
  - essentially a comlpetely rewrite of the first approach which just padded every
    allocation

Some background: On the Apple M1 the IOMMUs are hardwired to only support 16 KB pages.
We'd still like to boot Linux with 4KB pages though because that's what most distros
ship these days. I've been told this also helps with Android userspace compatibility
and x86 emulation.
This patch series adds support for that setup to the IOMMU DMA API.

This is essentially done by always mapping the encapsulating IOMMU page and adjusting
the returned iova offset. There are also changes to only allow DMA domains to make use
of this and prevent UNMANAGED domains from encountering unexpected situations.


Best,

Sven

Sven Peter (6):
  iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
  iommu/dma: Support granule > PAGE_SIZE in dma_map_sg
  iommu/dma: Support granule > PAGE_SIZE allocations
  iommu: Move IOMMU pagesize check to attach_device
  iommu: Introduce __IOMMU_DOMAIN_LP
  iommu/dart: Remove force_bypass logic

 drivers/iommu/apple-dart.c |  14 +---
 drivers/iommu/dma-iommu.c  | 134 +++++++++++++++++++++++++++++++------
 drivers/iommu/iommu.c      |  40 ++++++++++-
 drivers/iommu/iova.c       |   7 +-
 include/linux/iommu.h      |  18 ++++-
 5 files changed, 174 insertions(+), 39 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/6] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
  2021-10-19 16:37 [PATCH v3 0/6] Support IOMMU page sizes larger than the CPU page size Sven Peter
@ 2021-10-19 16:37 ` Sven Peter
  2021-10-19 16:37 ` [PATCH v3 2/6] iommu/dma: Support granule > PAGE_SIZE in dma_map_sg Sven Peter
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Sven Peter @ 2021-10-19 16:37 UTC (permalink / raw)
  To: iommu, Robin Murphy
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Arnd Bergmann,
	Marc Zyngier, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

While this function *probably* works correctly without any changes for
granule > PAGE_SIZE I don't have any code to actually test it and cannot
reason about how the function is supposed to work.
Disable it instead until we run into a use case where it's required.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/iommu/dma-iommu.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index de5040b65529..17f25632a0d6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1249,9 +1249,15 @@ static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		unsigned long attrs)
 {
+	struct iommu_domain *domain = iommu_get_dma_domain(dev);
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iova_domain *iovad = &cookie->iovad;
 	struct page *page;
 	int ret;
 
+	if (iovad->granule > PAGE_SIZE)
+		return -ENXIO;
+
 	if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
 		struct page **pages = dma_common_find_pages(cpu_addr);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/6] iommu/dma: Support granule > PAGE_SIZE in dma_map_sg
  2021-10-19 16:37 [PATCH v3 0/6] Support IOMMU page sizes larger than the CPU page size Sven Peter
  2021-10-19 16:37 ` [PATCH v3 1/6] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE Sven Peter
@ 2021-10-19 16:37 ` Sven Peter
  2021-10-19 16:37 ` [PATCH v3 3/6] iommu/dma: Support granule > PAGE_SIZE allocations Sven Peter
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Sven Peter @ 2021-10-19 16:37 UTC (permalink / raw)
  To: iommu, Robin Murphy
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Arnd Bergmann,
	Marc Zyngier, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

Add support to iommu_dma_map_sg's impedance matching to also align
sg_lists correctly when the IOMMU granule is larger than PAGE_SIZE.

Co-developed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/iommu/dma-iommu.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 17f25632a0d6..ea799e70fc98 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -19,6 +19,7 @@
 #include <linux/irq.h>
 #include <linux/mm.h>
 #include <linux/mutex.h>
+#include <linux/pfn.h>
 #include <linux/pci.h>
 #include <linux/swiotlb.h>
 #include <linux/scatterlist.h>
@@ -878,8 +879,9 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
 		unsigned int s_length = sg_dma_len(s);
 		unsigned int s_iova_len = s->length;
 
-		s->offset += s_iova_off;
-		s->length = s_length;
+		sg_set_page(s,
+			    pfn_to_page(PHYS_PFN(sg_phys(s) + s_iova_off)),
+			    s_length, s_iova_off & ~PAGE_MASK);
 		sg_dma_address(s) = DMA_MAPPING_ERROR;
 		sg_dma_len(s) = 0;
 
@@ -920,13 +922,17 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
 static void __invalidate_sg(struct scatterlist *sg, int nents)
 {
 	struct scatterlist *s;
+	phys_addr_t orig_paddr;
 	int i;
 
 	for_each_sg(sg, s, nents, i) {
-		if (sg_dma_address(s) != DMA_MAPPING_ERROR)
-			s->offset += sg_dma_address(s);
-		if (sg_dma_len(s))
-			s->length = sg_dma_len(s);
+		if (sg_dma_len(s)) {
+			orig_paddr = sg_phys(s) + sg_dma_address(s);
+			sg_set_page(s,
+				    pfn_to_page(PHYS_PFN(orig_paddr)),
+				    sg_dma_len(s),
+				    sg_dma_address(s) & ~PAGE_MASK);
+		}
 		sg_dma_address(s) = DMA_MAPPING_ERROR;
 		sg_dma_len(s) = 0;
 	}
@@ -1003,15 +1009,16 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 	 * stashing the unaligned parts in the as-yet-unused DMA fields.
 	 */
 	for_each_sg(sg, s, nents, i) {
-		size_t s_iova_off = iova_offset(iovad, s->offset);
+		phys_addr_t s_phys = sg_phys(s);
+		size_t s_iova_off = iova_offset(iovad, s_phys);
 		size_t s_length = s->length;
 		size_t pad_len = (mask - iova_len + 1) & mask;
 
 		sg_dma_address(s) = s_iova_off;
 		sg_dma_len(s) = s_length;
-		s->offset -= s_iova_off;
 		s_length = iova_align(iovad, s_length + s_iova_off);
-		s->length = s_length;
+		sg_set_page(s, pfn_to_page(PHYS_PFN(s_phys - s_iova_off)),
+			    s_length, s->offset & ~s_iova_off);
 
 		/*
 		 * Due to the alignment of our single IOVA allocation, we can
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/6] iommu/dma: Support granule > PAGE_SIZE allocations
  2021-10-19 16:37 [PATCH v3 0/6] Support IOMMU page sizes larger than the CPU page size Sven Peter
  2021-10-19 16:37 ` [PATCH v3 1/6] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE Sven Peter
  2021-10-19 16:37 ` [PATCH v3 2/6] iommu/dma: Support granule > PAGE_SIZE in dma_map_sg Sven Peter
@ 2021-10-19 16:37 ` Sven Peter
  2021-10-19 16:37 ` [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device Sven Peter
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Sven Peter @ 2021-10-19 16:37 UTC (permalink / raw)
  To: iommu, Robin Murphy
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Arnd Bergmann,
	Marc Zyngier, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

Noncontiguous allocations must be made up of individual blocks
in a way that allows those blocks to be mapped contiguously in IOVA space.
For IOMMU page sizes larger than the CPU page size this can be done
by allocating all individual blocks from pools with
order >= get_order(iovad->granule). Some spillover pages might be
allocated at the end, which can however immediately be freed.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/iommu/dma-iommu.c | 103 ++++++++++++++++++++++++++++++++++----
 1 file changed, 93 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ea799e70fc98..579a5a89d1ec 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -17,6 +17,7 @@
 #include <linux/iommu.h>
 #include <linux/iova.h>
 #include <linux/irq.h>
+#include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/mutex.h>
 #include <linux/pfn.h>
@@ -547,6 +548,9 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,
 {
 	struct page **pages;
 	unsigned int i = 0, nid = dev_to_node(dev);
+	unsigned int j;
+	unsigned long min_order = __fls(order_mask);
+	unsigned int min_order_size = 1U << min_order;
 
 	order_mask &= (2U << MAX_ORDER) - 1;
 	if (!order_mask)
@@ -586,15 +590,37 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,
 				split_page(page, order);
 			break;
 		}
-		if (!page) {
-			__iommu_dma_free_pages(pages, i);
-			return NULL;
+
+		/*
+		 * If we have no valid page here we might be trying to allocate
+		 * the last block consisting of 1<<order pages (to guarantee
+		 * alignment) but actually need less pages than that.
+		 * In that case we just try to allocate the entire block and
+		 * directly free the spillover pages again.
+		 */
+		if (!page && !order_mask && count < min_order_size) {
+			page = alloc_pages_node(nid, gfp, min_order);
+			if (!page)
+				goto free_pages;
+			split_page(page, min_order);
+
+			for (j = count; j < min_order_size; ++j)
+				__free_page(page + j);
+
+			order_size = count;
 		}
+
+		if (!page)
+			goto free_pages;
 		count -= order_size;
 		while (order_size--)
 			pages[i++] = page++;
 	}
 	return pages;
+
+free_pages:
+	__iommu_dma_free_pages(pages, i);
+	return NULL;
 }
 
 /*
@@ -611,15 +637,27 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
 	bool coherent = dev_is_dma_coherent(dev);
 	int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
 	unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
+	struct sg_append_table sgt_append = {};
+	struct scatterlist *last_sg;
 	struct page **pages;
 	dma_addr_t iova;
+	phys_addr_t orig_s_phys;
+	size_t orig_s_len, orig_s_off, s_iova_off, iova_size;
 
 	if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
 	    iommu_deferred_attach(dev, domain))
 		return NULL;
 
 	min_size = alloc_sizes & -alloc_sizes;
-	if (min_size < PAGE_SIZE) {
+	if (iovad->granule > PAGE_SIZE) {
+		if (size < iovad->granule) {
+			/* ensure a single contiguous allocation */
+			min_size = ALIGN(size, PAGE_SIZE*(1U<<get_order(size)));
+			alloc_sizes = min_size;
+		}
+
+		size = PAGE_ALIGN(size);
+	} else if (min_size < PAGE_SIZE) {
 		min_size = PAGE_SIZE;
 		alloc_sizes |= PAGE_SIZE;
 	} else {
@@ -634,13 +672,17 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
 	if (!pages)
 		return NULL;
 
-	size = iova_align(iovad, size);
-	iova = iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, dev);
+	iova_size = iova_align(iovad, size);
+	iova = iommu_dma_alloc_iova(domain, iova_size, dev->coherent_dma_mask, dev);
 	if (!iova)
 		goto out_free_pages;
 
-	if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL))
+	/* append_table is only used to get a pointer to the last entry */
+	if (sg_alloc_append_table_from_pages(&sgt_append, pages, count, 0,
+					iova_size, UINT_MAX, 0, GFP_KERNEL))
 		goto out_free_iova;
+	memcpy(sgt, &sgt_append.sgt, sizeof(*sgt));
+	last_sg = sgt_append.prv;
 
 	if (!(ioprot & IOMMU_CACHE)) {
 		struct scatterlist *sg;
@@ -650,18 +692,59 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
 			arch_dma_prep_coherent(sg_page(sg), sg->length);
 	}
 
+	if (iovad->granule > PAGE_SIZE) {
+		if (size < iovad->granule) {
+			/*
+			 * we only have a single sg list entry here that is
+			 * likely not aligned to iovad->granule. adjust the
+			 * entry to represent the encapsulating IOMMU page
+			 * and then later restore everything to its original
+			 * values, similar to the impedance matching done in
+			 * iommu_dma_map_sg.
+			 */
+			orig_s_phys = sg_phys(sgt->sgl);
+			orig_s_len = sgt->sgl->length;
+			orig_s_off = sgt->sgl->offset;
+			s_iova_off = iova_offset(iovad, orig_s_phys);
+
+			sg_set_page(sgt->sgl,
+				pfn_to_page(PHYS_PFN(orig_s_phys - s_iova_off)),
+				iova_align(iovad, orig_s_len + s_iova_off),
+				sgt->sgl->offset & ~s_iova_off);
+		} else {
+			/*
+			 * convince iommu_map_sg_atomic to map the last block
+			 * even though it may be too small.
+			 */
+			orig_s_len = last_sg->length;
+			last_sg->length = iova_align(iovad, last_sg->length);
+		}
+	}
+
 	if (iommu_map_sg_atomic(domain, iova, sgt->sgl, sgt->orig_nents, ioprot)
-			< size)
+			< iova_size)
 		goto out_free_sg;
 
+	if (iovad->granule > PAGE_SIZE) {
+		if (size < iovad->granule) {
+			sg_set_page(sgt->sgl,
+				pfn_to_page(PHYS_PFN(orig_s_phys)),
+				orig_s_len, orig_s_off);
+
+			iova += s_iova_off;
+		} else {
+			last_sg->length = orig_s_len;
+		}
+	}
+
 	sgt->sgl->dma_address = iova;
-	sgt->sgl->dma_length = size;
+	sgt->sgl->dma_length = iova_size;
 	return pages;
 
 out_free_sg:
 	sg_free_table(sgt);
 out_free_iova:
-	iommu_dma_free_iova(cookie, iova, size, NULL);
+	iommu_dma_free_iova(cookie, iova, iova_size, NULL);
 out_free_pages:
 	__iommu_dma_free_pages(pages, count);
 	return NULL;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device
  2021-10-19 16:37 [PATCH v3 0/6] Support IOMMU page sizes larger than the CPU page size Sven Peter
                   ` (2 preceding siblings ...)
  2021-10-19 16:37 ` [PATCH v3 3/6] iommu/dma: Support granule > PAGE_SIZE allocations Sven Peter
@ 2021-10-19 16:37 ` Sven Peter
  2021-10-20  5:21   ` Lu Baolu
  2021-10-19 16:37 ` [PATCH v3 5/6] iommu: Introduce __IOMMU_DOMAIN_LP Sven Peter
  2021-10-19 16:37 ` [PATCH v3 6/6] iommu/dart: Remove force_bypass logic Sven Peter
  5 siblings, 1 reply; 15+ messages in thread
From: Sven Peter @ 2021-10-19 16:37 UTC (permalink / raw)
  To: iommu, Robin Murphy
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Arnd Bergmann,
	Marc Zyngier, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

The iova allocator is capable of handling any granularity which is a power
of two. Remove the much stronger condition that the granularity must be
smaller or equal to the CPU page size from a BUG_ON there.
Instead, check this condition during __iommu_attach_device and fail
gracefully.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/iommu/iommu.c | 35 ++++++++++++++++++++++++++++++++---
 drivers/iommu/iova.c  |  7 ++++---
 include/linux/iommu.h |  5 +++++
 3 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index dd7863e453a5..28896739964b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -80,6 +80,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 						 unsigned type);
 static int __iommu_attach_device(struct iommu_domain *domain,
 				 struct device *dev);
+static void __iommu_detach_device(struct iommu_domain *domain,
+				  struct device *dev);
 static int __iommu_attach_group(struct iommu_domain *domain,
 				struct iommu_group *group);
 static void __iommu_detach_group(struct iommu_domain *domain,
@@ -1974,6 +1976,19 @@ void iommu_domain_free(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL_GPL(iommu_domain_free);
 
+static int iommu_check_page_size(struct iommu_domain *domain)
+{
+	if (!iommu_is_paging_domain(domain))
+		return 0;
+
+	if (!(domain->pgsize_bitmap & (PAGE_SIZE | (PAGE_SIZE - 1)))) {
+		pr_warn("IOMMU pages cannot exactly represent CPU pages.\n");
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
 static int __iommu_attach_device(struct iommu_domain *domain,
 				 struct device *dev)
 {
@@ -1983,9 +1998,23 @@ static int __iommu_attach_device(struct iommu_domain *domain,
 		return -ENODEV;
 
 	ret = domain->ops->attach_dev(domain, dev);
-	if (!ret)
-		trace_attach_device_to_domain(dev);
-	return ret;
+	if (ret)
+		return ret;
+
+	/*
+	 * Check that CPU pages can be represented by the IOVA granularity.
+	 * This has to be done after ops->attach_dev since many IOMMU drivers
+	 * only limit domain->pgsize_bitmap after having attached the first
+	 * device.
+	 */
+	ret = iommu_check_page_size(domain);
+	if (ret) {
+		__iommu_detach_device(domain, dev);
+		return ret;
+	}
+
+	trace_attach_device_to_domain(dev);
+	return 0;
 }
 
 int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 9e8bc802ac05..707eb0ceb29f 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -50,10 +50,11 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule,
 {
 	/*
 	 * IOVA granularity will normally be equal to the smallest
-	 * supported IOMMU page size; both *must* be capable of
-	 * representing individual CPU pages exactly.
+	 * supported IOMMU page size; while both usually are capable of
+	 * representing individual CPU pages exactly the IOVA allocator
+	 * supports any granularities that are an exact power of two.
 	 */
-	BUG_ON((granule > PAGE_SIZE) || !is_power_of_2(granule));
+	BUG_ON(!is_power_of_2(granule));
 
 	spin_lock_init(&iovad->iova_rbtree_lock);
 	iovad->rbroot = RB_ROOT;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d2f3435e7d17..cabd25879613 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -101,6 +101,11 @@ static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
 	return domain->type & __IOMMU_DOMAIN_DMA_API;
 }
 
+static inline bool iommu_is_paging_domain(struct iommu_domain *domain)
+{
+	return domain->type & __IOMMU_DOMAIN_PAGING;
+}
+
 enum iommu_cap {
 	IOMMU_CAP_CACHE_COHERENCY,	/* IOMMU can enforce cache coherent DMA
 					   transactions */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 5/6] iommu: Introduce __IOMMU_DOMAIN_LP
  2021-10-19 16:37 [PATCH v3 0/6] Support IOMMU page sizes larger than the CPU page size Sven Peter
                   ` (3 preceding siblings ...)
  2021-10-19 16:37 ` [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device Sven Peter
@ 2021-10-19 16:37 ` Sven Peter
  2021-10-19 16:37 ` [PATCH v3 6/6] iommu/dart: Remove force_bypass logic Sven Peter
  5 siblings, 0 replies; 15+ messages in thread
From: Sven Peter @ 2021-10-19 16:37 UTC (permalink / raw)
  To: iommu, Robin Murphy
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Arnd Bergmann,
	Marc Zyngier, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

__IOMMU_DOMAIN_LP (large pages) indicates that a domain can handle
conditions where PAGE_SIZE might be smaller than the IOMMU page size.
Always allow attaching trusted devices to such domains and set the flag for
IOMMU_DOMAIN_DMA, which can now handle these situations.

Note that untrusted devices are not yet supported. Those require
additional changes to allow aligning swiotlb buffers to granularities
larger than PAGE_SIZE.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/iommu/iommu.c |  9 +++++++--
 include/linux/iommu.h | 13 +++++++++++--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 28896739964b..66bba6a6bb28 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1976,10 +1976,15 @@ void iommu_domain_free(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL_GPL(iommu_domain_free);
 
-static int iommu_check_page_size(struct iommu_domain *domain)
+static int iommu_check_page_size(struct iommu_domain *domain,
+				struct device *dev)
 {
+	bool trusted = !(dev_is_pci(dev) && to_pci_dev(dev)->untrusted);
+
 	if (!iommu_is_paging_domain(domain))
 		return 0;
+	if (iommu_is_large_pages_domain(domain) && trusted)
+		return 0;
 
 	if (!(domain->pgsize_bitmap & (PAGE_SIZE | (PAGE_SIZE - 1)))) {
 		pr_warn("IOMMU pages cannot exactly represent CPU pages.\n");
@@ -2007,7 +2012,7 @@ static int __iommu_attach_device(struct iommu_domain *domain,
 	 * only limit domain->pgsize_bitmap after having attached the first
 	 * device.
 	 */
-	ret = iommu_check_page_size(domain);
+	ret = iommu_check_page_size(domain, dev);
 	if (ret) {
 		__iommu_detach_device(domain, dev);
 		return ret;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index cabd25879613..1f1af59d0522 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -62,6 +62,8 @@ struct iommu_domain_geometry {
 					      implementation              */
 #define __IOMMU_DOMAIN_PT	(1U << 2)  /* Domain is identity mapped   */
 #define __IOMMU_DOMAIN_DMA_FQ	(1U << 3)  /* DMA-API uses flush queue    */
+#define __IOMMU_DOMAIN_LP	(1U << 4)  /* Support for PAGE_SIZE smaller
+					      than IOMMU page size        */
 
 /*
  * This are the possible domain-types
@@ -81,10 +83,12 @@ struct iommu_domain_geometry {
 #define IOMMU_DOMAIN_IDENTITY	(__IOMMU_DOMAIN_PT)
 #define IOMMU_DOMAIN_UNMANAGED	(__IOMMU_DOMAIN_PAGING)
 #define IOMMU_DOMAIN_DMA	(__IOMMU_DOMAIN_PAGING |	\
-				 __IOMMU_DOMAIN_DMA_API)
+				 __IOMMU_DOMAIN_DMA_API |       \
+				 __IOMMU_DOMAIN_LP)
 #define IOMMU_DOMAIN_DMA_FQ	(__IOMMU_DOMAIN_PAGING |	\
 				 __IOMMU_DOMAIN_DMA_API |	\
-				 __IOMMU_DOMAIN_DMA_FQ)
+				 __IOMMU_DOMAIN_DMA_FQ |        \
+				 __IOMMU_DOMAIN_LP)
 
 struct iommu_domain {
 	unsigned type;
@@ -106,6 +110,11 @@ static inline bool iommu_is_paging_domain(struct iommu_domain *domain)
 	return domain->type & __IOMMU_DOMAIN_PAGING;
 }
 
+static inline bool iommu_is_large_pages_domain(struct iommu_domain *domain)
+{
+	return domain->type & __IOMMU_DOMAIN_LP;
+}
+
 enum iommu_cap {
 	IOMMU_CAP_CACHE_COHERENCY,	/* IOMMU can enforce cache coherent DMA
 					   transactions */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 6/6] iommu/dart: Remove force_bypass logic
  2021-10-19 16:37 [PATCH v3 0/6] Support IOMMU page sizes larger than the CPU page size Sven Peter
                   ` (4 preceding siblings ...)
  2021-10-19 16:37 ` [PATCH v3 5/6] iommu: Introduce __IOMMU_DOMAIN_LP Sven Peter
@ 2021-10-19 16:37 ` Sven Peter
  5 siblings, 0 replies; 15+ messages in thread
From: Sven Peter @ 2021-10-19 16:37 UTC (permalink / raw)
  To: iommu, Robin Murphy
  Cc: Sven Peter, Joerg Roedel, Will Deacon, Arnd Bergmann,
	Marc Zyngier, Mohamed Mediouni, Alexander Graf, Hector Martin,
	Alyssa Rosenzweig, linux-kernel

Now that the dma-iommu API supports IOMMU granules which are larger than
the CPU page size and that the kernel no longer runs into a BUG_ON when
devices are attached to a domain with such a granule there's no need to
force bypass mode anymore.

Signed-off-by: Sven Peter <sven@svenpeter.dev>
---
 drivers/iommu/apple-dart.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 280ff8df728d..ce92195db638 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -90,7 +90,6 @@
  * @lock: lock for hardware operations involving this dart
  * @pgsize: pagesize supported by this DART
  * @supports_bypass: indicates if this DART supports bypass mode
- * @force_bypass: force bypass mode due to pagesize mismatch?
  * @sid2group: maps stream ids to iommu_groups
  * @iommu: iommu core device
  */
@@ -107,7 +106,6 @@ struct apple_dart {
 
 	u32 pgsize;
 	u32 supports_bypass : 1;
-	u32 force_bypass : 1;
 
 	struct iommu_group *sid2group[DART_MAX_STREAMS];
 	struct iommu_device iommu;
@@ -488,9 +486,6 @@ static int apple_dart_attach_dev(struct iommu_domain *domain,
 	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
 	struct apple_dart_domain *dart_domain = to_dart_domain(domain);
 
-	if (cfg->stream_maps[0].dart->force_bypass &&
-	    domain->type != IOMMU_DOMAIN_IDENTITY)
-		return -EINVAL;
 	if (!cfg->stream_maps[0].dart->supports_bypass &&
 	    domain->type == IOMMU_DOMAIN_IDENTITY)
 		return -EINVAL;
@@ -619,8 +614,6 @@ static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args)
 	if (cfg_dart) {
 		if (cfg_dart->supports_bypass != dart->supports_bypass)
 			return -EINVAL;
-		if (cfg_dart->force_bypass != dart->force_bypass)
-			return -EINVAL;
 		if (cfg_dart->pgsize != dart->pgsize)
 			return -EINVAL;
 	}
@@ -726,8 +719,6 @@ static int apple_dart_def_domain_type(struct device *dev)
 {
 	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
 
-	if (cfg->stream_maps[0].dart->force_bypass)
-		return IOMMU_DOMAIN_IDENTITY;
 	if (!cfg->stream_maps[0].dart->supports_bypass)
 		return IOMMU_DOMAIN_DMA;
 
@@ -884,7 +875,6 @@ static int apple_dart_probe(struct platform_device *pdev)
 	dart_params[1] = readl(dart->regs + DART_PARAMS2);
 	dart->pgsize = 1 << FIELD_GET(DART_PARAMS_PAGE_SHIFT, dart_params[0]);
 	dart->supports_bypass = dart_params[1] & DART_PARAMS_BYPASS_SUPPORT;
-	dart->force_bypass = dart->pgsize > PAGE_SIZE;
 
 	ret = request_irq(dart->irq, apple_dart_irq, IRQF_SHARED,
 			  "apple-dart fault handler", dart);
@@ -908,8 +898,8 @@ static int apple_dart_probe(struct platform_device *pdev)
 
 	dev_info(
 		&pdev->dev,
-		"DART [pagesize %x, bypass support: %d, bypass forced: %d] initialized\n",
-		dart->pgsize, dart->supports_bypass, dart->force_bypass);
+		"DART [pagesize %x, bypass support: %d] initialized\n",
+		dart->pgsize, dart->supports_bypass);
 	return 0;
 
 err_sysfs_remove:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device
  2021-10-19 16:37 ` [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device Sven Peter
@ 2021-10-20  5:21   ` Lu Baolu
  2021-10-20 14:22     ` Marc Zyngier
  2021-10-21  8:31     ` Sven Peter
  0 siblings, 2 replies; 15+ messages in thread
From: Lu Baolu @ 2021-10-20  5:21 UTC (permalink / raw)
  To: Sven Peter, iommu, Robin Murphy
  Cc: baolu.lu, Arnd Bergmann, Marc Zyngier, Hector Martin,
	linux-kernel, Alexander Graf, Mohamed Mediouni, Will Deacon,
	Alyssa Rosenzweig

On 2021/10/20 0:37, Sven Peter via iommu wrote:
> The iova allocator is capable of handling any granularity which is a power
> of two. Remove the much stronger condition that the granularity must be
> smaller or equal to the CPU page size from a BUG_ON there.
> Instead, check this condition during __iommu_attach_device and fail
> gracefully.
> 
> Signed-off-by: Sven Peter<sven@svenpeter.dev>
> ---
>   drivers/iommu/iommu.c | 35 ++++++++++++++++++++++++++++++++---
>   drivers/iommu/iova.c  |  7 ++++---
>   include/linux/iommu.h |  5 +++++
>   3 files changed, 41 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index dd7863e453a5..28896739964b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -80,6 +80,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
>   						 unsigned type);
>   static int __iommu_attach_device(struct iommu_domain *domain,
>   				 struct device *dev);
> +static void __iommu_detach_device(struct iommu_domain *domain,
> +				  struct device *dev);
>   static int __iommu_attach_group(struct iommu_domain *domain,
>   				struct iommu_group *group);
>   static void __iommu_detach_group(struct iommu_domain *domain,
> @@ -1974,6 +1976,19 @@ void iommu_domain_free(struct iommu_domain *domain)
>   }
>   EXPORT_SYMBOL_GPL(iommu_domain_free);
>   
> +static int iommu_check_page_size(struct iommu_domain *domain)
> +{
> +	if (!iommu_is_paging_domain(domain))
> +		return 0;
> +
> +	if (!(domain->pgsize_bitmap & (PAGE_SIZE | (PAGE_SIZE - 1)))) {
> +		pr_warn("IOMMU pages cannot exactly represent CPU pages.\n");
> +		return -EFAULT;
> +	}
> +
> +	return 0;
> +}
> +
>   static int __iommu_attach_device(struct iommu_domain *domain,
>   				 struct device *dev)
>   {
> @@ -1983,9 +1998,23 @@ static int __iommu_attach_device(struct iommu_domain *domain,
>   		return -ENODEV;
>   
>   	ret = domain->ops->attach_dev(domain, dev);
> -	if (!ret)
> -		trace_attach_device_to_domain(dev);
> -	return ret;
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Check that CPU pages can be represented by the IOVA granularity.
> +	 * This has to be done after ops->attach_dev since many IOMMU drivers
> +	 * only limit domain->pgsize_bitmap after having attached the first
> +	 * device.
> +	 */
> +	ret = iommu_check_page_size(domain);
> +	if (ret) {
> +		__iommu_detach_device(domain, dev);
> +		return ret;
> +	}

It looks odd. __iommu_attach_device() attaches an I/O page table for a
device. How does it relate to CPU pages? Why is it a failure case if CPU
page size is not covered?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device
  2021-10-20  5:21   ` Lu Baolu
@ 2021-10-20 14:22     ` Marc Zyngier
  2021-10-21  2:22       ` Lu Baolu
  2021-10-21  8:31     ` Sven Peter
  1 sibling, 1 reply; 15+ messages in thread
From: Marc Zyngier @ 2021-10-20 14:22 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Sven Peter, iommu, Robin Murphy, Arnd Bergmann, Hector Martin,
	linux-kernel, Alexander Graf, Mohamed Mediouni, Will Deacon,
	Alyssa Rosenzweig

On Wed, 20 Oct 2021 06:21:44 +0100,
Lu Baolu <baolu.lu@linux.intel.com> wrote:
> 
> On 2021/10/20 0:37, Sven Peter via iommu wrote:
> > The iova allocator is capable of handling any granularity which is a power
> > of two. Remove the much stronger condition that the granularity must be
> > smaller or equal to the CPU page size from a BUG_ON there.
> > Instead, check this condition during __iommu_attach_device and fail
> > gracefully.
> > 
> > Signed-off-by: Sven Peter<sven@svenpeter.dev>
> > ---
> >   drivers/iommu/iommu.c | 35 ++++++++++++++++++++++++++++++++---
> >   drivers/iommu/iova.c  |  7 ++++---
> >   include/linux/iommu.h |  5 +++++
> >   3 files changed, 41 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dd7863e453a5..28896739964b 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -80,6 +80,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
> >   						 unsigned type);
> >   static int __iommu_attach_device(struct iommu_domain *domain,
> >   				 struct device *dev);
> > +static void __iommu_detach_device(struct iommu_domain *domain,
> > +				  struct device *dev);
> >   static int __iommu_attach_group(struct iommu_domain *domain,
> >   				struct iommu_group *group);
> >   static void __iommu_detach_group(struct iommu_domain *domain,
> > @@ -1974,6 +1976,19 @@ void iommu_domain_free(struct iommu_domain *domain)
> >   }
> >   EXPORT_SYMBOL_GPL(iommu_domain_free);
> >   +static int iommu_check_page_size(struct iommu_domain *domain)
> > +{
> > +	if (!iommu_is_paging_domain(domain))
> > +		return 0;
> > +
> > +	if (!(domain->pgsize_bitmap & (PAGE_SIZE | (PAGE_SIZE - 1)))) {
> > +		pr_warn("IOMMU pages cannot exactly represent CPU pages.\n");
> > +		return -EFAULT;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >   static int __iommu_attach_device(struct iommu_domain *domain,
> >   				 struct device *dev)
> >   {
> > @@ -1983,9 +1998,23 @@ static int __iommu_attach_device(struct iommu_domain *domain,
> >   		return -ENODEV;
> >     	ret = domain->ops->attach_dev(domain, dev);
> > -	if (!ret)
> > -		trace_attach_device_to_domain(dev);
> > -	return ret;
> > +	if (ret)
> > +		return ret;
> > +
> > +	/*
> > +	 * Check that CPU pages can be represented by the IOVA granularity.
> > +	 * This has to be done after ops->attach_dev since many IOMMU drivers
> > +	 * only limit domain->pgsize_bitmap after having attached the first
> > +	 * device.
> > +	 */
> > +	ret = iommu_check_page_size(domain);
> > +	if (ret) {
> > +		__iommu_detach_device(domain, dev);
> > +		return ret;
> > +	}
> 
> It looks odd. __iommu_attach_device() attaches an I/O page table for a
> device. How does it relate to CPU pages? Why is it a failure case if CPU
> page size is not covered?

If you allocate a CPU PAGE_SIZE'd region, and point it at a device
that now can DMA to more than what you have allocated because the
IOMMU's own page size is larger, the device has now access to data it
shouldn't see. In my book, that's a pretty bad thing.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device
  2021-10-20 14:22     ` Marc Zyngier
@ 2021-10-21  2:22       ` Lu Baolu
  2021-10-21  8:10         ` Marc Zyngier
  0 siblings, 1 reply; 15+ messages in thread
From: Lu Baolu @ 2021-10-21  2:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: baolu.lu, Sven Peter, iommu, Robin Murphy, Arnd Bergmann,
	Hector Martin, linux-kernel, Alexander Graf, Mohamed Mediouni,
	Will Deacon, Alyssa Rosenzweig

On 10/20/21 10:22 PM, Marc Zyngier wrote:
> On Wed, 20 Oct 2021 06:21:44 +0100,
> Lu Baolu <baolu.lu@linux.intel.com> wrote:
>>
>> On 2021/10/20 0:37, Sven Peter via iommu wrote:
>>> The iova allocator is capable of handling any granularity which is a power
>>> of two. Remove the much stronger condition that the granularity must be
>>> smaller or equal to the CPU page size from a BUG_ON there.
>>> Instead, check this condition during __iommu_attach_device and fail
>>> gracefully.
>>>
>>> Signed-off-by: Sven Peter<sven@svenpeter.dev>
>>> ---
>>>    drivers/iommu/iommu.c | 35 ++++++++++++++++++++++++++++++++---
>>>    drivers/iommu/iova.c  |  7 ++++---
>>>    include/linux/iommu.h |  5 +++++
>>>    3 files changed, 41 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index dd7863e453a5..28896739964b 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -80,6 +80,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
>>>    						 unsigned type);
>>>    static int __iommu_attach_device(struct iommu_domain *domain,
>>>    				 struct device *dev);
>>> +static void __iommu_detach_device(struct iommu_domain *domain,
>>> +				  struct device *dev);
>>>    static int __iommu_attach_group(struct iommu_domain *domain,
>>>    				struct iommu_group *group);
>>>    static void __iommu_detach_group(struct iommu_domain *domain,
>>> @@ -1974,6 +1976,19 @@ void iommu_domain_free(struct iommu_domain *domain)
>>>    }
>>>    EXPORT_SYMBOL_GPL(iommu_domain_free);
>>>    +static int iommu_check_page_size(struct iommu_domain *domain)
>>> +{
>>> +	if (!iommu_is_paging_domain(domain))
>>> +		return 0;
>>> +
>>> +	if (!(domain->pgsize_bitmap & (PAGE_SIZE | (PAGE_SIZE - 1)))) {
>>> +		pr_warn("IOMMU pages cannot exactly represent CPU pages.\n");
>>> +		return -EFAULT;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>    static int __iommu_attach_device(struct iommu_domain *domain,
>>>    				 struct device *dev)
>>>    {
>>> @@ -1983,9 +1998,23 @@ static int __iommu_attach_device(struct iommu_domain *domain,
>>>    		return -ENODEV;
>>>      	ret = domain->ops->attach_dev(domain, dev);
>>> -	if (!ret)
>>> -		trace_attach_device_to_domain(dev);
>>> -	return ret;
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	/*
>>> +	 * Check that CPU pages can be represented by the IOVA granularity.
>>> +	 * This has to be done after ops->attach_dev since many IOMMU drivers
>>> +	 * only limit domain->pgsize_bitmap after having attached the first
>>> +	 * device.
>>> +	 */
>>> +	ret = iommu_check_page_size(domain);
>>> +	if (ret) {
>>> +		__iommu_detach_device(domain, dev);
>>> +		return ret;
>>> +	}
>>
>> It looks odd. __iommu_attach_device() attaches an I/O page table for a
>> device. How does it relate to CPU pages? Why is it a failure case if CPU
>> page size is not covered?
> 
> If you allocate a CPU PAGE_SIZE'd region, and point it at a device
> that now can DMA to more than what you have allocated because the
> IOMMU's own page size is larger, the device has now access to data it
> shouldn't see. In my book, that's a pretty bad thing.

But even you enforce the CPU page size check here, this problem still
exists unless all DMA buffers are PAGE_SIZE aligned and sized, right?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device
  2021-10-21  2:22       ` Lu Baolu
@ 2021-10-21  8:10         ` Marc Zyngier
  2021-10-22  2:52           ` Lu Baolu
  0 siblings, 1 reply; 15+ messages in thread
From: Marc Zyngier @ 2021-10-21  8:10 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Sven Peter, iommu, Robin Murphy, Arnd Bergmann, Hector Martin,
	linux-kernel, Alexander Graf, Mohamed Mediouni, Will Deacon,
	Alyssa Rosenzweig

On Thu, 21 Oct 2021 03:22:30 +0100,
Lu Baolu <baolu.lu@linux.intel.com> wrote:
> 
> On 10/20/21 10:22 PM, Marc Zyngier wrote:
> > On Wed, 20 Oct 2021 06:21:44 +0100,
> > Lu Baolu <baolu.lu@linux.intel.com> wrote:
> >> 
> >> On 2021/10/20 0:37, Sven Peter via iommu wrote:
> >>> +	/*
> >>> +	 * Check that CPU pages can be represented by the IOVA granularity.
> >>> +	 * This has to be done after ops->attach_dev since many IOMMU drivers
> >>> +	 * only limit domain->pgsize_bitmap after having attached the first
> >>> +	 * device.
> >>> +	 */
> >>> +	ret = iommu_check_page_size(domain);
> >>> +	if (ret) {
> >>> +		__iommu_detach_device(domain, dev);
> >>> +		return ret;
> >>> +	}
> >> 
> >> It looks odd. __iommu_attach_device() attaches an I/O page table for a
> >> device. How does it relate to CPU pages? Why is it a failure case if CPU
> >> page size is not covered?
> > 
> > If you allocate a CPU PAGE_SIZE'd region, and point it at a device
> > that now can DMA to more than what you have allocated because the
> > IOMMU's own page size is larger, the device has now access to data it
> > shouldn't see. In my book, that's a pretty bad thing.
> 
> But even you enforce the CPU page size check here, this problem still
> exists unless all DMA buffers are PAGE_SIZE aligned and sized, right?

Let me take a CPU analogy: you have a page that contains some user
data *and* a kernel secret. How do you map this page into userspace
without leaking the kernel secret?

PAGE_SIZE allocations are the unit of isolation, and this applies to
both CPU and IOMMU. If you have allocated a DMA buffer that is less
than a page, you then have to resort to bounce buffering, or accept
that your data isn't safe.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device
  2021-10-20  5:21   ` Lu Baolu
  2021-10-20 14:22     ` Marc Zyngier
@ 2021-10-21  8:31     ` Sven Peter
  1 sibling, 0 replies; 15+ messages in thread
From: Sven Peter @ 2021-10-21  8:31 UTC (permalink / raw)
  To: Lu Baolu, iommu, Robin Murphy
  Cc: Arnd Bergmann, Marc Zyngier, Hector Martin, linux-kernel,
	Alexander Graf, Mohamed Mediouni, Will Deacon, Alyssa Rosenzweig



On Wed, Oct 20, 2021, at 07:21, Lu Baolu wrote:
> On 2021/10/20 0:37, Sven Peter via iommu wrote:
>> The iova allocator is capable of handling any granularity which is a power
>> of two. Remove the much stronger condition that the granularity must be
>> smaller or equal to the CPU page size from a BUG_ON there.
>> Instead, check this condition during __iommu_attach_device and fail
>> gracefully.
>> 
>> Signed-off-by: Sven Peter<sven@svenpeter.dev>
>> ---
>>   drivers/iommu/iommu.c | 35 ++++++++++++++++++++++++++++++++---
>>   drivers/iommu/iova.c  |  7 ++++---
>>   include/linux/iommu.h |  5 +++++
>>   3 files changed, 41 insertions(+), 6 deletions(-)
>> 
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index dd7863e453a5..28896739964b 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -80,6 +80,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
>>   						 unsigned type);
>>   static int __iommu_attach_device(struct iommu_domain *domain,
>>   				 struct device *dev);
>> +static void __iommu_detach_device(struct iommu_domain *domain,
>> +				  struct device *dev);
>>   static int __iommu_attach_group(struct iommu_domain *domain,
>>   				struct iommu_group *group);
>>   static void __iommu_detach_group(struct iommu_domain *domain,
>> @@ -1974,6 +1976,19 @@ void iommu_domain_free(struct iommu_domain *domain)
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_domain_free);
>>   
>> +static int iommu_check_page_size(struct iommu_domain *domain)
>> +{
>> +	if (!iommu_is_paging_domain(domain))
>> +		return 0;
>> +
>> +	if (!(domain->pgsize_bitmap & (PAGE_SIZE | (PAGE_SIZE - 1)))) {
>> +		pr_warn("IOMMU pages cannot exactly represent CPU pages.\n");
>> +		return -EFAULT;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   static int __iommu_attach_device(struct iommu_domain *domain,
>>   				 struct device *dev)
>>   {
>> @@ -1983,9 +1998,23 @@ static int __iommu_attach_device(struct iommu_domain *domain,
>>   		return -ENODEV;
>>   
>>   	ret = domain->ops->attach_dev(domain, dev);
>> -	if (!ret)
>> -		trace_attach_device_to_domain(dev);
>> -	return ret;
>> +	if (ret)
>> +		return ret;
>> +
>> +	/*
>> +	 * Check that CPU pages can be represented by the IOVA granularity.
>> +	 * This has to be done after ops->attach_dev since many IOMMU drivers
>> +	 * only limit domain->pgsize_bitmap after having attached the first
>> +	 * device.
>> +	 */
>> +	ret = iommu_check_page_size(domain);
>> +	if (ret) {
>> +		__iommu_detach_device(domain, dev);
>> +		return ret;
>> +	}
>
> It looks odd. __iommu_attach_device() attaches an I/O page table for a
> device. How does it relate to CPU pages? Why is it a failure case if CPU
> page size is not covered?

Ideally, I'd only allow allocating DMA domains (which are going to be able
to handle larger IOMMU page sizes) while disallowing UNMANAGED domains
(which can theoretically read the granule but I doubt any client right now
considers this situation and will just run into odd issues) when the I/O
page size is bigger than the CPU page size. There was a brief previous
discussion about this [1,2,3].

Unfortunately, Apple's DART IOMMU is hardwired to either support 4K or
16K pages but never both. And to make things worse there was at least one
SoC used in the iPhones that mixed 4K and 16K DARTs on the same bus. Ugh.
That's why this awkward check is here because this is the earliest
place where I know which I/O page size will be used.


But I guess I could just limit the DART driver to 16K pages for now
(since every instance on the M1 is hard wired for that anyway) and then
just disallow allocating UNMANAGED domains when granule > PAGE_SIZE.

I'd still need a similar check here (at least for now) to prevent attaching
untrusted devices since I haven't changed swiotlb yet to support aligning
buffers correctly to more than PAGE_SIZE. That's possible but the interaction
with min_align_mask is a bit tricky to get right.
If there really shouldn't be any check here I can also do that for the next
version but I'd really like to keep that as a separate series - especially
since Thunderbolt support is still far away.


Thanks,


Sven

[1] https://lore.kernel.org/linux-iommu/7261df01-34a9-4e53-37cd-ae1aa15b1fb4@arm.com/
[2] https://lore.kernel.org/linux-iommu/CAK8P3a18XK2mfMGbZ+M32Mbabhbkd+=DNrnzampOah_j=rWozw@mail.gmail.com/
[3] https://lore.kernel.org/linux-iommu/YO%2FbMUAOLRgOJOvl@8bytes.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device
  2021-10-21  8:10         ` Marc Zyngier
@ 2021-10-22  2:52           ` Lu Baolu
  2021-10-22  8:06             ` Marc Zyngier
  0 siblings, 1 reply; 15+ messages in thread
From: Lu Baolu @ 2021-10-22  2:52 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: baolu.lu, Sven Peter, iommu, Robin Murphy, Arnd Bergmann,
	Hector Martin, linux-kernel, Alexander Graf, Mohamed Mediouni,
	Will Deacon, Alyssa Rosenzweig

On 10/21/21 4:10 PM, Marc Zyngier wrote:
> On Thu, 21 Oct 2021 03:22:30 +0100,
> Lu Baolu <baolu.lu@linux.intel.com> wrote:
>>
>> On 10/20/21 10:22 PM, Marc Zyngier wrote:
>>> On Wed, 20 Oct 2021 06:21:44 +0100,
>>> Lu Baolu <baolu.lu@linux.intel.com> wrote:
>>>>
>>>> On 2021/10/20 0:37, Sven Peter via iommu wrote:
>>>>> +	/*
>>>>> +	 * Check that CPU pages can be represented by the IOVA granularity.
>>>>> +	 * This has to be done after ops->attach_dev since many IOMMU drivers
>>>>> +	 * only limit domain->pgsize_bitmap after having attached the first
>>>>> +	 * device.
>>>>> +	 */
>>>>> +	ret = iommu_check_page_size(domain);
>>>>> +	if (ret) {
>>>>> +		__iommu_detach_device(domain, dev);
>>>>> +		return ret;
>>>>> +	}
>>>>
>>>> It looks odd. __iommu_attach_device() attaches an I/O page table for a
>>>> device. How does it relate to CPU pages? Why is it a failure case if CPU
>>>> page size is not covered?
>>>
>>> If you allocate a CPU PAGE_SIZE'd region, and point it at a device
>>> that now can DMA to more than what you have allocated because the
>>> IOMMU's own page size is larger, the device has now access to data it
>>> shouldn't see. In my book, that's a pretty bad thing.
>>
>> But even you enforce the CPU page size check here, this problem still
>> exists unless all DMA buffers are PAGE_SIZE aligned and sized, right?
> 
> Let me take a CPU analogy: you have a page that contains some user
> data *and* a kernel secret. How do you map this page into userspace
> without leaking the kernel secret?
> 
> PAGE_SIZE allocations are the unit of isolation, and this applies to
> both CPU and IOMMU. If you have allocated a DMA buffer that is less
> than a page, you then have to resort to bounce buffering, or accept
> that your data isn't safe.

I can understand the problems when IOMMU page sizes is larger than CPU
page size. But the code itself is not clean. The vendor iommu drivers
know more hardware details than the iommu core. It looks odd that the
vendor iommu says "okay, I can attach this I/O page table to the
device", but the iommu core says "no, you can't" and rolls everything
back.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device
  2021-10-22  2:52           ` Lu Baolu
@ 2021-10-22  8:06             ` Marc Zyngier
  2021-10-22 13:39               ` Robin Murphy
  0 siblings, 1 reply; 15+ messages in thread
From: Marc Zyngier @ 2021-10-22  8:06 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Sven Peter, iommu, Robin Murphy, Arnd Bergmann, Hector Martin,
	linux-kernel, Alexander Graf, Mohamed Mediouni, Will Deacon,
	Alyssa Rosenzweig

On Fri, 22 Oct 2021 03:52:38 +0100,
Lu Baolu <baolu.lu@linux.intel.com> wrote:
> 
> On 10/21/21 4:10 PM, Marc Zyngier wrote:
> > On Thu, 21 Oct 2021 03:22:30 +0100,
> > Lu Baolu <baolu.lu@linux.intel.com> wrote:
> >> 
> >> On 10/20/21 10:22 PM, Marc Zyngier wrote:
> >>> On Wed, 20 Oct 2021 06:21:44 +0100,
> >>> Lu Baolu <baolu.lu@linux.intel.com> wrote:
> >>>> 
> >>>> On 2021/10/20 0:37, Sven Peter via iommu wrote:
> >>>>> +	/*
> >>>>> +	 * Check that CPU pages can be represented by the IOVA granularity.
> >>>>> +	 * This has to be done after ops->attach_dev since many IOMMU drivers
> >>>>> +	 * only limit domain->pgsize_bitmap after having attached the first
> >>>>> +	 * device.
> >>>>> +	 */
> >>>>> +	ret = iommu_check_page_size(domain);
> >>>>> +	if (ret) {
> >>>>> +		__iommu_detach_device(domain, dev);
> >>>>> +		return ret;
> >>>>> +	}
> >>>> 
> >>>> It looks odd. __iommu_attach_device() attaches an I/O page table for a
> >>>> device. How does it relate to CPU pages? Why is it a failure case if CPU
> >>>> page size is not covered?
> >>> 
> >>> If you allocate a CPU PAGE_SIZE'd region, and point it at a device
> >>> that now can DMA to more than what you have allocated because the
> >>> IOMMU's own page size is larger, the device has now access to data it
> >>> shouldn't see. In my book, that's a pretty bad thing.
> >> 
> >> But even you enforce the CPU page size check here, this problem still
> >> exists unless all DMA buffers are PAGE_SIZE aligned and sized, right?
> > 
> > Let me take a CPU analogy: you have a page that contains some user
> > data *and* a kernel secret. How do you map this page into userspace
> > without leaking the kernel secret?
> > 
> > PAGE_SIZE allocations are the unit of isolation, and this applies to
> > both CPU and IOMMU. If you have allocated a DMA buffer that is less
> > than a page, you then have to resort to bounce buffering, or accept
> > that your data isn't safe.
> 
> I can understand the problems when IOMMU page sizes is larger than CPU
> page size. But the code itself is not clean. The vendor iommu drivers
> know more hardware details than the iommu core. It looks odd that the
> vendor iommu says "okay, I can attach this I/O page table to the
> device", but the iommu core says "no, you can't" and rolls everything
> back.

If your IOMMU driver can do things behind the core's back and
contradict the view that the core has, then it is probably time to fix
your IOMMU driver and make the core aware of what is going on.
Supported page sizes is one of these things.

In general, keeping the IOMMU driver as dumb as possible is a worthy
design goal, and this is why we have these abstractions.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device
  2021-10-22  8:06             ` Marc Zyngier
@ 2021-10-22 13:39               ` Robin Murphy
  0 siblings, 0 replies; 15+ messages in thread
From: Robin Murphy @ 2021-10-22 13:39 UTC (permalink / raw)
  To: Marc Zyngier, Lu Baolu
  Cc: Sven Peter, iommu, Arnd Bergmann, Hector Martin, linux-kernel,
	Alexander Graf, Mohamed Mediouni, Will Deacon, Alyssa Rosenzweig

On 2021-10-22 09:06, Marc Zyngier wrote:
> On Fri, 22 Oct 2021 03:52:38 +0100,
> Lu Baolu <baolu.lu@linux.intel.com> wrote:
>>
>> On 10/21/21 4:10 PM, Marc Zyngier wrote:
>>> On Thu, 21 Oct 2021 03:22:30 +0100,
>>> Lu Baolu <baolu.lu@linux.intel.com> wrote:
>>>>
>>>> On 10/20/21 10:22 PM, Marc Zyngier wrote:
>>>>> On Wed, 20 Oct 2021 06:21:44 +0100,
>>>>> Lu Baolu <baolu.lu@linux.intel.com> wrote:
>>>>>>
>>>>>> On 2021/10/20 0:37, Sven Peter via iommu wrote:
>>>>>>> +	/*
>>>>>>> +	 * Check that CPU pages can be represented by the IOVA granularity.
>>>>>>> +	 * This has to be done after ops->attach_dev since many IOMMU drivers
>>>>>>> +	 * only limit domain->pgsize_bitmap after having attached the first
>>>>>>> +	 * device.
>>>>>>> +	 */
>>>>>>> +	ret = iommu_check_page_size(domain);
>>>>>>> +	if (ret) {
>>>>>>> +		__iommu_detach_device(domain, dev);
>>>>>>> +		return ret;
>>>>>>> +	}
>>>>>>
>>>>>> It looks odd. __iommu_attach_device() attaches an I/O page table for a
>>>>>> device. How does it relate to CPU pages? Why is it a failure case if CPU
>>>>>> page size is not covered?
>>>>>
>>>>> If you allocate a CPU PAGE_SIZE'd region, and point it at a device
>>>>> that now can DMA to more than what you have allocated because the
>>>>> IOMMU's own page size is larger, the device has now access to data it
>>>>> shouldn't see. In my book, that's a pretty bad thing.
>>>>
>>>> But even you enforce the CPU page size check here, this problem still
>>>> exists unless all DMA buffers are PAGE_SIZE aligned and sized, right?
>>>
>>> Let me take a CPU analogy: you have a page that contains some user
>>> data *and* a kernel secret. How do you map this page into userspace
>>> without leaking the kernel secret?
>>>
>>> PAGE_SIZE allocations are the unit of isolation, and this applies to
>>> both CPU and IOMMU. If you have allocated a DMA buffer that is less
>>> than a page, you then have to resort to bounce buffering, or accept
>>> that your data isn't safe.
>>
>> I can understand the problems when IOMMU page sizes is larger than CPU
>> page size. But the code itself is not clean. The vendor iommu drivers
>> know more hardware details than the iommu core. It looks odd that the
>> vendor iommu says "okay, I can attach this I/O page table to the
>> device", but the iommu core says "no, you can't" and rolls everything
>> back.
> 
> If your IOMMU driver can do things behind the core's back and
> contradict the view that the core has, then it is probably time to fix
> your IOMMU driver and make the core aware of what is going on.
> Supported page sizes is one of these things.
> 
> In general, keeping the IOMMU driver as dumb as possible is a worthy
> design goal, and this is why we have these abstractions.

In this case it's the abstractions that are the problem, though. Any 
driver which supports heterogeneous IOMMU instances with potentially 
differing page sizes currently has no choice but to do horrible bodges 
to make the bus-based iommu_domain_alloc() paradigm work *at all*. 
Fixing that from the fundamental API level upwards has been on the to-do 
list for some time now, but won't be straightforward.

Robin.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-10-22 13:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-19 16:37 [PATCH v3 0/6] Support IOMMU page sizes larger than the CPU page size Sven Peter
2021-10-19 16:37 ` [PATCH v3 1/6] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE Sven Peter
2021-10-19 16:37 ` [PATCH v3 2/6] iommu/dma: Support granule > PAGE_SIZE in dma_map_sg Sven Peter
2021-10-19 16:37 ` [PATCH v3 3/6] iommu/dma: Support granule > PAGE_SIZE allocations Sven Peter
2021-10-19 16:37 ` [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device Sven Peter
2021-10-20  5:21   ` Lu Baolu
2021-10-20 14:22     ` Marc Zyngier
2021-10-21  2:22       ` Lu Baolu
2021-10-21  8:10         ` Marc Zyngier
2021-10-22  2:52           ` Lu Baolu
2021-10-22  8:06             ` Marc Zyngier
2021-10-22 13:39               ` Robin Murphy
2021-10-21  8:31     ` Sven Peter
2021-10-19 16:37 ` [PATCH v3 5/6] iommu: Introduce __IOMMU_DOMAIN_LP Sven Peter
2021-10-19 16:37 ` [PATCH v3 6/6] iommu/dart: Remove force_bypass logic Sven Peter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).