All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] iommu: Split iommu_unmaps
@ 2013-05-24 17:14 ` Alex Williamson
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2013-05-24 17:14 UTC (permalink / raw)
  To: iommu; +Cc: joro, linux-kernel

iommu_map splits requests into pages that the iommu driver reports
that it can handle.  The iommu_unmap path does not do the same.  This
can cause problems not only from callers that might expect the same
behavior as the map path, but even from the failure path of iommu_map,
should it fail at a point where it has mapped and needs to unwind a
set of pages that the iommu driver cannot handle directly.  amd_iommu,
for example, will BUG_ON if asked to unmap a non power of 2 size.

Fix this by extracting and generalizing the sizing code from the
iommu_map path and use it for both map and unmap.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/iommu/iommu.c |   63 +++++++++++++++++++++++++++----------------------
 1 file changed, 35 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d8f98b1..4b0b56b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -754,6 +754,38 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
 
+static size_t iommu_pgsize(struct iommu_domain *domain,
+			   unsigned long addr_merge, size_t size)
+{
+	unsigned int pgsize_idx;
+	size_t pgsize;
+
+	/* Max page size that still fits into 'size' */
+	pgsize_idx = __fls(size);
+
+	/* need to consider alignment requirements ? */
+	if (likely(addr_merge)) {
+		/* Max page size allowed by address */
+		unsigned int align_pgsize_idx = __ffs(addr_merge);
+		pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+	}
+
+	/* build a mask of acceptable page sizes */
+	pgsize = (1UL << (pgsize_idx + 1)) - 1;
+
+	/* throw away page sizes not supported by the hardware */
+	pgsize &= domain->ops->pgsize_bitmap;
+
+	/* make sure we're still sane */
+	BUG_ON(!pgsize);
+
+	/* pick the biggest page */
+	pgsize_idx = __fls(pgsize);
+	pgsize = 1UL << pgsize_idx;
+
+	return pgsize;
+}
+
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
 	      phys_addr_t paddr, size_t size, int prot)
 {
@@ -785,32 +817,7 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 				(unsigned long)paddr, (unsigned long)size);
 
 	while (size) {
-		unsigned long pgsize, addr_merge = iova | paddr;
-		unsigned int pgsize_idx;
-
-		/* Max page size that still fits into 'size' */
-		pgsize_idx = __fls(size);
-
-		/* need to consider alignment requirements ? */
-		if (likely(addr_merge)) {
-			/* Max page size allowed by both iova and paddr */
-			unsigned int align_pgsize_idx = __ffs(addr_merge);
-
-			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
-		}
-
-		/* build a mask of acceptable page sizes */
-		pgsize = (1UL << (pgsize_idx + 1)) - 1;
-
-		/* throw away page sizes not supported by the hardware */
-		pgsize &= domain->ops->pgsize_bitmap;
-
-		/* make sure we're still sane */
-		BUG_ON(!pgsize);
-
-		/* pick the biggest page */
-		pgsize_idx = __fls(pgsize);
-		pgsize = 1UL << pgsize_idx;
+		size_t pgsize = iommu_pgsize(domain, iova | paddr, size);
 
 		pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova,
 					(unsigned long)paddr, pgsize);
@@ -863,9 +870,9 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 	 * or we hit an area that isn't mapped.
 	 */
 	while (unmapped < size) {
-		size_t left = size - unmapped;
+		size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);
 
-		unmapped_page = domain->ops->unmap(domain, iova, left);
+		unmapped_page = domain->ops->unmap(domain, iova, pgsize);
 		if (!unmapped_page)
 			break;
 


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] iommu: Split iommu_unmaps
@ 2013-05-24 17:14 ` Alex Williamson
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2013-05-24 17:14 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA

iommu_map splits requests into pages that the iommu driver reports
that it can handle.  The iommu_unmap path does not do the same.  This
can cause problems not only from callers that might expect the same
behavior as the map path, but even from the failure path of iommu_map,
should it fail at a point where it has mapped and needs to unwind a
set of pages that the iommu driver cannot handle directly.  amd_iommu,
for example, will BUG_ON if asked to unmap a non power of 2 size.

Fix this by extracting and generalizing the sizing code from the
iommu_map path and use it for both map and unmap.

Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/iommu/iommu.c |   63 +++++++++++++++++++++++++++----------------------
 1 file changed, 35 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d8f98b1..4b0b56b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -754,6 +754,38 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
 
+static size_t iommu_pgsize(struct iommu_domain *domain,
+			   unsigned long addr_merge, size_t size)
+{
+	unsigned int pgsize_idx;
+	size_t pgsize;
+
+	/* Max page size that still fits into 'size' */
+	pgsize_idx = __fls(size);
+
+	/* need to consider alignment requirements ? */
+	if (likely(addr_merge)) {
+		/* Max page size allowed by address */
+		unsigned int align_pgsize_idx = __ffs(addr_merge);
+		pgsize_idx = min(pgsize_idx, align_pgsize_idx);
+	}
+
+	/* build a mask of acceptable page sizes */
+	pgsize = (1UL << (pgsize_idx + 1)) - 1;
+
+	/* throw away page sizes not supported by the hardware */
+	pgsize &= domain->ops->pgsize_bitmap;
+
+	/* make sure we're still sane */
+	BUG_ON(!pgsize);
+
+	/* pick the biggest page */
+	pgsize_idx = __fls(pgsize);
+	pgsize = 1UL << pgsize_idx;
+
+	return pgsize;
+}
+
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
 	      phys_addr_t paddr, size_t size, int prot)
 {
@@ -785,32 +817,7 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 				(unsigned long)paddr, (unsigned long)size);
 
 	while (size) {
-		unsigned long pgsize, addr_merge = iova | paddr;
-		unsigned int pgsize_idx;
-
-		/* Max page size that still fits into 'size' */
-		pgsize_idx = __fls(size);
-
-		/* need to consider alignment requirements ? */
-		if (likely(addr_merge)) {
-			/* Max page size allowed by both iova and paddr */
-			unsigned int align_pgsize_idx = __ffs(addr_merge);
-
-			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
-		}
-
-		/* build a mask of acceptable page sizes */
-		pgsize = (1UL << (pgsize_idx + 1)) - 1;
-
-		/* throw away page sizes not supported by the hardware */
-		pgsize &= domain->ops->pgsize_bitmap;
-
-		/* make sure we're still sane */
-		BUG_ON(!pgsize);
-
-		/* pick the biggest page */
-		pgsize_idx = __fls(pgsize);
-		pgsize = 1UL << pgsize_idx;
+		size_t pgsize = iommu_pgsize(domain, iova | paddr, size);
 
 		pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova,
 					(unsigned long)paddr, pgsize);
@@ -863,9 +870,9 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 	 * or we hit an area that isn't mapped.
 	 */
 	while (unmapped < size) {
-		size_t left = size - unmapped;
+		size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);
 
-		unmapped_page = domain->ops->unmap(domain, iova, left);
+		unmapped_page = domain->ops->unmap(domain, iova, pgsize);
 		if (!unmapped_page)
 			break;
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu: Split iommu_unmaps
@ 2013-06-05 16:39   ` Alex Williamson
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2013-06-05 16:39 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: linux-kernel, iommu

Joerg,

Any comments on this?  I need this for vfio hugepage support, otherwise
we risk getting a map failure that results in a BUG_ON from
iommu_unmap_page in amd_iommu.  I can take it in through my vfio tree to
keep the dependencies together if you want to provide an ack.  Thanks,

Alex

On Fri, 2013-05-24 at 11:14 -0600, Alex Williamson wrote:
> iommu_map splits requests into pages that the iommu driver reports
> that it can handle.  The iommu_unmap path does not do the same.  This
> can cause problems not only from callers that might expect the same
> behavior as the map path, but even from the failure path of iommu_map,
> should it fail at a point where it has mapped and needs to unwind a
> set of pages that the iommu driver cannot handle directly.  amd_iommu,
> for example, will BUG_ON if asked to unmap a non power of 2 size.
> 
> Fix this by extracting and generalizing the sizing code from the
> iommu_map path and use it for both map and unmap.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/iommu/iommu.c |   63 +++++++++++++++++++++++++++----------------------
>  1 file changed, 35 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d8f98b1..4b0b56b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -754,6 +754,38 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
>  }
>  EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
>  
> +static size_t iommu_pgsize(struct iommu_domain *domain,
> +			   unsigned long addr_merge, size_t size)
> +{
> +	unsigned int pgsize_idx;
> +	size_t pgsize;
> +
> +	/* Max page size that still fits into 'size' */
> +	pgsize_idx = __fls(size);
> +
> +	/* need to consider alignment requirements ? */
> +	if (likely(addr_merge)) {
> +		/* Max page size allowed by address */
> +		unsigned int align_pgsize_idx = __ffs(addr_merge);
> +		pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> +	}
> +
> +	/* build a mask of acceptable page sizes */
> +	pgsize = (1UL << (pgsize_idx + 1)) - 1;
> +
> +	/* throw away page sizes not supported by the hardware */
> +	pgsize &= domain->ops->pgsize_bitmap;
> +
> +	/* make sure we're still sane */
> +	BUG_ON(!pgsize);
> +
> +	/* pick the biggest page */
> +	pgsize_idx = __fls(pgsize);
> +	pgsize = 1UL << pgsize_idx;
> +
> +	return pgsize;
> +}
> +
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
>  	      phys_addr_t paddr, size_t size, int prot)
>  {
> @@ -785,32 +817,7 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
>  				(unsigned long)paddr, (unsigned long)size);
>  
>  	while (size) {
> -		unsigned long pgsize, addr_merge = iova | paddr;
> -		unsigned int pgsize_idx;
> -
> -		/* Max page size that still fits into 'size' */
> -		pgsize_idx = __fls(size);
> -
> -		/* need to consider alignment requirements ? */
> -		if (likely(addr_merge)) {
> -			/* Max page size allowed by both iova and paddr */
> -			unsigned int align_pgsize_idx = __ffs(addr_merge);
> -
> -			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> -		}
> -
> -		/* build a mask of acceptable page sizes */
> -		pgsize = (1UL << (pgsize_idx + 1)) - 1;
> -
> -		/* throw away page sizes not supported by the hardware */
> -		pgsize &= domain->ops->pgsize_bitmap;
> -
> -		/* make sure we're still sane */
> -		BUG_ON(!pgsize);
> -
> -		/* pick the biggest page */
> -		pgsize_idx = __fls(pgsize);
> -		pgsize = 1UL << pgsize_idx;
> +		size_t pgsize = iommu_pgsize(domain, iova | paddr, size);
>  
>  		pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova,
>  					(unsigned long)paddr, pgsize);
> @@ -863,9 +870,9 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>  	 * or we hit an area that isn't mapped.
>  	 */
>  	while (unmapped < size) {
> -		size_t left = size - unmapped;
> +		size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);
>  
> -		unmapped_page = domain->ops->unmap(domain, iova, left);
> +		unmapped_page = domain->ops->unmap(domain, iova, pgsize);
>  		if (!unmapped_page)
>  			break;
>  
> 




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu: Split iommu_unmaps
@ 2013-06-05 16:39   ` Alex Williamson
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2013-06-05 16:39 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, linux-kernel-u79uwXL29TY76Z2rM5mHXA

Joerg,

Any comments on this?  I need this for vfio hugepage support, otherwise
we risk getting a map failure that results in a BUG_ON from
iommu_unmap_page in amd_iommu.  I can take it in through my vfio tree to
keep the dependencies together if you want to provide an ack.  Thanks,

Alex

On Fri, 2013-05-24 at 11:14 -0600, Alex Williamson wrote:
> iommu_map splits requests into pages that the iommu driver reports
> that it can handle.  The iommu_unmap path does not do the same.  This
> can cause problems not only from callers that might expect the same
> behavior as the map path, but even from the failure path of iommu_map,
> should it fail at a point where it has mapped and needs to unwind a
> set of pages that the iommu driver cannot handle directly.  amd_iommu,
> for example, will BUG_ON if asked to unmap a non power of 2 size.
> 
> Fix this by extracting and generalizing the sizing code from the
> iommu_map path and use it for both map and unmap.
> 
> Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/iommu/iommu.c |   63 +++++++++++++++++++++++++++----------------------
>  1 file changed, 35 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d8f98b1..4b0b56b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -754,6 +754,38 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
>  }
>  EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
>  
> +static size_t iommu_pgsize(struct iommu_domain *domain,
> +			   unsigned long addr_merge, size_t size)
> +{
> +	unsigned int pgsize_idx;
> +	size_t pgsize;
> +
> +	/* Max page size that still fits into 'size' */
> +	pgsize_idx = __fls(size);
> +
> +	/* need to consider alignment requirements ? */
> +	if (likely(addr_merge)) {
> +		/* Max page size allowed by address */
> +		unsigned int align_pgsize_idx = __ffs(addr_merge);
> +		pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> +	}
> +
> +	/* build a mask of acceptable page sizes */
> +	pgsize = (1UL << (pgsize_idx + 1)) - 1;
> +
> +	/* throw away page sizes not supported by the hardware */
> +	pgsize &= domain->ops->pgsize_bitmap;
> +
> +	/* make sure we're still sane */
> +	BUG_ON(!pgsize);
> +
> +	/* pick the biggest page */
> +	pgsize_idx = __fls(pgsize);
> +	pgsize = 1UL << pgsize_idx;
> +
> +	return pgsize;
> +}
> +
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
>  	      phys_addr_t paddr, size_t size, int prot)
>  {
> @@ -785,32 +817,7 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
>  				(unsigned long)paddr, (unsigned long)size);
>  
>  	while (size) {
> -		unsigned long pgsize, addr_merge = iova | paddr;
> -		unsigned int pgsize_idx;
> -
> -		/* Max page size that still fits into 'size' */
> -		pgsize_idx = __fls(size);
> -
> -		/* need to consider alignment requirements ? */
> -		if (likely(addr_merge)) {
> -			/* Max page size allowed by both iova and paddr */
> -			unsigned int align_pgsize_idx = __ffs(addr_merge);
> -
> -			pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> -		}
> -
> -		/* build a mask of acceptable page sizes */
> -		pgsize = (1UL << (pgsize_idx + 1)) - 1;
> -
> -		/* throw away page sizes not supported by the hardware */
> -		pgsize &= domain->ops->pgsize_bitmap;
> -
> -		/* make sure we're still sane */
> -		BUG_ON(!pgsize);
> -
> -		/* pick the biggest page */
> -		pgsize_idx = __fls(pgsize);
> -		pgsize = 1UL << pgsize_idx;
> +		size_t pgsize = iommu_pgsize(domain, iova | paddr, size);
>  
>  		pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova,
>  					(unsigned long)paddr, pgsize);
> @@ -863,9 +870,9 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>  	 * or we hit an area that isn't mapped.
>  	 */
>  	while (unmapped < size) {
> -		size_t left = size - unmapped;
> +		size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);
>  
> -		unmapped_page = domain->ops->unmap(domain, iova, left);
> +		unmapped_page = domain->ops->unmap(domain, iova, pgsize);
>  		if (!unmapped_page)
>  			break;
>  
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu: Split iommu_unmaps
@ 2013-11-07 16:37   ` David Woodhouse
  0 siblings, 0 replies; 10+ messages in thread
From: David Woodhouse @ 2013-11-07 16:37 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, joro, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2041 bytes --]

On Fri, 2013-05-24 at 11:14 -0600, Alex Williamson wrote:
> iommu_map splits requests into pages that the iommu driver reports
> that it can handle.  The iommu_unmap path does not do the same.  This
> can cause problems not only from callers that might expect the same
> behavior as the map path, but even from the failure path of iommu_map,
> should it fail at a point where it has mapped and needs to unwind a
> set of pages that the iommu driver cannot handle directly.  amd_iommu,
> for example, will BUG_ON if asked to unmap a non power of 2 size.
> 
> Fix this by extracting and generalizing the sizing code from the
> iommu_map path and use it for both map and unmap.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Ick, this is horrid and looks like it will introduce a massive
performance hit.

Surely the answer is to fix the AMD driver so that it will just get on
with it and unmap the {address, range} that it's asked to map?

And fix the generic iommu_map() code while we're at it, to do it the
same way.

IOTLB flushes are *slow*, and on certain hardware with
non-cache-coherent page tables even the dcache flush for the page table
is slow. If you deliberately break it up into individual pages and do
the flush between each one, rather than just unmapping the range, you
are pessimising the performance quite hard.

A while back, I went through the Intel IOMMU code to make sure it was
doing this right — it used to have this kind of bogosity with repeated
per-page cache and IOTLB flushes *internally*, and the resulting
performance improvement was shown at http://david.woodhou.se/iommu.png

You will basically be undoing that work, by ensuring that the low-level
driver never *sees* the full range.

If the AMD driver really can't handle more than one page at a time, let
it loop for *itself* over the pages.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu: Split iommu_unmaps
@ 2013-11-07 16:37   ` David Woodhouse
  0 siblings, 0 replies; 10+ messages in thread
From: David Woodhouse @ 2013-11-07 16:37 UTC (permalink / raw)
  To: Alex Williamson
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 2100 bytes --]

On Fri, 2013-05-24 at 11:14 -0600, Alex Williamson wrote:
> iommu_map splits requests into pages that the iommu driver reports
> that it can handle.  The iommu_unmap path does not do the same.  This
> can cause problems not only from callers that might expect the same
> behavior as the map path, but even from the failure path of iommu_map,
> should it fail at a point where it has mapped and needs to unwind a
> set of pages that the iommu driver cannot handle directly.  amd_iommu,
> for example, will BUG_ON if asked to unmap a non power of 2 size.
> 
> Fix this by extracting and generalizing the sizing code from the
> iommu_map path and use it for both map and unmap.
> 
> Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Ick, this is horrid and looks like it will introduce a massive
performance hit.

Surely the answer is to fix the AMD driver so that it will just get on
with it and unmap the {address, range} that it's asked to map?

And fix the generic iommu_map() code while we're at it, to do it the
same way.

IOTLB flushes are *slow*, and on certain hardware with
non-cache-coherent page tables even the dcache flush for the page table
is slow. If you deliberately break it up into individual pages and do
the flush between each one, rather than just unmapping the range, you
are pessimising the performance quite hard.

A while back, I went through the Intel IOMMU code to make sure it was
doing this right — it used to have this kind of bogosity with repeated
per-page cache and IOTLB flushes *internally*, and the resulting
performance improvement was shown at http://david.woodhou.se/iommu.png

You will basically be undoing that work, by ensuring that the low-level
driver never *sees* the full range.

If the AMD driver really can't handle more than one page at a time, let
it loop for *itself* over the pages.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org                              Intel Corporation

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu: Split iommu_unmaps
@ 2013-11-11 23:09     ` Alex Williamson
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2013-11-11 23:09 UTC (permalink / raw)
  To: David Woodhouse; +Cc: iommu, joro, linux-kernel

On Thu, 2013-11-07 at 16:37 +0000, David Woodhouse wrote:
> On Fri, 2013-05-24 at 11:14 -0600, Alex Williamson wrote:
> > iommu_map splits requests into pages that the iommu driver reports
> > that it can handle.  The iommu_unmap path does not do the same.  This
> > can cause problems not only from callers that might expect the same
> > behavior as the map path, but even from the failure path of iommu_map,
> > should it fail at a point where it has mapped and needs to unwind a
> > set of pages that the iommu driver cannot handle directly.  amd_iommu,
> > for example, will BUG_ON if asked to unmap a non power of 2 size.
> > 
> > Fix this by extracting and generalizing the sizing code from the
> > iommu_map path and use it for both map and unmap.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> 
> Ick, this is horrid and looks like it will introduce a massive
> performance hit.

For x86 there are essentially two users of iommu_unmap(), KVM and VFIO.
Both of them try to unmap an individual page and look at the result to
see how much was actually unmapped.  Everything else appears to be error
paths.  So where exactly is this massive performance hit?

> Surely the answer is to fix the AMD driver so that it will just get on
> with it and unmap the {address, range} that it's asked to map?

The IOMMU API allows iommu drivers to expose the page sizes they
support.  Mappings are done using these sizes so it only seems fair that
unmappings should as well.  At least that's what amd_iommu was
expecting.

> And fix the generic iommu_map() code while we're at it, to do it the
> same way.
> 
> IOTLB flushes are *slow*, and on certain hardware with
> non-cache-coherent page tables even the dcache flush for the page table
> is slow. If you deliberately break it up into individual pages and do
> the flush between each one, rather than just unmapping the range, you
> are pessimising the performance quite hard.
> 
> A while back, I went through the Intel IOMMU code to make sure it was
> doing this right — it used to have this kind of bogosity with repeated
> per-page cache and IOTLB flushes *internally*, and the resulting
> performance improvement was shown at http://david.woodhou.se/iommu.png
> 
> You will basically be undoing that work, by ensuring that the low-level
> driver never *sees* the full range.

That data is for dma_ops interfaces, not IOMMU API.  How is changing
iommu_unmap() in this way undoing any of your previous work?

> If the AMD driver really can't handle more than one page at a time, let
> it loop for *itself* over the pages.

Sure, but that's a change to the API where I think this fix was
correcting a bug in the implementation of the API.  Are there users of
iommu_unmap() that I don't know about?  Given the in-tree users, there's
not really a compelling argument to optimize.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu: Split iommu_unmaps
@ 2013-11-11 23:09     ` Alex Williamson
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2013-11-11 23:09 UTC (permalink / raw)
  To: David Woodhouse
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Thu, 2013-11-07 at 16:37 +0000, David Woodhouse wrote:
> On Fri, 2013-05-24 at 11:14 -0600, Alex Williamson wrote:
> > iommu_map splits requests into pages that the iommu driver reports
> > that it can handle.  The iommu_unmap path does not do the same.  This
> > can cause problems not only from callers that might expect the same
> > behavior as the map path, but even from the failure path of iommu_map,
> > should it fail at a point where it has mapped and needs to unwind a
> > set of pages that the iommu driver cannot handle directly.  amd_iommu,
> > for example, will BUG_ON if asked to unmap a non power of 2 size.
> > 
> > Fix this by extracting and generalizing the sizing code from the
> > iommu_map path and use it for both map and unmap.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> 
> Ick, this is horrid and looks like it will introduce a massive
> performance hit.

For x86 there are essentially two users of iommu_unmap(), KVM and VFIO.
Both of them try to unmap an individual page and look at the result to
see how much was actually unmapped.  Everything else appears to be error
paths.  So where exactly is this massive performance hit?

> Surely the answer is to fix the AMD driver so that it will just get on
> with it and unmap the {address, range} that it's asked to map?

The IOMMU API allows iommu drivers to expose the page sizes they
support.  Mappings are done using these sizes so it only seems fair that
unmappings should as well.  At least that's what amd_iommu was
expecting.

> And fix the generic iommu_map() code while we're at it, to do it the
> same way.
> 
> IOTLB flushes are *slow*, and on certain hardware with
> non-cache-coherent page tables even the dcache flush for the page table
> is slow. If you deliberately break it up into individual pages and do
> the flush between each one, rather than just unmapping the range, you
> are pessimising the performance quite hard.
> 
> A while back, I went through the Intel IOMMU code to make sure it was
> doing this right — it used to have this kind of bogosity with repeated
> per-page cache and IOTLB flushes *internally*, and the resulting
> performance improvement was shown at http://david.woodhou.se/iommu.png
> 
> You will basically be undoing that work, by ensuring that the low-level
> driver never *sees* the full range.

That data is for dma_ops interfaces, not IOMMU API.  How is changing
iommu_unmap() in this way undoing any of your previous work?

> If the AMD driver really can't handle more than one page at a time, let
> it loop for *itself* over the pages.

Sure, but that's a change to the API where I think this fix was
correcting a bug in the implementation of the API.  Are there users of
iommu_unmap() that I don't know about?  Given the in-tree users, there's
not really a compelling argument to optimize.  Thanks,

Alex

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu: Split iommu_unmaps
@ 2013-11-20 14:29       ` David Woodhouse
  0 siblings, 0 replies; 10+ messages in thread
From: David Woodhouse @ 2013-11-20 14:29 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3580 bytes --]

On Mon, 2013-11-11 at 16:09 -0700, Alex Williamson wrote:
> On Thu, 2013-11-07 at 16:37 +0000, David Woodhouse wrote:
> > On Fri, 2013-05-24 at 11:14 -0600, Alex Williamson wrote:
> > > iommu_map splits requests into pages that the iommu driver reports
> > > that it can handle.  The iommu_unmap path does not do the same.  This
> > > can cause problems not only from callers that might expect the same
> > > behavior as the map path, but even from the failure path of iommu_map,
> > > should it fail at a point where it has mapped and needs to unwind a
> > > set of pages that the iommu driver cannot handle directly.  amd_iommu,
> > > for example, will BUG_ON if asked to unmap a non power of 2 size.
> > > 
> > > Fix this by extracting and generalizing the sizing code from the
> > > iommu_map path and use it for both map and unmap.
> > > 
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > 
> > Ick, this is horrid and looks like it will introduce a massive
> > performance hit.
> 
> For x86 there are essentially two users of iommu_unmap(), KVM and VFIO.
> Both of them try to unmap an individual page and look at the result to
> see how much was actually unmapped.  Everything else appears to be error
> paths.  So where exactly is this massive performance hit?

It's there, in the code that you describe. This patch is making that
bogus behaviour even more firmly entrenched, and harder to fix.

There are out-of-tree users of this IOMMU API too. And while it sucks
that they are out-of-tree, they are working on fixing that. And I've
been talking to them about performance issues they already see on the
map side.

> > Surely the answer is to fix the AMD driver so that it will just get on
> > with it and unmap the {address, range} that it's asked to map?
> 
> The IOMMU API allows iommu drivers to expose the page sizes they
> support.  Mappings are done using these sizes so it only seems fair that
> unmappings should as well.  At least that's what amd_iommu was
> expecting.

This is silly.

The generic code has (almost) no business caring about the page sizes
that the IOMMU driver will support. It should care about them *only* as
an optimisation — "hey, if you manage to give me 2MiB pages I can work
faster then". But it should *only* be an optimisation. Fundamentally,
the map and unmap routines should just do as they're bloody told,
without expecting their caller to break down the calls into individual
pages.

> That data is for dma_ops interfaces, not IOMMU API.  How is changing
> iommu_unmap() in this way undoing any of your previous work?

That data is for the core map/unmap functions, which are accessed
through both APIs. While iommu_map() has the problem as you described,
iommu_unmap() didn't, and surely it would have been seeing the same
improvements... until this patch?

> > If the AMD driver really can't handle more than one page at a time, let
> > it loop for *itself* over the pages.
> 
> Sure, but that's a change to the API where I think this fix was
> correcting a bug in the implementation of the API.  Are there users of
> iommu_unmap() that I don't know about?  Given the in-tree users, there's
> not really a compelling argument to optimize.  Thanks,

It's a fix to a stupid API, yes. The current API has us manually marking
the Intel IOMMU as supporting *all* sizes of pages, just so that this
stupid "one page at a time" nonsense doesn't bite so hard... which
should have raised alarm bells at the time we did it, really.

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu: Split iommu_unmaps
@ 2013-11-20 14:29       ` David Woodhouse
  0 siblings, 0 replies; 10+ messages in thread
From: David Woodhouse @ 2013-11-20 14:29 UTC (permalink / raw)
  To: Alex Williamson
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 3609 bytes --]

On Mon, 2013-11-11 at 16:09 -0700, Alex Williamson wrote:
> On Thu, 2013-11-07 at 16:37 +0000, David Woodhouse wrote:
> > On Fri, 2013-05-24 at 11:14 -0600, Alex Williamson wrote:
> > > iommu_map splits requests into pages that the iommu driver reports
> > > that it can handle.  The iommu_unmap path does not do the same.  This
> > > can cause problems not only from callers that might expect the same
> > > behavior as the map path, but even from the failure path of iommu_map,
> > > should it fail at a point where it has mapped and needs to unwind a
> > > set of pages that the iommu driver cannot handle directly.  amd_iommu,
> > > for example, will BUG_ON if asked to unmap a non power of 2 size.
> > > 
> > > Fix this by extracting and generalizing the sizing code from the
> > > iommu_map path and use it for both map and unmap.
> > > 
> > > Signed-off-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > 
> > Ick, this is horrid and looks like it will introduce a massive
> > performance hit.
> 
> For x86 there are essentially two users of iommu_unmap(), KVM and VFIO.
> Both of them try to unmap an individual page and look at the result to
> see how much was actually unmapped.  Everything else appears to be error
> paths.  So where exactly is this massive performance hit?

It's there, in the code that you describe. This patch is making that
bogus behaviour even more firmly entrenched, and harder to fix.

There are out-of-tree users of this IOMMU API too. And while it sucks
that they are out-of-tree, they are working on fixing that. And I've
been talking to them about performance issues they already see on the
map side.

> > Surely the answer is to fix the AMD driver so that it will just get on
> > with it and unmap the {address, range} that it's asked to map?
> 
> The IOMMU API allows iommu drivers to expose the page sizes they
> support.  Mappings are done using these sizes so it only seems fair that
> unmappings should as well.  At least that's what amd_iommu was
> expecting.

This is silly.

The generic code has (almost) no business caring about the page sizes
that the IOMMU driver will support. It should care about them *only* as
an optimisation — "hey, if you manage to give me 2MiB pages I can work
faster then". But it should *only* be an optimisation. Fundamentally,
the map and unmap routines should just do as they're bloody told,
without expecting their caller to break down the calls into individual
pages.

> That data is for dma_ops interfaces, not IOMMU API.  How is changing
> iommu_unmap() in this way undoing any of your previous work?

That data is for the core map/unmap functions, which are accessed
through both APIs. While iommu_map() has the problem as you described,
iommu_unmap() didn't, and surely it would have been seeing the same
improvements... until this patch?

> > If the AMD driver really can't handle more than one page at a time, let
> > it loop for *itself* over the pages.
> 
> Sure, but that's a change to the API where I think this fix was
> correcting a bug in the implementation of the API.  Are there users of
> iommu_unmap() that I don't know about?  Given the in-tree users, there's
> not really a compelling argument to optimize.  Thanks,

It's a fix to a stupid API, yes. The current API has us manually marking
the Intel IOMMU as supporting *all* sizes of pages, just so that this
stupid "one page at a time" nonsense doesn't bite so hard... which
should have raised alarm bells at the time we did it, really.

-- 
dwmw2


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-11-20 14:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-24 17:14 [PATCH] iommu: Split iommu_unmaps Alex Williamson
2013-05-24 17:14 ` Alex Williamson
2013-06-05 16:39 ` Alex Williamson
2013-06-05 16:39   ` Alex Williamson
2013-11-07 16:37 ` David Woodhouse
2013-11-07 16:37   ` David Woodhouse
2013-11-11 23:09   ` Alex Williamson
2013-11-11 23:09     ` Alex Williamson
2013-11-20 14:29     ` David Woodhouse
2013-11-20 14:29       ` David Woodhouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.