iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] iommu: Optimise PCI SAC address trick
@ 2022-09-21 12:53 Robin Murphy
  2022-09-21 16:24 ` Linus Torvalds
  2022-09-22 14:59 ` John Garry
  0 siblings, 2 replies; 5+ messages in thread
From: Robin Murphy @ 2022-09-21 12:53 UTC (permalink / raw)
  To: joro; +Cc: will, iommu, linux-kernel, Linus Torvalds

Per the reasoning in commit 4bf7fda4dce2 ("iommu/dma: Add config for
PCI SAC address trick") and its subsequent revert, this mechanism no
longer serves its original purpose, but now only works around broken
hardware/drivers in a way that is unfortunately too impactful to remove.

This does not, however prevent us from solving the performance impact
which that workaround has on large-scale systems that don't need it.
That kicks in once the 32-bit IOVA space fills up and we keep
unsuccessfully trying to allocate from it. However, if we get to that
point then in fact it's already the endgame. The nature of the allocator
is such that the first IOVA we give to a device after the 32-bit space
runs out will be the highest possible address for that device, ever.
If that works, then great, we know we can optimise for speed by always
allocating from the full range. And if it doesn't, then the worst has
already happened and any brokenness is now showing, so there's no point
continuing to try to hide it.

To that end, implement a flag to refine this into a per-device policy
that can automatically get itself out of the way if and when it stops
being useful.

CC: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

v2: Refactor to avoid CONFIG_IOMMU_DMA=n breakage (oops)

 drivers/iommu/dma-iommu.c | 5 ++++-
 drivers/iommu/dma-iommu.h | 8 ++++++++
 drivers/iommu/iommu.c     | 3 +++
 include/linux/iommu.h     | 2 ++
 4 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 9297b741f5e8..1cebb16faa33 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -643,9 +643,12 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
 		dma_limit = min(dma_limit, (u64)domain->geometry.aperture_end);
 
 	/* Try to get PCI devices a SAC address */
-	if (dma_limit > DMA_BIT_MASK(32) && !iommu_dma_forcedac && dev_is_pci(dev))
+	if (dma_limit > DMA_BIT_MASK(32) && dev->iommu->pci_workaround) {
 		iova = alloc_iova_fast(iovad, iova_len,
 				       DMA_BIT_MASK(32) >> shift, false);
+		if (!iova)
+			dev->iommu->pci_workaround = false;
+	}
 
 	if (!iova)
 		iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift,
diff --git a/drivers/iommu/dma-iommu.h b/drivers/iommu/dma-iommu.h
index 942790009292..c7be42d4f0cf 100644
--- a/drivers/iommu/dma-iommu.h
+++ b/drivers/iommu/dma-iommu.h
@@ -17,6 +17,10 @@ int iommu_dma_init_fq(struct iommu_domain *domain);
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
 
 extern bool iommu_dma_forcedac;
+static inline void iommu_dma_set_pci_workaround(struct device *dev)
+{
+	dev->iommu->pci_workaround = !iommu_dma_forcedac;
+}
 
 #else /* CONFIG_IOMMU_DMA */
 
@@ -38,5 +42,9 @@ static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_he
 {
 }
 
+static inline void iommu_dma_set_pci_workaround(struct device *dev)
+{
+}
+
 #endif	/* CONFIG_IOMMU_DMA */
 #endif	/* __DMA_IOMMU_H */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index edc768bf8976..ba8afea63ef1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -323,6 +323,9 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
 
 	iommu_device_link(iommu_dev, dev);
 
+	if (dev_is_pci(dev))
+		iommu_dma_set_pci_workaround(dev);
+
 	return 0;
 
 out_release:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 79cb6eb560a8..0eb0f808109c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -368,6 +368,7 @@ struct iommu_fault_param {
  * @fwspec:	 IOMMU fwspec data
  * @iommu_dev:	 IOMMU device this device is linked to
  * @priv:	 IOMMU Driver private data
+ * @pci_workaround: Limit DMA allocations to 32-bit IOVAs
  *
  * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
  *	struct iommu_group	*iommu_group;
@@ -379,6 +380,7 @@ struct dev_iommu {
 	struct iommu_fwspec		*fwspec;
 	struct iommu_device		*iommu_dev;
 	void				*priv;
+	bool				pci_workaround;
 };
 
 int iommu_device_register(struct iommu_device *iommu,
-- 
2.36.1.dirty


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] iommu: Optimise PCI SAC address trick
  2022-09-21 12:53 [PATCH v2] iommu: Optimise PCI SAC address trick Robin Murphy
@ 2022-09-21 16:24 ` Linus Torvalds
  2022-09-21 17:00   ` Robin Murphy
  2022-09-22 14:59 ` John Garry
  1 sibling, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2022-09-21 16:24 UTC (permalink / raw)
  To: Robin Murphy; +Cc: joro, will, iommu, linux-kernel

On Wed, Sep 21, 2022 at 5:53 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> Per the reasoning in commit 4bf7fda4dce2 ("iommu/dma: Add config for
> PCI SAC address trick") and its subsequent revert, this mechanism no
> longer serves its original purpose, but now only works around broken
> hardware/drivers in a way that is unfortunately too impactful to remove.

I was going to test this, since the previous version failed for me.
But it's based on linux-next, and I didn't want to fight the conflicts
(including - but not limited to - the header file being moved) so I
dropped that plan.

If you think it's worth testing on the setup that used to fail, and
you send me a version that applies on my current tree, I can do so.

               Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] iommu: Optimise PCI SAC address trick
  2022-09-21 16:24 ` Linus Torvalds
@ 2022-09-21 17:00   ` Robin Murphy
  2022-09-21 17:27     ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Robin Murphy @ 2022-09-21 17:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: joro, will, iommu, linux-kernel

On 21/09/2022 5:24 pm, Linus Torvalds wrote:
> On Wed, Sep 21, 2022 at 5:53 AM Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> Per the reasoning in commit 4bf7fda4dce2 ("iommu/dma: Add config for
>> PCI SAC address trick") and its subsequent revert, this mechanism no
>> longer serves its original purpose, but now only works around broken
>> hardware/drivers in a way that is unfortunately too impactful to remove.
> 
> I was going to test this, since the previous version failed for me.
> But it's based on linux-next, and I didn't want to fight the conflicts
> (including - but not limited to - the header file being moved) so I
> dropped that plan.
> 
> If you think it's worth testing on the setup that used to fail, and
> you send me a version that applies on my current tree, I can do so.

No great rush - in principle this one *should* be entirely safe, but I'm
not suggesting we should hurry it into 6.0 or 6.1. I mostly just wanted
to make sure you got first dibs on any objection to this new approach :)

If you do want to have a play though, compile-tested cherry-pick below.

Thanks,
Robin.

----->8-----
Per the reasoning in commit 4bf7fda4dce2 ("iommu/dma: Add config for
PCI SAC address trick") and its subsequent revert, this mechanism no
longer serves its original purpose, but now only works around broken
hardware/drivers in a way that is unfortunately too impactful to remove.

This does not, however prevent us from solving the performance impact
which that workaround has on large-scale systems that don't need it.
That kicks in once the 32-bit IOVA space fills up and we keep
unsuccessfully trying to allocate from it. However, if we get to that
point then in fact it's already the endgame. The nature of the allocator
is such that the first IOVA we give to a device after the 32-bit space
runs out will be the highest possible address for that device, ever.
If that works, then great, we know we can optimise for speed by always
allocating from the full range. And if it doesn't, then the worst has
already happened and any brokenness is now showing, so there's no point
continuing to try to hide it.

To that end, implement a flag to refine this into a per-device policy
that can automatically get itself out of the way if and when it stops
being useful.

CC: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
  drivers/iommu/dma-iommu.c | 5 ++++-
  drivers/iommu/iommu.c     | 3 +++
  include/linux/dma-iommu.h | 8 ++++++++
  include/linux/iommu.h     | 2 ++
  4 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 17dd683b2fce..61c057aba3ad 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -642,9 +642,12 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
  		dma_limit = min(dma_limit, (u64)domain->geometry.aperture_end);
  
  	/* Try to get PCI devices a SAC address */
-	if (dma_limit > DMA_BIT_MASK(32) && !iommu_dma_forcedac && dev_is_pci(dev))
+	if (dma_limit > DMA_BIT_MASK(32) && dev->iommu->pci_workaround) {
  		iova = alloc_iova_fast(iovad, iova_len,
  				       DMA_BIT_MASK(32) >> shift, false);
+		if (!iova)
+			dev->iommu->pci_workaround = false;
+	}
  
  	if (!iova)
  		iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3a808146b50f..3348a67f603d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -256,6 +256,9 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
  
  	iommu_device_link(iommu_dev, dev);
  
+	if (dev_is_pci(dev))
+		iommu_dma_set_pci_workaround(dev);
+
  	return 0;
  
  out_release:
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 24607dc3c2ac..f2968b8e9f7d 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -42,6 +42,10 @@ void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
  		struct iommu_domain *domain);
  
  extern bool iommu_dma_forcedac;
+static inline void iommu_dma_set_pci_workaround(struct device *dev)
+{
+	dev->iommu->pci_workaround = !iommu_dma_forcedac;
+}
  
  #else /* CONFIG_IOMMU_DMA */
  
@@ -89,5 +93,9 @@ static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_he
  {
  }
  
+static inline void iommu_dma_set_pci_workaround(struct device *dev)
+{
+}
+
  #endif	/* CONFIG_IOMMU_DMA */
  #endif	/* __DMA_IOMMU_H */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ea30f00dc145..eff7a53850af 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -366,6 +366,7 @@ struct iommu_fault_param {
   * @fwspec:	 IOMMU fwspec data
   * @iommu_dev:	 IOMMU device this device is linked to
   * @priv:	 IOMMU Driver private data
+ * @pci_workaround: Limit DMA allocations to 32-bit IOVAs
   *
   * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
   *	struct iommu_group	*iommu_group;
@@ -377,6 +378,7 @@ struct dev_iommu {
  	struct iommu_fwspec		*fwspec;
  	struct iommu_device		*iommu_dev;
  	void				*priv;
+	bool				pci_workaround;
  };
  
  int iommu_device_register(struct iommu_device *iommu,
-- 
2.36.1.dirty

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] iommu: Optimise PCI SAC address trick
  2022-09-21 17:00   ` Robin Murphy
@ 2022-09-21 17:27     ` Linus Torvalds
  0 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2022-09-21 17:27 UTC (permalink / raw)
  To: Robin Murphy; +Cc: joro, will, iommu, linux-kernel

On Wed, Sep 21, 2022 at 10:01 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> If you do want to have a play though, compile-tested cherry-pick below.

Works for me.

Except for you having "format=flowed" and some whitespace damage. But
I fixed that up manually.

                Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] iommu: Optimise PCI SAC address trick
  2022-09-21 12:53 [PATCH v2] iommu: Optimise PCI SAC address trick Robin Murphy
  2022-09-21 16:24 ` Linus Torvalds
@ 2022-09-22 14:59 ` John Garry
  1 sibling, 0 replies; 5+ messages in thread
From: John Garry @ 2022-09-22 14:59 UTC (permalink / raw)
  To: Robin Murphy, joro; +Cc: will, iommu, linux-kernel, Linus Torvalds

On 21/09/2022 13:53, Robin Murphy wrote:
> Per the reasoning in commit 4bf7fda4dce2 ("iommu/dma: Add config for
> PCI SAC address trick") and its subsequent revert, this mechanism no
> longer serves its original purpose, but now only works around broken
> hardware/drivers in a way that is unfortunately too impactful to remove.
> 
> This does not, however prevent us from solving the performance impact
> which that workaround has on large-scale systems that don't need it.
> That kicks in once the 32-bit IOVA space fills up and we keep
> unsuccessfully trying to allocate from it. However, if we get to that
> point then in fact it's already the endgame. The nature of the allocator
> is such that the first IOVA we give to a device after the 32-bit space
> runs out will be the highest possible address for that device, ever.
> If that works, then great, we know we can optimise for speed by always
> allocating from the full range. And if it doesn't, then the worst has
> already happened and any brokenness is now showing, so there's no point
> continuing to try to hide it.
> 
> To that end, implement a flag to refine this into a per-device policy
> that can automatically get itself out of the way if and when it stops
> being useful.
> 
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

FWIW,

Reviewed-by: John Garry <john.garry@huawei.com>

2x minor comments below.

Thanks,
John

> ---
> 
> v2: Refactor to avoid CONFIG_IOMMU_DMA=n breakage (oops)
> 
>   drivers/iommu/dma-iommu.c | 5 ++++-
>   drivers/iommu/dma-iommu.h | 8 ++++++++
>   drivers/iommu/iommu.c     | 3 +++
>   include/linux/iommu.h     | 2 ++
>   4 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 9297b741f5e8..1cebb16faa33 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -643,9 +643,12 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
>   		dma_limit = min(dma_limit, (u64)domain->geometry.aperture_end);
>   
>   	/* Try to get PCI devices a SAC address */
> -	if (dma_limit > DMA_BIT_MASK(32) && !iommu_dma_forcedac && dev_is_pci(dev))
> +	if (dma_limit > DMA_BIT_MASK(32) && dev->iommu->pci_workaround) {
>   		iova = alloc_iova_fast(iovad, iova_len,
>   				       DMA_BIT_MASK(32) >> shift, false);
> +		if (!iova)
> +			dev->iommu->pci_workaround = false;

maybe a warn/notice log would be useful for broken systems to give a 
hint what could have gone wrong when it breaks (if we ever get here).

> +	}
>   
>   	if (!iova)
>   		iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift,
> diff --git a/drivers/iommu/dma-iommu.h b/drivers/iommu/dma-iommu.h
> index 942790009292..c7be42d4f0cf 100644
> --- a/drivers/iommu/dma-iommu.h
> +++ b/drivers/iommu/dma-iommu.h
> @@ -17,6 +17,10 @@ int iommu_dma_init_fq(struct iommu_domain *domain);
>   void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
>   
>   extern bool iommu_dma_forcedac;
> +static inline void iommu_dma_set_pci_workaround(struct device *dev)

"pci_workaround" is quite a vague name

> +{
> +	dev->iommu->pci_workaround = !iommu_dma_forcedac;
> +}
>   
>   #else /* CONFIG_IOMMU_DMA */
>   
> @@ -38,5 +42,9 @@ static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_he
>   {
>   }
>   
> +static inline void iommu_dma_set_pci_workaround(struct device *dev)
> +{
> +}
> +
>   #endif	/* CONFIG_IOMMU_DMA */
>   #endif	/* __DMA_IOMMU_H */
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index edc768bf8976..ba8afea63ef1 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -323,6 +323,9 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
>   
>   	iommu_device_link(iommu_dev, dev);
>   
> +	if (dev_is_pci(dev))
> +		iommu_dma_set_pci_workaround(dev);
> +
>   	return 0;
>   
>   out_release:
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 79cb6eb560a8..0eb0f808109c 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -368,6 +368,7 @@ struct iommu_fault_param {
>    * @fwspec:	 IOMMU fwspec data
>    * @iommu_dev:	 IOMMU device this device is linked to
>    * @priv:	 IOMMU Driver private data
> + * @pci_workaround: Limit DMA allocations to 32-bit IOVA >    *
>    * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
>    *	struct iommu_group	*iommu_group;
> @@ -379,6 +380,7 @@ struct dev_iommu {
>   	struct iommu_fwspec		*fwspec;
>   	struct iommu_device		*iommu_dev;
>   	void				*priv;
> +	bool				pci_workaround;
>   };
>   
>   int iommu_device_register(struct iommu_device *iommu,


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-09-22 14:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-21 12:53 [PATCH v2] iommu: Optimise PCI SAC address trick Robin Murphy
2022-09-21 16:24 ` Linus Torvalds
2022-09-21 17:00   ` Robin Murphy
2022-09-21 17:27     ` Linus Torvalds
2022-09-22 14:59 ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).