linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/6] iommu/s390: Fixes related to attach and aperture handling
@ 2022-10-06 14:46 Niklas Schnelle
  2022-10-06 14:46 ` [PATCH v5 1/6] iommu/s390: Fix duplicate domain attachments Niklas Schnelle
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Niklas Schnelle @ 2022-10-06 14:46 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

Hi All,

This is v5 of a follow up to Matt's recent series[0] where he tackled
a race that turned out to be outside of the s390 IOMMU driver itself as
well as duplicate device attachments. After an internal discussion we came
up with what I believe is a cleaner fix. Instead of actively checking for
duplicates we instead detach from any previous domain on attach. From my
cursory reading of the code this seems to be what the Intel IOMMU driver is
doing as well.

Moreover we drop the attempt to re-attach the device to its previous IOMMU
domain on failure. This was fragile, unlikely to help and unexpected for
calling code. Thanks Jason for the suggestion.

We can also get rid of struct s390_domain_device entirely if we instead
thread the list through the attached struct zpci_devs. This saves us from
having to allocate during attach and gets rid of one level of indirection
during IOMMU operations.

Additionally 3 more fixes have been added in v3 that weren't in v2 of this
series. One is for a potential situation where the aperture of a domain
could shrink and leave invalid translations. The next one fixes an off by
one in checking validity of an IOVA and the last one fixes a wrong value
for pgsize_bitmap.

In v4 we also add a patch changing to the map_pages()/unmap_pages()
interface in order to prevent a performance regression due to the
pgsize_bitmap change.

*Note*:
This series is against the s390 features branch[1] which already contains
the bus_next field removal that was part of v2.

It is also available as a branch with the GPG signed tag
s390_iommu_fixes_v5 on my niks/linux.git on git.kernel.org[2].

*Open Question*:
Which tree should this go via?

Best regards,
Niklas

Changes since v4:
- Add patch to change to the map_pages()/unmap_pages() API to prevent
  a performance regression from the pgsize_bitmap change (Robin)
- In patch 1 unregister IOAT on error (Matt)
- Turn the aperture check in attach into a WARN_ON() in patch 3 (Jason)

Changes since v3:
- Drop s390_domain from __s390_iommu_detach_device() (Jason)
- WARN_ON() mismatched domain in s390_iommu_detach_device() (Jason)
- Use __s390_iommu_detach_device() in s390_iommu_release_device() (Jason)
- Make aperture check resistant against overflow (Jason)

Changes since v2:
- The patch removing the unused bus_next field has been spun out and
  already made it into the s390 feature branch on git.kernel.org
- Make __s390_iommu_detach_device() return void (Jason)
- Remove the re-attach on failure dance as it is unlikely to help
  and complicates debug and recovery (Jason)
- Ignore attempts to detach from domain that is not the active one
- Add patch to fix potential shrinking of the aperture and use
  reserved ranges per device instead of the aperture to respect
  IOVA range restrictions (Jason)
- Add a fix for an off by one error on checking an IOVA against
  the aperture
- Add a fix for wrong pgsize_bitmap

Changes since v1:
- After patch 3 we don't have to search in the devices list on detach as
  we alreadz have hold of the zpci_dev (Jason)
- Add a WARN_ON() if somehow ended up detaching a device from a domain that
  isn't the device's current domain.
- Removed the iteration and list delete from s390_domain_free() instead
  just WARN_ON() when we're freeing without having detached
- The last two points should help catching sequencing errors much more
  quickly in the future.

[0] https://lore.kernel.org/linux-iommu/20220831201236.77595-1-mjrosato@linux.ibm.com/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git

Niklas Schnelle (6):
  iommu/s390: Fix duplicate domain attachments
  iommu/s390: Get rid of s390_domain_device
  iommu/s390: Fix potential s390_domain aperture shrinking
  iommu/s390: Fix incorrect aperture check
  iommu/s390: Fix incorrect pgsize_bitmap
  iommu/s390: Implement map_pages()/unmap_pages() instead of
    map()/unmap()

 arch/s390/include/asm/pci.h |   1 +
 drivers/iommu/s390-iommu.c  | 221 +++++++++++++++++-------------------
 2 files changed, 107 insertions(+), 115 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 1/6] iommu/s390: Fix duplicate domain attachments
  2022-10-06 14:46 [PATCH v5 0/6] iommu/s390: Fixes related to attach and aperture handling Niklas Schnelle
@ 2022-10-06 14:46 ` Niklas Schnelle
  2022-10-06 21:02   ` Matthew Rosato
  2022-10-06 14:46 ` [PATCH v5 2/6] iommu/s390: Get rid of s390_domain_device Niklas Schnelle
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Niklas Schnelle @ 2022-10-06 14:46 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

Since commit fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev
calls") we can end up with duplicates in the list of devices attached to
a domain. This is inefficient and confusing since only one domain can
actually be in control of the IOMMU translations for a device. Fix this
by detaching the device from the previous domain, if any, on attach.
Add a WARN_ON() in case we still have attached devices on freeing the
domain. While here remove the re-attach on failure dance as it was
determined to be unlikely to help and may confuse debug and recovery.

Fixes: fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev calls")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
v4->v5:
- Unregister IOAT and set zdev->dma_table on error (Matt)

 drivers/iommu/s390-iommu.c | 102 ++++++++++++++++---------------------
 1 file changed, 43 insertions(+), 59 deletions(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index c898bcbbce11..938998c46bd3 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -79,10 +79,36 @@ static void s390_domain_free(struct iommu_domain *domain)
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 
+	WARN_ON(!list_empty(&s390_domain->devices));
 	dma_cleanup_tables(s390_domain->dma_table);
 	kfree(s390_domain);
 }
 
+static void __s390_iommu_detach_device(struct zpci_dev *zdev)
+{
+	struct s390_domain *s390_domain = zdev->s390_domain;
+	struct s390_domain_device *domain_device, *tmp;
+	unsigned long flags;
+
+	if (!s390_domain)
+		return;
+
+	spin_lock_irqsave(&s390_domain->list_lock, flags);
+	list_for_each_entry_safe(domain_device, tmp, &s390_domain->devices,
+				 list) {
+		if (domain_device->zdev == zdev) {
+			list_del(&domain_device->list);
+			kfree(domain_device);
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+
+	zpci_unregister_ioat(zdev, 0);
+	zdev->s390_domain = NULL;
+	zdev->dma_table = NULL;
+}
+
 static int s390_iommu_attach_device(struct iommu_domain *domain,
 				    struct device *dev)
 {
@@ -90,7 +116,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	struct zpci_dev *zdev = to_zpci_dev(dev);
 	struct s390_domain_device *domain_device;
 	unsigned long flags;
-	int cc, rc;
+	int cc, rc = 0;
 
 	if (!zdev)
 		return -ENODEV;
@@ -99,23 +125,17 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	if (!domain_device)
 		return -ENOMEM;
 
-	if (zdev->dma_table && !zdev->s390_domain) {
-		cc = zpci_dma_exit_device(zdev);
-		if (cc) {
-			rc = -EIO;
-			goto out_free;
-		}
-	}
-
 	if (zdev->s390_domain)
-		zpci_unregister_ioat(zdev, 0);
+		__s390_iommu_detach_device(zdev);
+	else if (zdev->dma_table)
+		zpci_dma_exit_device(zdev);
 
 	zdev->dma_table = s390_domain->dma_table;
 	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
 				virt_to_phys(zdev->dma_table));
 	if (cc) {
 		rc = -EIO;
-		goto out_restore;
+		goto out_free;
 	}
 
 	spin_lock_irqsave(&s390_domain->list_lock, flags);
@@ -127,9 +147,9 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	/* Allow only devices with identical DMA range limits */
 	} else if (domain->geometry.aperture_start != zdev->start_dma ||
 		   domain->geometry.aperture_end != zdev->end_dma) {
-		rc = -EINVAL;
 		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
-		goto out_restore;
+		rc = -EINVAL;
+		goto out_unregister;
 	}
 	domain_device->zdev = zdev;
 	zdev->s390_domain = s390_domain;
@@ -138,14 +158,9 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 
 	return 0;
 
-out_restore:
-	if (!zdev->s390_domain) {
-		zpci_dma_init_device(zdev);
-	} else {
-		zdev->dma_table = zdev->s390_domain->dma_table;
-		zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
-				   virt_to_phys(zdev->dma_table));
-	}
+out_unregister:
+	zpci_unregister_ioat(zdev, 0);
+	zdev->dma_table = NULL;
 out_free:
 	kfree(domain_device);
 
@@ -155,32 +170,12 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 static void s390_iommu_detach_device(struct iommu_domain *domain,
 				     struct device *dev)
 {
-	struct s390_domain *s390_domain = to_s390_domain(domain);
 	struct zpci_dev *zdev = to_zpci_dev(dev);
-	struct s390_domain_device *domain_device, *tmp;
-	unsigned long flags;
-	int found = 0;
 
-	if (!zdev)
-		return;
-
-	spin_lock_irqsave(&s390_domain->list_lock, flags);
-	list_for_each_entry_safe(domain_device, tmp, &s390_domain->devices,
-				 list) {
-		if (domain_device->zdev == zdev) {
-			list_del(&domain_device->list);
-			kfree(domain_device);
-			found = 1;
-			break;
-		}
-	}
-	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+	WARN_ON(zdev->s390_domain != to_s390_domain(domain));
 
-	if (found && (zdev->s390_domain == s390_domain)) {
-		zdev->s390_domain = NULL;
-		zpci_unregister_ioat(zdev, 0);
-		zpci_dma_init_device(zdev);
-	}
+	__s390_iommu_detach_device(zdev);
+	zpci_dma_init_device(zdev);
 }
 
 static struct iommu_device *s390_iommu_probe_device(struct device *dev)
@@ -193,24 +188,13 @@ static struct iommu_device *s390_iommu_probe_device(struct device *dev)
 static void s390_iommu_release_device(struct device *dev)
 {
 	struct zpci_dev *zdev = to_zpci_dev(dev);
-	struct iommu_domain *domain;
 
 	/*
-	 * This is a workaround for a scenario where the IOMMU API common code
-	 * "forgets" to call the detach_dev callback: After binding a device
-	 * to vfio-pci and completing the VFIO_SET_IOMMU ioctl (which triggers
-	 * the attach_dev), removing the device via
-	 * "echo 1 > /sys/bus/pci/devices/.../remove" won't trigger detach_dev,
-	 * only release_device will be called via the BUS_NOTIFY_REMOVED_DEVICE
-	 * notifier.
-	 *
-	 * So let's call detach_dev from here if it hasn't been called before.
+	 * release_device is expected to detach any domain currently attached
+	 * to the device, but keep it attached to other devices in the group.
 	 */
-	if (zdev && zdev->s390_domain) {
-		domain = iommu_get_domain_for_dev(dev);
-		if (domain)
-			s390_iommu_detach_device(domain, dev);
-	}
+	if (zdev)
+		__s390_iommu_detach_device(zdev);
 }
 
 static int s390_iommu_update_trans(struct s390_domain *s390_domain,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 2/6] iommu/s390: Get rid of s390_domain_device
  2022-10-06 14:46 [PATCH v5 0/6] iommu/s390: Fixes related to attach and aperture handling Niklas Schnelle
  2022-10-06 14:46 ` [PATCH v5 1/6] iommu/s390: Fix duplicate domain attachments Niklas Schnelle
@ 2022-10-06 14:46 ` Niklas Schnelle
  2022-10-06 15:19   ` Niklas Schnelle
  2022-10-06 14:46 ` [PATCH v5 3/6] iommu/s390: Fix potential s390_domain aperture shrinking Niklas Schnelle
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Niklas Schnelle @ 2022-10-06 14:46 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

The struct s390_domain_device serves the sole purpose as list entry for
the devices list of a struct s390_domain. As it contains no additional
information besides a list_head and a pointer to the struct zpci_dev we
can simplify things and just thread the device list through struct
zpci_dev directly. This removes the need to allocate during domain
attach and gets rid of one level of indirection during mapping
operations.

Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 arch/s390/include/asm/pci.h |  1 +
 drivers/iommu/s390-iommu.c  | 37 +++++++------------------------------
 2 files changed, 8 insertions(+), 30 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 108e732d7b14..15f8714ca9b7 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -117,6 +117,7 @@ struct zpci_bus {
 struct zpci_dev {
 	struct zpci_bus *zbus;
 	struct list_head entry;		/* list of all zpci_devices, needed for hotplug, etc. */
+	struct list_head iommu_list;
 	struct kref kref;
 	struct hotplug_slot hotplug_slot;
 
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 938998c46bd3..9b3ae4b14636 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -29,11 +29,6 @@ struct s390_domain {
 	spinlock_t		list_lock;
 };
 
-struct s390_domain_device {
-	struct list_head	list;
-	struct zpci_dev		*zdev;
-};
-
 static struct s390_domain *to_s390_domain(struct iommu_domain *dom)
 {
 	return container_of(dom, struct s390_domain, domain);
@@ -87,21 +82,13 @@ static void s390_domain_free(struct iommu_domain *domain)
 static void __s390_iommu_detach_device(struct zpci_dev *zdev)
 {
 	struct s390_domain *s390_domain = zdev->s390_domain;
-	struct s390_domain_device *domain_device, *tmp;
 	unsigned long flags;
 
 	if (!s390_domain)
 		return;
 
 	spin_lock_irqsave(&s390_domain->list_lock, flags);
-	list_for_each_entry_safe(domain_device, tmp, &s390_domain->devices,
-				 list) {
-		if (domain_device->zdev == zdev) {
-			list_del(&domain_device->list);
-			kfree(domain_device);
-			break;
-		}
-	}
+	list_del_init(&zdev->iommu_list);
 	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
 
 	zpci_unregister_ioat(zdev, 0);
@@ -114,17 +101,12 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 	struct zpci_dev *zdev = to_zpci_dev(dev);
-	struct s390_domain_device *domain_device;
 	unsigned long flags;
 	int cc, rc = 0;
 
 	if (!zdev)
 		return -ENODEV;
 
-	domain_device = kzalloc(sizeof(*domain_device), GFP_KERNEL);
-	if (!domain_device)
-		return -ENOMEM;
-
 	if (zdev->s390_domain)
 		__s390_iommu_detach_device(zdev);
 	else if (zdev->dma_table)
@@ -133,10 +115,8 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	zdev->dma_table = s390_domain->dma_table;
 	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
 				virt_to_phys(zdev->dma_table));
-	if (cc) {
-		rc = -EIO;
-		goto out_free;
-	}
+	if (cc)
+		return -EIO;
 
 	spin_lock_irqsave(&s390_domain->list_lock, flags);
 	/* First device defines the DMA range limits */
@@ -151,9 +131,8 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 		rc = -EINVAL;
 		goto out_unregister;
 	}
-	domain_device->zdev = zdev;
 	zdev->s390_domain = s390_domain;
-	list_add(&domain_device->list, &s390_domain->devices);
+	list_add(&zdev->iommu_list, &s390_domain->devices);
 	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
 
 	return 0;
@@ -161,8 +140,6 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 out_unregister:
 	zpci_unregister_ioat(zdev, 0);
 	zdev->dma_table = NULL;
-out_free:
-	kfree(domain_device);
 
 	return rc;
 }
@@ -201,10 +178,10 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
 				   phys_addr_t pa, dma_addr_t dma_addr,
 				   size_t size, int flags)
 {
-	struct s390_domain_device *domain_device;
 	phys_addr_t page_addr = pa & PAGE_MASK;
 	dma_addr_t start_dma_addr = dma_addr;
 	unsigned long irq_flags, nr_pages, i;
+	struct zpci_dev *zdev;
 	unsigned long *entry;
 	int rc = 0;
 
@@ -229,8 +206,8 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
 	}
 
 	spin_lock(&s390_domain->list_lock);
-	list_for_each_entry(domain_device, &s390_domain->devices, list) {
-		rc = zpci_refresh_trans((u64) domain_device->zdev->fh << 32,
+	list_for_each_entry(zdev, &s390_domain->devices, iommu_list) {
+		rc = zpci_refresh_trans((u64)zdev->fh << 32,
 					start_dma_addr, nr_pages * PAGE_SIZE);
 		if (rc)
 			break;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 3/6] iommu/s390: Fix potential s390_domain aperture shrinking
  2022-10-06 14:46 [PATCH v5 0/6] iommu/s390: Fixes related to attach and aperture handling Niklas Schnelle
  2022-10-06 14:46 ` [PATCH v5 1/6] iommu/s390: Fix duplicate domain attachments Niklas Schnelle
  2022-10-06 14:46 ` [PATCH v5 2/6] iommu/s390: Get rid of s390_domain_device Niklas Schnelle
@ 2022-10-06 14:46 ` Niklas Schnelle
  2022-10-06 15:21   ` Niklas Schnelle
  2022-10-06 21:02   ` Matthew Rosato
  2022-10-06 14:46 ` [PATCH v5 4/6] iommu/s390: Fix incorrect aperture check Niklas Schnelle
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 16+ messages in thread
From: Niklas Schnelle @ 2022-10-06 14:46 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

The s390 IOMMU driver currently sets the IOMMU domain's aperture to
match the device specific DMA address range of the device that is first
attached. This is not ideal. For one if the domain has no device
attached in the meantime the aperture could be shrunk allowing
translations outside the aperture to exist in the translation tables.
Also this is a bit of a misuse of the aperture which really should
describe what addresses can be translated and not some device specific
limitations.

Instead of misusing the aperture like this we can instead create
reserved ranges for the ranges inaccessible to the attached devices
allowing devices with overlapping ranges to still share an IOMMU domain.
This also significantly simplifies s390_iommu_attach_device() allowing
us to move the aperture check to the beginning of the function and
removing the need to hold the device list's lock to check the aperture.

As we then use the same aperture for all domains and it only depends on
the table properties we can already check zdev->start_dma/end_dma at
probe time and turn the check on attach into a WARN_ON().

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
v4->v5:
- Make aperture check in attach a WARN_ON() and fail in probe if
  zdev->start_dma/end_dma doesn't git in aperture  (Jason)

 drivers/iommu/s390-iommu.c | 65 +++++++++++++++++++++++++-------------
 1 file changed, 43 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 9b3ae4b14636..1f6c9bee9a80 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -62,6 +62,9 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
 		kfree(s390_domain);
 		return NULL;
 	}
+	s390_domain->domain.geometry.force_aperture = true;
+	s390_domain->domain.geometry.aperture_start = 0;
+	s390_domain->domain.geometry.aperture_end = ZPCI_TABLE_SIZE_RT - 1;
 
 	spin_lock_init(&s390_domain->dma_table_lock);
 	spin_lock_init(&s390_domain->list_lock);
@@ -102,46 +105,32 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 	struct zpci_dev *zdev = to_zpci_dev(dev);
 	unsigned long flags;
-	int cc, rc = 0;
+	int cc;
 
 	if (!zdev)
 		return -ENODEV;
 
+	WARN_ON(domain->geometry.aperture_start > zdev->end_dma ||
+		domain->geometry.aperture_end < zdev->start_dma);
+
 	if (zdev->s390_domain)
 		__s390_iommu_detach_device(zdev);
 	else if (zdev->dma_table)
 		zpci_dma_exit_device(zdev);
 
-	zdev->dma_table = s390_domain->dma_table;
 	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
-				virt_to_phys(zdev->dma_table));
+				virt_to_phys(s390_domain->dma_table));
 	if (cc)
 		return -EIO;
 
-	spin_lock_irqsave(&s390_domain->list_lock, flags);
-	/* First device defines the DMA range limits */
-	if (list_empty(&s390_domain->devices)) {
-		domain->geometry.aperture_start = zdev->start_dma;
-		domain->geometry.aperture_end = zdev->end_dma;
-		domain->geometry.force_aperture = true;
-	/* Allow only devices with identical DMA range limits */
-	} else if (domain->geometry.aperture_start != zdev->start_dma ||
-		   domain->geometry.aperture_end != zdev->end_dma) {
-		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
-		rc = -EINVAL;
-		goto out_unregister;
-	}
+	zdev->dma_table = s390_domain->dma_table;
 	zdev->s390_domain = s390_domain;
+
+	spin_lock_irqsave(&s390_domain->list_lock, flags);
 	list_add(&zdev->iommu_list, &s390_domain->devices);
 	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
 
 	return 0;
-
-out_unregister:
-	zpci_unregister_ioat(zdev, 0);
-	zdev->dma_table = NULL;
-
-	return rc;
 }
 
 static void s390_iommu_detach_device(struct iommu_domain *domain,
@@ -155,10 +144,41 @@ static void s390_iommu_detach_device(struct iommu_domain *domain,
 	zpci_dma_init_device(zdev);
 }
 
+static void s390_iommu_get_resv_regions(struct device *dev,
+					struct list_head *list)
+{
+	struct zpci_dev *zdev = to_zpci_dev(dev);
+	struct iommu_resv_region *region;
+
+	if (zdev->start_dma) {
+		region = iommu_alloc_resv_region(0, zdev->start_dma, 0,
+						 IOMMU_RESV_RESERVED);
+		if (!region)
+			return;
+		list_add_tail(&region->list, list);
+	}
+
+	if (zdev->end_dma < ZPCI_TABLE_SIZE_RT - 1) {
+		region = iommu_alloc_resv_region(zdev->end_dma + 1,
+						 ZPCI_TABLE_SIZE_RT - zdev->end_dma - 1,
+						 0, IOMMU_RESV_RESERVED);
+		if (!region)
+			return;
+		list_add_tail(&region->list, list);
+	}
+}
+
 static struct iommu_device *s390_iommu_probe_device(struct device *dev)
 {
 	struct zpci_dev *zdev = to_zpci_dev(dev);
 
+	if (zdev->start_dma > zdev->end_dma ||
+	    zdev->start_dma > ZPCI_TABLE_SIZE_RT - 1)
+		return ERR_PTR(-EINVAL);
+
+	if (zdev->end_dma > ZPCI_TABLE_SIZE_RT - 1)
+		zdev->end_dma = ZPCI_TABLE_SIZE_RT - 1;
+
 	return &zdev->iommu_dev;
 }
 
@@ -337,6 +357,7 @@ static const struct iommu_ops s390_iommu_ops = {
 	.release_device = s390_iommu_release_device,
 	.device_group = generic_device_group,
 	.pgsize_bitmap = S390_IOMMU_PGSIZES,
+	.get_resv_regions = s390_iommu_get_resv_regions,
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev	= s390_iommu_attach_device,
 		.detach_dev	= s390_iommu_detach_device,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 4/6] iommu/s390: Fix incorrect aperture check
  2022-10-06 14:46 [PATCH v5 0/6] iommu/s390: Fixes related to attach and aperture handling Niklas Schnelle
                   ` (2 preceding siblings ...)
  2022-10-06 14:46 ` [PATCH v5 3/6] iommu/s390: Fix potential s390_domain aperture shrinking Niklas Schnelle
@ 2022-10-06 14:46 ` Niklas Schnelle
  2022-10-06 14:46 ` [PATCH v5 5/6] iommu/s390: Fix incorrect pgsize_bitmap Niklas Schnelle
  2022-10-06 14:47 ` [PATCH v5 6/6] iommu/s390: Implement map_pages()/unmap_pages() instead of map()/unmap() Niklas Schnelle
  5 siblings, 0 replies; 16+ messages in thread
From: Niklas Schnelle @ 2022-10-06 14:46 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

The domain->geometry.aperture_end specifies the last valid address treat
it as such when checking if a DMA address is valid.

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 drivers/iommu/s390-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 1f6c9bee9a80..a89fd0256f99 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -206,7 +206,7 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
 	int rc = 0;
 
 	if (dma_addr < s390_domain->domain.geometry.aperture_start ||
-	    dma_addr + size > s390_domain->domain.geometry.aperture_end)
+	    (dma_addr + size - 1) > s390_domain->domain.geometry.aperture_end)
 		return -EINVAL;
 
 	nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 5/6] iommu/s390: Fix incorrect pgsize_bitmap
  2022-10-06 14:46 [PATCH v5 0/6] iommu/s390: Fixes related to attach and aperture handling Niklas Schnelle
                   ` (3 preceding siblings ...)
  2022-10-06 14:46 ` [PATCH v5 4/6] iommu/s390: Fix incorrect aperture check Niklas Schnelle
@ 2022-10-06 14:46 ` Niklas Schnelle
  2022-10-06 14:47 ` [PATCH v5 6/6] iommu/s390: Implement map_pages()/unmap_pages() instead of map()/unmap() Niklas Schnelle
  5 siblings, 0 replies; 16+ messages in thread
From: Niklas Schnelle @ 2022-10-06 14:46 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

The .pgsize_bitmap property of struct iommu_ops is not a page mask but
rather has a bit set for each size of pages the IOMMU supports. As the
comment correctly pointed out at this moment the code only support 4K
pages so simply use SZ_4K here.

Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 drivers/iommu/s390-iommu.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index a89fd0256f99..ac200f0b81fa 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -12,13 +12,6 @@
 #include <linux/sizes.h>
 #include <asm/pci_dma.h>
 
-/*
- * Physically contiguous memory regions can be mapped with 4 KiB alignment,
- * we allow all page sizes that are an order of 4KiB (no special large page
- * support so far).
- */
-#define S390_IOMMU_PGSIZES	(~0xFFFUL)
-
 static const struct iommu_ops s390_iommu_ops;
 
 struct s390_domain {
@@ -356,7 +349,7 @@ static const struct iommu_ops s390_iommu_ops = {
 	.probe_device = s390_iommu_probe_device,
 	.release_device = s390_iommu_release_device,
 	.device_group = generic_device_group,
-	.pgsize_bitmap = S390_IOMMU_PGSIZES,
+	.pgsize_bitmap = SZ_4K,
 	.get_resv_regions = s390_iommu_get_resv_regions,
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev	= s390_iommu_attach_device,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 6/6] iommu/s390: Implement map_pages()/unmap_pages() instead of map()/unmap()
  2022-10-06 14:46 [PATCH v5 0/6] iommu/s390: Fixes related to attach and aperture handling Niklas Schnelle
                   ` (4 preceding siblings ...)
  2022-10-06 14:46 ` [PATCH v5 5/6] iommu/s390: Fix incorrect pgsize_bitmap Niklas Schnelle
@ 2022-10-06 14:47 ` Niklas Schnelle
  2022-10-06 21:03   ` Matthew Rosato
  5 siblings, 1 reply; 16+ messages in thread
From: Niklas Schnelle @ 2022-10-06 14:47 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

While s390-iommu currently implements the map_page()/unmap_page()
operations which only map/unmap a single page at a time the internal
s390_iommu_update_trans() API already supports mapping/unmapping a range
of pages at once. Take advantage of this by implementing the
map_pages()/unmap_pages() operations instead thus allowing users of the
IOMMU drivers to map multiple pages in a single call followed by
a single I/O TLB flush if needed.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 drivers/iommu/s390-iommu.c | 48 +++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index ac200f0b81fa..7b92855135ac 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -189,20 +189,15 @@ static void s390_iommu_release_device(struct device *dev)
 
 static int s390_iommu_update_trans(struct s390_domain *s390_domain,
 				   phys_addr_t pa, dma_addr_t dma_addr,
-				   size_t size, int flags)
+				   unsigned long nr_pages, int flags)
 {
 	phys_addr_t page_addr = pa & PAGE_MASK;
 	dma_addr_t start_dma_addr = dma_addr;
-	unsigned long irq_flags, nr_pages, i;
+	unsigned long irq_flags, i;
 	struct zpci_dev *zdev;
 	unsigned long *entry;
 	int rc = 0;
 
-	if (dma_addr < s390_domain->domain.geometry.aperture_start ||
-	    (dma_addr + size - 1) > s390_domain->domain.geometry.aperture_end)
-		return -EINVAL;
-
-	nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
 	if (!nr_pages)
 		return 0;
 
@@ -245,11 +240,24 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
 	return rc;
 }
 
-static int s390_iommu_map(struct iommu_domain *domain, unsigned long iova,
-			  phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+static int s390_iommu_map_pages(struct iommu_domain *domain,
+				unsigned long iova, phys_addr_t paddr,
+				size_t pgsize, size_t pgcount,
+				int prot, gfp_t gfp, size_t *mapped)
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 	int flags = ZPCI_PTE_VALID, rc = 0;
+	size_t size = pgcount << __ffs(pgsize);
+
+	if (pgsize != SZ_4K)
+		return -EINVAL;
+
+	if (iova < s390_domain->domain.geometry.aperture_start ||
+	    (iova + size - 1) > s390_domain->domain.geometry.aperture_end)
+		return -EINVAL;
+
+	if (!IS_ALIGNED(iova | paddr, pgsize))
+		return -EINVAL;
 
 	if (!(prot & IOMMU_READ))
 		return -EINVAL;
@@ -258,7 +266,9 @@ static int s390_iommu_map(struct iommu_domain *domain, unsigned long iova,
 		flags |= ZPCI_TABLE_PROTECTED;
 
 	rc = s390_iommu_update_trans(s390_domain, paddr, iova,
-				     size, flags);
+				     pgcount, flags);
+	if (!rc)
+		*mapped = size;
 
 	return rc;
 }
@@ -294,21 +304,27 @@ static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
 	return phys;
 }
 
-static size_t s390_iommu_unmap(struct iommu_domain *domain,
-			       unsigned long iova, size_t size,
-			       struct iommu_iotlb_gather *gather)
+static size_t s390_iommu_unmap_pages(struct iommu_domain *domain,
+				     unsigned long iova,
+				     size_t pgsize, size_t pgcount,
+				     struct iommu_iotlb_gather *gather)
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
+	size_t size = pgcount << __ffs(pgsize);
 	int flags = ZPCI_PTE_INVALID;
 	phys_addr_t paddr;
 	int rc;
 
+	if (iova < s390_domain->domain.geometry.aperture_start ||
+	    (iova + size - 1) > s390_domain->domain.geometry.aperture_end)
+		return 0;
+
 	paddr = s390_iommu_iova_to_phys(domain, iova);
 	if (!paddr)
 		return 0;
 
 	rc = s390_iommu_update_trans(s390_domain, paddr, iova,
-				     size, flags);
+				     pgcount, flags);
 	if (rc)
 		return 0;
 
@@ -354,8 +370,8 @@ static const struct iommu_ops s390_iommu_ops = {
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev	= s390_iommu_attach_device,
 		.detach_dev	= s390_iommu_detach_device,
-		.map		= s390_iommu_map,
-		.unmap		= s390_iommu_unmap,
+		.map_pages	= s390_iommu_map_pages,
+		.unmap_pages	= s390_iommu_unmap_pages,
 		.iova_to_phys	= s390_iommu_iova_to_phys,
 		.free		= s390_domain_free,
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/6] iommu/s390: Get rid of s390_domain_device
  2022-10-06 14:46 ` [PATCH v5 2/6] iommu/s390: Get rid of s390_domain_device Niklas Schnelle
@ 2022-10-06 15:19   ` Niklas Schnelle
  0 siblings, 0 replies; 16+ messages in thread
From: Niklas Schnelle @ 2022-10-06 15:19 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

On Thu, 2022-10-06 at 16:46 +0200, Niklas Schnelle wrote:
> The struct s390_domain_device serves the sole purpose as list entry for
> the devices list of a struct s390_domain. As it contains no additional
> information besides a list_head and a pointer to the struct zpci_dev we
> can simplify things and just thread the device list through struct
> zpci_dev directly. This removes the need to allocate during domain
> attach and gets rid of one level of indirection during mapping
> operations.
> 
> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

@Jason, on second thought not sure if your R-b still holds as you only
implied that the zpci_unregister_ioat(), zdev->dma_table = NULL is okay
with the plan of ignoring zpci_register_ioat() fail on error in the
future (already have the patch).

> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
>  arch/s390/include/asm/pci.h |  1 +
>  drivers/iommu/s390-iommu.c  | 37 +++++++------------------------------
>  2 files changed, 8 insertions(+), 30 deletions(-)
---8<---


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/6] iommu/s390: Fix potential s390_domain aperture shrinking
  2022-10-06 14:46 ` [PATCH v5 3/6] iommu/s390: Fix potential s390_domain aperture shrinking Niklas Schnelle
@ 2022-10-06 15:21   ` Niklas Schnelle
  2022-10-06 21:02   ` Matthew Rosato
  1 sibling, 0 replies; 16+ messages in thread
From: Niklas Schnelle @ 2022-10-06 15:21 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

On Thu, 2022-10-06 at 16:46 +0200, Niklas Schnelle wrote:
> The s390 IOMMU driver currently sets the IOMMU domain's aperture to
> match the device specific DMA address range of the device that is first
> attached. This is not ideal. For one if the domain has no device
> attached in the meantime the aperture could be shrunk allowing
> translations outside the aperture to exist in the translation tables.
> Also this is a bit of a misuse of the aperture which really should
> describe what addresses can be translated and not some device specific
> limitations.
> 
> Instead of misusing the aperture like this we can instead create
> reserved ranges for the ranges inaccessible to the attached devices
> allowing devices with overlapping ranges to still share an IOMMU domain.
> This also significantly simplifies s390_iommu_attach_device() allowing
> us to move the aperture check to the beginning of the function and
> removing the need to hold the device list's lock to check the aperture.
> 
> As we then use the same aperture for all domains and it only depends on
> the table properties we can already check zdev->start_dma/end_dma at
> probe time and turn the check on attach into a WARN_ON().
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>

@Matt, @Jason I did drop the R-b's here because the change Jason
suggested of changing the aperture check on attach to a WARN_ON() and
checking zdev->start_dma/end_dma on probe is a behavioral change.

> ---
> v4->v5:
> - Make aperture check in attach a WARN_ON() and fail in probe if
>   zdev->start_dma/end_dma doesn't git in aperture  (Jason)
> 
>  drivers/iommu/s390-iommu.c | 65 +++++++++++++++++++++++++-------------
>  1 file changed, 43 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index 9b3ae4b14636..1f6c9bee9a80 100644
> --- a/drivers/iommu/s390-iommu.c
---8<---


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/6] iommu/s390: Fix duplicate domain attachments
  2022-10-06 14:46 ` [PATCH v5 1/6] iommu/s390: Fix duplicate domain attachments Niklas Schnelle
@ 2022-10-06 21:02   ` Matthew Rosato
  2022-10-07  6:55     ` Niklas Schnelle
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Rosato @ 2022-10-06 21:02 UTC (permalink / raw)
  To: Niklas Schnelle, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

On 10/6/22 10:46 AM, Niklas Schnelle wrote:
> Since commit fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev
> calls") we can end up with duplicates in the list of devices attached to
> a domain. This is inefficient and confusing since only one domain can
> actually be in control of the IOMMU translations for a device. Fix this
> by detaching the device from the previous domain, if any, on attach.
> Add a WARN_ON() in case we still have attached devices on freeing the
> domain. While here remove the re-attach on failure dance as it was
> determined to be unlikely to help and may confuse debug and recovery.
> 
> Fixes: fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev calls")
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
> v4->v5:
> - Unregister IOAT and set zdev->dma_table on error (Matt)
>
...

>  static int s390_iommu_attach_device(struct iommu_domain *domain,
>  				    struct device *dev)
>  {
> @@ -90,7 +116,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  	struct zpci_dev *zdev = to_zpci_dev(dev);
>  	struct s390_domain_device *domain_device;
>  	unsigned long flags;
> -	int cc, rc;
> +	int cc, rc = 0;
>  
>  	if (!zdev)
>  		return -ENODEV;
> @@ -99,23 +125,17 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  	if (!domain_device)
>  		return -ENOMEM;
>  
> -	if (zdev->dma_table && !zdev->s390_domain) {
> -		cc = zpci_dma_exit_device(zdev);
> -		if (cc) {
> -			rc = -EIO;
> -			goto out_free;
> -		}
> -	}
> -
>  	if (zdev->s390_domain)
> -		zpci_unregister_ioat(zdev, 0);
> +		__s390_iommu_detach_device(zdev);
> +	else if (zdev->dma_table)
> +		zpci_dma_exit_device(zdev);
>  
>  	zdev->dma_table = s390_domain->dma_table;
>  	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
>  				virt_to_phys(zdev->dma_table));
>  	if (cc) {
>  		rc = -EIO;
> -		goto out_restore;
> +		goto out_free;
>  	}

Hmm, with this we will leave attach_dev with a zdev->dma_table associated with this domain (not one generated via zpci_dma_init_device) and zdev->s390_domain == 0.  Won't this cause both s390_domain_free and zpci_dma_exit_device() to try and free the same dma table?

I think we also have to leave with a NULL zdev->dma_table in this case too (you technically could skip the zpci_unregister_ioat)

>  
>  	spin_lock_irqsave(&s390_domain->list_lock, flags);
> @@ -127,9 +147,9 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  	/* Allow only devices with identical DMA range limits */
>  	} else if (domain->geometry.aperture_start != zdev->start_dma ||
>  		   domain->geometry.aperture_end != zdev->end_dma) {
> -		rc = -EINVAL;
>  		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> -		goto out_restore;
> +		rc = -EINVAL;
> +		goto out_unregister;
>  	}
>  	domain_device->zdev = zdev;
>  	zdev->s390_domain = s390_domain;
> @@ -138,14 +158,9 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  
>  	return 0;
>  
> -out_restore:
> -	if (!zdev->s390_domain) {
> -		zpci_dma_init_device(zdev);
> -	} else {
> -		zdev->dma_table = zdev->s390_domain->dma_table;
> -		zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> -				   virt_to_phys(zdev->dma_table));
> -	}
> +out_unregister:
> +	zpci_unregister_ioat(zdev, 0);
> +	zdev->dma_table = NULL;
>  out_free:
>  	kfree(domain_device);
>  


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/6] iommu/s390: Fix potential s390_domain aperture shrinking
  2022-10-06 14:46 ` [PATCH v5 3/6] iommu/s390: Fix potential s390_domain aperture shrinking Niklas Schnelle
  2022-10-06 15:21   ` Niklas Schnelle
@ 2022-10-06 21:02   ` Matthew Rosato
  2022-10-07  7:37     ` Niklas Schnelle
  1 sibling, 1 reply; 16+ messages in thread
From: Matthew Rosato @ 2022-10-06 21:02 UTC (permalink / raw)
  To: Niklas Schnelle, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

On 10/6/22 10:46 AM, Niklas Schnelle wrote:
> The s390 IOMMU driver currently sets the IOMMU domain's aperture to
> match the device specific DMA address range of the device that is first
> attached. This is not ideal. For one if the domain has no device
> attached in the meantime the aperture could be shrunk allowing
> translations outside the aperture to exist in the translation tables.
> Also this is a bit of a misuse of the aperture which really should
> describe what addresses can be translated and not some device specific
> limitations.
> 
> Instead of misusing the aperture like this we can instead create
> reserved ranges for the ranges inaccessible to the attached devices
> allowing devices with overlapping ranges to still share an IOMMU domain.
> This also significantly simplifies s390_iommu_attach_device() allowing
> us to move the aperture check to the beginning of the function and
> removing the need to hold the device list's lock to check the aperture.
> 
> As we then use the same aperture for all domains and it only depends on
> the table properties we can already check zdev->start_dma/end_dma at
> probe time and turn the check on attach into a WARN_ON().
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>

Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>

> ---
> v4->v5:
> - Make aperture check in attach a WARN_ON() and fail in probe if
>   zdev->start_dma/end_dma doesn't git in aperture  (Jason)
> 
>  drivers/iommu/s390-iommu.c | 65 +++++++++++++++++++++++++-------------
>  1 file changed, 43 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index 9b3ae4b14636..1f6c9bee9a80 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -62,6 +62,9 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
>  		kfree(s390_domain);
>  		return NULL;
>  	}
> +	s390_domain->domain.geometry.force_aperture = true;
> +	s390_domain->domain.geometry.aperture_start = 0;
> +	s390_domain->domain.geometry.aperture_end = ZPCI_TABLE_SIZE_RT - 1;
>  
>  	spin_lock_init(&s390_domain->dma_table_lock);
>  	spin_lock_init(&s390_domain->list_lock);
> @@ -102,46 +105,32 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  	struct s390_domain *s390_domain = to_s390_domain(domain);
>  	struct zpci_dev *zdev = to_zpci_dev(dev);
>  	unsigned long flags;
> -	int cc, rc = 0;
> +	int cc;
>  
>  	if (!zdev)
>  		return -ENODEV;
>  
> +	WARN_ON(domain->geometry.aperture_start > zdev->end_dma ||
> +		domain->geometry.aperture_end < zdev->start_dma);
> +
>  	if (zdev->s390_domain)
>  		__s390_iommu_detach_device(zdev);
>  	else if (zdev->dma_table)
>  		zpci_dma_exit_device(zdev);
>  
> -	zdev->dma_table = s390_domain->dma_table;
>  	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> -				virt_to_phys(zdev->dma_table));
> +				virt_to_phys(s390_domain->dma_table));
>  	if (cc)
>  		return -EIO;
>  
> -	spin_lock_irqsave(&s390_domain->list_lock, flags);
> -	/* First device defines the DMA range limits */
> -	if (list_empty(&s390_domain->devices)) {
> -		domain->geometry.aperture_start = zdev->start_dma;
> -		domain->geometry.aperture_end = zdev->end_dma;
> -		domain->geometry.force_aperture = true;
> -	/* Allow only devices with identical DMA range limits */
> -	} else if (domain->geometry.aperture_start != zdev->start_dma ||
> -		   domain->geometry.aperture_end != zdev->end_dma) {
> -		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> -		rc = -EINVAL;
> -		goto out_unregister;
> -	}
> +	zdev->dma_table = s390_domain->dma_table;
>  	zdev->s390_domain = s390_domain;
> +
> +	spin_lock_irqsave(&s390_domain->list_lock, flags);
>  	list_add(&zdev->iommu_list, &s390_domain->devices);
>  	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
>  
>  	return 0;
> -
> -out_unregister:
> -	zpci_unregister_ioat(zdev, 0);
> -	zdev->dma_table = NULL;
> -
> -	return rc;
>  }
>  
>  static void s390_iommu_detach_device(struct iommu_domain *domain,
> @@ -155,10 +144,41 @@ static void s390_iommu_detach_device(struct iommu_domain *domain,
>  	zpci_dma_init_device(zdev);
>  }
>  
> +static void s390_iommu_get_resv_regions(struct device *dev,
> +					struct list_head *list)
> +{
> +	struct zpci_dev *zdev = to_zpci_dev(dev);
> +	struct iommu_resv_region *region;
> +
> +	if (zdev->start_dma) {
> +		region = iommu_alloc_resv_region(0, zdev->start_dma, 0,
> +						 IOMMU_RESV_RESERVED);
> +		if (!region)
> +			return;
> +		list_add_tail(&region->list, list);
> +	}
> +
> +	if (zdev->end_dma < ZPCI_TABLE_SIZE_RT - 1) {
> +		region = iommu_alloc_resv_region(zdev->end_dma + 1,
> +						 ZPCI_TABLE_SIZE_RT - zdev->end_dma - 1,
> +						 0, IOMMU_RESV_RESERVED);
> +		if (!region)
> +			return;
> +		list_add_tail(&region->list, list);
> +	}
> +}
> +
>  static struct iommu_device *s390_iommu_probe_device(struct device *dev)
>  {
>  	struct zpci_dev *zdev = to_zpci_dev(dev);
>  
> +	if (zdev->start_dma > zdev->end_dma ||
> +	    zdev->start_dma > ZPCI_TABLE_SIZE_RT - 1)
> +		return ERR_PTR(-EINVAL);
> +
> +	if (zdev->end_dma > ZPCI_TABLE_SIZE_RT - 1)
> +		zdev->end_dma = ZPCI_TABLE_SIZE_RT - 1;
> +
>  	return &zdev->iommu_dev;
>  }
>  
> @@ -337,6 +357,7 @@ static const struct iommu_ops s390_iommu_ops = {
>  	.release_device = s390_iommu_release_device,
>  	.device_group = generic_device_group,
>  	.pgsize_bitmap = S390_IOMMU_PGSIZES,
> +	.get_resv_regions = s390_iommu_get_resv_regions,
>  	.default_domain_ops = &(const struct iommu_domain_ops) {
>  		.attach_dev	= s390_iommu_attach_device,
>  		.detach_dev	= s390_iommu_detach_device,


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 6/6] iommu/s390: Implement map_pages()/unmap_pages() instead of map()/unmap()
  2022-10-06 14:47 ` [PATCH v5 6/6] iommu/s390: Implement map_pages()/unmap_pages() instead of map()/unmap() Niklas Schnelle
@ 2022-10-06 21:03   ` Matthew Rosato
  2022-10-07  6:59     ` Niklas Schnelle
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Rosato @ 2022-10-06 21:03 UTC (permalink / raw)
  To: Niklas Schnelle, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

On 10/6/22 10:47 AM, Niklas Schnelle wrote:
> While s390-iommu currently implements the map_page()/unmap_page()
> operations which only map/unmap a single page at a time the internal
> s390_iommu_update_trans() API already supports mapping/unmapping a range
> of pages at once. Take advantage of this by implementing the
> map_pages()/unmap_pages() operations instead thus allowing users of the
> IOMMU drivers to map multiple pages in a single call followed by
> a single I/O TLB flush if needed.
> 
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
>  drivers/iommu/s390-iommu.c | 48 +++++++++++++++++++++++++-------------
>  1 file changed, 32 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index ac200f0b81fa..7b92855135ac 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -189,20 +189,15 @@ static void s390_iommu_release_device(struct device *dev)
>  
>  static int s390_iommu_update_trans(struct s390_domain *s390_domain,
>  				   phys_addr_t pa, dma_addr_t dma_addr,
> -				   size_t size, int flags)
> +				   unsigned long nr_pages, int flags)
>  {
>  	phys_addr_t page_addr = pa & PAGE_MASK;
>  	dma_addr_t start_dma_addr = dma_addr;
> -	unsigned long irq_flags, nr_pages, i;
> +	unsigned long irq_flags, i;
>  	struct zpci_dev *zdev;
>  	unsigned long *entry;
>  	int rc = 0;
>  
> -	if (dma_addr < s390_domain->domain.geometry.aperture_start ||
> -	    (dma_addr + size - 1) > s390_domain->domain.geometry.aperture_end)
> -		return -EINVAL;
> -
> -	nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
>  	if (!nr_pages)
>  		return 0;
>  
> @@ -245,11 +240,24 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
>  	return rc;
>  }
>  
> -static int s390_iommu_map(struct iommu_domain *domain, unsigned long iova,
> -			  phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
> +static int s390_iommu_map_pages(struct iommu_domain *domain,
> +				unsigned long iova, phys_addr_t paddr,
> +				size_t pgsize, size_t pgcount,
> +				int prot, gfp_t gfp, size_t *mapped)
>  {
>  	struct s390_domain *s390_domain = to_s390_domain(domain);
>  	int flags = ZPCI_PTE_VALID, rc = 0;
> +	size_t size = pgcount << __ffs(pgsize);
> +
> +	if (pgsize != SZ_4K)
> +		return -EINVAL;
> +
> +	if (iova < s390_domain->domain.geometry.aperture_start ||
> +	    (iova + size - 1) > s390_domain->domain.geometry.aperture_end)
> +		return -EINVAL;
> +
> +	if (!IS_ALIGNED(iova | paddr, pgsize))
> +		return -EINVAL;
>  
>  	if (!(prot & IOMMU_READ))
>  		return -EINVAL;
> @@ -258,7 +266,9 @@ static int s390_iommu_map(struct iommu_domain *domain, unsigned long iova,
>  		flags |= ZPCI_TABLE_PROTECTED;
>  
>  	rc = s390_iommu_update_trans(s390_domain, paddr, iova,
> -				     size, flags);
> +				     pgcount, flags);
> +	if (!rc)
> +		*mapped = size;
>  
>  	return rc;
>  }
> @@ -294,21 +304,27 @@ static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
>  	return phys;
>  }
>  
> -static size_t s390_iommu_unmap(struct iommu_domain *domain,
> -			       unsigned long iova, size_t size,
> -			       struct iommu_iotlb_gather *gather)
> +static size_t s390_iommu_unmap_pages(struct iommu_domain *domain,
> +				     unsigned long iova,
> +				     size_t pgsize, size_t pgcount,
> +				     struct iommu_iotlb_gather *gather)
>  {
>  	struct s390_domain *s390_domain = to_s390_domain(domain);
> +	size_t size = pgcount << __ffs(pgsize);
>  	int flags = ZPCI_PTE_INVALID;
>  	phys_addr_t paddr;
>  	int rc;
>  
> +	if (iova < s390_domain->domain.geometry.aperture_start ||
> +	    (iova + size - 1) > s390_domain->domain.geometry.aperture_end)
> +		return 0;
> +

Overall this LGTM and runs well with my testing.  But I'm curious why we silently ignore an egregiously bad unmap request here?  We've already done an -EINVAL for an attempt to map_pages() something outside of the aperture.  If something still tries to unmap_pages() outside of the aperture, that seems like a bug?  Maybe this should be surrounded by a if (WARN_ON(... || ...) to signify the unexpected behavior and then still return 0?

Otherwise:
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>

>  	paddr = s390_iommu_iova_to_phys(domain, iova);
>  	if (!paddr)
>  		return 0;
>  
>  	rc = s390_iommu_update_trans(s390_domain, paddr, iova,
> -				     size, flags);
> +				     pgcount, flags);
>  	if (rc)
>  		return 0;
>  
> @@ -354,8 +370,8 @@ static const struct iommu_ops s390_iommu_ops = {
>  	.default_domain_ops = &(const struct iommu_domain_ops) {
>  		.attach_dev	= s390_iommu_attach_device,
>  		.detach_dev	= s390_iommu_detach_device,
> -		.map		= s390_iommu_map,
> -		.unmap		= s390_iommu_unmap,
> +		.map_pages	= s390_iommu_map_pages,
> +		.unmap_pages	= s390_iommu_unmap_pages,
>  		.iova_to_phys	= s390_iommu_iova_to_phys,
>  		.free		= s390_domain_free,
>  	}


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/6] iommu/s390: Fix duplicate domain attachments
  2022-10-06 21:02   ` Matthew Rosato
@ 2022-10-07  6:55     ` Niklas Schnelle
  2022-10-07 11:20       ` Niklas Schnelle
  0 siblings, 1 reply; 16+ messages in thread
From: Niklas Schnelle @ 2022-10-07  6:55 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

On Thu, 2022-10-06 at 17:02 -0400, Matthew Rosato wrote:
> On 10/6/22 10:46 AM, Niklas Schnelle wrote:
> > Since commit fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev
> > calls") we can end up with duplicates in the list of devices attached to
> > a domain. This is inefficient and confusing since only one domain can
> > actually be in control of the IOMMU translations for a device. Fix this
> > by detaching the device from the previous domain, if any, on attach.
> > Add a WARN_ON() in case we still have attached devices on freeing the
> > domain. While here remove the re-attach on failure dance as it was
> > determined to be unlikely to help and may confuse debug and recovery.
> > 
> > Fixes: fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev calls")
> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > ---
> > v4->v5:
> > - Unregister IOAT and set zdev->dma_table on error (Matt)
> > 
> ...
> 
> >  static int s390_iommu_attach_device(struct iommu_domain *domain,
> >  				    struct device *dev)
> >  {
> > @@ -90,7 +116,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
> >  	struct zpci_dev *zdev = to_zpci_dev(dev);
> >  	struct s390_domain_device *domain_device;
> >  	unsigned long flags;
> > -	int cc, rc;
> > +	int cc, rc = 0;
> >  
> >  	if (!zdev)
> >  		return -ENODEV;
> > @@ -99,23 +125,17 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
> >  	if (!domain_device)
> >  		return -ENOMEM;
> >  
> > -	if (zdev->dma_table && !zdev->s390_domain) {
> > -		cc = zpci_dma_exit_device(zdev);
> > -		if (cc) {
> > -			rc = -EIO;
> > -			goto out_free;
> > -		}
> > -	}
> > -
> >  	if (zdev->s390_domain)
> > -		zpci_unregister_ioat(zdev, 0);
> > +		__s390_iommu_detach_device(zdev);
> > +	else if (zdev->dma_table)
> > +		zpci_dma_exit_device(zdev);
> >  
> >  	zdev->dma_table = s390_domain->dma_table;
> >  	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> >  				virt_to_phys(zdev->dma_table));
> >  	if (cc) {
> >  		rc = -EIO;
> > -		goto out_restore;
> > +		goto out_free;
> >  	}
> 
> Hmm, with this we will leave attach_dev with a zdev->dma_table associated with this domain (not one generated via zpci_dma_init_device) and zdev->s390_domain == 0.  Won't this cause both s390_domain_free and zpci_dma_exit_device() to try and free the same dma table?
> 
> I think we also have to leave with a NULL zdev->dma_table in this case too (you technically could skip the zpci_unregister_ioat)


Argh you're right. This is I think a a bad rebase, in v4 I had the
zpci_register_ioat() use s390_domain->dma_table and only set zdev-
>dma_table after that succeeded. I seem to have lost that part
somewhere along the way. With that we zdev->dma_table would be NULL and
all would be good.

> 
> >  
> >  	spin_lock_irqsave(&s390_domain->list_lock, flags);
> > @@ -127,9 +147,9 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
> >  	/* Allow only devices with identical DMA range limits */
> >  	} else if (domain->geometry.aperture_start != zdev->start_dma ||
> >  		   domain->geometry.aperture_end != zdev->end_dma) {
> > -		rc = -EINVAL;
> >  		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> > -		goto out_restore;
> > +		rc = -EINVAL;
> > +		goto out_unregister;
> >  	}
> >  	domain_device->zdev = zdev;
> >  	zdev->s390_domain = s390_domain;
> > @@ -138,14 +158,9 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
> >  
> >  	return 0;
> >  
> > -out_restore:
> > -	if (!zdev->s390_domain) {
> > -		zpci_dma_init_device(zdev);
> > -	} else {
> > -		zdev->dma_table = zdev->s390_domain->dma_table;
> > -		zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> > -				   virt_to_phys(zdev->dma_table));
> > -	}
> > +out_unregister:
> > +	zpci_unregister_ioat(zdev, 0);
> > +	zdev->dma_table = NULL;
> >  out_free:
> >  	kfree(domain_device);
> >  



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 6/6] iommu/s390: Implement map_pages()/unmap_pages() instead of map()/unmap()
  2022-10-06 21:03   ` Matthew Rosato
@ 2022-10-07  6:59     ` Niklas Schnelle
  0 siblings, 0 replies; 16+ messages in thread
From: Niklas Schnelle @ 2022-10-07  6:59 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

On Thu, 2022-10-06 at 17:03 -0400, Matthew Rosato wrote:
> On 10/6/22 10:47 AM, Niklas Schnelle wrote:
> > While s390-iommu currently implements the map_page()/unmap_page()
> > operations which only map/unmap a single page at a time the internal
> > s390_iommu_update_trans() API already supports mapping/unmapping a range
> > of pages at once. Take advantage of this by implementing the
> > map_pages()/unmap_pages() operations instead thus allowing users of the
> > IOMMU drivers to map multiple pages in a single call followed by
> > a single I/O TLB flush if needed.
> > 
> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > ---
> >  drivers/iommu/s390-iommu.c | 48 +++++++++++++++++++++++++-------------
> >  1 file changed, 32 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> > index ac200f0b81fa..7b92855135ac 100644
> > --- a/drivers/iommu/s390-iommu.c
> > +++ b/drivers/iommu/s390-iommu.c
> > @@ -189,20 +189,15 @@ static void s390_iommu_release_device(struct device *dev)
> >  
> >  static int s390_iommu_update_trans(struct s390_domain *s390_domain,
> >  				   phys_addr_t pa, dma_addr_t dma_addr,
> > -				   size_t size, int flags)
> > +				   unsigned long nr_pages, int flags)
> >  {
> >  	phys_addr_t page_addr = pa & PAGE_MASK;
> >  	dma_addr_t start_dma_addr = dma_addr;
> > -	unsigned long irq_flags, nr_pages, i;
> > +	unsigned long irq_flags, i;
> >  	struct zpci_dev *zdev;
> >  	unsigned long *entry;
> >  	int rc = 0;
> >  
> > -	if (dma_addr < s390_domain->domain.geometry.aperture_start ||
> > -	    (dma_addr + size - 1) > s390_domain->domain.geometry.aperture_end)
> > -		return -EINVAL;
> > -
> > -	nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
> >  	if (!nr_pages)
> >  		return 0;
> >  
> > @@ -245,11 +240,24 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
> >  	return rc;
> >  }
> >  
> > -static int s390_iommu_map(struct iommu_domain *domain, unsigned long iova,
> > -			  phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
> > +static int s390_iommu_map_pages(struct iommu_domain *domain,
> > +				unsigned long iova, phys_addr_t paddr,
> > +				size_t pgsize, size_t pgcount,
> > +				int prot, gfp_t gfp, size_t *mapped)
> >  {
> >  	struct s390_domain *s390_domain = to_s390_domain(domain);
> >  	int flags = ZPCI_PTE_VALID, rc = 0;
> > +	size_t size = pgcount << __ffs(pgsize);
> > +
> > +	if (pgsize != SZ_4K)
> > +		return -EINVAL;
> > +
> > +	if (iova < s390_domain->domain.geometry.aperture_start ||
> > +	    (iova + size - 1) > s390_domain->domain.geometry.aperture_end)
> > +		return -EINVAL;
> > +
> > +	if (!IS_ALIGNED(iova | paddr, pgsize))
> > +		return -EINVAL;
> >  
> >  	if (!(prot & IOMMU_READ))
> >  		return -EINVAL;
> > @@ -258,7 +266,9 @@ static int s390_iommu_map(struct iommu_domain *domain, unsigned long iova,
> >  		flags |= ZPCI_TABLE_PROTECTED;
> >  
> >  	rc = s390_iommu_update_trans(s390_domain, paddr, iova,
> > -				     size, flags);
> > +				     pgcount, flags);
> > +	if (!rc)
> > +		*mapped = size;
> >  
> >  	return rc;
> >  }
> > @@ -294,21 +304,27 @@ static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
> >  	return phys;
> >  }
> >  
> > -static size_t s390_iommu_unmap(struct iommu_domain *domain,
> > -			       unsigned long iova, size_t size,
> > -			       struct iommu_iotlb_gather *gather)
> > +static size_t s390_iommu_unmap_pages(struct iommu_domain *domain,
> > +				     unsigned long iova,
> > +				     size_t pgsize, size_t pgcount,
> > +				     struct iommu_iotlb_gather *gather)
> >  {
> >  	struct s390_domain *s390_domain = to_s390_domain(domain);
> > +	size_t size = pgcount << __ffs(pgsize);
> >  	int flags = ZPCI_PTE_INVALID;
> >  	phys_addr_t paddr;
> >  	int rc;
> >  
> > +	if (iova < s390_domain->domain.geometry.aperture_start ||
> > +	    (iova + size - 1) > s390_domain->domain.geometry.aperture_end)
> > +		return 0;
> > +
> 
> Overall this LGTM and runs well with my testing.  But I'm curious why we silently ignore an egregiously bad unmap request here?  We've already done an -EINVAL for an attempt to map_pages() something outside of the aperture.  If something still tries to unmap_pages() outside of the aperture, that seems like a bug?  Maybe this should be surrounded by a if (WARN_ON(... || ...) to signify the unexpected behavior and then still return 0?
> 

Well, the problem here is that .unmap_pages() returns size_t so
0 is kind of the only invalid value.But yes, a WARN_ON() seems
warranted.

> Otherwise:
> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
> 
> >  	paddr = s390_iommu_iova_to_phys(domain, iova);
> >  	if (!paddr)
> >  		return 0;
> >  
> >  	rc = s390_iommu_update_trans(s390_domain, paddr, iova,
> > -				     size, flags);
> > +				     pgcount, flags);
> >  	if (rc)
> >  		return 0;
> >  
> > @@ -354,8 +370,8 @@ static const struct iommu_ops s390_iommu_ops = {
> >  	.default_domain_ops = &(const struct iommu_domain_ops) {
> >  		.attach_dev	= s390_iommu_attach_device,
> >  		.detach_dev	= s390_iommu_detach_device,
> > -		.map		= s390_iommu_map,
> > -		.unmap		= s390_iommu_unmap,
> > +		.map_pages	= s390_iommu_map_pages,
> > +		.unmap_pages	= s390_iommu_unmap_pages,
> >  		.iova_to_phys	= s390_iommu_iova_to_phys,
> >  		.free		= s390_domain_free,
> >  	}



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 3/6] iommu/s390: Fix potential s390_domain aperture shrinking
  2022-10-06 21:02   ` Matthew Rosato
@ 2022-10-07  7:37     ` Niklas Schnelle
  0 siblings, 0 replies; 16+ messages in thread
From: Niklas Schnelle @ 2022-10-07  7:37 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

On Thu, 2022-10-06 at 17:02 -0400, Matthew Rosato wrote:
> On 10/6/22 10:46 AM, Niklas Schnelle wrote:
> > The s390 IOMMU driver currently sets the IOMMU domain's aperture to
> > match the device specific DMA address range of the device that is first
> > attached. This is not ideal. For one if the domain has no device
> > attached in the meantime the aperture could be shrunk allowing
> > translations outside the aperture to exist in the translation tables.
> > Also this is a bit of a misuse of the aperture which really should
> > describe what addresses can be translated and not some device specific
> > limitations.
> > 
> > Instead of misusing the aperture like this we can instead create
> > reserved ranges for the ranges inaccessible to the attached devices
> > allowing devices with overlapping ranges to still share an IOMMU domain.
> > This also significantly simplifies s390_iommu_attach_device() allowing
> > us to move the aperture check to the beginning of the function and
> > removing the need to hold the device list's lock to check the aperture.
> > 
> > As we then use the same aperture for all domains and it only depends on
> > the table properties we can already check zdev->start_dma/end_dma at
> > probe time and turn the check on attach into a WARN_ON().
> > 
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> 
> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
> 
> > ---
> > v4->v5:
> > - Make aperture check in attach a WARN_ON() and fail in probe if
> >   zdev->start_dma/end_dma doesn't git in aperture  (Jason)
> > 
> >  drivers/iommu/s390-iommu.c | 65 +++++++++++++++++++++++++-------------
> >  1 file changed, 43 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> > index 9b3ae4b14636..1f6c9bee9a80 100644
> > --- a/drivers/iommu/s390-iommu.c
> > +++ b/drivers/iommu/s390-iommu.c
> > @@ -62,6 +62,9 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
> >  		kfree(s390_domain);
> >  		return NULL;
> >  	}
> > +	s390_domain->domain.geometry.force_aperture = true;
> > +	s390_domain->domain.geometry.aperture_start = 0;
> > +	s390_domain->domain.geometry.aperture_end = ZPCI_TABLE_SIZE_RT - 1;
> >  
> >  	spin_lock_init(&s390_domain->dma_table_lock);
> >  	spin_lock_init(&s390_domain->list_lock);
> > @@ -102,46 +105,32 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
> >  	struct s390_domain *s390_domain = to_s390_domain(domain);
> >  	struct zpci_dev *zdev = to_zpci_dev(dev);
> >  	unsigned long flags;
> > -	int cc, rc = 0;
> > +	int cc;
> >  
> >  	if (!zdev)
> >  		return -ENODEV;
> >  
> > +	WARN_ON(domain->geometry.aperture_start > zdev->end_dma ||
> > +		domain->geometry.aperture_end < zdev->start_dma);
> > +

I think this one should still return with -EINVAL.

> >  	if (zdev->s390_domain)
> >  		__s390_iommu_detach_device(zdev);
> >  	else if (zdev->dma_table)
> >  		zpci_dma_exit_device(zdev);
> >  
> > -	zdev->dma_table = s390_domain->dma_table;
> >  	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> > -				virt_to_phys(zdev->dma_table));
> > +				virt_to_phys(s390_domain->dma_table));
> >  	if (cc)
> >  		return -EIO;
> >  
> > -	spin_lock_irqsave(&s390_domain->list_lock, flags);
> > -	/* First device defines the DMA range limits */
> > -	if (list_empty(&s390_domain->devices)) {
> > -		domain->geometry.aperture_start = zdev->start_dma;
> > -		domain->geometry.aperture_end = zdev->end_dma;
> > -		domain->geometry.force_aperture = true;
> > -	/* Allow only devices with identical DMA range limits */
> > -	} else if (domain->geometry.aperture_start != zdev->start_dma ||
> > -		   domain->geometry.aperture_end != zdev->end_dma) {
> > -		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> > -		rc = -EINVAL;
> > -		goto out_unregister;
> > -	}
> > +	zdev->dma_table = s390_domain->dma_table;
> >  	zdev->s390_domain = s390_domain;
> > +
> > +	spin_lock_irqsave(&s390_domain->list_lock, flags);
> >  	list_add(&zdev->iommu_list, &s390_domain->devices);
> >  	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> >  
> >  	return 0;
> > -
> > -out_unregister:
> > -	zpci_unregister_ioat(zdev, 0);
> > -	zdev->dma_table = NULL;
> > -
> > -	return rc;
> >  }
> >  
> > 
---8<---



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/6] iommu/s390: Fix duplicate domain attachments
  2022-10-07  6:55     ` Niklas Schnelle
@ 2022-10-07 11:20       ` Niklas Schnelle
  0 siblings, 0 replies; 16+ messages in thread
From: Niklas Schnelle @ 2022-10-07 11:20 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

On Fri, 2022-10-07 at 08:55 +0200, Niklas Schnelle wrote:
> On Thu, 2022-10-06 at 17:02 -0400, Matthew Rosato wrote:
> > On 10/6/22 10:46 AM, Niklas Schnelle wrote:
> > > Since commit fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev
> > > calls") we can end up with duplicates in the list of devices attached to
> > > a domain. This is inefficient and confusing since only one domain can
> > > actually be in control of the IOMMU translations for a device. Fix this
> > > by detaching the device from the previous domain, if any, on attach.
> > > Add a WARN_ON() in case we still have attached devices on freeing the
> > > domain. While here remove the re-attach on failure dance as it was
> > > determined to be unlikely to help and may confuse debug and recovery.
> > > 
> > > Fixes: fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev calls")
> > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > > ---
> > > v4->v5:
> > > - Unregister IOAT and set zdev->dma_table on error (Matt)
> > > 
> > ...
> > 
> > >  static int s390_iommu_attach_device(struct iommu_domain *domain,
> > >  				    struct device *dev)
> > >  {
> > > @@ -90,7 +116,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
> > >  	struct zpci_dev *zdev = to_zpci_dev(dev);
> > >  	struct s390_domain_device *domain_device;
> > >  	unsigned long flags;
> > > -	int cc, rc;
> > > +	int cc, rc = 0;
> > >  
> > >  	if (!zdev)
> > >  		return -ENODEV;
> > > @@ -99,23 +125,17 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
> > >  	if (!domain_device)
> > >  		return -ENOMEM;
> > >  
> > > -	if (zdev->dma_table && !zdev->s390_domain) {
> > > -		cc = zpci_dma_exit_device(zdev);
> > > -		if (cc) {
> > > -			rc = -EIO;
> > > -			goto out_free;
> > > -		}
> > > -	}
> > > -
> > >  	if (zdev->s390_domain)
> > > -		zpci_unregister_ioat(zdev, 0);
> > > +		__s390_iommu_detach_device(zdev);
> > > +	else if (zdev->dma_table)
> > > +		zpci_dma_exit_device(zdev);
> > >  
> > >  	zdev->dma_table = s390_domain->dma_table;
> > >  	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> > >  				virt_to_phys(zdev->dma_table));
> > >  	if (cc) {
> > >  		rc = -EIO;
> > > -		goto out_restore;
> > > +		goto out_free;
> > >  	}
> > 
> > Hmm, with this we will leave attach_dev with a zdev->dma_table associated with this domain (not one generated via zpci_dma_init_device) and zdev->s390_domain == 0.  Won't this cause both s390_domain_free and zpci_dma_exit_device() to try and free the same dma table?
> > 
> > I think we also have to leave with a NULL zdev->dma_table in this case too (you technically could skip the zpci_unregister_ioat)
> 
> Argh you're right. This is I think a a bad rebase, in v4 I had the
> zpci_register_ioat() use s390_domain->dma_table and only set zdev-
> > dma_table after that succeeded. I seem to have lost that part
> somewhere along the way. With that we zdev->dma_table would be NULL and
> all would be good.
> 

Went back to the way I did it in v4 for v6. I think I was simply an
idiot and when comparing to the state prior to the commit forgot why I
did it this way and thought it was an unneeded change..

> > >  
> > >  	spin_lock_irqsave(&s390_domain->list_lock, flags);
> > > @@ -127,9 +147,9 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
> > >  	/* Allow only devices with identical DMA range limits */
> > >  	} else if (domain->geometry.aperture_start != zdev->start_dma ||
> > >  		   domain->geometry.aperture_end != zdev->end_dma) {
> > > -		rc = -EINVAL;
> > >  		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> > > -		goto out_restore;
> > > +		rc = -EINVAL;
> > > +		goto out_unregister;
> > >  	}
> > >  	domain_device->zdev = zdev;
> > >  	zdev->s390_domain = s390_domain;
> > > @@ -138,14 +158,9 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
> > >  
> > >  	return 0;
> > >  
> > > -out_restore:
> > > -	if (!zdev->s390_domain) {
> > > -		zpci_dma_init_device(zdev);
> > > -	} else {
> > > -		zdev->dma_table = zdev->s390_domain->dma_table;
> > > -		zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> > > -				   virt_to_phys(zdev->dma_table));
> > > -	}
> > > +out_unregister:
> > > +	zpci_unregister_ioat(zdev, 0);
> > > +	zdev->dma_table = NULL;
> > >  out_free:
> > >  	kfree(domain_device);
> > >  
> 
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-10-07 11:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-06 14:46 [PATCH v5 0/6] iommu/s390: Fixes related to attach and aperture handling Niklas Schnelle
2022-10-06 14:46 ` [PATCH v5 1/6] iommu/s390: Fix duplicate domain attachments Niklas Schnelle
2022-10-06 21:02   ` Matthew Rosato
2022-10-07  6:55     ` Niklas Schnelle
2022-10-07 11:20       ` Niklas Schnelle
2022-10-06 14:46 ` [PATCH v5 2/6] iommu/s390: Get rid of s390_domain_device Niklas Schnelle
2022-10-06 15:19   ` Niklas Schnelle
2022-10-06 14:46 ` [PATCH v5 3/6] iommu/s390: Fix potential s390_domain aperture shrinking Niklas Schnelle
2022-10-06 15:21   ` Niklas Schnelle
2022-10-06 21:02   ` Matthew Rosato
2022-10-07  7:37     ` Niklas Schnelle
2022-10-06 14:46 ` [PATCH v5 4/6] iommu/s390: Fix incorrect aperture check Niklas Schnelle
2022-10-06 14:46 ` [PATCH v5 5/6] iommu/s390: Fix incorrect pgsize_bitmap Niklas Schnelle
2022-10-06 14:47 ` [PATCH v5 6/6] iommu/s390: Implement map_pages()/unmap_pages() instead of map()/unmap() Niklas Schnelle
2022-10-06 21:03   ` Matthew Rosato
2022-10-07  6:59     ` Niklas Schnelle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).