linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] iommu/s390: Fixes related to attach and aperture handling
@ 2022-10-04 12:07 Niklas Schnelle
  2022-10-04 12:07 ` [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments Niklas Schnelle
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Niklas Schnelle @ 2022-10-04 12:07 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

Hi All,

This is v4 of a follow up to Matt's recent series[0] where he tackled
a race that turned out to be outside of the s390 IOMMU driver itself as
well as duplicate device attachments. After an internal discussion we came
up with what I believe is a cleaner fix. Instead of actively checking for
duplicates we instead detach from any previous domain on attach. From my
cursory reading of the code this seems to be what the Intel IOMMU driver is
doing as well.

Moreover we drop the attempt to re-attach the device to its previous IOMMU
domain on failure. This was fragile, unlikely to help and unexpected for
calling code. Thanks Jason for the suggestion.

During development of this fix we realized that we can get rid of struct
s390_domain_device entirely if we instead thread the list through the
attached struct zpci_devs. This saves us from having to allocate during
attach and gets rid of one level of indirection during IOMMU operations.

Additionally 3 more fixes have been added in v3 that weren't in v2 of this
series. One is for a potential situation where the aperture of a domain
could shrink and leave invalid translations. The next one fixes an off by
one in checking validity of an IOVA and the last one fixes a wrong value
for pgsize_bitmap.

*Note*:
This series is against the s390 features branch[1] which already contains
the bus_next field removal that was part of v2.

Best regards,
Niklas

Changes since v3:
- Drop s390_domain from __s390_iommu_detach_device() (Jason)
- WARN_ON() mismatched domain in s390_iommu_detach_device() (Jason)
- Use __s390_iommu_detach_device() in s390_iommu_release_device() (Jason)
- Make aperture check resistant against overflow (Jason)

Changes since v2:
- The patch removing the unused bus_next field has been spun out and
  already made it into the s390 feature branch on git.kernel.org
- Make __s390_iommu_detach_device() return void (Jason)
- Remove the re-attach on failure dance as it is unlikely to help
  and complicates debug and recovery (Jason)
- Ignore attempts to detach from domain that is not the active one
- Add patch to fix potential shrinking of the aperture and use
  reserved ranges per device instead of the aperture to respect
  IOVA range restrictions (Jason)
- Add a fix for an off by one error on checking an IOVA against
  the aperture
- Add a fix for wrong pgsize_bitmap

Changes since v1:
- After patch 3 we don't have to search in the devices list on detach as
  we alreadz have hold of the zpci_dev (Jason)
- Add a WARN_ON() if somehow ended up detaching a device from a domain that
  isn't the device's current domain.
- Removed the iteration and list delete from s390_domain_free() instead
  just WARN_ON() when we're freeing without having detached
- The last two points should help catching sequencing errors much more
  quickly in the future.

[0] https://lore.kernel.org/linux-iommu/20220831201236.77595-1-mjrosato@linux.ibm.com/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/

Niklas Schnelle (5):
  iommu/s390: Fix duplicate domain attachments
  iommu/s390: Get rid of s390_domain_device
  iommu/s390: Fix potential s390_domain aperture shrinking
  iommu/s390: Fix incorrect aperture check
  iommu/s390: Fix incorrect pgsize_bitmap

 arch/s390/include/asm/pci.h |   1 +
 drivers/iommu/s390-iommu.c  | 169 +++++++++++++++---------------------
 2 files changed, 70 insertions(+), 100 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments
  2022-10-04 12:07 [PATCH v4 0/5] iommu/s390: Fixes related to attach and aperture handling Niklas Schnelle
@ 2022-10-04 12:07 ` Niklas Schnelle
  2022-10-04 12:43   ` Jason Gunthorpe
  2022-10-04 16:18   ` Matthew Rosato
  2022-10-04 12:07 ` [PATCH v4 2/5] iommu/s390: Get rid of s390_domain_device Niklas Schnelle
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 22+ messages in thread
From: Niklas Schnelle @ 2022-10-04 12:07 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

Since commit fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev
calls") we can end up with duplicates in the list of devices attached to
a domain. This is inefficient and confusing since only one domain can
actually be in control of the IOMMU translations for a device. Fix this
by detaching the device from the previous domain, if any, on attach.
Add a WARN_ON() in case we still have attached devices on freeing the
domain. While here remove the re-attach on failure dance as it was
determined to be unlikely to help and may confuse debug and recovery.

Fixes: fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev calls")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
v3 -> v4:
- Drop s390_domain from __s390_iommu_detach_device() (Jason)
- WARN_ON() mismatched domain in s390_iommu_detach_device() (Jason)
- Use __s390_iommu_detach_device() in s390_iommu_release_device() (Jason)

 drivers/iommu/s390-iommu.c | 97 +++++++++++++++-----------------------
 1 file changed, 39 insertions(+), 58 deletions(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index c898bcbbce11..0f58e897bc95 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -79,10 +79,36 @@ static void s390_domain_free(struct iommu_domain *domain)
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 
+	WARN_ON(!list_empty(&s390_domain->devices));
 	dma_cleanup_tables(s390_domain->dma_table);
 	kfree(s390_domain);
 }
 
+static void __s390_iommu_detach_device(struct zpci_dev *zdev)
+{
+	struct s390_domain *s390_domain = zdev->s390_domain;
+	struct s390_domain_device *domain_device, *tmp;
+	unsigned long flags;
+
+	if (!s390_domain)
+		return;
+
+	spin_lock_irqsave(&s390_domain->list_lock, flags);
+	list_for_each_entry_safe(domain_device, tmp, &s390_domain->devices,
+				 list) {
+		if (domain_device->zdev == zdev) {
+			list_del(&domain_device->list);
+			kfree(domain_device);
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+
+	zpci_unregister_ioat(zdev, 0);
+	zdev->s390_domain = NULL;
+	zdev->dma_table = NULL;
+}
+
 static int s390_iommu_attach_device(struct iommu_domain *domain,
 				    struct device *dev)
 {
@@ -90,7 +116,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	struct zpci_dev *zdev = to_zpci_dev(dev);
 	struct s390_domain_device *domain_device;
 	unsigned long flags;
-	int cc, rc;
+	int cc, rc = 0;
 
 	if (!zdev)
 		return -ENODEV;
@@ -99,23 +125,17 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	if (!domain_device)
 		return -ENOMEM;
 
-	if (zdev->dma_table && !zdev->s390_domain) {
-		cc = zpci_dma_exit_device(zdev);
-		if (cc) {
-			rc = -EIO;
-			goto out_free;
-		}
-	}
-
 	if (zdev->s390_domain)
-		zpci_unregister_ioat(zdev, 0);
+		__s390_iommu_detach_device(zdev);
+	else if (zdev->dma_table)
+		zpci_dma_exit_device(zdev);
 
 	zdev->dma_table = s390_domain->dma_table;
 	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
 				virt_to_phys(zdev->dma_table));
 	if (cc) {
 		rc = -EIO;
-		goto out_restore;
+		goto out_free;
 	}
 
 	spin_lock_irqsave(&s390_domain->list_lock, flags);
@@ -129,7 +149,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 		   domain->geometry.aperture_end != zdev->end_dma) {
 		rc = -EINVAL;
 		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
-		goto out_restore;
+		goto out_free;
 	}
 	domain_device->zdev = zdev;
 	zdev->s390_domain = s390_domain;
@@ -138,14 +158,6 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 
 	return 0;
 
-out_restore:
-	if (!zdev->s390_domain) {
-		zpci_dma_init_device(zdev);
-	} else {
-		zdev->dma_table = zdev->s390_domain->dma_table;
-		zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
-				   virt_to_phys(zdev->dma_table));
-	}
 out_free:
 	kfree(domain_device);
 
@@ -155,32 +167,12 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 static void s390_iommu_detach_device(struct iommu_domain *domain,
 				     struct device *dev)
 {
-	struct s390_domain *s390_domain = to_s390_domain(domain);
 	struct zpci_dev *zdev = to_zpci_dev(dev);
-	struct s390_domain_device *domain_device, *tmp;
-	unsigned long flags;
-	int found = 0;
 
-	if (!zdev)
-		return;
+	WARN_ON(zdev->s390_domain != to_s390_domain(domain));
 
-	spin_lock_irqsave(&s390_domain->list_lock, flags);
-	list_for_each_entry_safe(domain_device, tmp, &s390_domain->devices,
-				 list) {
-		if (domain_device->zdev == zdev) {
-			list_del(&domain_device->list);
-			kfree(domain_device);
-			found = 1;
-			break;
-		}
-	}
-	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
-
-	if (found && (zdev->s390_domain == s390_domain)) {
-		zdev->s390_domain = NULL;
-		zpci_unregister_ioat(zdev, 0);
-		zpci_dma_init_device(zdev);
-	}
+	__s390_iommu_detach_device(zdev);
+	zpci_dma_init_device(zdev);
 }
 
 static struct iommu_device *s390_iommu_probe_device(struct device *dev)
@@ -193,24 +185,13 @@ static struct iommu_device *s390_iommu_probe_device(struct device *dev)
 static void s390_iommu_release_device(struct device *dev)
 {
 	struct zpci_dev *zdev = to_zpci_dev(dev);
-	struct iommu_domain *domain;
 
 	/*
-	 * This is a workaround for a scenario where the IOMMU API common code
-	 * "forgets" to call the detach_dev callback: After binding a device
-	 * to vfio-pci and completing the VFIO_SET_IOMMU ioctl (which triggers
-	 * the attach_dev), removing the device via
-	 * "echo 1 > /sys/bus/pci/devices/.../remove" won't trigger detach_dev,
-	 * only release_device will be called via the BUS_NOTIFY_REMOVED_DEVICE
-	 * notifier.
-	 *
-	 * So let's call detach_dev from here if it hasn't been called before.
+	 * release_device is expected to detach any domain currently attached
+	 * to the device, but keep it attached to other devices in the group.
 	 */
-	if (zdev && zdev->s390_domain) {
-		domain = iommu_get_domain_for_dev(dev);
-		if (domain)
-			s390_iommu_detach_device(domain, dev);
-	}
+	if (zdev)
+		__s390_iommu_detach_device(zdev);
 }
 
 static int s390_iommu_update_trans(struct s390_domain *s390_domain,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 2/5] iommu/s390: Get rid of s390_domain_device
  2022-10-04 12:07 [PATCH v4 0/5] iommu/s390: Fixes related to attach and aperture handling Niklas Schnelle
  2022-10-04 12:07 ` [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments Niklas Schnelle
@ 2022-10-04 12:07 ` Niklas Schnelle
  2022-10-04 16:20   ` Matthew Rosato
  2022-10-04 12:07 ` [PATCH v4 3/5] iommu/s390: Fix potential s390_domain aperture shrinking Niklas Schnelle
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Niklas Schnelle @ 2022-10-04 12:07 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

The struct s390_domain_device serves the sole purpose as list entry for
the devices list of a struct s390_domain. As it contains no additional
information besides a list_head and a pointer to the struct zpci_dev we
can simplify things and just thread the device list through struct
zpci_dev directly. This removes the need to allocate during domain
attach and gets rid of one level of indirection during mapping
operations.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 arch/s390/include/asm/pci.h |  1 +
 drivers/iommu/s390-iommu.c  | 45 ++++++++-----------------------------
 2 files changed, 10 insertions(+), 36 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 108e732d7b14..15f8714ca9b7 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -117,6 +117,7 @@ struct zpci_bus {
 struct zpci_dev {
 	struct zpci_bus *zbus;
 	struct list_head entry;		/* list of all zpci_devices, needed for hotplug, etc. */
+	struct list_head iommu_list;
 	struct kref kref;
 	struct hotplug_slot hotplug_slot;
 
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 0f58e897bc95..6f87dd4b85af 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -29,11 +29,6 @@ struct s390_domain {
 	spinlock_t		list_lock;
 };
 
-struct s390_domain_device {
-	struct list_head	list;
-	struct zpci_dev		*zdev;
-};
-
 static struct s390_domain *to_s390_domain(struct iommu_domain *dom)
 {
 	return container_of(dom, struct s390_domain, domain);
@@ -87,21 +82,13 @@ static void s390_domain_free(struct iommu_domain *domain)
 static void __s390_iommu_detach_device(struct zpci_dev *zdev)
 {
 	struct s390_domain *s390_domain = zdev->s390_domain;
-	struct s390_domain_device *domain_device, *tmp;
 	unsigned long flags;
 
 	if (!s390_domain)
 		return;
 
 	spin_lock_irqsave(&s390_domain->list_lock, flags);
-	list_for_each_entry_safe(domain_device, tmp, &s390_domain->devices,
-				 list) {
-		if (domain_device->zdev == zdev) {
-			list_del(&domain_device->list);
-			kfree(domain_device);
-			break;
-		}
-	}
+	list_del_init(&zdev->iommu_list);
 	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
 
 	zpci_unregister_ioat(zdev, 0);
@@ -114,17 +101,12 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 	struct zpci_dev *zdev = to_zpci_dev(dev);
-	struct s390_domain_device *domain_device;
 	unsigned long flags;
-	int cc, rc = 0;
+	int cc;
 
 	if (!zdev)
 		return -ENODEV;
 
-	domain_device = kzalloc(sizeof(*domain_device), GFP_KERNEL);
-	if (!domain_device)
-		return -ENOMEM;
-
 	if (zdev->s390_domain)
 		__s390_iommu_detach_device(zdev);
 	else if (zdev->dma_table)
@@ -133,10 +115,8 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	zdev->dma_table = s390_domain->dma_table;
 	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
 				virt_to_phys(zdev->dma_table));
-	if (cc) {
-		rc = -EIO;
-		goto out_free;
-	}
+	if (cc)
+		return -EIO;
 
 	spin_lock_irqsave(&s390_domain->list_lock, flags);
 	/* First device defines the DMA range limits */
@@ -147,21 +127,14 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	/* Allow only devices with identical DMA range limits */
 	} else if (domain->geometry.aperture_start != zdev->start_dma ||
 		   domain->geometry.aperture_end != zdev->end_dma) {
-		rc = -EINVAL;
 		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
-		goto out_free;
+		return -EINVAL;
 	}
-	domain_device->zdev = zdev;
 	zdev->s390_domain = s390_domain;
-	list_add(&domain_device->list, &s390_domain->devices);
+	list_add(&zdev->iommu_list, &s390_domain->devices);
 	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
 
 	return 0;
-
-out_free:
-	kfree(domain_device);
-
-	return rc;
 }
 
 static void s390_iommu_detach_device(struct iommu_domain *domain,
@@ -198,10 +171,10 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
 				   phys_addr_t pa, dma_addr_t dma_addr,
 				   size_t size, int flags)
 {
-	struct s390_domain_device *domain_device;
 	phys_addr_t page_addr = pa & PAGE_MASK;
 	dma_addr_t start_dma_addr = dma_addr;
 	unsigned long irq_flags, nr_pages, i;
+	struct zpci_dev *zdev;
 	unsigned long *entry;
 	int rc = 0;
 
@@ -226,8 +199,8 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
 	}
 
 	spin_lock(&s390_domain->list_lock);
-	list_for_each_entry(domain_device, &s390_domain->devices, list) {
-		rc = zpci_refresh_trans((u64) domain_device->zdev->fh << 32,
+	list_for_each_entry(zdev, &s390_domain->devices, iommu_list) {
+		rc = zpci_refresh_trans((u64)zdev->fh << 32,
 					start_dma_addr, nr_pages * PAGE_SIZE);
 		if (rc)
 			break;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 3/5] iommu/s390: Fix potential s390_domain aperture shrinking
  2022-10-04 12:07 [PATCH v4 0/5] iommu/s390: Fixes related to attach and aperture handling Niklas Schnelle
  2022-10-04 12:07 ` [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments Niklas Schnelle
  2022-10-04 12:07 ` [PATCH v4 2/5] iommu/s390: Get rid of s390_domain_device Niklas Schnelle
@ 2022-10-04 12:07 ` Niklas Schnelle
  2022-10-04 21:12   ` Matthew Rosato
  2022-10-04 12:07 ` [PATCH v4 4/5] iommu/s390: Fix incorrect aperture check Niklas Schnelle
  2022-10-04 12:07 ` [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap Niklas Schnelle
  4 siblings, 1 reply; 22+ messages in thread
From: Niklas Schnelle @ 2022-10-04 12:07 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

The s390 IOMMU driver currently sets the IOMMU domain's aperture to
match the device specific DMA address range of the device that is first
attached. This is not ideal. For one if the domain has no device
attached in the meantime the aperture could be shrunk allowing
translations outside the aperture to exist in the translation tables.
Also this is a bit of a misuse of the aperture which really should
describe what addresses can be translated and not some device specific
limitations.

Instead of misusing the aperture like this we can instead create
reserved ranges for the ranges inaccessible to the attached devices
allowing devices with overlapping ranges to still share an IOMMU domain.
This also significantly simplifies s390_iommu_attach_device() allowing
us to move the aperture check to the beginning of the function and
removing the need to hold the device list's lock to check the aperture.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 drivers/iommu/s390-iommu.c | 50 +++++++++++++++++++++++++++-----------
 1 file changed, 36 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 6f87dd4b85af..762dc55aea1e 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -62,6 +62,9 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
 		kfree(s390_domain);
 		return NULL;
 	}
+	s390_domain->domain.geometry.force_aperture = true;
+	s390_domain->domain.geometry.aperture_start = 0;
+	s390_domain->domain.geometry.aperture_end = ZPCI_TABLE_SIZE_RT - 1;
 
 	spin_lock_init(&s390_domain->dma_table_lock);
 	spin_lock_init(&s390_domain->list_lock);
@@ -107,30 +110,24 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	if (!zdev)
 		return -ENODEV;
 
+	if (domain->geometry.aperture_start > zdev->end_dma ||
+	    domain->geometry.aperture_end < zdev->start_dma)
+		return -EINVAL;
+
 	if (zdev->s390_domain)
 		__s390_iommu_detach_device(zdev);
 	else if (zdev->dma_table)
 		zpci_dma_exit_device(zdev);
 
-	zdev->dma_table = s390_domain->dma_table;
 	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
-				virt_to_phys(zdev->dma_table));
+				virt_to_phys(s390_domain->dma_table));
 	if (cc)
 		return -EIO;
 
-	spin_lock_irqsave(&s390_domain->list_lock, flags);
-	/* First device defines the DMA range limits */
-	if (list_empty(&s390_domain->devices)) {
-		domain->geometry.aperture_start = zdev->start_dma;
-		domain->geometry.aperture_end = zdev->end_dma;
-		domain->geometry.force_aperture = true;
-	/* Allow only devices with identical DMA range limits */
-	} else if (domain->geometry.aperture_start != zdev->start_dma ||
-		   domain->geometry.aperture_end != zdev->end_dma) {
-		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
-		return -EINVAL;
-	}
+	zdev->dma_table = s390_domain->dma_table;
 	zdev->s390_domain = s390_domain;
+
+	spin_lock_irqsave(&s390_domain->list_lock, flags);
 	list_add(&zdev->iommu_list, &s390_domain->devices);
 	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
 
@@ -148,6 +145,30 @@ static void s390_iommu_detach_device(struct iommu_domain *domain,
 	zpci_dma_init_device(zdev);
 }
 
+static void s390_iommu_get_resv_regions(struct device *dev,
+					struct list_head *list)
+{
+	struct zpci_dev *zdev = to_zpci_dev(dev);
+	struct iommu_resv_region *region;
+
+	if (zdev->start_dma) {
+		region = iommu_alloc_resv_region(0, zdev->start_dma, 0,
+						 IOMMU_RESV_RESERVED);
+		if (!region)
+			return;
+		list_add_tail(&region->list, list);
+	}
+
+	if (zdev->end_dma < ZPCI_TABLE_SIZE_RT - 1) {
+		region = iommu_alloc_resv_region(zdev->end_dma + 1,
+						 ZPCI_TABLE_SIZE_RT - zdev->end_dma - 1,
+						 0, IOMMU_RESV_RESERVED);
+		if (!region)
+			return;
+		list_add_tail(&region->list, list);
+	}
+}
+
 static struct iommu_device *s390_iommu_probe_device(struct device *dev)
 {
 	struct zpci_dev *zdev = to_zpci_dev(dev);
@@ -330,6 +351,7 @@ static const struct iommu_ops s390_iommu_ops = {
 	.release_device = s390_iommu_release_device,
 	.device_group = generic_device_group,
 	.pgsize_bitmap = S390_IOMMU_PGSIZES,
+	.get_resv_regions = s390_iommu_get_resv_regions,
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev	= s390_iommu_attach_device,
 		.detach_dev	= s390_iommu_detach_device,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 4/5] iommu/s390: Fix incorrect aperture check
  2022-10-04 12:07 [PATCH v4 0/5] iommu/s390: Fixes related to attach and aperture handling Niklas Schnelle
                   ` (2 preceding siblings ...)
  2022-10-04 12:07 ` [PATCH v4 3/5] iommu/s390: Fix potential s390_domain aperture shrinking Niklas Schnelle
@ 2022-10-04 12:07 ` Niklas Schnelle
  2022-10-04 12:07 ` [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap Niklas Schnelle
  4 siblings, 0 replies; 22+ messages in thread
From: Niklas Schnelle @ 2022-10-04 12:07 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

The domain->geometry.aperture_end specifies the last valid address treat
it as such when checking if a DMA address is valid.

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
v3 -> v4:
- Make aperture check resistant against overflow (Jason)

 drivers/iommu/s390-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 762dc55aea1e..94c444b909bd 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -200,7 +200,7 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
 	int rc = 0;
 
 	if (dma_addr < s390_domain->domain.geometry.aperture_start ||
-	    dma_addr + size > s390_domain->domain.geometry.aperture_end)
+	    (dma_addr + size - 1) > s390_domain->domain.geometry.aperture_end)
 		return -EINVAL;
 
 	nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap
  2022-10-04 12:07 [PATCH v4 0/5] iommu/s390: Fixes related to attach and aperture handling Niklas Schnelle
                   ` (3 preceding siblings ...)
  2022-10-04 12:07 ` [PATCH v4 4/5] iommu/s390: Fix incorrect aperture check Niklas Schnelle
@ 2022-10-04 12:07 ` Niklas Schnelle
  2022-10-04 14:38   ` Matthew Rosato
  2022-10-04 15:02   ` Robin Murphy
  4 siblings, 2 replies; 22+ messages in thread
From: Niklas Schnelle @ 2022-10-04 12:07 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

The .pgsize_bitmap property of struct iommu_ops is not a page mask but
rather has a bit set for each size of pages the IOMMU supports. As the
comment correctly pointed out at this moment the code only support 4K
pages so simply use SZ_4K here.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 drivers/iommu/s390-iommu.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 94c444b909bd..6bf23e7830a2 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -12,13 +12,6 @@
 #include <linux/sizes.h>
 #include <asm/pci_dma.h>
 
-/*
- * Physically contiguous memory regions can be mapped with 4 KiB alignment,
- * we allow all page sizes that are an order of 4KiB (no special large page
- * support so far).
- */
-#define S390_IOMMU_PGSIZES	(~0xFFFUL)
-
 static const struct iommu_ops s390_iommu_ops;
 
 struct s390_domain {
@@ -350,7 +343,7 @@ static const struct iommu_ops s390_iommu_ops = {
 	.probe_device = s390_iommu_probe_device,
 	.release_device = s390_iommu_release_device,
 	.device_group = generic_device_group,
-	.pgsize_bitmap = S390_IOMMU_PGSIZES,
+	.pgsize_bitmap = SZ_4K,
 	.get_resv_regions = s390_iommu_get_resv_regions,
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev	= s390_iommu_attach_device,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments
  2022-10-04 12:07 ` [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments Niklas Schnelle
@ 2022-10-04 12:43   ` Jason Gunthorpe
  2022-10-04 16:18   ` Matthew Rosato
  1 sibling, 0 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2022-10-04 12:43 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Matthew Rosato, Pierre Morel, iommu, linux-s390, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, joro, will,
	robin.murphy, linux-kernel

On Tue, Oct 04, 2022 at 02:07:02PM +0200, Niklas Schnelle wrote:
> Since commit fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev
> calls") we can end up with duplicates in the list of devices attached to
> a domain. This is inefficient and confusing since only one domain can
> actually be in control of the IOMMU translations for a device. Fix this
> by detaching the device from the previous domain, if any, on attach.
> Add a WARN_ON() in case we still have attached devices on freeing the
> domain. While here remove the re-attach on failure dance as it was
> determined to be unlikely to help and may confuse debug and recovery.
> 
> Fixes: fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev calls")
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
> v3 -> v4:
> - Drop s390_domain from __s390_iommu_detach_device() (Jason)
> - WARN_ON() mismatched domain in s390_iommu_detach_device() (Jason)
> - Use __s390_iommu_detach_device() in s390_iommu_release_device() (Jason)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap
  2022-10-04 12:07 ` [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap Niklas Schnelle
@ 2022-10-04 14:38   ` Matthew Rosato
  2022-10-04 15:02   ` Robin Murphy
  1 sibling, 0 replies; 22+ messages in thread
From: Matthew Rosato @ 2022-10-04 14:38 UTC (permalink / raw)
  To: Niklas Schnelle, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

On 10/4/22 8:07 AM, Niklas Schnelle wrote:
> The .pgsize_bitmap property of struct iommu_ops is not a page mask but
> rather has a bit set for each size of pages the IOMMU supports. As the
> comment correctly pointed out at this moment the code only support 4K
> pages so simply use SZ_4K here.
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>

Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>

> ---
>  drivers/iommu/s390-iommu.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index 94c444b909bd..6bf23e7830a2 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -12,13 +12,6 @@
>  #include <linux/sizes.h>
>  #include <asm/pci_dma.h>
>  
> -/*
> - * Physically contiguous memory regions can be mapped with 4 KiB alignment,
> - * we allow all page sizes that are an order of 4KiB (no special large page
> - * support so far).
> - */
> -#define S390_IOMMU_PGSIZES	(~0xFFFUL)
> -
>  static const struct iommu_ops s390_iommu_ops;
>  
>  struct s390_domain {
> @@ -350,7 +343,7 @@ static const struct iommu_ops s390_iommu_ops = {
>  	.probe_device = s390_iommu_probe_device,
>  	.release_device = s390_iommu_release_device,
>  	.device_group = generic_device_group,
> -	.pgsize_bitmap = S390_IOMMU_PGSIZES,
> +	.pgsize_bitmap = SZ_4K,
>  	.get_resv_regions = s390_iommu_get_resv_regions,
>  	.default_domain_ops = &(const struct iommu_domain_ops) {
>  		.attach_dev	= s390_iommu_attach_device,


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap
  2022-10-04 12:07 ` [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap Niklas Schnelle
  2022-10-04 14:38   ` Matthew Rosato
@ 2022-10-04 15:02   ` Robin Murphy
  2022-10-04 15:12     ` Matthew Rosato
  1 sibling, 1 reply; 22+ messages in thread
From: Robin Murphy @ 2022-10-04 15:02 UTC (permalink / raw)
  To: Niklas Schnelle, Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, jgg, linux-kernel

On 2022-10-04 13:07, Niklas Schnelle wrote:
> The .pgsize_bitmap property of struct iommu_ops is not a page mask but
> rather has a bit set for each size of pages the IOMMU supports. As the
> comment correctly pointed out at this moment the code only support 4K
> pages so simply use SZ_4K here.

Unless it's already been done somewhere else, you'll want to switch over 
to the {map,unmap}_pages() interfaces as well to avoid taking a hit on 
efficiency here. The "page mask" thing was an old hack to trick the core 
API into making fewer map/unmap calls where the driver could map 
arbitrary numbers of pages at once anyway. The multi-page interfaces now 
do that more honestly and generally better (since they work for 
non-power-of-two sizes as well).

Robin.

> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
>   drivers/iommu/s390-iommu.c | 9 +--------
>   1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index 94c444b909bd..6bf23e7830a2 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -12,13 +12,6 @@
>   #include <linux/sizes.h>
>   #include <asm/pci_dma.h>
>   
> -/*
> - * Physically contiguous memory regions can be mapped with 4 KiB alignment,
> - * we allow all page sizes that are an order of 4KiB (no special large page
> - * support so far).
> - */
> -#define S390_IOMMU_PGSIZES	(~0xFFFUL)
> -
>   static const struct iommu_ops s390_iommu_ops;
>   
>   struct s390_domain {
> @@ -350,7 +343,7 @@ static const struct iommu_ops s390_iommu_ops = {
>   	.probe_device = s390_iommu_probe_device,
>   	.release_device = s390_iommu_release_device,
>   	.device_group = generic_device_group,
> -	.pgsize_bitmap = S390_IOMMU_PGSIZES,
> +	.pgsize_bitmap = SZ_4K,
>   	.get_resv_regions = s390_iommu_get_resv_regions,
>   	.default_domain_ops = &(const struct iommu_domain_ops) {
>   		.attach_dev	= s390_iommu_attach_device,

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap
  2022-10-04 15:02   ` Robin Murphy
@ 2022-10-04 15:12     ` Matthew Rosato
  2022-10-04 15:31       ` Robin Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Matthew Rosato @ 2022-10-04 15:12 UTC (permalink / raw)
  To: Robin Murphy, Niklas Schnelle, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, jgg, linux-kernel

On 10/4/22 11:02 AM, Robin Murphy wrote:
> On 2022-10-04 13:07, Niklas Schnelle wrote:
>> The .pgsize_bitmap property of struct iommu_ops is not a page mask but
>> rather has a bit set for each size of pages the IOMMU supports. As the
>> comment correctly pointed out at this moment the code only support 4K
>> pages so simply use SZ_4K here.
> 
> Unless it's already been done somewhere else, you'll want to switch over to the {map,unmap}_pages() interfaces as well to avoid taking a hit on efficiency here. The "page mask" thing was an old hack to trick the core API into making fewer map/unmap calls where the driver could map arbitrary numbers of pages at once anyway. The multi-page interfaces now do that more honestly and generally better (since they work for non-power-of-two sizes as well).

Thanks for the heads up -- Niklas has some additional series coming soon as described here:

https://lore.kernel.org/linux-iommu/a10424adbe01a0fd40372cbd0736d11e517951a1.camel@linux.ibm.com/

So implementing the _pages() interfaces is soon up on the roadmap.  But given what you say I wonder if this patch should just wait until the series that implements {map,unmap}_pages().
> 
> Robin.
> 
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
>> ---
>>   drivers/iommu/s390-iommu.c | 9 +--------
>>   1 file changed, 1 insertion(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
>> index 94c444b909bd..6bf23e7830a2 100644
>> --- a/drivers/iommu/s390-iommu.c
>> +++ b/drivers/iommu/s390-iommu.c
>> @@ -12,13 +12,6 @@
>>   #include <linux/sizes.h>
>>   #include <asm/pci_dma.h>
>>   -/*
>> - * Physically contiguous memory regions can be mapped with 4 KiB alignment,
>> - * we allow all page sizes that are an order of 4KiB (no special large page
>> - * support so far).
>> - */
>> -#define S390_IOMMU_PGSIZES    (~0xFFFUL)
>> -
>>   static const struct iommu_ops s390_iommu_ops;
>>     struct s390_domain {
>> @@ -350,7 +343,7 @@ static const struct iommu_ops s390_iommu_ops = {
>>       .probe_device = s390_iommu_probe_device,
>>       .release_device = s390_iommu_release_device,
>>       .device_group = generic_device_group,
>> -    .pgsize_bitmap = S390_IOMMU_PGSIZES,
>> +    .pgsize_bitmap = SZ_4K,
>>       .get_resv_regions = s390_iommu_get_resv_regions,
>>       .default_domain_ops = &(const struct iommu_domain_ops) {
>>           .attach_dev    = s390_iommu_attach_device,


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap
  2022-10-04 15:12     ` Matthew Rosato
@ 2022-10-04 15:31       ` Robin Murphy
  2022-10-04 16:13         ` Niklas Schnelle
  0 siblings, 1 reply; 22+ messages in thread
From: Robin Murphy @ 2022-10-04 15:31 UTC (permalink / raw)
  To: Matthew Rosato, Niklas Schnelle, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, jgg, linux-kernel

On 2022-10-04 16:12, Matthew Rosato wrote:
> On 10/4/22 11:02 AM, Robin Murphy wrote:
>> On 2022-10-04 13:07, Niklas Schnelle wrote:
>>> The .pgsize_bitmap property of struct iommu_ops is not a page mask but
>>> rather has a bit set for each size of pages the IOMMU supports. As the
>>> comment correctly pointed out at this moment the code only support 4K
>>> pages so simply use SZ_4K here.
>>
>> Unless it's already been done somewhere else, you'll want to switch over to the {map,unmap}_pages() interfaces as well to avoid taking a hit on efficiency here. The "page mask" thing was an old hack to trick the core API into making fewer map/unmap calls where the driver could map arbitrary numbers of pages at once anyway. The multi-page interfaces now do that more honestly and generally better (since they work for non-power-of-two sizes as well).
> 
> Thanks for the heads up -- Niklas has some additional series coming soon as described here:
> 
> https://lore.kernel.org/linux-iommu/a10424adbe01a0fd40372cbd0736d11e517951a1.camel@linux.ibm.com/
> 
> So implementing the _pages() interfaces is soon up on the roadmap.  But given what you say I wonder if this patch should just wait until the series that implements {map,unmap}_pages().

Perhaps, although the full change should be trivial enough that there's 
probably just as much argument for doing the whole thing in its own 
right for the sake of this cleanup. The main point is that 
S390_IOMMU_PGSIZES is not incorrect as such, it's just not spelling out 
the deliberate trick that it's achieving - everyone copied it from 
intel-iommu, but since that got converted to the new interfaces the 
original explanation is now gone. The only effect of "fixing" it in 
isolation right now will be to make large VFIO mappings slower.

Robin.

>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>>> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
>>> ---
>>>    drivers/iommu/s390-iommu.c | 9 +--------
>>>    1 file changed, 1 insertion(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
>>> index 94c444b909bd..6bf23e7830a2 100644
>>> --- a/drivers/iommu/s390-iommu.c
>>> +++ b/drivers/iommu/s390-iommu.c
>>> @@ -12,13 +12,6 @@
>>>    #include <linux/sizes.h>
>>>    #include <asm/pci_dma.h>
>>>    -/*
>>> - * Physically contiguous memory regions can be mapped with 4 KiB alignment,
>>> - * we allow all page sizes that are an order of 4KiB (no special large page
>>> - * support so far).
>>> - */
>>> -#define S390_IOMMU_PGSIZES    (~0xFFFUL)
>>> -
>>>    static const struct iommu_ops s390_iommu_ops;
>>>      struct s390_domain {
>>> @@ -350,7 +343,7 @@ static const struct iommu_ops s390_iommu_ops = {
>>>        .probe_device = s390_iommu_probe_device,
>>>        .release_device = s390_iommu_release_device,
>>>        .device_group = generic_device_group,
>>> -    .pgsize_bitmap = S390_IOMMU_PGSIZES,
>>> +    .pgsize_bitmap = SZ_4K,
>>>        .get_resv_regions = s390_iommu_get_resv_regions,
>>>        .default_domain_ops = &(const struct iommu_domain_ops) {
>>>            .attach_dev    = s390_iommu_attach_device,
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap
  2022-10-04 15:31       ` Robin Murphy
@ 2022-10-04 16:13         ` Niklas Schnelle
  2022-10-05  9:53           ` Robin Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Niklas Schnelle @ 2022-10-04 16:13 UTC (permalink / raw)
  To: Robin Murphy, Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, jgg, linux-kernel

On Tue, 2022-10-04 at 16:31 +0100, Robin Murphy wrote:
> On 2022-10-04 16:12, Matthew Rosato wrote:
> > On 10/4/22 11:02 AM, Robin Murphy wrote:
> > > On 2022-10-04 13:07, Niklas Schnelle wrote:
> > > > The .pgsize_bitmap property of struct iommu_ops is not a page mask but
> > > > rather has a bit set for each size of pages the IOMMU supports. As the
> > > > comment correctly pointed out at this moment the code only support 4K
> > > > pages so simply use SZ_4K here.
> > > 
> > > Unless it's already been done somewhere else, you'll want to switch over to the {map,unmap}_pages() interfaces as well to avoid taking a hit on efficiency here. The "page mask" thing was an old hack to trick the core API into making fewer map/unmap calls where the driver could map arbitrary numbers of pages at once anyway. The multi-page interfaces now do that more honestly and generally better (since they work for non-power-of-two sizes as well).
> > 
> > Thanks for the heads up -- Niklas has some additional series coming soon as described here:
> > 
> > https://lore.kernel.org/linux-iommu/a10424adbe01a0fd40372cbd0736d11e517951a1.camel@linux.ibm.com/
> > 
> > So implementing the _pages() interfaces is soon up on the roadmap.  But given what you say I wonder if this patch should just wait until the series that implements {map,unmap}_pages().
> 
> Perhaps, although the full change should be trivial enough that there's 
> probably just as much argument for doing the whole thing in its own 
> right for the sake of this cleanup. The main point is that 
> S390_IOMMU_PGSIZES is not incorrect as such, it's just not spelling out 
> the deliberate trick that it's achieving - everyone copied it from 
> intel-iommu, but since that got converted to the new interfaces the 
> original explanation is now gone. The only effect of "fixing" it in 
> isolation right now will be to make large VFIO mappings slower.
> 
> Robin.

The patch changing to map_pages()/unmap_pages() is currently part of a
larger series of improvements, some of which are less trivial. So I'm
planning to send those as RFC first. Those include changing the
spin_lock protected list to RCU so the map/unmap can paralellize
better. Another one is atomic updates to the IOMMU tables to do away
with locks in map/unmap. So I think pulling that whole
series into this one isn't ideal. I could pull just the
map_pages()/unmap_pages() change though.

> 
> > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > > > ---
> > > >    drivers/iommu/s390-iommu.c | 9 +--------
> > > >    1 file changed, 1 insertion(+), 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> > > > index 94c444b909bd..6bf23e7830a2 100644
> > > > --- a/drivers/iommu/s390-iommu.c
> > > > +++ b/drivers/iommu/s390-iommu.c
> > > > @@ -12,13 +12,6 @@
> > > >    #include <linux/sizes.h>
> > > >    #include <asm/pci_dma.h>
> > > >    -/*
> > > > - * Physically contiguous memory regions can be mapped with 4 KiB alignment,
> > > > - * we allow all page sizes that are an order of 4KiB (no special large page
> > > > - * support so far).
> > > > - */
> > > > -#define S390_IOMMU_PGSIZES    (~0xFFFUL)
> > > > -
> > > >    static const struct iommu_ops s390_iommu_ops;
> > > >      struct s390_domain {
> > > > @@ -350,7 +343,7 @@ static const struct iommu_ops s390_iommu_ops = {
> > > >        .probe_device = s390_iommu_probe_device,
> > > >        .release_device = s390_iommu_release_device,
> > > >        .device_group = generic_device_group,
> > > > -    .pgsize_bitmap = S390_IOMMU_PGSIZES,
> > > > +    .pgsize_bitmap = SZ_4K,
> > > >        .get_resv_regions = s390_iommu_get_resv_regions,
> > > >        .default_domain_ops = &(const struct iommu_domain_ops) {
> > > >            .attach_dev    = s390_iommu_attach_device,



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments
  2022-10-04 12:07 ` [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments Niklas Schnelle
  2022-10-04 12:43   ` Jason Gunthorpe
@ 2022-10-04 16:18   ` Matthew Rosato
  2022-10-05  7:58     ` Niklas Schnelle
  1 sibling, 1 reply; 22+ messages in thread
From: Matthew Rosato @ 2022-10-04 16:18 UTC (permalink / raw)
  To: Niklas Schnelle, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

On 10/4/22 8:07 AM, Niklas Schnelle wrote:
> Since commit fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev
> calls") we can end up with duplicates in the list of devices attached to
> a domain. This is inefficient and confusing since only one domain can
> actually be in control of the IOMMU translations for a device. Fix this
> by detaching the device from the previous domain, if any, on attach.
> Add a WARN_ON() in case we still have attached devices on freeing the
> domain. While here remove the re-attach on failure dance as it was
> determined to be unlikely to help and may confuse debug and recovery.
> 
> Fixes: fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev calls")
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>

I've been testing this in isolation and it looks good to me, but one question...

> ---
> v3 -> v4:
> - Drop s390_domain from __s390_iommu_detach_device() (Jason)
> - WARN_ON() mismatched domain in s390_iommu_detach_device() (Jason)
> - Use __s390_iommu_detach_device() in s390_iommu_release_device() (Jason)
> 
>  drivers/iommu/s390-iommu.c | 97 +++++++++++++++-----------------------
>  1 file changed, 39 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index c898bcbbce11..0f58e897bc95 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -79,10 +79,36 @@ static void s390_domain_free(struct iommu_domain *domain)
>  {
>  	struct s390_domain *s390_domain = to_s390_domain(domain);
>  
> +	WARN_ON(!list_empty(&s390_domain->devices));
>  	dma_cleanup_tables(s390_domain->dma_table);
>  	kfree(s390_domain);
>  }
>  
> +static void __s390_iommu_detach_device(struct zpci_dev *zdev)
> +{
> +	struct s390_domain *s390_domain = zdev->s390_domain;
> +	struct s390_domain_device *domain_device, *tmp;
> +	unsigned long flags;
> +
> +	if (!s390_domain)
> +		return;
> +
> +	spin_lock_irqsave(&s390_domain->list_lock, flags);
> +	list_for_each_entry_safe(domain_device, tmp, &s390_domain->devices,
> +				 list) {
> +		if (domain_device->zdev == zdev) {
> +			list_del(&domain_device->list);
> +			kfree(domain_device);
> +			break;
> +		}
> +	}
> +	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> +
> +	zpci_unregister_ioat(zdev, 0);
> +	zdev->s390_domain = NULL;
> +	zdev->dma_table = NULL;
> +}
> +
>  static int s390_iommu_attach_device(struct iommu_domain *domain,
>  				    struct device *dev)
>  {
> @@ -90,7 +116,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  	struct zpci_dev *zdev = to_zpci_dev(dev);
>  	struct s390_domain_device *domain_device;
>  	unsigned long flags;
> -	int cc, rc;
> +	int cc, rc = 0;
>  
>  	if (!zdev)
>  		return -ENODEV;
> @@ -99,23 +125,17 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  	if (!domain_device)
>  		return -ENOMEM;
>  
> -	if (zdev->dma_table && !zdev->s390_domain) {
> -		cc = zpci_dma_exit_device(zdev);
> -		if (cc) {
> -			rc = -EIO;
> -			goto out_free;
> -		}
> -	}
> -
>  	if (zdev->s390_domain)
> -		zpci_unregister_ioat(zdev, 0);
> +		__s390_iommu_detach_device(zdev);
> +	else if (zdev->dma_table)
> +		zpci_dma_exit_device(zdev);
>  
>  	zdev->dma_table = s390_domain->dma_table;
>  	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
>  				virt_to_phys(zdev->dma_table));
>  	if (cc) {
>  		rc = -EIO;
> -		goto out_restore;
> +		goto out_free;
>  	}
>  
>  	spin_lock_irqsave(&s390_domain->list_lock, flags);
> @@ -129,7 +149,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  		   domain->geometry.aperture_end != zdev->end_dma) {
>  		rc = -EINVAL;
>  		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> -		goto out_restore;
> +		goto out_free;
>  	}
>  	domain_device->zdev = zdev;
>  	zdev->s390_domain = s390_domain;
> @@ -138,14 +158,6 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  
>  	return 0;
>  
> -out_restore:
> -	if (!zdev->s390_domain) {
> -		zpci_dma_init_device(zdev);
> -	} else {
> -		zdev->dma_table = zdev->s390_domain->dma_table;
> -		zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> -				   virt_to_phys(zdev->dma_table));
> -	}

^ I see you removed this awkward backout scenario (and replace the aperture check later) and I generally agree, but I'm looking at just this patch in isolation since its a fix...
If we leave due to a failed register_ioat or aperture mismatch, what do we expect to happen moving forward?  In one case (aperture mismatch -- how?) something is left registered with firmware and another (register_ioat fails) we have nothing registered with firmware (as we've discussed for, then the device is probably in an error state).  Is the expectation that the device is just broken for now and, more importantly, will device recovery clean both of these scenarios up?


>  out_free:
>  	kfree(domain_device);
>  
> @@ -155,32 +167,12 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  static void s390_iommu_detach_device(struct iommu_domain *domain,
>  				     struct device *dev)
>  {
> -	struct s390_domain *s390_domain = to_s390_domain(domain);
>  	struct zpci_dev *zdev = to_zpci_dev(dev);
> -	struct s390_domain_device *domain_device, *tmp;
> -	unsigned long flags;
> -	int found = 0;
>  
> -	if (!zdev)
> -		return;
> +	WARN_ON(zdev->s390_domain != to_s390_domain(domain));
>  
> -	spin_lock_irqsave(&s390_domain->list_lock, flags);
> -	list_for_each_entry_safe(domain_device, tmp, &s390_domain->devices,
> -				 list) {
> -		if (domain_device->zdev == zdev) {
> -			list_del(&domain_device->list);
> -			kfree(domain_device);
> -			found = 1;
> -			break;
> -		}
> -	}
> -	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> -
> -	if (found && (zdev->s390_domain == s390_domain)) {
> -		zdev->s390_domain = NULL;
> -		zpci_unregister_ioat(zdev, 0);
> -		zpci_dma_init_device(zdev);
> -	}
> +	__s390_iommu_detach_device(zdev);
> +	zpci_dma_init_device(zdev);
>  }
>  
>  static struct iommu_device *s390_iommu_probe_device(struct device *dev)
> @@ -193,24 +185,13 @@ static struct iommu_device *s390_iommu_probe_device(struct device *dev)
>  static void s390_iommu_release_device(struct device *dev)
>  {
>  	struct zpci_dev *zdev = to_zpci_dev(dev);
> -	struct iommu_domain *domain;
>  
>  	/*
> -	 * This is a workaround for a scenario where the IOMMU API common code
> -	 * "forgets" to call the detach_dev callback: After binding a device
> -	 * to vfio-pci and completing the VFIO_SET_IOMMU ioctl (which triggers
> -	 * the attach_dev), removing the device via
> -	 * "echo 1 > /sys/bus/pci/devices/.../remove" won't trigger detach_dev,
> -	 * only release_device will be called via the BUS_NOTIFY_REMOVED_DEVICE
> -	 * notifier.
> -	 *
> -	 * So let's call detach_dev from here if it hasn't been called before.
> +	 * release_device is expected to detach any domain currently attached
> +	 * to the device, but keep it attached to other devices in the group.
>  	 */
> -	if (zdev && zdev->s390_domain) {
> -		domain = iommu_get_domain_for_dev(dev);
> -		if (domain)
> -			s390_iommu_detach_device(domain, dev);
> -	}
> +	if (zdev)
> +		__s390_iommu_detach_device(zdev);
>  }
>  
>  static int s390_iommu_update_trans(struct s390_domain *s390_domain,


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/5] iommu/s390: Get rid of s390_domain_device
  2022-10-04 12:07 ` [PATCH v4 2/5] iommu/s390: Get rid of s390_domain_device Niklas Schnelle
@ 2022-10-04 16:20   ` Matthew Rosato
  0 siblings, 0 replies; 22+ messages in thread
From: Matthew Rosato @ 2022-10-04 16:20 UTC (permalink / raw)
  To: Niklas Schnelle, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

On 10/4/22 8:07 AM, Niklas Schnelle wrote:
> The struct s390_domain_device serves the sole purpose as list entry for
> the devices list of a struct s390_domain. As it contains no additional
> information besides a list_head and a pointer to the struct zpci_dev we
> can simplify things and just thread the device list through struct
> zpci_dev directly. This removes the need to allocate during domain
> attach and gets rid of one level of indirection during mapping
> operations.
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>

Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>

> ---
>  arch/s390/include/asm/pci.h |  1 +
>  drivers/iommu/s390-iommu.c  | 45 ++++++++-----------------------------
>  2 files changed, 10 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index 108e732d7b14..15f8714ca9b7 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -117,6 +117,7 @@ struct zpci_bus {
>  struct zpci_dev {
>  	struct zpci_bus *zbus;
>  	struct list_head entry;		/* list of all zpci_devices, needed for hotplug, etc. */
> +	struct list_head iommu_list;
>  	struct kref kref;
>  	struct hotplug_slot hotplug_slot;
>  
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index 0f58e897bc95..6f87dd4b85af 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -29,11 +29,6 @@ struct s390_domain {
>  	spinlock_t		list_lock;
>  };
>  
> -struct s390_domain_device {
> -	struct list_head	list;
> -	struct zpci_dev		*zdev;
> -};
> -
>  static struct s390_domain *to_s390_domain(struct iommu_domain *dom)
>  {
>  	return container_of(dom, struct s390_domain, domain);
> @@ -87,21 +82,13 @@ static void s390_domain_free(struct iommu_domain *domain)
>  static void __s390_iommu_detach_device(struct zpci_dev *zdev)
>  {
>  	struct s390_domain *s390_domain = zdev->s390_domain;
> -	struct s390_domain_device *domain_device, *tmp;
>  	unsigned long flags;
>  
>  	if (!s390_domain)
>  		return;
>  
>  	spin_lock_irqsave(&s390_domain->list_lock, flags);
> -	list_for_each_entry_safe(domain_device, tmp, &s390_domain->devices,
> -				 list) {
> -		if (domain_device->zdev == zdev) {
> -			list_del(&domain_device->list);
> -			kfree(domain_device);
> -			break;
> -		}
> -	}
> +	list_del_init(&zdev->iommu_list);
>  	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
>  
>  	zpci_unregister_ioat(zdev, 0);
> @@ -114,17 +101,12 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  {
>  	struct s390_domain *s390_domain = to_s390_domain(domain);
>  	struct zpci_dev *zdev = to_zpci_dev(dev);
> -	struct s390_domain_device *domain_device;
>  	unsigned long flags;
> -	int cc, rc = 0;
> +	int cc;
>  
>  	if (!zdev)
>  		return -ENODEV;
>  
> -	domain_device = kzalloc(sizeof(*domain_device), GFP_KERNEL);
> -	if (!domain_device)
> -		return -ENOMEM;
> -
>  	if (zdev->s390_domain)
>  		__s390_iommu_detach_device(zdev);
>  	else if (zdev->dma_table)
> @@ -133,10 +115,8 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  	zdev->dma_table = s390_domain->dma_table;
>  	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
>  				virt_to_phys(zdev->dma_table));
> -	if (cc) {
> -		rc = -EIO;
> -		goto out_free;
> -	}
> +	if (cc)
> +		return -EIO;
>  
>  	spin_lock_irqsave(&s390_domain->list_lock, flags);
>  	/* First device defines the DMA range limits */
> @@ -147,21 +127,14 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  	/* Allow only devices with identical DMA range limits */
>  	} else if (domain->geometry.aperture_start != zdev->start_dma ||
>  		   domain->geometry.aperture_end != zdev->end_dma) {
> -		rc = -EINVAL;
>  		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> -		goto out_free;
> +		return -EINVAL;
>  	}
> -	domain_device->zdev = zdev;
>  	zdev->s390_domain = s390_domain;
> -	list_add(&domain_device->list, &s390_domain->devices);
> +	list_add(&zdev->iommu_list, &s390_domain->devices);
>  	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
>  
>  	return 0;
> -
> -out_free:
> -	kfree(domain_device);
> -
> -	return rc;
>  }
>  
>  static void s390_iommu_detach_device(struct iommu_domain *domain,
> @@ -198,10 +171,10 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
>  				   phys_addr_t pa, dma_addr_t dma_addr,
>  				   size_t size, int flags)
>  {
> -	struct s390_domain_device *domain_device;
>  	phys_addr_t page_addr = pa & PAGE_MASK;
>  	dma_addr_t start_dma_addr = dma_addr;
>  	unsigned long irq_flags, nr_pages, i;
> +	struct zpci_dev *zdev;
>  	unsigned long *entry;
>  	int rc = 0;
>  
> @@ -226,8 +199,8 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
>  	}
>  
>  	spin_lock(&s390_domain->list_lock);
> -	list_for_each_entry(domain_device, &s390_domain->devices, list) {
> -		rc = zpci_refresh_trans((u64) domain_device->zdev->fh << 32,
> +	list_for_each_entry(zdev, &s390_domain->devices, iommu_list) {
> +		rc = zpci_refresh_trans((u64)zdev->fh << 32,
>  					start_dma_addr, nr_pages * PAGE_SIZE);
>  		if (rc)
>  			break;


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 3/5] iommu/s390: Fix potential s390_domain aperture shrinking
  2022-10-04 12:07 ` [PATCH v4 3/5] iommu/s390: Fix potential s390_domain aperture shrinking Niklas Schnelle
@ 2022-10-04 21:12   ` Matthew Rosato
  0 siblings, 0 replies; 22+ messages in thread
From: Matthew Rosato @ 2022-10-04 21:12 UTC (permalink / raw)
  To: Niklas Schnelle, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, jgg, linux-kernel

On 10/4/22 8:07 AM, Niklas Schnelle wrote:
> The s390 IOMMU driver currently sets the IOMMU domain's aperture to
> match the device specific DMA address range of the device that is first
> attached. This is not ideal. For one if the domain has no device
> attached in the meantime the aperture could be shrunk allowing
> translations outside the aperture to exist in the translation tables.
> Also this is a bit of a misuse of the aperture which really should
> describe what addresses can be translated and not some device specific
> limitations.
> 
> Instead of misusing the aperture like this we can instead create
> reserved ranges for the ranges inaccessible to the attached devices
> allowing devices with overlapping ranges to still share an IOMMU domain.
> This also significantly simplifies s390_iommu_attach_device() allowing
> us to move the aperture check to the beginning of the function and
> removing the need to hold the device list's lock to check the aperture.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>

Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>

> ---
>  drivers/iommu/s390-iommu.c | 50 +++++++++++++++++++++++++++-----------
>  1 file changed, 36 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index 6f87dd4b85af..762dc55aea1e 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -62,6 +62,9 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
>  		kfree(s390_domain);
>  		return NULL;
>  	}
> +	s390_domain->domain.geometry.force_aperture = true;
> +	s390_domain->domain.geometry.aperture_start = 0;
> +	s390_domain->domain.geometry.aperture_end = ZPCI_TABLE_SIZE_RT - 1;
>  
>  	spin_lock_init(&s390_domain->dma_table_lock);
>  	spin_lock_init(&s390_domain->list_lock);
> @@ -107,30 +110,24 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  	if (!zdev)
>  		return -ENODEV;
>  
> +	if (domain->geometry.aperture_start > zdev->end_dma ||
> +	    domain->geometry.aperture_end < zdev->start_dma)
> +		return -EINVAL;
> +
>  	if (zdev->s390_domain)
>  		__s390_iommu_detach_device(zdev);
>  	else if (zdev->dma_table)
>  		zpci_dma_exit_device(zdev);
>  
> -	zdev->dma_table = s390_domain->dma_table;
>  	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> -				virt_to_phys(zdev->dma_table));
> +				virt_to_phys(s390_domain->dma_table));
>  	if (cc)
>  		return -EIO;
>  
> -	spin_lock_irqsave(&s390_domain->list_lock, flags);
> -	/* First device defines the DMA range limits */
> -	if (list_empty(&s390_domain->devices)) {
> -		domain->geometry.aperture_start = zdev->start_dma;
> -		domain->geometry.aperture_end = zdev->end_dma;
> -		domain->geometry.force_aperture = true;
> -	/* Allow only devices with identical DMA range limits */
> -	} else if (domain->geometry.aperture_start != zdev->start_dma ||
> -		   domain->geometry.aperture_end != zdev->end_dma) {
> -		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> -		return -EINVAL;
> -	}
> +	zdev->dma_table = s390_domain->dma_table;
>  	zdev->s390_domain = s390_domain;
> +
> +	spin_lock_irqsave(&s390_domain->list_lock, flags);
>  	list_add(&zdev->iommu_list, &s390_domain->devices);
>  	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
>  
> @@ -148,6 +145,30 @@ static void s390_iommu_detach_device(struct iommu_domain *domain,
>  	zpci_dma_init_device(zdev);
>  }
>  
> +static void s390_iommu_get_resv_regions(struct device *dev,
> +					struct list_head *list)
> +{
> +	struct zpci_dev *zdev = to_zpci_dev(dev);
> +	struct iommu_resv_region *region;
> +
> +	if (zdev->start_dma) {
> +		region = iommu_alloc_resv_region(0, zdev->start_dma, 0,
> +						 IOMMU_RESV_RESERVED);
> +		if (!region)
> +			return;
> +		list_add_tail(&region->list, list);
> +	}
> +
> +	if (zdev->end_dma < ZPCI_TABLE_SIZE_RT - 1) {
> +		region = iommu_alloc_resv_region(zdev->end_dma + 1,
> +						 ZPCI_TABLE_SIZE_RT - zdev->end_dma - 1,
> +						 0, IOMMU_RESV_RESERVED);
> +		if (!region)
> +			return;
> +		list_add_tail(&region->list, list);
> +	}
> +}
> +
>  static struct iommu_device *s390_iommu_probe_device(struct device *dev)
>  {
>  	struct zpci_dev *zdev = to_zpci_dev(dev);
> @@ -330,6 +351,7 @@ static const struct iommu_ops s390_iommu_ops = {
>  	.release_device = s390_iommu_release_device,
>  	.device_group = generic_device_group,
>  	.pgsize_bitmap = S390_IOMMU_PGSIZES,
> +	.get_resv_regions = s390_iommu_get_resv_regions,
>  	.default_domain_ops = &(const struct iommu_domain_ops) {
>  		.attach_dev	= s390_iommu_attach_device,
>  		.detach_dev	= s390_iommu_detach_device,


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments
  2022-10-04 16:18   ` Matthew Rosato
@ 2022-10-05  7:58     ` Niklas Schnelle
  2022-10-05 11:48       ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Niklas Schnelle @ 2022-10-05  7:58 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, iommu, Jason Gunthorpe
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, robin.murphy, linux-kernel

On Tue, 2022-10-04 at 12:18 -0400, Matthew Rosato wrote:
> On 10/4/22 8:07 AM, Niklas Schnelle wrote:
> > Since commit fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev
> > calls") we can end up with duplicates in the list of devices attached to
> > a domain. This is inefficient and confusing since only one domain can
> > actually be in control of the IOMMU translations for a device. Fix this
> > by detaching the device from the previous domain, if any, on attach.
> > Add a WARN_ON() in case we still have attached devices on freeing the
> > domain. While here remove the re-attach on failure dance as it was
> > determined to be unlikely to help and may confuse debug and recovery.
> > 
> > Fixes: fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev calls")
> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> 
> I've been testing this in isolation and it looks good to me, but one question...
> 
> > ---
> > v3 -> v4:
> > - Drop s390_domain from __s390_iommu_detach_device() (Jason)
> > - WARN_ON() mismatched domain in s390_iommu_detach_device() (Jason)
> > - Use __s390_iommu_detach_device() in s390_iommu_release_device() (Jason)
> > 
> >  drivers/iommu/s390-iommu.c | 97 +++++++++++++++-----------------------
> >  1 file changed, 39 insertions(+), 58 deletions(-)
> > 
> > diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> > index c898bcbbce11..0f58e897bc95 100644
> > --- a/drivers/iommu/s390-iommu.c
> > +++ b/drivers/iommu/s390-iommu.c
> > @@ -79,10 +79,36 @@ static void s390_domain_free(struct iommu_domain *domain)
> >  {
> >  	struct s390_domain *s390_domain = to_s390_domain(domain);
> >  
> > +	WARN_ON(!list_empty(&s390_domain->devices));
> >  	dma_cleanup_tables(s390_domain->dma_table);
> >  	kfree(s390_domain);
> >  }
> >  
> > +static void __s390_iommu_detach_device(struct zpci_dev *zdev)
> > +{
> > +	struct s390_domain *s390_domain = zdev->s390_domain;
> > +	struct s390_domain_device *domain_device, *tmp;
> > +	unsigned long flags;
> > +
> > +	if (!s390_domain)
> > +		return;
> > +
> > +	spin_lock_irqsave(&s390_domain->list_lock, flags);
> > +	list_for_each_entry_safe(domain_device, tmp, &s390_domain->devices,
> > +				 list) {
> > +		if (domain_device->zdev == zdev) {
> > +			list_del(&domain_device->list);
> > +			kfree(domain_device);
> > +			break;
> > +		}
> > +	}
> > +	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> > +
> > +	zpci_unregister_ioat(zdev, 0);
> > +	zdev->s390_domain = NULL;
> > +	zdev->dma_table = NULL;
> > +}
> > +
> >  static int s390_iommu_attach_device(struct iommu_domain *domain,
> >  				    struct device *dev)
> >  {
> > @@ -90,7 +116,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
> >  	struct zpci_dev *zdev = to_zpci_dev(dev);
> >  	struct s390_domain_device *domain_device;
> >  	unsigned long flags;
> > -	int cc, rc;
> > +	int cc, rc = 0;
> >  
> >  	if (!zdev)
> >  		return -ENODEV;
> > @@ -99,23 +125,17 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
> >  	if (!domain_device)
> >  		return -ENOMEM;
> >  
> > -	if (zdev->dma_table && !zdev->s390_domain) {
> > -		cc = zpci_dma_exit_device(zdev);
> > -		if (cc) {
> > -			rc = -EIO;
> > -			goto out_free;
> > -		}
> > -	}
> > -
> >  	if (zdev->s390_domain)
> > -		zpci_unregister_ioat(zdev, 0);
> > +		__s390_iommu_detach_device(zdev);
> > +	else if (zdev->dma_table)
> > +		zpci_dma_exit_device(zdev);
> >  
> >  	zdev->dma_table = s390_domain->dma_table;
> >  	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> >  				virt_to_phys(zdev->dma_table));
> >  	if (cc) {
> >  		rc = -EIO;
> > -		goto out_restore;
> > +		goto out_free;
> >  	}
> >  
> >  	spin_lock_irqsave(&s390_domain->list_lock, flags);
> > @@ -129,7 +149,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
> >  		   domain->geometry.aperture_end != zdev->end_dma) {
> >  		rc = -EINVAL;
> >  		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> > -		goto out_restore;
> > +		goto out_free;
> >  	}
> >  	domain_device->zdev = zdev;
> >  	zdev->s390_domain = s390_domain;
> > @@ -138,14 +158,6 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
> >  
> >  	return 0;
> >  
> > -out_restore:
> > -	if (!zdev->s390_domain) {
> > -		zpci_dma_init_device(zdev);
> > -	} else {
> > -		zdev->dma_table = zdev->s390_domain->dma_table;
> > -		zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> > -				   virt_to_phys(zdev->dma_table));
> > -	}
> 
> ^ I see you removed this awkward backout scenario (and replace the aperture check later) and I generally agree, but I'm looking at just this patch in isolation since its a fix...
> If we leave due to a failed register_ioat or aperture mismatch, what do we expect to happen moving forward?  In one case (aperture mismatch -- how?) something is left registered with firmware and another (register_ioat fails) we have nothing registered with firmware (as we've discussed for, then the device is probably in an error state).  Is the expectation that the device is just broken for now and, more importantly, will device recovery clean both of these scenarios up?

A failed aperture test leaving the IOAT registered would indeed be bad.
I guess I focused too much on the failure scenarios at the state after
these patches where this can't happen. I think this would leave us in a
bad state because zpci_register_ioat() succeeded with the domain's DMA
table but we won't have attached leading to the wrong decisions in
recovery paths (see below). 

I think we should do a zpci_unregister_ioat() and zdev->dma_table =
NULL in this case just to be safe. It's certainly still much less
fragile than the full rollback and even if the zpci_unregister_ioat()
fails it prevents recovery from restoring the wrong DMA translation
tables. I don't think we can really get into this situation though as
the aperture should match what firmware accepts but it's still a valid
code path.

@Jason would you be okay with that?

Recovery (via zpci_hot_reset_device()) should then be able to deal with
these situations as long as zdev->dma_table matches the IOAT
registration state.

1. If zdev->dma_table != NULL we re-register the previous DMA table
2. If zdev->dma_table == NULL we do zpci_dma_init_device()

> 
> 
> >  out_free:
> >  	kfree(domain_device);
> >  
> > @@ -155,32 +167,12 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
> >  static void s390_iommu_detach_device(struct iommu_domain *domain,
> >  				     struct device *dev)
> >  {
> > -	struct s390_domain *s390_domain = to_s390_domain(domain);
> >  	struct zpci_dev *zdev = to_zpci_dev(dev);
> > -	struct s390_domain_device *domain_device, *tmp;
> > -	unsigned long flags;
> > -	int found = 0;
> >  
> > -	if (!zdev)
> > -		return;
> > +	WARN_ON(zdev->s390_domain != to_s390_domain(domain));
> >  
> > -	spin_lock_irqsave(&s390_domain->list_lock, flags);
> > -	list_for_each_entry_safe(domain_device, tmp, &s390_domain->devices,
> > -				 list) {
> > -		if (domain_device->zdev == zdev) {
> > -			list_del(&domain_device->list);
> > -			kfree(domain_device);
> > -			found = 1;
> > -			break;
> > -		}
> > -	}
> > -	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> > -
> > -	if (found && (zdev->s390_domain == s390_domain)) {
> > -		zdev->s390_domain = NULL;
> > -		zpci_unregister_ioat(zdev, 0);
> > -		zpci_dma_init_device(zdev);
> > -	}
> > +	__s390_iommu_detach_device(zdev);
> > +	zpci_dma_init_device(zdev);
> >  }
> >  
> >  static struct iommu_device *s390_iommu_probe_device(struct device *dev)
> > @@ -193,24 +185,13 @@ static struct iommu_device *s390_iommu_probe_device(struct device *dev)
> >  static void s390_iommu_release_device(struct device *dev)
> >  {
> >  	struct zpci_dev *zdev = to_zpci_dev(dev);
> > -	struct iommu_domain *domain;
> >  
> >  	/*
> > -	 * This is a workaround for a scenario where the IOMMU API common code
> > -	 * "forgets" to call the detach_dev callback: After binding a device
> > -	 * to vfio-pci and completing the VFIO_SET_IOMMU ioctl (which triggers
> > -	 * the attach_dev), removing the device via
> > -	 * "echo 1 > /sys/bus/pci/devices/.../remove" won't trigger detach_dev,
> > -	 * only release_device will be called via the BUS_NOTIFY_REMOVED_DEVICE
> > -	 * notifier.
> > -	 *
> > -	 * So let's call detach_dev from here if it hasn't been called before.
> > +	 * release_device is expected to detach any domain currently attached
> > +	 * to the device, but keep it attached to other devices in the group.
> >  	 */
> > -	if (zdev && zdev->s390_domain) {
> > -		domain = iommu_get_domain_for_dev(dev);
> > -		if (domain)
> > -			s390_iommu_detach_device(domain, dev);
> > -	}
> > +	if (zdev)
> > +		__s390_iommu_detach_device(zdev);
> >  }
> >  
> >  static int s390_iommu_update_trans(struct s390_domain *s390_domain,



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap
  2022-10-04 16:13         ` Niklas Schnelle
@ 2022-10-05  9:53           ` Robin Murphy
  2022-10-05 11:03             ` Niklas Schnelle
  0 siblings, 1 reply; 22+ messages in thread
From: Robin Murphy @ 2022-10-05  9:53 UTC (permalink / raw)
  To: Niklas Schnelle, Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, jgg, linux-kernel

On 2022-10-04 17:13, Niklas Schnelle wrote:
> On Tue, 2022-10-04 at 16:31 +0100, Robin Murphy wrote:
>> On 2022-10-04 16:12, Matthew Rosato wrote:
>>> On 10/4/22 11:02 AM, Robin Murphy wrote:
>>>> On 2022-10-04 13:07, Niklas Schnelle wrote:
>>>>> The .pgsize_bitmap property of struct iommu_ops is not a page mask but
>>>>> rather has a bit set for each size of pages the IOMMU supports. As the
>>>>> comment correctly pointed out at this moment the code only support 4K
>>>>> pages so simply use SZ_4K here.
>>>>
>>>> Unless it's already been done somewhere else, you'll want to switch over to the {map,unmap}_pages() interfaces as well to avoid taking a hit on efficiency here. The "page mask" thing was an old hack to trick the core API into making fewer map/unmap calls where the driver could map arbitrary numbers of pages at once anyway. The multi-page interfaces now do that more honestly and generally better (since they work for non-power-of-two sizes as well).
>>>
>>> Thanks for the heads up -- Niklas has some additional series coming soon as described here:
>>>
>>> https://lore.kernel.org/linux-iommu/a10424adbe01a0fd40372cbd0736d11e517951a1.camel@linux.ibm.com/
>>>
>>> So implementing the _pages() interfaces is soon up on the roadmap.  But given what you say I wonder if this patch should just wait until the series that implements {map,unmap}_pages().
>>
>> Perhaps, although the full change should be trivial enough that there's
>> probably just as much argument for doing the whole thing in its own
>> right for the sake of this cleanup. The main point is that
>> S390_IOMMU_PGSIZES is not incorrect as such, it's just not spelling out
>> the deliberate trick that it's achieving - everyone copied it from
>> intel-iommu, but since that got converted to the new interfaces the
>> original explanation is now gone. The only effect of "fixing" it in
>> isolation right now will be to make large VFIO mappings slower.
>>
>> Robin.
> 
> The patch changing to map_pages()/unmap_pages() is currently part of a
> larger series of improvements, some of which are less trivial. So I'm
> planning to send those as RFC first. Those include changing the
> spin_lock protected list to RCU so the map/unmap can paralellize
> better. Another one is atomic updates to the IOMMU tables to do away
> with locks in map/unmap. So I think pulling that whole
> series into this one isn't ideal. I could pull just the
> map_pages()/unmap_pages() change though.

Yeah, literally just updating the s390_iommu_{map,unmap} function 
prototypes and replacing "size" with "pgsize * count" within is all 
that's needed to clean up this hack properly. That can (and probably 
should) be completely independent of other improvements deeper down.

Thanks,
Robin.

> 
>>
>>>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>>>>> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
>>>>> ---
>>>>>     drivers/iommu/s390-iommu.c | 9 +--------
>>>>>     1 file changed, 1 insertion(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
>>>>> index 94c444b909bd..6bf23e7830a2 100644
>>>>> --- a/drivers/iommu/s390-iommu.c
>>>>> +++ b/drivers/iommu/s390-iommu.c
>>>>> @@ -12,13 +12,6 @@
>>>>>     #include <linux/sizes.h>
>>>>>     #include <asm/pci_dma.h>
>>>>>     -/*
>>>>> - * Physically contiguous memory regions can be mapped with 4 KiB alignment,
>>>>> - * we allow all page sizes that are an order of 4KiB (no special large page
>>>>> - * support so far).
>>>>> - */
>>>>> -#define S390_IOMMU_PGSIZES    (~0xFFFUL)
>>>>> -
>>>>>     static const struct iommu_ops s390_iommu_ops;
>>>>>       struct s390_domain {
>>>>> @@ -350,7 +343,7 @@ static const struct iommu_ops s390_iommu_ops = {
>>>>>         .probe_device = s390_iommu_probe_device,
>>>>>         .release_device = s390_iommu_release_device,
>>>>>         .device_group = generic_device_group,
>>>>> -    .pgsize_bitmap = S390_IOMMU_PGSIZES,
>>>>> +    .pgsize_bitmap = SZ_4K,
>>>>>         .get_resv_regions = s390_iommu_get_resv_regions,
>>>>>         .default_domain_ops = &(const struct iommu_domain_ops) {
>>>>>             .attach_dev    = s390_iommu_attach_device,
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap
  2022-10-05  9:53           ` Robin Murphy
@ 2022-10-05 11:03             ` Niklas Schnelle
  0 siblings, 0 replies; 22+ messages in thread
From: Niklas Schnelle @ 2022-10-05 11:03 UTC (permalink / raw)
  To: Robin Murphy, Matthew Rosato, Pierre Morel, iommu
  Cc: linux-s390, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, joro, will, jgg, linux-kernel

On Wed, 2022-10-05 at 10:53 +0100, Robin Murphy wrote:
> On 2022-10-04 17:13, Niklas Schnelle wrote:
> > On Tue, 2022-10-04 at 16:31 +0100, Robin Murphy wrote:
> > > On 2022-10-04 16:12, Matthew Rosato wrote:
> > > > On 10/4/22 11:02 AM, Robin Murphy wrote:
> > > > > On 2022-10-04 13:07, Niklas Schnelle wrote:
> > > > > > The .pgsize_bitmap property of struct iommu_ops is not a page mask but
> > > > > > rather has a bit set for each size of pages the IOMMU supports. As the
> > > > > > comment correctly pointed out at this moment the code only support 4K
> > > > > > pages so simply use SZ_4K here.
> > > > > 
> > > > > Unless it's already been done somewhere else, you'll want to switch over to the {map,unmap}_pages() interfaces as well to avoid taking a hit on efficiency here. The "page mask" thing was an old hack to trick the core API into making fewer map/unmap calls where the driver could map arbitrary numbers of pages at once anyway. The multi-page interfaces now do that more honestly and generally better (since they work for non-power-of-two sizes as well).
> > > > 
> > > > Thanks for the heads up -- Niklas has some additional series coming soon as described here:
> > > > 
> > > > https://lore.kernel.org/linux-iommu/a10424adbe01a0fd40372cbd0736d11e517951a1.camel@linux.ibm.com/
> > > > 
> > > > So implementing the _pages() interfaces is soon up on the roadmap.  But given what you say I wonder if this patch should just wait until the series that implements {map,unmap}_pages().
> > > 
> > > Perhaps, although the full change should be trivial enough that there's
> > > probably just as much argument for doing the whole thing in its own
> > > right for the sake of this cleanup. The main point is that
> > > S390_IOMMU_PGSIZES is not incorrect as such, it's just not spelling out
> > > the deliberate trick that it's achieving - everyone copied it from
> > > intel-iommu, but since that got converted to the new interfaces the
> > > original explanation is now gone. The only effect of "fixing" it in
> > > isolation right now will be to make large VFIO mappings slower.
> > > 
> > > Robin.
> > 
> > The patch changing to map_pages()/unmap_pages() is currently part of a
> > larger series of improvements, some of which are less trivial. So I'm
> > planning to send those as RFC first. Those include changing the
> > spin_lock protected list to RCU so the map/unmap can paralellize
> > better. Another one is atomic updates to the IOMMU tables to do away
> > with locks in map/unmap. So I think pulling that whole
> > series into this one isn't ideal. I could pull just the
> > map_pages()/unmap_pages() change though.
> 
> Yeah, literally just updating the s390_iommu_{map,unmap} function 
> prototypes and replacing "size" with "pgsize * count" within is all 
> that's needed to clean up this hack properly. That can (and probably 
> should) be completely independent of other improvements deeper down.
> 
> Thanks,
> Robin.
> 

Pretty much, it's a bit cleaner to slightly change
s390_iommu_update_trans() to take pgcount as argument since that
currently calculates the pgcount from the size anyway which is
redundant if we have a pgcount already but that's redudant if we have
the pgcount already. But yes it's all pretty simple and I reordered
things for v5 already.

Speaking of v5, if that were the final form, what do you think would be
the best tree to take it? Except for patch 1 they depend on the removal
of the bus_next field in struct zpci_dev. That commit is not yet in
Linus' tree but already in the s390 feature branch on git.kernel.org so
if these changes were to go via the s390 tree that would be taken care
of. Otherwise one would have to merge that tree first. Or as an
alternative I also have a kernel.org account and can provide this
series as a GPG signed branch based on the s390 tree.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments
  2022-10-05  7:58     ` Niklas Schnelle
@ 2022-10-05 11:48       ` Jason Gunthorpe
  2022-10-06 11:52         ` Niklas Schnelle
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2022-10-05 11:48 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Matthew Rosato, Pierre Morel, iommu, linux-s390, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, joro, will,
	robin.murphy, linux-kernel

On Wed, Oct 05, 2022 at 09:58:58AM +0200, Niklas Schnelle wrote:

> A failed aperture test leaving the IOAT registered would indeed be bad.
> I guess I focused too much on the failure scenarios at the state after
> these patches where this can't happen. I think this would leave us in a
> bad state because zpci_register_ioat() succeeded with the domain's DMA
> table but we won't have attached leading to the wrong decisions in
> recovery paths (see below).

Domain attach should either completely move to the new domain and
succeed, or it should leave everything as is and fail.

So it looks OK to me.

> Recovery (via zpci_hot_reset_device()) should then be able to deal with
> these situations as long as zdev->dma_table matches the IOAT
> registration state.

If you are doing reset the s390 driver should keep track of what
domain is supposed to be attached and fix it when the reset is
completed. In this case it should not fail attach here for the
mandatory success domain types.

The core code does not reasonably handle failures from this routine,
it must be avoided if you want it to be robust.

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments
  2022-10-05 11:48       ` Jason Gunthorpe
@ 2022-10-06 11:52         ` Niklas Schnelle
  2022-10-06 12:02           ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Niklas Schnelle @ 2022-10-06 11:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matthew Rosato, Pierre Morel, iommu, linux-s390, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, joro, will,
	robin.murphy, linux-kernel

On Wed, 2022-10-05 at 08:48 -0300, Jason Gunthorpe wrote:
> On Wed, Oct 05, 2022 at 09:58:58AM +0200, Niklas Schnelle wrote:
> 
> > A failed aperture test leaving the IOAT registered would indeed be bad.
> > I guess I focused too much on the failure scenarios at the state after
> > these patches where this can't happen. I think this would leave us in a
> > bad state because zpci_register_ioat() succeeded with the domain's DMA
> > table but we won't have attached leading to the wrong decisions in
> > recovery paths (see below).
> 
> Domain attach should either completely move to the new domain and
> succeed, or it should leave everything as is and fail.
> 
> So it looks OK to me.
> 
> > Recovery (via zpci_hot_reset_device()) should then be able to deal with
> > these situations as long as zdev->dma_table matches the IOAT
> > registration state.
> 
> If you are doing reset the s390 driver should keep track of what
> domain is supposed to be attached and fix it when the reset is
> completed. In this case it should not fail attach here for the
> mandatory success domain types.

Our reset/recovery code won't do a detach/attach, it directly re-
establishes the DMA table that was previously in use with firmware. If
that fails the reset fails and one will have to "power cyle"* the
device.

Also automatic recovery is blocked while the IOMMU API is in use.
Though "echo 1 > /sys/bus/pci/../reset" is available and does re-
register the DMA table if the device was in an error state.

> 
> The core code does not reasonably handle failures from this routine,
> it must be avoided if you want it to be robust.
> 
> Jason

Makes sense. I see the following failure cases:

1. After patch 3 failure in the aperture check leaves
   everything as it is. Before that my proposal would
   leave it with DMA blocked and no domain attached
   so it will need to be "power cycled"*.

2. If zpci_register_ioat() fails the device is left detached
   from all domains. This however only happens in one of two cases:

   2a. The device was surprise unplugged. This seems fine as
       we tear things down and the calling code just needs to
       back off which from what I can see it does.
   2b. The device has entered an error state.

In case 2b the device is going to need recovery and will not be usable
until that succeeded (DMA and MMIO access is blocked). In automatic
recovery if zdev->dma_table == NULL the device will re-initialized for
use with the DMA API while if the IOMMU API is in use we currently
don't attempt recovery and the user needs to "power cycle"* the device
manually. The "re-initalized for DMA API" part of course doesn't work
for the upcoming DMA API conversion.

One option I see would be to ignore the error return from
zpci_register_ioat() if it indicates case 2b. Then we would still add
the device to the IOMMU's devices list and return success despite
knowing that the device is inaccessible (DMA and MMIO blocked).

Then the recovery/reset code will register the new domain once the
device comes out of the error state. At least from an IOMMU API point
of view that would make the attachment always succeed for all
zpci_register_ioat() error cases that aren't programming bugs and can
conceivably be recovered from.

If you agree I would propose adding this as a robustness improvement as
part of my upcoming series of IOMMU improvements needed for the DMA API
conversion. As stated above before the DMA API conversion any error
that would cause zpci_register_ioat() to fail while the IOMMU API is
being used will need a "power cycle" anyway so postponing this doesn't
hurt.

* I say "power cycle" but this isn't usually a real power cycle rather
an architecture specific low level disabled/enable but from Linux
driver point of view the device is completely unplugged and re-plugged.

Niklas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments
  2022-10-06 11:52         ` Niklas Schnelle
@ 2022-10-06 12:02           ` Jason Gunthorpe
  2022-10-06 13:01             ` Niklas Schnelle
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2022-10-06 12:02 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Matthew Rosato, Pierre Morel, iommu, linux-s390, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, joro, will,
	robin.murphy, linux-kernel

On Thu, Oct 06, 2022 at 01:52:44PM +0200, Niklas Schnelle wrote:

> One option I see would be to ignore the error return from
> zpci_register_ioat() if it indicates case 2b. Then we would still add
> the device to the IOMMU's devices list and return success despite
> knowing that the device is inaccessible (DMA and MMIO blocked).
> 
> Then the recovery/reset code will register the new domain once the
> device comes out of the error state. At least from an IOMMU API point
> of view that would make the attachment always succeed for all
> zpci_register_ioat() error cases that aren't programming bugs and can
> conceivably be recovered from.

This is what I was thinking..

> If you agree I would propose adding this as a robustness improvement as
> part of my upcoming series of IOMMU improvements needed for the DMA API
> conversion. As stated above before the DMA API conversion any error
> that would cause zpci_register_ioat() to fail while the IOMMU API is
> being used will need a "power cycle" anyway so postponing this doesn't
> hurt.

Yes, I think this series is fine as is

Patch 4 mostly deletes all these error cases, and the one hunk that is left:

+	if (domain->geometry.aperture_start > zdev->end_dma ||
+	    domain->geometry.aperture_end < zdev->start_dma)
+		return -EINVAL;

Is misplaced. If a device cannot be supported by the IOMMU, which is
what that is really saying since it only s390 creates one aperture
size, then it should fail to probe, not fail at attach.

So I'd change the above to a WARN_ON() for future safety and add a
similar test to probe and then all that is left is the
zpci_register_ioat() which you have a plan for.

Jason
 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments
  2022-10-06 12:02           ` Jason Gunthorpe
@ 2022-10-06 13:01             ` Niklas Schnelle
  0 siblings, 0 replies; 22+ messages in thread
From: Niklas Schnelle @ 2022-10-06 13:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matthew Rosato, Pierre Morel, iommu, linux-s390, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, joro, will,
	robin.murphy, linux-kernel

On Thu, 2022-10-06 at 09:02 -0300, Jason Gunthorpe wrote:
> On Thu, Oct 06, 2022 at 01:52:44PM +0200, Niklas Schnelle wrote:
> 
> > One option I see would be to ignore the error return from
> > zpci_register_ioat() if it indicates case 2b. Then we would still add
> > the device to the IOMMU's devices list and return success despite
> > knowing that the device is inaccessible (DMA and MMIO blocked).
> > 
> > Then the recovery/reset code will register the new domain once the
> > device comes out of the error state. At least from an IOMMU API point
> > of view that would make the attachment always succeed for all
> > zpci_register_ioat() error cases that aren't programming bugs and can
> > conceivably be recovered from.
> 
> This is what I was thinking..
> 
> > If you agree I would propose adding this as a robustness improvement as
> > part of my upcoming series of IOMMU improvements needed for the DMA API
> > conversion. As stated above before the DMA API conversion any error
> > that would cause zpci_register_ioat() to fail while the IOMMU API is
> > being used will need a "power cycle" anyway so postponing this doesn't
> > hurt.
> 
> Yes, I think this series is fine as is
> 
> Patch 4 mostly deletes all these error cases, and the one hunk that is left:
> 
> +	if (domain->geometry.aperture_start > zdev->end_dma ||
> +	    domain->geometry.aperture_end < zdev->start_dma)
> +		return -EINVAL;
> 
> Is misplaced. If a device cannot be supported by the IOMMU, which is
> what that is really saying since it only s390 creates one aperture
> size, then it should fail to probe, not fail at attach.
> 
> So I'd change the above to a WARN_ON() for future safety and add a
> similar test to probe and then all that is left is the
> zpci_register_ioat() which you have a plan for.
> 
> Jason
>  

Sounds good will do a v5 anyway to add the map_pages()/unmap_pages().


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-10-06 13:03 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-04 12:07 [PATCH v4 0/5] iommu/s390: Fixes related to attach and aperture handling Niklas Schnelle
2022-10-04 12:07 ` [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments Niklas Schnelle
2022-10-04 12:43   ` Jason Gunthorpe
2022-10-04 16:18   ` Matthew Rosato
2022-10-05  7:58     ` Niklas Schnelle
2022-10-05 11:48       ` Jason Gunthorpe
2022-10-06 11:52         ` Niklas Schnelle
2022-10-06 12:02           ` Jason Gunthorpe
2022-10-06 13:01             ` Niklas Schnelle
2022-10-04 12:07 ` [PATCH v4 2/5] iommu/s390: Get rid of s390_domain_device Niklas Schnelle
2022-10-04 16:20   ` Matthew Rosato
2022-10-04 12:07 ` [PATCH v4 3/5] iommu/s390: Fix potential s390_domain aperture shrinking Niklas Schnelle
2022-10-04 21:12   ` Matthew Rosato
2022-10-04 12:07 ` [PATCH v4 4/5] iommu/s390: Fix incorrect aperture check Niklas Schnelle
2022-10-04 12:07 ` [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap Niklas Schnelle
2022-10-04 14:38   ` Matthew Rosato
2022-10-04 15:02   ` Robin Murphy
2022-10-04 15:12     ` Matthew Rosato
2022-10-04 15:31       ` Robin Murphy
2022-10-04 16:13         ` Niklas Schnelle
2022-10-05  9:53           ` Robin Murphy
2022-10-05 11:03             ` Niklas Schnelle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).