All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] iommu/s390: Further improvements
@ 2022-11-09 14:28 Niklas Schnelle
  2022-11-09 14:28 ` [PATCH v2 1/5] iommu/s390: Make attach succeed even if the device is in error state Niklas Schnelle
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Niklas Schnelle @ 2022-11-09 14:28 UTC (permalink / raw)
  To: Matthew Rosato, iommu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe
  Cc: Gerd Bayer, Pierre Morel, linux-s390, borntraeger, hca, gor,
	gerald.schaefer, agordeev, svens, linux-kernel

Hi All,

This series of patches improves the s390 IOMMU driver. These improvements help
existing IOMMU users, mainly vfio-pci, but at the same time are also in
preparation of converting s390 to use the common DMA API implementation in
drivers/iommu/dma-iommu.c instead of its platform specific DMA API in
arch/s390/pci/pci_dma.c that sidesteps the IOMMU driver to control the same
hardware interface directly.

Among the included changes patch 1 improves the robustness of switching IOMMU
domains and patch 2 adds the I/O TLB operations necessary for the DMA API
conversion. Patches 3, 4, and 5 aim to improve performance with patch 5 being
the most intrusive by removing the I/O translation table lock and using atomic
updates instead.

This series is based on the s390 branch of Joerg's IOMMU tree[0] that includes
the latest s390 IOMMU fixes. It is available for easy testing in the
iommu_improve_v2 branch with signed tag s390_iommu_improve_v2 of my
git.kernel.org tree[1].

Best regards,
Niklas Schnelle

Changes sinve v1:
- If an IOTLB flush fails for one device don't skip the flush for other devices.
  This is also needed when RCU readers try to flush a detached device. (Jason)
- Free a domain's IOMMU translation table via call_rcu() (Jason)

[0] https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/log/?h=s390
[1] https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/

Niklas Schnelle (5):
  iommu/s390: Make attach succeed even if the device is in error state
  iommu/s390: Add I/O TLB ops
  iommu/s390: Use RCU to allow concurrent domain_list iteration
  iommu/s390: Optimize IOMMU table walking
  s390/pci: use lock-free I/O translation updates

 arch/s390/include/asm/pci.h |   4 +-
 arch/s390/kvm/pci.c         |   6 +-
 arch/s390/pci/pci.c         |  13 +--
 arch/s390/pci/pci_dma.c     |  77 +++++++++------
 drivers/iommu/s390-iommu.c  | 184 ++++++++++++++++++++++++------------
 5 files changed, 185 insertions(+), 99 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/5] iommu/s390: Make attach succeed even if the device is in error state
  2022-11-09 14:28 [PATCH v2 0/5] iommu/s390: Further improvements Niklas Schnelle
@ 2022-11-09 14:28 ` Niklas Schnelle
  2022-11-14 15:05   ` Gerd Bayer
  2022-11-14 15:30   ` Matthew Rosato
  2022-11-09 14:29 ` [PATCH v2 2/5] iommu/s390: Add I/O TLB ops Niklas Schnelle
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 9+ messages in thread
From: Niklas Schnelle @ 2022-11-09 14:28 UTC (permalink / raw)
  To: Matthew Rosato, iommu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe
  Cc: Gerd Bayer, Pierre Morel, linux-s390, borntraeger, hca, gor,
	gerald.schaefer, agordeev, svens, linux-kernel

If a zPCI device is in the error state while switching IOMMU domains
zpci_register_ioat() will fail and we would end up with the device not
attached to any domain. In this state since zdev->dma_table == NULL
a reset via zpci_hot_reset_device() would wrongfully re-initialize the
device for DMA API usage using zpci_dma_init_device(). As automatic
recovery is currently disabled while attached to an IOMMU domain this
only affects slot resets triggered through other means but will affect
automatic recovery once we switch to using dma-iommu.

Additionally with that switch common code expects attaching to the
default domain to always work so zpci_register_ioat() should only fail
if there is no chance to recover anyway, e.g. if the device has been
unplugged.

Improve the robustness of attach by specifically looking at the status
returned by zpci_mod_fc() to determine if the device is unavailable and
in this case simply ignore the error. Once the device is reset
zpci_hot_reset_device() will then correctly set the domain's DMA
translation tables.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 arch/s390/include/asm/pci.h |  2 +-
 arch/s390/kvm/pci.c         |  6 ++++--
 arch/s390/pci/pci.c         | 11 ++++++-----
 arch/s390/pci/pci_dma.c     |  3 ++-
 drivers/iommu/s390-iommu.c  |  9 +++++++--
 5 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 15f8714ca9b7..07361e2fd8c5 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -221,7 +221,7 @@ void zpci_device_reserved(struct zpci_dev *zdev);
 bool zpci_is_device_configured(struct zpci_dev *zdev);
 
 int zpci_hot_reset_device(struct zpci_dev *zdev);
-int zpci_register_ioat(struct zpci_dev *, u8, u64, u64, u64);
+int zpci_register_ioat(struct zpci_dev *, u8, u64, u64, u64, u8 *);
 int zpci_unregister_ioat(struct zpci_dev *, u8);
 void zpci_remove_reserved_devices(void);
 void zpci_update_fh(struct zpci_dev *zdev, u32 fh);
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index c50c1645c0ae..03964c0e1fdf 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -434,6 +434,7 @@ static void kvm_s390_pci_dev_release(struct zpci_dev *zdev)
 static int kvm_s390_pci_register_kvm(void *opaque, struct kvm *kvm)
 {
 	struct zpci_dev *zdev = opaque;
+	u8 status;
 	int rc;
 
 	if (!zdev)
@@ -486,7 +487,7 @@ static int kvm_s390_pci_register_kvm(void *opaque, struct kvm *kvm)
 
 	/* Re-register the IOMMU that was already created */
 	rc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
-				virt_to_phys(zdev->dma_table));
+				virt_to_phys(zdev->dma_table), &status);
 	if (rc)
 		goto clear_gisa;
 
@@ -516,6 +517,7 @@ static void kvm_s390_pci_unregister_kvm(void *opaque)
 {
 	struct zpci_dev *zdev = opaque;
 	struct kvm *kvm;
+	u8 status;
 
 	if (!zdev)
 		return;
@@ -554,7 +556,7 @@ static void kvm_s390_pci_unregister_kvm(void *opaque)
 
 	/* Re-register the IOMMU that was already created */
 	zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
-			   virt_to_phys(zdev->dma_table));
+			   virt_to_phys(zdev->dma_table), &status);
 
 out:
 	spin_lock(&kvm->arch.kzdev_list_lock);
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 73cdc5539384..a703dcd94a68 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -116,20 +116,20 @@ EXPORT_SYMBOL_GPL(pci_proc_domain);
 
 /* Modify PCI: Register I/O address translation parameters */
 int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas,
-		       u64 base, u64 limit, u64 iota)
+		       u64 base, u64 limit, u64 iota, u8 *status)
 {
 	u64 req = ZPCI_CREATE_REQ(zdev->fh, dmaas, ZPCI_MOD_FC_REG_IOAT);
 	struct zpci_fib fib = {0};
-	u8 cc, status;
+	u8 cc;
 
 	WARN_ON_ONCE(iota & 0x3fff);
 	fib.pba = base;
 	fib.pal = limit;
 	fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
 	fib.gd = zdev->gisa;
-	cc = zpci_mod_fc(req, &fib, &status);
+	cc = zpci_mod_fc(req, &fib, status);
 	if (cc)
-		zpci_dbg(3, "reg ioat fid:%x, cc:%d, status:%d\n", zdev->fid, cc, status);
+		zpci_dbg(3, "reg ioat fid:%x, cc:%d, status:%d\n", zdev->fid, cc, *status);
 	return cc;
 }
 EXPORT_SYMBOL_GPL(zpci_register_ioat);
@@ -764,6 +764,7 @@ EXPORT_SYMBOL_GPL(zpci_disable_device);
  */
 int zpci_hot_reset_device(struct zpci_dev *zdev)
 {
+	u8 status;
 	int rc;
 
 	zpci_dbg(3, "rst fid:%x, fh:%x\n", zdev->fid, zdev->fh);
@@ -787,7 +788,7 @@ int zpci_hot_reset_device(struct zpci_dev *zdev)
 
 	if (zdev->dma_table)
 		rc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
-					virt_to_phys(zdev->dma_table));
+					virt_to_phys(zdev->dma_table), &status);
 	else
 		rc = zpci_dma_init_device(zdev);
 	if (rc) {
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 227cf0a62800..dee825ee7305 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -547,6 +547,7 @@ static void s390_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
 	
 int zpci_dma_init_device(struct zpci_dev *zdev)
 {
+	u8 status;
 	int rc;
 
 	/*
@@ -598,7 +599,7 @@ int zpci_dma_init_device(struct zpci_dev *zdev)
 
 	}
 	if (zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
-			       virt_to_phys(zdev->dma_table))) {
+			       virt_to_phys(zdev->dma_table), &status)) {
 		rc = -EIO;
 		goto free_bitmap;
 	}
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 7fb512bece9a..e2c886bc4376 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -98,6 +98,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 	struct zpci_dev *zdev = to_zpci_dev(dev);
 	unsigned long flags;
+	u8 status;
 	int cc;
 
 	if (!zdev)
@@ -113,8 +114,12 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 		zpci_dma_exit_device(zdev);
 
 	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
-				virt_to_phys(s390_domain->dma_table));
-	if (cc)
+				virt_to_phys(s390_domain->dma_table), &status);
+	/*
+	 * If the device is undergoing error recovery the reset code
+	 * will re-establish the new domain.
+	 */
+	if (cc && status != ZPCI_PCI_ST_FUNC_NOT_AVAIL)
 		return -EIO;
 	zdev->dma_table = s390_domain->dma_table;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/5] iommu/s390: Add I/O TLB ops
  2022-11-09 14:28 [PATCH v2 0/5] iommu/s390: Further improvements Niklas Schnelle
  2022-11-09 14:28 ` [PATCH v2 1/5] iommu/s390: Make attach succeed even if the device is in error state Niklas Schnelle
@ 2022-11-09 14:29 ` Niklas Schnelle
  2022-11-09 14:29 ` [PATCH v2 3/5] iommu/s390: Use RCU to allow concurrent domain_list iteration Niklas Schnelle
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Niklas Schnelle @ 2022-11-09 14:29 UTC (permalink / raw)
  To: Matthew Rosato, iommu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe
  Cc: Gerd Bayer, Pierre Morel, linux-s390, borntraeger, hca, gor,
	gerald.schaefer, agordeev, svens, linux-kernel

Currently s390-iommu does an I/O TLB flush (RPCIT) for every update of
the I/O translation table explicitly. For one this is wasteful since
RPCIT can be skipped after a mapping operation if zdev->tlb_refresh is
unset. Moreover we can do a single RPCIT for a range of pages including
whne doing lazy unmapping.

Thankfully both of these optimizations can be achieved by implementing
the IOMMU operations common code provides for the different types of I/O
tlb flushes:

 * flush_iotlb_all: Flushes the I/O TLB for the entire IOVA space
 * iotlb_sync:  Flushes the I/O TLB for a range of pages that can be
   gathered up, for example to implement lazy unmapping.
 * iotlb_sync_map: Flushes the I/O TLB after a mapping operation

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
v1->v2:
- Don't skip IOTLB flushes for other devices on IOTLB flush failure (Jason)

 drivers/iommu/s390-iommu.c | 67 +++++++++++++++++++++++++++++++-------
 1 file changed, 56 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index e2c886bc4376..9771bce86e94 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -199,14 +199,63 @@ static void s390_iommu_release_device(struct device *dev)
 		__s390_iommu_detach_device(zdev);
 }
 
+static void s390_iommu_flush_iotlb_all(struct iommu_domain *domain)
+{
+	struct s390_domain *s390_domain = to_s390_domain(domain);
+	struct zpci_dev *zdev;
+	unsigned long flags;
+
+	spin_lock_irqsave(&s390_domain->list_lock, flags);
+	list_for_each_entry(zdev, &s390_domain->devices, iommu_list) {
+		zpci_refresh_trans((u64)zdev->fh << 32, zdev->start_dma,
+				   zdev->end_dma - zdev->start_dma + 1);
+	}
+	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+}
+
+static void s390_iommu_iotlb_sync(struct iommu_domain *domain,
+				  struct iommu_iotlb_gather *gather)
+{
+	struct s390_domain *s390_domain = to_s390_domain(domain);
+	size_t size = gather->end - gather->start + 1;
+	struct zpci_dev *zdev;
+	unsigned long flags;
+
+	/* If gather was never added to there is nothing to flush */
+	if (!gather->end)
+		return;
+
+	spin_lock_irqsave(&s390_domain->list_lock, flags);
+	list_for_each_entry(zdev, &s390_domain->devices, iommu_list) {
+		zpci_refresh_trans((u64)zdev->fh << 32, gather->start,
+				   size);
+	}
+	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+}
+
+static void s390_iommu_iotlb_sync_map(struct iommu_domain *domain,
+				      unsigned long iova, size_t size)
+{
+	struct s390_domain *s390_domain = to_s390_domain(domain);
+	struct zpci_dev *zdev;
+	unsigned long flags;
+
+	spin_lock_irqsave(&s390_domain->list_lock, flags);
+	list_for_each_entry(zdev, &s390_domain->devices, iommu_list) {
+		if (!zdev->tlb_refresh)
+			continue;
+		zpci_refresh_trans((u64)zdev->fh << 32,
+				   iova, size);
+	}
+	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+}
+
 static int s390_iommu_update_trans(struct s390_domain *s390_domain,
 				   phys_addr_t pa, dma_addr_t dma_addr,
 				   unsigned long nr_pages, int flags)
 {
 	phys_addr_t page_addr = pa & PAGE_MASK;
-	dma_addr_t start_dma_addr = dma_addr;
 	unsigned long irq_flags, i;
-	struct zpci_dev *zdev;
 	unsigned long *entry;
 	int rc = 0;
 
@@ -225,15 +274,6 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
 		dma_addr += PAGE_SIZE;
 	}
 
-	spin_lock(&s390_domain->list_lock);
-	list_for_each_entry(zdev, &s390_domain->devices, iommu_list) {
-		rc = zpci_refresh_trans((u64)zdev->fh << 32,
-					start_dma_addr, nr_pages * PAGE_SIZE);
-		if (rc)
-			break;
-	}
-	spin_unlock(&s390_domain->list_lock);
-
 undo_cpu_trans:
 	if (rc && ((flags & ZPCI_PTE_VALID_MASK) == ZPCI_PTE_VALID)) {
 		flags = ZPCI_PTE_INVALID;
@@ -340,6 +380,8 @@ static size_t s390_iommu_unmap_pages(struct iommu_domain *domain,
 	if (rc)
 		return 0;
 
+	iommu_iotlb_gather_add_range(gather, iova, size);
+
 	return size;
 }
 
@@ -384,6 +426,9 @@ static const struct iommu_ops s390_iommu_ops = {
 		.detach_dev	= s390_iommu_detach_device,
 		.map_pages	= s390_iommu_map_pages,
 		.unmap_pages	= s390_iommu_unmap_pages,
+		.flush_iotlb_all = s390_iommu_flush_iotlb_all,
+		.iotlb_sync      = s390_iommu_iotlb_sync,
+		.iotlb_sync_map  = s390_iommu_iotlb_sync_map,
 		.iova_to_phys	= s390_iommu_iova_to_phys,
 		.free		= s390_domain_free,
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/5] iommu/s390: Use RCU to allow concurrent domain_list iteration
  2022-11-09 14:28 [PATCH v2 0/5] iommu/s390: Further improvements Niklas Schnelle
  2022-11-09 14:28 ` [PATCH v2 1/5] iommu/s390: Make attach succeed even if the device is in error state Niklas Schnelle
  2022-11-09 14:29 ` [PATCH v2 2/5] iommu/s390: Add I/O TLB ops Niklas Schnelle
@ 2022-11-09 14:29 ` Niklas Schnelle
  2022-11-09 14:29 ` [PATCH v2 4/5] iommu/s390: Optimize IOMMU table walking Niklas Schnelle
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Niklas Schnelle @ 2022-11-09 14:29 UTC (permalink / raw)
  To: Matthew Rosato, iommu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe
  Cc: Gerd Bayer, Pierre Morel, linux-s390, borntraeger, hca, gor,
	gerald.schaefer, agordeev, svens, linux-kernel

The s390_domain->devices list is only added to when new devices are
attached but is iterated through in read-only fashion for every mapping
operation as well as for I/O TLB flushes and thus in performance
critical code causing contention on the s390_domain->list_lock.
Fortunately such a read-mostly linked list is a standard use case for
RCU. This change closely follows the example fpr RCU protected list
given in Documentation/RCU/listRCU.rst.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
v1->v2:
- Free domain tables via call_rcu() (Jason)

 arch/s390/include/asm/pci.h |  1 +
 arch/s390/pci/pci.c         |  2 +-
 drivers/iommu/s390-iommu.c  | 44 +++++++++++++++++++++++--------------
 3 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 07361e2fd8c5..e4c3e4e04d30 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -119,6 +119,7 @@ struct zpci_dev {
 	struct list_head entry;		/* list of all zpci_devices, needed for hotplug, etc. */
 	struct list_head iommu_list;
 	struct kref kref;
+	struct rcu_head rcu;
 	struct hotplug_slot hotplug_slot;
 
 	enum zpci_state state;
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index a703dcd94a68..ef38b1514c77 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -996,7 +996,7 @@ void zpci_release_device(struct kref *kref)
 		break;
 	}
 	zpci_dbg(3, "rem fid:%x\n", zdev->fid);
-	kfree(zdev);
+	kfree_rcu(zdev, rcu);
 }
 
 int zpci_report_error(struct pci_dev *pdev,
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 9771bce86e94..cf5dcbcea4e0 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -10,6 +10,8 @@
 #include <linux/iommu.h>
 #include <linux/iommu-helper.h>
 #include <linux/sizes.h>
+#include <linux/rculist.h>
+#include <linux/rcupdate.h>
 #include <asm/pci_dma.h>
 
 static const struct iommu_ops s390_iommu_ops;
@@ -20,6 +22,7 @@ struct s390_domain {
 	unsigned long		*dma_table;
 	spinlock_t		dma_table_lock;
 	spinlock_t		list_lock;
+	struct rcu_head		rcu;
 };
 
 static struct s390_domain *to_s390_domain(struct iommu_domain *dom)
@@ -61,18 +64,28 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
 
 	spin_lock_init(&s390_domain->dma_table_lock);
 	spin_lock_init(&s390_domain->list_lock);
-	INIT_LIST_HEAD(&s390_domain->devices);
+	INIT_LIST_HEAD_RCU(&s390_domain->devices);
 
 	return &s390_domain->domain;
 }
 
+static void s390_iommu_rcu_free_domain(struct rcu_head *head)
+{
+	struct s390_domain *s390_domain = container_of(head, struct s390_domain, rcu);
+
+	dma_cleanup_tables(s390_domain->dma_table);
+	kfree(s390_domain);
+}
+
 static void s390_domain_free(struct iommu_domain *domain)
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 
+	rcu_read_lock();
 	WARN_ON(!list_empty(&s390_domain->devices));
-	dma_cleanup_tables(s390_domain->dma_table);
-	kfree(s390_domain);
+	rcu_read_unlock();
+
+	call_rcu(&s390_domain->rcu, s390_iommu_rcu_free_domain);
 }
 
 static void __s390_iommu_detach_device(struct zpci_dev *zdev)
@@ -84,7 +97,7 @@ static void __s390_iommu_detach_device(struct zpci_dev *zdev)
 		return;
 
 	spin_lock_irqsave(&s390_domain->list_lock, flags);
-	list_del_init(&zdev->iommu_list);
+	list_del_rcu(&zdev->iommu_list);
 	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
 
 	zpci_unregister_ioat(zdev, 0);
@@ -127,7 +140,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	zdev->s390_domain = s390_domain;
 
 	spin_lock_irqsave(&s390_domain->list_lock, flags);
-	list_add(&zdev->iommu_list, &s390_domain->devices);
+	list_add_rcu(&zdev->iommu_list, &s390_domain->devices);
 	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
 
 	return 0;
@@ -203,14 +216,13 @@ static void s390_iommu_flush_iotlb_all(struct iommu_domain *domain)
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 	struct zpci_dev *zdev;
-	unsigned long flags;
 
-	spin_lock_irqsave(&s390_domain->list_lock, flags);
-	list_for_each_entry(zdev, &s390_domain->devices, iommu_list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(zdev, &s390_domain->devices, iommu_list) {
 		zpci_refresh_trans((u64)zdev->fh << 32, zdev->start_dma,
 				   zdev->end_dma - zdev->start_dma + 1);
 	}
-	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+	rcu_read_unlock();
 }
 
 static void s390_iommu_iotlb_sync(struct iommu_domain *domain,
@@ -219,18 +231,17 @@ static void s390_iommu_iotlb_sync(struct iommu_domain *domain,
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 	size_t size = gather->end - gather->start + 1;
 	struct zpci_dev *zdev;
-	unsigned long flags;
 
 	/* If gather was never added to there is nothing to flush */
 	if (!gather->end)
 		return;
 
-	spin_lock_irqsave(&s390_domain->list_lock, flags);
-	list_for_each_entry(zdev, &s390_domain->devices, iommu_list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(zdev, &s390_domain->devices, iommu_list) {
 		zpci_refresh_trans((u64)zdev->fh << 32, gather->start,
 				   size);
 	}
-	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+	rcu_read_unlock();
 }
 
 static void s390_iommu_iotlb_sync_map(struct iommu_domain *domain,
@@ -238,16 +249,15 @@ static void s390_iommu_iotlb_sync_map(struct iommu_domain *domain,
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 	struct zpci_dev *zdev;
-	unsigned long flags;
 
-	spin_lock_irqsave(&s390_domain->list_lock, flags);
-	list_for_each_entry(zdev, &s390_domain->devices, iommu_list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(zdev, &s390_domain->devices, iommu_list) {
 		if (!zdev->tlb_refresh)
 			continue;
 		zpci_refresh_trans((u64)zdev->fh << 32,
 				   iova, size);
 	}
-	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+	rcu_read_unlock();
 }
 
 static int s390_iommu_update_trans(struct s390_domain *s390_domain,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 4/5] iommu/s390: Optimize IOMMU table walking
  2022-11-09 14:28 [PATCH v2 0/5] iommu/s390: Further improvements Niklas Schnelle
                   ` (2 preceding siblings ...)
  2022-11-09 14:29 ` [PATCH v2 3/5] iommu/s390: Use RCU to allow concurrent domain_list iteration Niklas Schnelle
@ 2022-11-09 14:29 ` Niklas Schnelle
  2022-11-09 14:29 ` [PATCH v2 5/5] s390/pci: use lock-free I/O translation updates Niklas Schnelle
  2022-11-19  9:28 ` [PATCH v2 0/5] iommu/s390: Further improvements Joerg Roedel
  5 siblings, 0 replies; 9+ messages in thread
From: Niklas Schnelle @ 2022-11-09 14:29 UTC (permalink / raw)
  To: Matthew Rosato, iommu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe
  Cc: Gerd Bayer, Pierre Morel, linux-s390, borntraeger, hca, gor,
	gerald.schaefer, agordeev, svens, linux-kernel

When invalidating existing table entries for unmap there is no need to
know the physical address beforehand so don't do an extra walk of the
IOMMU table to get it. Also when invalidating entries not finding an
entry indicates an invalid unmap and not a lack of memory we also don't
need to undo updates in this case. Implement this by splitting
s390_iommu_update_trans() in a variant for validating and one for
invalidating translations.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 drivers/iommu/s390-iommu.c | 69 ++++++++++++++++++++++++--------------
 1 file changed, 43 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index cf5dcbcea4e0..2b9a3e3bc606 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -260,14 +260,14 @@ static void s390_iommu_iotlb_sync_map(struct iommu_domain *domain,
 	rcu_read_unlock();
 }
 
-static int s390_iommu_update_trans(struct s390_domain *s390_domain,
-				   phys_addr_t pa, dma_addr_t dma_addr,
-				   unsigned long nr_pages, int flags)
+static int s390_iommu_validate_trans(struct s390_domain *s390_domain,
+				     phys_addr_t pa, dma_addr_t dma_addr,
+				     unsigned long nr_pages, int flags)
 {
 	phys_addr_t page_addr = pa & PAGE_MASK;
 	unsigned long irq_flags, i;
 	unsigned long *entry;
-	int rc = 0;
+	int rc;
 
 	if (!nr_pages)
 		return 0;
@@ -275,7 +275,7 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
 	spin_lock_irqsave(&s390_domain->dma_table_lock, irq_flags);
 	for (i = 0; i < nr_pages; i++) {
 		entry = dma_walk_cpu_trans(s390_domain->dma_table, dma_addr);
-		if (!entry) {
+		if (unlikely(!entry)) {
 			rc = -ENOMEM;
 			goto undo_cpu_trans;
 		}
@@ -283,19 +283,43 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
 		page_addr += PAGE_SIZE;
 		dma_addr += PAGE_SIZE;
 	}
+	spin_unlock_irqrestore(&s390_domain->dma_table_lock, irq_flags);
+
+	return 0;
 
 undo_cpu_trans:
-	if (rc && ((flags & ZPCI_PTE_VALID_MASK) == ZPCI_PTE_VALID)) {
-		flags = ZPCI_PTE_INVALID;
-		while (i-- > 0) {
-			page_addr -= PAGE_SIZE;
-			dma_addr -= PAGE_SIZE;
-			entry = dma_walk_cpu_trans(s390_domain->dma_table,
-						   dma_addr);
-			if (!entry)
-				break;
-			dma_update_cpu_trans(entry, page_addr, flags);
+	while (i-- > 0) {
+		dma_addr -= PAGE_SIZE;
+		entry = dma_walk_cpu_trans(s390_domain->dma_table,
+					   dma_addr);
+		if (!entry)
+			break;
+		dma_update_cpu_trans(entry, 0, ZPCI_PTE_INVALID);
+	}
+	spin_unlock_irqrestore(&s390_domain->dma_table_lock, irq_flags);
+
+	return rc;
+}
+
+static int s390_iommu_invalidate_trans(struct s390_domain *s390_domain,
+				       dma_addr_t dma_addr, unsigned long nr_pages)
+{
+	unsigned long irq_flags, i;
+	unsigned long *entry;
+	int rc = 0;
+
+	if (!nr_pages)
+		return 0;
+
+	spin_lock_irqsave(&s390_domain->dma_table_lock, irq_flags);
+	for (i = 0; i < nr_pages; i++) {
+		entry = dma_walk_cpu_trans(s390_domain->dma_table, dma_addr);
+		if (unlikely(!entry)) {
+			rc = -EINVAL;
+			break;
 		}
+		dma_update_cpu_trans(entry, 0, ZPCI_PTE_INVALID);
+		dma_addr += PAGE_SIZE;
 	}
 	spin_unlock_irqrestore(&s390_domain->dma_table_lock, irq_flags);
 
@@ -308,8 +332,8 @@ static int s390_iommu_map_pages(struct iommu_domain *domain,
 				int prot, gfp_t gfp, size_t *mapped)
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
-	int flags = ZPCI_PTE_VALID, rc = 0;
 	size_t size = pgcount << __ffs(pgsize);
+	int flags = ZPCI_PTE_VALID, rc = 0;
 
 	if (pgsize != SZ_4K)
 		return -EINVAL;
@@ -327,8 +351,8 @@ static int s390_iommu_map_pages(struct iommu_domain *domain,
 	if (!(prot & IOMMU_WRITE))
 		flags |= ZPCI_TABLE_PROTECTED;
 
-	rc = s390_iommu_update_trans(s390_domain, paddr, iova,
-				     pgcount, flags);
+	rc = s390_iommu_validate_trans(s390_domain, paddr, iova,
+				       pgcount, flags);
 	if (!rc)
 		*mapped = size;
 
@@ -373,20 +397,13 @@ static size_t s390_iommu_unmap_pages(struct iommu_domain *domain,
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 	size_t size = pgcount << __ffs(pgsize);
-	int flags = ZPCI_PTE_INVALID;
-	phys_addr_t paddr;
 	int rc;
 
 	if (WARN_ON(iova < s390_domain->domain.geometry.aperture_start ||
 	    (iova + size - 1) > s390_domain->domain.geometry.aperture_end))
 		return 0;
 
-	paddr = s390_iommu_iova_to_phys(domain, iova);
-	if (!paddr)
-		return 0;
-
-	rc = s390_iommu_update_trans(s390_domain, paddr, iova,
-				     pgcount, flags);
+	rc = s390_iommu_invalidate_trans(s390_domain, iova, pgcount);
 	if (rc)
 		return 0;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 5/5] s390/pci: use lock-free I/O translation updates
  2022-11-09 14:28 [PATCH v2 0/5] iommu/s390: Further improvements Niklas Schnelle
                   ` (3 preceding siblings ...)
  2022-11-09 14:29 ` [PATCH v2 4/5] iommu/s390: Optimize IOMMU table walking Niklas Schnelle
@ 2022-11-09 14:29 ` Niklas Schnelle
  2022-11-19  9:28 ` [PATCH v2 0/5] iommu/s390: Further improvements Joerg Roedel
  5 siblings, 0 replies; 9+ messages in thread
From: Niklas Schnelle @ 2022-11-09 14:29 UTC (permalink / raw)
  To: Matthew Rosato, iommu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe
  Cc: Gerd Bayer, Pierre Morel, linux-s390, borntraeger, hca, gor,
	gerald.schaefer, agordeev, svens, linux-kernel

I/O translation tables on s390 use 8 byte page table entries and tables
which are allocated lazily but only freed when the entire I/O
translation table is torn down. Also each IOVA can at any time only
translate to one physical address Furthermore I/O table accesses by the
IOMMU hardware are cache coherent. With a bit of care we can thus use
atomic updates to manipulate the translation table without having to use
a global lock at all. This is done analogous to the existing I/O
translation table handling code used on Intel and AMD x86 systems.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 arch/s390/include/asm/pci.h |  1 -
 arch/s390/pci/pci_dma.c     | 74 ++++++++++++++++++++++---------------
 drivers/iommu/s390-iommu.c  | 37 +++++++------------
 3 files changed, 58 insertions(+), 54 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index e4c3e4e04d30..b248694e0024 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -157,7 +157,6 @@ struct zpci_dev {
 
 	/* DMA stuff */
 	unsigned long	*dma_table;
-	spinlock_t	dma_table_lock;
 	int		tlb_refresh;
 
 	spinlock_t	iommu_bitmap_lock;
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index dee825ee7305..ea478d11fbd1 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -63,37 +63,55 @@ static void dma_free_page_table(void *table)
 	kmem_cache_free(dma_page_table_cache, table);
 }
 
-static unsigned long *dma_get_seg_table_origin(unsigned long *entry)
+static unsigned long *dma_get_seg_table_origin(unsigned long *rtep)
 {
+	unsigned long old_rte, rte;
 	unsigned long *sto;
 
-	if (reg_entry_isvalid(*entry))
-		sto = get_rt_sto(*entry);
-	else {
+	rte = READ_ONCE(*rtep);
+	if (reg_entry_isvalid(rte)) {
+		sto = get_rt_sto(rte);
+	} else {
 		sto = dma_alloc_cpu_table();
 		if (!sto)
 			return NULL;
 
-		set_rt_sto(entry, virt_to_phys(sto));
-		validate_rt_entry(entry);
-		entry_clr_protected(entry);
+		set_rt_sto(&rte, virt_to_phys(sto));
+		validate_rt_entry(&rte);
+		entry_clr_protected(&rte);
+
+		old_rte = cmpxchg(rtep, ZPCI_TABLE_INVALID, rte);
+		if (old_rte != ZPCI_TABLE_INVALID) {
+			/* Somone else was faster, use theirs */
+			dma_free_cpu_table(sto);
+			sto = get_rt_sto(old_rte);
+		}
 	}
 	return sto;
 }
 
-static unsigned long *dma_get_page_table_origin(unsigned long *entry)
+static unsigned long *dma_get_page_table_origin(unsigned long *step)
 {
+	unsigned long old_ste, ste;
 	unsigned long *pto;
 
-	if (reg_entry_isvalid(*entry))
-		pto = get_st_pto(*entry);
-	else {
+	ste = READ_ONCE(*step);
+	if (reg_entry_isvalid(ste)) {
+		pto = get_st_pto(ste);
+	} else {
 		pto = dma_alloc_page_table();
 		if (!pto)
 			return NULL;
-		set_st_pto(entry, virt_to_phys(pto));
-		validate_st_entry(entry);
-		entry_clr_protected(entry);
+		set_st_pto(&ste, virt_to_phys(pto));
+		validate_st_entry(&ste);
+		entry_clr_protected(&ste);
+
+		old_ste = cmpxchg(step, ZPCI_TABLE_INVALID, ste);
+		if (old_ste != ZPCI_TABLE_INVALID) {
+			/* Somone else was faster, use theirs */
+			dma_free_page_table(pto);
+			pto = get_st_pto(old_ste);
+		}
 	}
 	return pto;
 }
@@ -117,19 +135,24 @@ unsigned long *dma_walk_cpu_trans(unsigned long *rto, dma_addr_t dma_addr)
 	return &pto[px];
 }
 
-void dma_update_cpu_trans(unsigned long *entry, phys_addr_t page_addr, int flags)
+void dma_update_cpu_trans(unsigned long *ptep, phys_addr_t page_addr, int flags)
 {
+	unsigned long pte;
+
+	pte = READ_ONCE(*ptep);
 	if (flags & ZPCI_PTE_INVALID) {
-		invalidate_pt_entry(entry);
+		invalidate_pt_entry(&pte);
 	} else {
-		set_pt_pfaa(entry, page_addr);
-		validate_pt_entry(entry);
+		set_pt_pfaa(&pte, page_addr);
+		validate_pt_entry(&pte);
 	}
 
 	if (flags & ZPCI_TABLE_PROTECTED)
-		entry_set_protected(entry);
+		entry_set_protected(&pte);
 	else
-		entry_clr_protected(entry);
+		entry_clr_protected(&pte);
+
+	xchg(ptep, pte);
 }
 
 static int __dma_update_trans(struct zpci_dev *zdev, phys_addr_t pa,
@@ -137,18 +160,14 @@ static int __dma_update_trans(struct zpci_dev *zdev, phys_addr_t pa,
 {
 	unsigned int nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
 	phys_addr_t page_addr = (pa & PAGE_MASK);
-	unsigned long irq_flags;
 	unsigned long *entry;
 	int i, rc = 0;
 
 	if (!nr_pages)
 		return -EINVAL;
 
-	spin_lock_irqsave(&zdev->dma_table_lock, irq_flags);
-	if (!zdev->dma_table) {
-		rc = -EINVAL;
-		goto out_unlock;
-	}
+	if (!zdev->dma_table)
+		return -EINVAL;
 
 	for (i = 0; i < nr_pages; i++) {
 		entry = dma_walk_cpu_trans(zdev->dma_table, dma_addr);
@@ -173,8 +192,6 @@ static int __dma_update_trans(struct zpci_dev *zdev, phys_addr_t pa,
 			dma_update_cpu_trans(entry, page_addr, flags);
 		}
 	}
-out_unlock:
-	spin_unlock_irqrestore(&zdev->dma_table_lock, irq_flags);
 	return rc;
 }
 
@@ -558,7 +575,6 @@ int zpci_dma_init_device(struct zpci_dev *zdev)
 	WARN_ON(zdev->s390_domain);
 
 	spin_lock_init(&zdev->iommu_bitmap_lock);
-	spin_lock_init(&zdev->dma_table_lock);
 
 	zdev->dma_table = dma_alloc_cpu_table();
 	if (!zdev->dma_table) {
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 2b9a3e3bc606..ed33c6cce083 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -20,7 +20,6 @@ struct s390_domain {
 	struct iommu_domain	domain;
 	struct list_head	devices;
 	unsigned long		*dma_table;
-	spinlock_t		dma_table_lock;
 	spinlock_t		list_lock;
 	struct rcu_head		rcu;
 };
@@ -62,7 +61,6 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
 	s390_domain->domain.geometry.aperture_start = 0;
 	s390_domain->domain.geometry.aperture_end = ZPCI_TABLE_SIZE_RT - 1;
 
-	spin_lock_init(&s390_domain->dma_table_lock);
 	spin_lock_init(&s390_domain->list_lock);
 	INIT_LIST_HEAD_RCU(&s390_domain->devices);
 
@@ -265,14 +263,10 @@ static int s390_iommu_validate_trans(struct s390_domain *s390_domain,
 				     unsigned long nr_pages, int flags)
 {
 	phys_addr_t page_addr = pa & PAGE_MASK;
-	unsigned long irq_flags, i;
 	unsigned long *entry;
+	unsigned long i;
 	int rc;
 
-	if (!nr_pages)
-		return 0;
-
-	spin_lock_irqsave(&s390_domain->dma_table_lock, irq_flags);
 	for (i = 0; i < nr_pages; i++) {
 		entry = dma_walk_cpu_trans(s390_domain->dma_table, dma_addr);
 		if (unlikely(!entry)) {
@@ -283,7 +277,6 @@ static int s390_iommu_validate_trans(struct s390_domain *s390_domain,
 		page_addr += PAGE_SIZE;
 		dma_addr += PAGE_SIZE;
 	}
-	spin_unlock_irqrestore(&s390_domain->dma_table_lock, irq_flags);
 
 	return 0;
 
@@ -296,7 +289,6 @@ static int s390_iommu_validate_trans(struct s390_domain *s390_domain,
 			break;
 		dma_update_cpu_trans(entry, 0, ZPCI_PTE_INVALID);
 	}
-	spin_unlock_irqrestore(&s390_domain->dma_table_lock, irq_flags);
 
 	return rc;
 }
@@ -304,14 +296,10 @@ static int s390_iommu_validate_trans(struct s390_domain *s390_domain,
 static int s390_iommu_invalidate_trans(struct s390_domain *s390_domain,
 				       dma_addr_t dma_addr, unsigned long nr_pages)
 {
-	unsigned long irq_flags, i;
 	unsigned long *entry;
+	unsigned long i;
 	int rc = 0;
 
-	if (!nr_pages)
-		return 0;
-
-	spin_lock_irqsave(&s390_domain->dma_table_lock, irq_flags);
 	for (i = 0; i < nr_pages; i++) {
 		entry = dma_walk_cpu_trans(s390_domain->dma_table, dma_addr);
 		if (unlikely(!entry)) {
@@ -321,7 +309,6 @@ static int s390_iommu_invalidate_trans(struct s390_domain *s390_domain,
 		dma_update_cpu_trans(entry, 0, ZPCI_PTE_INVALID);
 		dma_addr += PAGE_SIZE;
 	}
-	spin_unlock_irqrestore(&s390_domain->dma_table_lock, irq_flags);
 
 	return rc;
 }
@@ -363,7 +350,8 @@ static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
 					   dma_addr_t iova)
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
-	unsigned long *sto, *pto, *rto, flags;
+	unsigned long *rto, *sto, *pto;
+	unsigned long ste, pte, rte;
 	unsigned int rtx, sx, px;
 	phys_addr_t phys = 0;
 
@@ -376,16 +364,17 @@ static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
 	px = calc_px(iova);
 	rto = s390_domain->dma_table;
 
-	spin_lock_irqsave(&s390_domain->dma_table_lock, flags);
-	if (rto && reg_entry_isvalid(rto[rtx])) {
-		sto = get_rt_sto(rto[rtx]);
-		if (sto && reg_entry_isvalid(sto[sx])) {
-			pto = get_st_pto(sto[sx]);
-			if (pto && pt_entry_isvalid(pto[px]))
-				phys = pto[px] & ZPCI_PTE_ADDR_MASK;
+	rte = READ_ONCE(rto[rtx]);
+	if (reg_entry_isvalid(rte)) {
+		sto = get_rt_sto(rte);
+		ste = READ_ONCE(sto[sx]);
+		if (reg_entry_isvalid(ste)) {
+			pto = get_st_pto(ste);
+			pte = READ_ONCE(pto[px]);
+			if (pt_entry_isvalid(pte))
+				phys = pte & ZPCI_PTE_ADDR_MASK;
 		}
 	}
-	spin_unlock_irqrestore(&s390_domain->dma_table_lock, flags);
 
 	return phys;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/5] iommu/s390: Make attach succeed even if the device is in error state
  2022-11-09 14:28 ` [PATCH v2 1/5] iommu/s390: Make attach succeed even if the device is in error state Niklas Schnelle
@ 2022-11-14 15:05   ` Gerd Bayer
  2022-11-14 15:30   ` Matthew Rosato
  1 sibling, 0 replies; 9+ messages in thread
From: Gerd Bayer @ 2022-11-14 15:05 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Pierre Morel, linux-s390, borntraeger, hca, gor, gerald.schaefer,
	agordeev, svens, linux-kernel, Matthew Rosato, iommu,
	Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe

Hi Niklas,

really just a nit...

On Wed, 2022-11-09 at 15:28 +0100, Niklas Schnelle wrote:
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index 7fb512bece9a..e2c886bc4376 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -98,6 +98,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  	struct s390_domain *s390_domain = to_s390_domain(domain);
>  	struct zpci_dev *zdev = to_zpci_dev(dev);
>  	unsigned long flags;
> +	u8 status;
>  	int cc;
>  
>  	if (!zdev)
> @@ -113,8 +114,12 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  		zpci_dma_exit_device(zdev);
>  
>  	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> -				virt_to_phys(s390_domain->dma_table));
> -	if (cc)
> +				virt_to_phys(s390_domain->dma_table), &status);
> +	/*
> +	 * If the device is undergoing error recovery the reset code
> +	 * will re-establish the new domain.
> +	 */
> +	if (cc && status != ZPCI_PCI_ST_FUNC_NOT_AVAIL)
Operator precedence does "the right thing", but would a more explicit parenthesis 
around (status != ZPCI_PCI_ST_FUNC_NOT_AVAIL) help or hurt here?

>  		return -EIO;
>  	zdev->dma_table = s390_domain->dma_table;

Thank you,
Gerd


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/5] iommu/s390: Make attach succeed even if the device is in error state
  2022-11-09 14:28 ` [PATCH v2 1/5] iommu/s390: Make attach succeed even if the device is in error state Niklas Schnelle
  2022-11-14 15:05   ` Gerd Bayer
@ 2022-11-14 15:30   ` Matthew Rosato
  1 sibling, 0 replies; 9+ messages in thread
From: Matthew Rosato @ 2022-11-14 15:30 UTC (permalink / raw)
  To: Niklas Schnelle, iommu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe
  Cc: Gerd Bayer, Pierre Morel, linux-s390, borntraeger, hca, gor,
	gerald.schaefer, agordeev, svens, linux-kernel

On 11/9/22 9:28 AM, Niklas Schnelle wrote:
> If a zPCI device is in the error state while switching IOMMU domains
> zpci_register_ioat() will fail and we would end up with the device not
> attached to any domain. In this state since zdev->dma_table == NULL
> a reset via zpci_hot_reset_device() would wrongfully re-initialize the
> device for DMA API usage using zpci_dma_init_device(). As automatic
> recovery is currently disabled while attached to an IOMMU domain this
> only affects slot resets triggered through other means but will affect
> automatic recovery once we switch to using dma-iommu.
> 
> Additionally with that switch common code expects attaching to the
> default domain to always work so zpci_register_ioat() should only fail
> if there is no chance to recover anyway, e.g. if the device has been
> unplugged.
> 
> Improve the robustness of attach by specifically looking at the status
> returned by zpci_mod_fc() to determine if the device is unavailable and
> in this case simply ignore the error. Once the device is reset
> zpci_hot_reset_device() will then correctly set the domain's DMA
> translation tables.
> 
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>

Hi Niklas,

Already gave you this on v1, doesn't look like anything changed:

Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>

> ---
>  arch/s390/include/asm/pci.h |  2 +-
>  arch/s390/kvm/pci.c         |  6 ++++--
>  arch/s390/pci/pci.c         | 11 ++++++-----
>  arch/s390/pci/pci_dma.c     |  3 ++-
>  drivers/iommu/s390-iommu.c  |  9 +++++++--
>  5 files changed, 20 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index 15f8714ca9b7..07361e2fd8c5 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -221,7 +221,7 @@ void zpci_device_reserved(struct zpci_dev *zdev);
>  bool zpci_is_device_configured(struct zpci_dev *zdev);
>  
>  int zpci_hot_reset_device(struct zpci_dev *zdev);
> -int zpci_register_ioat(struct zpci_dev *, u8, u64, u64, u64);
> +int zpci_register_ioat(struct zpci_dev *, u8, u64, u64, u64, u8 *);
>  int zpci_unregister_ioat(struct zpci_dev *, u8);
>  void zpci_remove_reserved_devices(void);
>  void zpci_update_fh(struct zpci_dev *zdev, u32 fh);
> diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
> index c50c1645c0ae..03964c0e1fdf 100644
> --- a/arch/s390/kvm/pci.c
> +++ b/arch/s390/kvm/pci.c
> @@ -434,6 +434,7 @@ static void kvm_s390_pci_dev_release(struct zpci_dev *zdev)
>  static int kvm_s390_pci_register_kvm(void *opaque, struct kvm *kvm)
>  {
>  	struct zpci_dev *zdev = opaque;
> +	u8 status;
>  	int rc;
>  
>  	if (!zdev)
> @@ -486,7 +487,7 @@ static int kvm_s390_pci_register_kvm(void *opaque, struct kvm *kvm)
>  
>  	/* Re-register the IOMMU that was already created */
>  	rc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> -				virt_to_phys(zdev->dma_table));
> +				virt_to_phys(zdev->dma_table), &status);
>  	if (rc)
>  		goto clear_gisa;
>  
> @@ -516,6 +517,7 @@ static void kvm_s390_pci_unregister_kvm(void *opaque)
>  {
>  	struct zpci_dev *zdev = opaque;
>  	struct kvm *kvm;
> +	u8 status;
>  
>  	if (!zdev)
>  		return;
> @@ -554,7 +556,7 @@ static void kvm_s390_pci_unregister_kvm(void *opaque)
>  
>  	/* Re-register the IOMMU that was already created */
>  	zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> -			   virt_to_phys(zdev->dma_table));
> +			   virt_to_phys(zdev->dma_table), &status);
>  
>  out:
>  	spin_lock(&kvm->arch.kzdev_list_lock);
> diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
> index 73cdc5539384..a703dcd94a68 100644
> --- a/arch/s390/pci/pci.c
> +++ b/arch/s390/pci/pci.c
> @@ -116,20 +116,20 @@ EXPORT_SYMBOL_GPL(pci_proc_domain);
>  
>  /* Modify PCI: Register I/O address translation parameters */
>  int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas,
> -		       u64 base, u64 limit, u64 iota)
> +		       u64 base, u64 limit, u64 iota, u8 *status)
>  {
>  	u64 req = ZPCI_CREATE_REQ(zdev->fh, dmaas, ZPCI_MOD_FC_REG_IOAT);
>  	struct zpci_fib fib = {0};
> -	u8 cc, status;
> +	u8 cc;
>  
>  	WARN_ON_ONCE(iota & 0x3fff);
>  	fib.pba = base;
>  	fib.pal = limit;
>  	fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
>  	fib.gd = zdev->gisa;
> -	cc = zpci_mod_fc(req, &fib, &status);
> +	cc = zpci_mod_fc(req, &fib, status);
>  	if (cc)
> -		zpci_dbg(3, "reg ioat fid:%x, cc:%d, status:%d\n", zdev->fid, cc, status);
> +		zpci_dbg(3, "reg ioat fid:%x, cc:%d, status:%d\n", zdev->fid, cc, *status);
>  	return cc;
>  }
>  EXPORT_SYMBOL_GPL(zpci_register_ioat);
> @@ -764,6 +764,7 @@ EXPORT_SYMBOL_GPL(zpci_disable_device);
>   */
>  int zpci_hot_reset_device(struct zpci_dev *zdev)
>  {
> +	u8 status;
>  	int rc;
>  
>  	zpci_dbg(3, "rst fid:%x, fh:%x\n", zdev->fid, zdev->fh);
> @@ -787,7 +788,7 @@ int zpci_hot_reset_device(struct zpci_dev *zdev)
>  
>  	if (zdev->dma_table)
>  		rc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> -					virt_to_phys(zdev->dma_table));
> +					virt_to_phys(zdev->dma_table), &status);
>  	else
>  		rc = zpci_dma_init_device(zdev);
>  	if (rc) {
> diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
> index 227cf0a62800..dee825ee7305 100644
> --- a/arch/s390/pci/pci_dma.c
> +++ b/arch/s390/pci/pci_dma.c
> @@ -547,6 +547,7 @@ static void s390_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
>  	
>  int zpci_dma_init_device(struct zpci_dev *zdev)
>  {
> +	u8 status;
>  	int rc;
>  
>  	/*
> @@ -598,7 +599,7 @@ int zpci_dma_init_device(struct zpci_dev *zdev)
>  
>  	}
>  	if (zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> -			       virt_to_phys(zdev->dma_table))) {
> +			       virt_to_phys(zdev->dma_table), &status)) {
>  		rc = -EIO;
>  		goto free_bitmap;
>  	}
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index 7fb512bece9a..e2c886bc4376 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -98,6 +98,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  	struct s390_domain *s390_domain = to_s390_domain(domain);
>  	struct zpci_dev *zdev = to_zpci_dev(dev);
>  	unsigned long flags;
> +	u8 status;
>  	int cc;
>  
>  	if (!zdev)
> @@ -113,8 +114,12 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
>  		zpci_dma_exit_device(zdev);
>  
>  	cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
> -				virt_to_phys(s390_domain->dma_table));
> -	if (cc)
> +				virt_to_phys(s390_domain->dma_table), &status);
> +	/*
> +	 * If the device is undergoing error recovery the reset code
> +	 * will re-establish the new domain.
> +	 */
> +	if (cc && status != ZPCI_PCI_ST_FUNC_NOT_AVAIL)
>  		return -EIO;
>  	zdev->dma_table = s390_domain->dma_table;
>  


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/5] iommu/s390: Further improvements
  2022-11-09 14:28 [PATCH v2 0/5] iommu/s390: Further improvements Niklas Schnelle
                   ` (4 preceding siblings ...)
  2022-11-09 14:29 ` [PATCH v2 5/5] s390/pci: use lock-free I/O translation updates Niklas Schnelle
@ 2022-11-19  9:28 ` Joerg Roedel
  5 siblings, 0 replies; 9+ messages in thread
From: Joerg Roedel @ 2022-11-19  9:28 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Matthew Rosato, iommu, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Gerd Bayer, Pierre Morel, linux-s390,
	borntraeger, hca, gor, gerald.schaefer, agordeev, svens,
	linux-kernel

On Wed, Nov 09, 2022 at 03:28:58PM +0100, Niklas Schnelle wrote:
> Niklas Schnelle (5):
>   iommu/s390: Make attach succeed even if the device is in error state
>   iommu/s390: Add I/O TLB ops
>   iommu/s390: Use RCU to allow concurrent domain_list iteration
>   iommu/s390: Optimize IOMMU table walking
>   s390/pci: use lock-free I/O translation updates

Applied, thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-11-19  9:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-09 14:28 [PATCH v2 0/5] iommu/s390: Further improvements Niklas Schnelle
2022-11-09 14:28 ` [PATCH v2 1/5] iommu/s390: Make attach succeed even if the device is in error state Niklas Schnelle
2022-11-14 15:05   ` Gerd Bayer
2022-11-14 15:30   ` Matthew Rosato
2022-11-09 14:29 ` [PATCH v2 2/5] iommu/s390: Add I/O TLB ops Niklas Schnelle
2022-11-09 14:29 ` [PATCH v2 3/5] iommu/s390: Use RCU to allow concurrent domain_list iteration Niklas Schnelle
2022-11-09 14:29 ` [PATCH v2 4/5] iommu/s390: Optimize IOMMU table walking Niklas Schnelle
2022-11-09 14:29 ` [PATCH v2 5/5] s390/pci: use lock-free I/O translation updates Niklas Schnelle
2022-11-19  9:28 ` [PATCH v2 0/5] iommu/s390: Further improvements Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.