All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-14  8:18 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-14  8:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, Deming Wang, Robin Murphy, Jason Gunthorpe,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman,
	Alexey Kardashevskiy

Here is another take on iommu_ops on POWER to make VFIO work
again on POWERPC64.

The tree with all prerequisites is here:
https://github.com/aik/linux/tree/kvm-fixes-wip

The previous discussion is here:
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

Please comment. Thanks.



Alexey Kardashevskiy (3):
  powerpc/iommu: Add "borrowing" iommu_table_group_ops
  powerpc/pci_64: Init pcibios subsys a bit later
  powerpc/iommu: Add iommu_ops to report capabilities and allow blocking
    domains

 arch/powerpc/include/asm/iommu.h          |   6 +-
 arch/powerpc/include/asm/pci-bridge.h     |   7 +
 arch/powerpc/platforms/pseries/pseries.h  |   5 +
 arch/powerpc/kernel/iommu.c               | 257 +++++++++++++++++++++-
 arch/powerpc/kernel/pci_64.c              |   2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  36 ++-
 arch/powerpc/platforms/pseries/iommu.c    |  27 +++
 arch/powerpc/platforms/pseries/setup.c    |   3 +
 drivers/vfio/vfio_iommu_spapr_tce.c       |  96 ++------
 9 files changed, 345 insertions(+), 94 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-14  8:18 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-14  8:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, Fabiano Rosas, Alexey Kardashevskiy,
	Daniel Henrique Barboza, Deming Wang, kvm-ppc, Alex Williamson,
	Nicholas Piggin, Jason Gunthorpe, Murilo Opsfelder Araujo,
	Robin Murphy

Here is another take on iommu_ops on POWER to make VFIO work
again on POWERPC64.

The tree with all prerequisites is here:
https://github.com/aik/linux/tree/kvm-fixes-wip

The previous discussion is here:
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

Please comment. Thanks.



Alexey Kardashevskiy (3):
  powerpc/iommu: Add "borrowing" iommu_table_group_ops
  powerpc/pci_64: Init pcibios subsys a bit later
  powerpc/iommu: Add iommu_ops to report capabilities and allow blocking
    domains

 arch/powerpc/include/asm/iommu.h          |   6 +-
 arch/powerpc/include/asm/pci-bridge.h     |   7 +
 arch/powerpc/platforms/pseries/pseries.h  |   5 +
 arch/powerpc/kernel/iommu.c               | 257 +++++++++++++++++++++-
 arch/powerpc/kernel/pci_64.c              |   2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  36 ++-
 arch/powerpc/platforms/pseries/iommu.c    |  27 +++
 arch/powerpc/platforms/pseries/setup.c    |   3 +
 drivers/vfio/vfio_iommu_spapr_tce.c       |  96 ++------
 9 files changed, 345 insertions(+), 94 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-14  8:18 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-14  8:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, Deming Wang, Robin Murphy, Jason Gunthorpe,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman,
	Alexey Kardashevskiy

Here is another take on iommu_ops on POWER to make VFIO work
again on POWERPC64.

The tree with all prerequisites is here:
https://github.com/aik/linux/tree/kvm-fixes-wip

The previous discussion is here:
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

Please comment. Thanks.



Alexey Kardashevskiy (3):
  powerpc/iommu: Add "borrowing" iommu_table_group_ops
  powerpc/pci_64: Init pcibios subsys a bit later
  powerpc/iommu: Add iommu_ops to report capabilities and allow blocking
    domains

 arch/powerpc/include/asm/iommu.h          |   6 +-
 arch/powerpc/include/asm/pci-bridge.h     |   7 +
 arch/powerpc/platforms/pseries/pseries.h  |   5 +
 arch/powerpc/kernel/iommu.c               | 257 +++++++++++++++++++++-
 arch/powerpc/kernel/pci_64.c              |   2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  36 ++-
 arch/powerpc/platforms/pseries/iommu.c    |  27 +++
 arch/powerpc/platforms/pseries/setup.c    |   3 +
 drivers/vfio/vfio_iommu_spapr_tce.c       |  96 ++------
 9 files changed, 345 insertions(+), 94 deletions(-)

-- 
2.30.2

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH kernel 1/3] powerpc/iommu: Add "borrowing" iommu_table_group_ops
  2022-07-14  8:18 ` Alexey Kardashevskiy
  (?)
@ 2022-07-14  8:18   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-14  8:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, Deming Wang, Robin Murphy, Jason Gunthorpe,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman,
	Alexey Kardashevskiy

PPC64 IOMMU API defines iommu_table_group_ops which handles DMA windows
for PEs: control the ownership, create/set/unset a table the hardware
for dynamic DMA windows (DDW). VFIO uses the API to implement support
on POWER.

So far only PowerNV IODA2 (POWER8 and newer machines) implemented this and other cases (POWER7 or nested KVM) did not and instead reused
existing iommu_table structs. This means 1) no DDW 2) ownership transfer
is done directly in the VFIO SPAPR TCE driver.

Soon POWER is going to get its own iommu_ops and ownership control is
going to move there. This implements spapr_tce_table_group_ops which
borrows iommu_table tables. The upside is that VFIO needs to know less
about POWER.

The new ops returns the existing table from create_table() and
only checks if the same window is already set. This is only going to work
if the default DMA window starts table_group.tce32_start and as big as
pe->table_group.tce32_size (not the case for IODA2+ PowerNV).

This changes iommu_table_group_ops::take_ownership() to return an error
if borrowing a table failed.

This should not cause any visible change in behavior for PowerNV.
pSeries was not that well tested/supported anyway.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/iommu.h          |  6 +-
 arch/powerpc/kernel/iommu.c               | 98 ++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci-ioda.c |  6 +-
 arch/powerpc/platforms/pseries/iommu.c    |  3 +
 drivers/vfio/vfio_iommu_spapr_tce.c       | 94 ++++------------------
 5 files changed, 121 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 7e29c73e3dd4..678b5bdc79b1 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -175,7 +175,7 @@ struct iommu_table_group_ops {
 	long (*unset_window)(struct iommu_table_group *table_group,
 			int num);
 	/* Switch ownership from platform code to external user (e.g. VFIO) */
-	void (*take_ownership)(struct iommu_table_group *table_group);
+	long (*take_ownership)(struct iommu_table_group *table_group);
 	/* Switch ownership from external user (e.g. VFIO) back to core */
 	void (*release_ownership)(struct iommu_table_group *table_group);
 };
@@ -215,6 +215,8 @@ extern long iommu_tce_xchg_no_kill(struct mm_struct *mm,
 		enum dma_data_direction *direction);
 extern void iommu_tce_kill(struct iommu_table *tbl,
 		unsigned long entry, unsigned long pages);
+
+extern struct iommu_table_group_ops spapr_tce_table_group_ops;
 #else
 static inline void iommu_register_group(struct iommu_table_group *table_group,
 					int pci_domain_number,
@@ -303,8 +305,6 @@ extern int iommu_tce_check_gpa(unsigned long page_shift,
 		iommu_tce_check_gpa((tbl)->it_page_shift, (gpa)))
 
 extern void iommu_flush_tce(struct iommu_table *tbl);
-extern int iommu_take_ownership(struct iommu_table *tbl);
-extern void iommu_release_ownership(struct iommu_table *tbl);
 
 extern enum dma_data_direction iommu_tce_direction(unsigned long tce);
 extern unsigned long iommu_direction_to_tce_perm(enum dma_data_direction dir);
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index caebe1431596..d873c123ab49 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1088,7 +1088,7 @@ void iommu_tce_kill(struct iommu_table *tbl,
 }
 EXPORT_SYMBOL_GPL(iommu_tce_kill);
 
-int iommu_take_ownership(struct iommu_table *tbl)
+static int iommu_take_ownership(struct iommu_table *tbl)
 {
 	unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
 	int ret = 0;
@@ -1120,9 +1120,8 @@ int iommu_take_ownership(struct iommu_table *tbl)
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(iommu_take_ownership);
 
-void iommu_release_ownership(struct iommu_table *tbl)
+static void iommu_release_ownership(struct iommu_table *tbl)
 {
 	unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
 
@@ -1139,7 +1138,6 @@ void iommu_release_ownership(struct iommu_table *tbl)
 		spin_unlock(&tbl->pools[i].lock);
 	spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
 }
-EXPORT_SYMBOL_GPL(iommu_release_ownership);
 
 int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 {
@@ -1181,4 +1179,96 @@ void iommu_del_device(struct device *dev)
 	iommu_group_remove_device(dev);
 }
 EXPORT_SYMBOL_GPL(iommu_del_device);
+
+/*
+ * A simple iommu_table_group_ops which only allows reusing the existing
+ * iommu_table. This handles VFIO for POWER7 or the nested KVM.
+ * The ops does not allow creating windows and only allows reusing the existing
+ * one if it matches table_group->tce32_start/tce32_size/page_shift.
+ */
+static unsigned long spapr_tce_get_table_size(__u32 page_shift,
+					      __u64 window_size, __u32 levels)
+{
+	unsigned long size;
+
+	if (levels > 1)
+		return ~0U;
+	size = window_size >> (page_shift - 3);
+	return size;
+}
+
+static long spapr_tce_create_table(struct iommu_table_group *table_group,
+		int num, __u32 page_shift, __u64 window_size, __u32 levels,
+		struct iommu_table **ptbl)
+{
+	struct iommu_table *tbl = table_group->tables[0];
+
+	if (num > 0)
+		return -EPERM;
+
+	if (tbl->it_page_shift != page_shift ||
+	    tbl->it_size != (window_size >> page_shift) ||
+	    tbl->it_indirect_levels != levels - 1)
+		return -EINVAL;
+
+	*ptbl = iommu_tce_table_get(tbl);
+	return 0;
+}
+
+static long spapr_tce_set_window(struct iommu_table_group *table_group,
+			       int num, struct iommu_table *tbl)
+{
+	return tbl == table_group->tables[num] ? 0 : -EPERM;
+}
+
+static long spapr_tce_unset_window(struct iommu_table_group *table_group, int num)
+{
+	return 0;
+}
+
+static long spapr_tce_take_ownership(struct iommu_table_group *table_group)
+{
+	int i, j, rc = 0;
+
+	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
+		struct iommu_table *tbl = table_group->tables[i];
+
+		if (!tbl || !tbl->it_map)
+			continue;
+
+		rc = iommu_take_ownership(tbl);
+		if (!rc)
+			continue;
+		for (j = 0; j < i; ++j)
+			iommu_release_ownership(table_group->tables[j]);
+		return rc;
+	}
+	return 0;
+}
+
+static void spapr_tce_release_ownership(struct iommu_table_group *table_group)
+{
+	int i;
+
+	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
+		struct iommu_table *tbl = table_group->tables[i];
+
+		if (!tbl)
+			continue;
+
+		iommu_table_clear(tbl);
+		if (tbl->it_map)
+			iommu_release_ownership(tbl);
+	}
+}
+
+struct iommu_table_group_ops spapr_tce_table_group_ops = {
+	.get_table_size = spapr_tce_get_table_size,
+	.create_table = spapr_tce_create_table,
+	.set_window = spapr_tce_set_window,
+	.unset_window = spapr_tce_unset_window,
+	.take_ownership = spapr_tce_take_ownership,
+	.release_ownership = spapr_tce_release_ownership,
+};
+
 #endif /* CONFIG_IOMMU_API */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9de9b2fb163d..180965a309b6 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1554,6 +1554,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	if (WARN_ON(!tbl))
 		return;
 
+	pe->table_group.ops = &spapr_tce_table_group_ops;
+	pe->table_group.pgsizes = SZ_4K;
 	iommu_register_group(&pe->table_group, phb->hose->global_number,
 			pe->pe_number);
 	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
@@ -1888,7 +1890,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
 	}
 }
 
-static void pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
+static long pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
 {
 	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
 						table_group);
@@ -1902,6 +1904,8 @@ static void pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
 	else if (pe->pdev)
 		set_iommu_table_base(&pe->pdev->dev, NULL);
 	iommu_tce_table_put(tbl);
+
+	return 0;
 }
 
 static void pnv_ioda2_release_ownership(struct iommu_table_group *table_group)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index c3d425ef7b39..ae05b11c457d 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -74,6 +74,9 @@ static struct iommu_table_group *iommu_pseries_alloc_group(int node)
 	if (!table_group)
 		return NULL;
 
+	table_group->ops = &spapr_tce_table_group_ops;
+	table_group->pgsizes = SZ_4K;
+
 	table_group->tables[0] = iommu_pseries_alloc_table(node);
 	if (table_group->tables[0])
 		return table_group;
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index cd7b9c136889..8a65ea61744c 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -1141,52 +1141,6 @@ static long tce_iommu_ioctl(void *iommu_data,
 
 static void tce_iommu_release_ownership(struct tce_container *container,
 		struct iommu_table_group *table_group)
-{
-	int i;
-
-	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
-		struct iommu_table *tbl = container->tables[i];
-
-		if (!tbl)
-			continue;
-
-		tce_iommu_clear(container, tbl, tbl->it_offset, tbl->it_size);
-		if (tbl->it_map)
-			iommu_release_ownership(tbl);
-
-		container->tables[i] = NULL;
-	}
-}
-
-static int tce_iommu_take_ownership(struct tce_container *container,
-		struct iommu_table_group *table_group)
-{
-	int i, j, rc = 0;
-
-	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
-		struct iommu_table *tbl = table_group->tables[i];
-
-		if (!tbl || !tbl->it_map)
-			continue;
-
-		rc = iommu_take_ownership(tbl);
-		if (rc) {
-			for (j = 0; j < i; ++j)
-				iommu_release_ownership(
-						table_group->tables[j]);
-
-			return rc;
-		}
-	}
-
-	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
-		container->tables[i] = table_group->tables[i];
-
-	return 0;
-}
-
-static void tce_iommu_release_ownership_ddw(struct tce_container *container,
-		struct iommu_table_group *table_group)
 {
 	long i;
 
@@ -1202,18 +1156,14 @@ static void tce_iommu_release_ownership_ddw(struct tce_container *container,
 	table_group->ops->release_ownership(table_group);
 }
 
-static long tce_iommu_take_ownership_ddw(struct tce_container *container,
+static long tce_iommu_take_ownership(struct tce_container *container,
 		struct iommu_table_group *table_group)
 {
 	long i, ret = 0;
 
-	if (!table_group->ops->create_table || !table_group->ops->set_window ||
-			!table_group->ops->release_ownership) {
-		WARN_ON_ONCE(1);
-		return -EFAULT;
-	}
-
-	table_group->ops->take_ownership(table_group);
+	ret = table_group->ops->take_ownership(table_group);
+	if (ret)
+		return ret;
 
 	/* Set all windows to the new group */
 	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
@@ -1259,9 +1209,14 @@ static int tce_iommu_attach_group(void *iommu_data,
 		goto unlock_exit;
 	}
 
-	if (tce_groups_attached(container) && (!table_group->ops ||
-			!table_group->ops->take_ownership ||
-			!table_group->ops->release_ownership)) {
+	/* v2 requires full support of dynamic DMA windows */
+	if (container->v2 && table_group->max_dynamic_windows_supported == 0) {
+		ret = -EINVAL;
+		goto unlock_exit;
+	}
+
+	/* v1 reuses TCE tables and does not share them among PEs */
+	if (!container->v2 && tce_groups_attached(container)) {
 		ret = -EBUSY;
 		goto unlock_exit;
 	}
@@ -1293,29 +1248,15 @@ static int tce_iommu_attach_group(void *iommu_data,
 		goto unlock_exit;
 	}
 
-	if (!table_group->ops || !table_group->ops->take_ownership ||
-			!table_group->ops->release_ownership) {
-		if (container->v2) {
-			ret = -EPERM;
-			goto free_exit;
-		}
-		ret = tce_iommu_take_ownership(container, table_group);
-	} else {
-		if (!container->v2) {
-			ret = -EPERM;
-			goto free_exit;
-		}
-		ret = tce_iommu_take_ownership_ddw(container, table_group);
-		if (!tce_groups_attached(container) && !container->tables[0])
-			container->def_window_pending = true;
-	}
+	ret = tce_iommu_take_ownership(container, table_group);
+	if (!tce_groups_attached(container) && !container->tables[0])
+		container->def_window_pending = true;
 
 	if (!ret) {
 		tcegrp->grp = iommu_group;
 		list_add(&tcegrp->next, &container->group_list);
 	}
 
-free_exit:
 	if (ret && tcegrp)
 		kfree(tcegrp);
 
@@ -1354,10 +1295,7 @@ static void tce_iommu_detach_group(void *iommu_data,
 	table_group = iommu_group_get_iommudata(iommu_group);
 	BUG_ON(!table_group);
 
-	if (!table_group->ops || !table_group->ops->release_ownership)
-		tce_iommu_release_ownership(container, table_group);
-	else
-		tce_iommu_release_ownership_ddw(container, table_group);
+	tce_iommu_release_ownership(container, table_group);
 
 unlock_exit:
 	mutex_unlock(&container->lock);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH kernel 1/3] powerpc/iommu: Add "borrowing" iommu_table_group_ops
@ 2022-07-14  8:18   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-14  8:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, Fabiano Rosas, Alexey Kardashevskiy,
	Daniel Henrique Barboza, Deming Wang, kvm-ppc, Alex Williamson,
	Nicholas Piggin, Jason Gunthorpe, Murilo Opsfelder Araujo,
	Robin Murphy

PPC64 IOMMU API defines iommu_table_group_ops which handles DMA windows
for PEs: control the ownership, create/set/unset a table the hardware
for dynamic DMA windows (DDW). VFIO uses the API to implement support
on POWER.

So far only PowerNV IODA2 (POWER8 and newer machines) implemented this and other cases (POWER7 or nested KVM) did not and instead reused
existing iommu_table structs. This means 1) no DDW 2) ownership transfer
is done directly in the VFIO SPAPR TCE driver.

Soon POWER is going to get its own iommu_ops and ownership control is
going to move there. This implements spapr_tce_table_group_ops which
borrows iommu_table tables. The upside is that VFIO needs to know less
about POWER.

The new ops returns the existing table from create_table() and
only checks if the same window is already set. This is only going to work
if the default DMA window starts table_group.tce32_start and as big as
pe->table_group.tce32_size (not the case for IODA2+ PowerNV).

This changes iommu_table_group_ops::take_ownership() to return an error
if borrowing a table failed.

This should not cause any visible change in behavior for PowerNV.
pSeries was not that well tested/supported anyway.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/iommu.h          |  6 +-
 arch/powerpc/kernel/iommu.c               | 98 ++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci-ioda.c |  6 +-
 arch/powerpc/platforms/pseries/iommu.c    |  3 +
 drivers/vfio/vfio_iommu_spapr_tce.c       | 94 ++++------------------
 5 files changed, 121 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 7e29c73e3dd4..678b5bdc79b1 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -175,7 +175,7 @@ struct iommu_table_group_ops {
 	long (*unset_window)(struct iommu_table_group *table_group,
 			int num);
 	/* Switch ownership from platform code to external user (e.g. VFIO) */
-	void (*take_ownership)(struct iommu_table_group *table_group);
+	long (*take_ownership)(struct iommu_table_group *table_group);
 	/* Switch ownership from external user (e.g. VFIO) back to core */
 	void (*release_ownership)(struct iommu_table_group *table_group);
 };
@@ -215,6 +215,8 @@ extern long iommu_tce_xchg_no_kill(struct mm_struct *mm,
 		enum dma_data_direction *direction);
 extern void iommu_tce_kill(struct iommu_table *tbl,
 		unsigned long entry, unsigned long pages);
+
+extern struct iommu_table_group_ops spapr_tce_table_group_ops;
 #else
 static inline void iommu_register_group(struct iommu_table_group *table_group,
 					int pci_domain_number,
@@ -303,8 +305,6 @@ extern int iommu_tce_check_gpa(unsigned long page_shift,
 		iommu_tce_check_gpa((tbl)->it_page_shift, (gpa)))
 
 extern void iommu_flush_tce(struct iommu_table *tbl);
-extern int iommu_take_ownership(struct iommu_table *tbl);
-extern void iommu_release_ownership(struct iommu_table *tbl);
 
 extern enum dma_data_direction iommu_tce_direction(unsigned long tce);
 extern unsigned long iommu_direction_to_tce_perm(enum dma_data_direction dir);
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index caebe1431596..d873c123ab49 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1088,7 +1088,7 @@ void iommu_tce_kill(struct iommu_table *tbl,
 }
 EXPORT_SYMBOL_GPL(iommu_tce_kill);
 
-int iommu_take_ownership(struct iommu_table *tbl)
+static int iommu_take_ownership(struct iommu_table *tbl)
 {
 	unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
 	int ret = 0;
@@ -1120,9 +1120,8 @@ int iommu_take_ownership(struct iommu_table *tbl)
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(iommu_take_ownership);
 
-void iommu_release_ownership(struct iommu_table *tbl)
+static void iommu_release_ownership(struct iommu_table *tbl)
 {
 	unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
 
@@ -1139,7 +1138,6 @@ void iommu_release_ownership(struct iommu_table *tbl)
 		spin_unlock(&tbl->pools[i].lock);
 	spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
 }
-EXPORT_SYMBOL_GPL(iommu_release_ownership);
 
 int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 {
@@ -1181,4 +1179,96 @@ void iommu_del_device(struct device *dev)
 	iommu_group_remove_device(dev);
 }
 EXPORT_SYMBOL_GPL(iommu_del_device);
+
+/*
+ * A simple iommu_table_group_ops which only allows reusing the existing
+ * iommu_table. This handles VFIO for POWER7 or the nested KVM.
+ * The ops does not allow creating windows and only allows reusing the existing
+ * one if it matches table_group->tce32_start/tce32_size/page_shift.
+ */
+static unsigned long spapr_tce_get_table_size(__u32 page_shift,
+					      __u64 window_size, __u32 levels)
+{
+	unsigned long size;
+
+	if (levels > 1)
+		return ~0U;
+	size = window_size >> (page_shift - 3);
+	return size;
+}
+
+static long spapr_tce_create_table(struct iommu_table_group *table_group,
+		int num, __u32 page_shift, __u64 window_size, __u32 levels,
+		struct iommu_table **ptbl)
+{
+	struct iommu_table *tbl = table_group->tables[0];
+
+	if (num > 0)
+		return -EPERM;
+
+	if (tbl->it_page_shift != page_shift ||
+	    tbl->it_size != (window_size >> page_shift) ||
+	    tbl->it_indirect_levels != levels - 1)
+		return -EINVAL;
+
+	*ptbl = iommu_tce_table_get(tbl);
+	return 0;
+}
+
+static long spapr_tce_set_window(struct iommu_table_group *table_group,
+			       int num, struct iommu_table *tbl)
+{
+	return tbl == table_group->tables[num] ? 0 : -EPERM;
+}
+
+static long spapr_tce_unset_window(struct iommu_table_group *table_group, int num)
+{
+	return 0;
+}
+
+static long spapr_tce_take_ownership(struct iommu_table_group *table_group)
+{
+	int i, j, rc = 0;
+
+	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
+		struct iommu_table *tbl = table_group->tables[i];
+
+		if (!tbl || !tbl->it_map)
+			continue;
+
+		rc = iommu_take_ownership(tbl);
+		if (!rc)
+			continue;
+		for (j = 0; j < i; ++j)
+			iommu_release_ownership(table_group->tables[j]);
+		return rc;
+	}
+	return 0;
+}
+
+static void spapr_tce_release_ownership(struct iommu_table_group *table_group)
+{
+	int i;
+
+	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
+		struct iommu_table *tbl = table_group->tables[i];
+
+		if (!tbl)
+			continue;
+
+		iommu_table_clear(tbl);
+		if (tbl->it_map)
+			iommu_release_ownership(tbl);
+	}
+}
+
+struct iommu_table_group_ops spapr_tce_table_group_ops = {
+	.get_table_size = spapr_tce_get_table_size,
+	.create_table = spapr_tce_create_table,
+	.set_window = spapr_tce_set_window,
+	.unset_window = spapr_tce_unset_window,
+	.take_ownership = spapr_tce_take_ownership,
+	.release_ownership = spapr_tce_release_ownership,
+};
+
 #endif /* CONFIG_IOMMU_API */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9de9b2fb163d..180965a309b6 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1554,6 +1554,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	if (WARN_ON(!tbl))
 		return;
 
+	pe->table_group.ops = &spapr_tce_table_group_ops;
+	pe->table_group.pgsizes = SZ_4K;
 	iommu_register_group(&pe->table_group, phb->hose->global_number,
 			pe->pe_number);
 	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
@@ -1888,7 +1890,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
 	}
 }
 
-static void pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
+static long pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
 {
 	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
 						table_group);
@@ -1902,6 +1904,8 @@ static void pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
 	else if (pe->pdev)
 		set_iommu_table_base(&pe->pdev->dev, NULL);
 	iommu_tce_table_put(tbl);
+
+	return 0;
 }
 
 static void pnv_ioda2_release_ownership(struct iommu_table_group *table_group)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index c3d425ef7b39..ae05b11c457d 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -74,6 +74,9 @@ static struct iommu_table_group *iommu_pseries_alloc_group(int node)
 	if (!table_group)
 		return NULL;
 
+	table_group->ops = &spapr_tce_table_group_ops;
+	table_group->pgsizes = SZ_4K;
+
 	table_group->tables[0] = iommu_pseries_alloc_table(node);
 	if (table_group->tables[0])
 		return table_group;
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index cd7b9c136889..8a65ea61744c 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -1141,52 +1141,6 @@ static long tce_iommu_ioctl(void *iommu_data,
 
 static void tce_iommu_release_ownership(struct tce_container *container,
 		struct iommu_table_group *table_group)
-{
-	int i;
-
-	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
-		struct iommu_table *tbl = container->tables[i];
-
-		if (!tbl)
-			continue;
-
-		tce_iommu_clear(container, tbl, tbl->it_offset, tbl->it_size);
-		if (tbl->it_map)
-			iommu_release_ownership(tbl);
-
-		container->tables[i] = NULL;
-	}
-}
-
-static int tce_iommu_take_ownership(struct tce_container *container,
-		struct iommu_table_group *table_group)
-{
-	int i, j, rc = 0;
-
-	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
-		struct iommu_table *tbl = table_group->tables[i];
-
-		if (!tbl || !tbl->it_map)
-			continue;
-
-		rc = iommu_take_ownership(tbl);
-		if (rc) {
-			for (j = 0; j < i; ++j)
-				iommu_release_ownership(
-						table_group->tables[j]);
-
-			return rc;
-		}
-	}
-
-	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
-		container->tables[i] = table_group->tables[i];
-
-	return 0;
-}
-
-static void tce_iommu_release_ownership_ddw(struct tce_container *container,
-		struct iommu_table_group *table_group)
 {
 	long i;
 
@@ -1202,18 +1156,14 @@ static void tce_iommu_release_ownership_ddw(struct tce_container *container,
 	table_group->ops->release_ownership(table_group);
 }
 
-static long tce_iommu_take_ownership_ddw(struct tce_container *container,
+static long tce_iommu_take_ownership(struct tce_container *container,
 		struct iommu_table_group *table_group)
 {
 	long i, ret = 0;
 
-	if (!table_group->ops->create_table || !table_group->ops->set_window ||
-			!table_group->ops->release_ownership) {
-		WARN_ON_ONCE(1);
-		return -EFAULT;
-	}
-
-	table_group->ops->take_ownership(table_group);
+	ret = table_group->ops->take_ownership(table_group);
+	if (ret)
+		return ret;
 
 	/* Set all windows to the new group */
 	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
@@ -1259,9 +1209,14 @@ static int tce_iommu_attach_group(void *iommu_data,
 		goto unlock_exit;
 	}
 
-	if (tce_groups_attached(container) && (!table_group->ops ||
-			!table_group->ops->take_ownership ||
-			!table_group->ops->release_ownership)) {
+	/* v2 requires full support of dynamic DMA windows */
+	if (container->v2 && table_group->max_dynamic_windows_supported == 0) {
+		ret = -EINVAL;
+		goto unlock_exit;
+	}
+
+	/* v1 reuses TCE tables and does not share them among PEs */
+	if (!container->v2 && tce_groups_attached(container)) {
 		ret = -EBUSY;
 		goto unlock_exit;
 	}
@@ -1293,29 +1248,15 @@ static int tce_iommu_attach_group(void *iommu_data,
 		goto unlock_exit;
 	}
 
-	if (!table_group->ops || !table_group->ops->take_ownership ||
-			!table_group->ops->release_ownership) {
-		if (container->v2) {
-			ret = -EPERM;
-			goto free_exit;
-		}
-		ret = tce_iommu_take_ownership(container, table_group);
-	} else {
-		if (!container->v2) {
-			ret = -EPERM;
-			goto free_exit;
-		}
-		ret = tce_iommu_take_ownership_ddw(container, table_group);
-		if (!tce_groups_attached(container) && !container->tables[0])
-			container->def_window_pending = true;
-	}
+	ret = tce_iommu_take_ownership(container, table_group);
+	if (!tce_groups_attached(container) && !container->tables[0])
+		container->def_window_pending = true;
 
 	if (!ret) {
 		tcegrp->grp = iommu_group;
 		list_add(&tcegrp->next, &container->group_list);
 	}
 
-free_exit:
 	if (ret && tcegrp)
 		kfree(tcegrp);
 
@@ -1354,10 +1295,7 @@ static void tce_iommu_detach_group(void *iommu_data,
 	table_group = iommu_group_get_iommudata(iommu_group);
 	BUG_ON(!table_group);
 
-	if (!table_group->ops || !table_group->ops->release_ownership)
-		tce_iommu_release_ownership(container, table_group);
-	else
-		tce_iommu_release_ownership_ddw(container, table_group);
+	tce_iommu_release_ownership(container, table_group);
 
 unlock_exit:
 	mutex_unlock(&container->lock);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH kernel 1/3] powerpc/iommu: Add "borrowing" iommu_table_group_ops
@ 2022-07-14  8:18   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-14  8:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, Deming Wang, Robin Murphy, Jason Gunthorpe,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman,
	Alexey Kardashevskiy

PPC64 IOMMU API defines iommu_table_group_ops which handles DMA windows
for PEs: control the ownership, create/set/unset a table the hardware
for dynamic DMA windows (DDW). VFIO uses the API to implement support
on POWER.

So far only PowerNV IODA2 (POWER8 and newer machines) implemented this and other cases (POWER7 or nested KVM) did not and instead reused
existing iommu_table structs. This means 1) no DDW 2) ownership transfer
is done directly in the VFIO SPAPR TCE driver.

Soon POWER is going to get its own iommu_ops and ownership control is
going to move there. This implements spapr_tce_table_group_ops which
borrows iommu_table tables. The upside is that VFIO needs to know less
about POWER.

The new ops returns the existing table from create_table() and
only checks if the same window is already set. This is only going to work
if the default DMA window starts table_group.tce32_start and as big as
pe->table_group.tce32_size (not the case for IODA2+ PowerNV).

This changes iommu_table_group_ops::take_ownership() to return an error
if borrowing a table failed.

This should not cause any visible change in behavior for PowerNV.
pSeries was not that well tested/supported anyway.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/iommu.h          |  6 +-
 arch/powerpc/kernel/iommu.c               | 98 ++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci-ioda.c |  6 +-
 arch/powerpc/platforms/pseries/iommu.c    |  3 +
 drivers/vfio/vfio_iommu_spapr_tce.c       | 94 ++++------------------
 5 files changed, 121 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 7e29c73e3dd4..678b5bdc79b1 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -175,7 +175,7 @@ struct iommu_table_group_ops {
 	long (*unset_window)(struct iommu_table_group *table_group,
 			int num);
 	/* Switch ownership from platform code to external user (e.g. VFIO) */
-	void (*take_ownership)(struct iommu_table_group *table_group);
+	long (*take_ownership)(struct iommu_table_group *table_group);
 	/* Switch ownership from external user (e.g. VFIO) back to core */
 	void (*release_ownership)(struct iommu_table_group *table_group);
 };
@@ -215,6 +215,8 @@ extern long iommu_tce_xchg_no_kill(struct mm_struct *mm,
 		enum dma_data_direction *direction);
 extern void iommu_tce_kill(struct iommu_table *tbl,
 		unsigned long entry, unsigned long pages);
+
+extern struct iommu_table_group_ops spapr_tce_table_group_ops;
 #else
 static inline void iommu_register_group(struct iommu_table_group *table_group,
 					int pci_domain_number,
@@ -303,8 +305,6 @@ extern int iommu_tce_check_gpa(unsigned long page_shift,
 		iommu_tce_check_gpa((tbl)->it_page_shift, (gpa)))
 
 extern void iommu_flush_tce(struct iommu_table *tbl);
-extern int iommu_take_ownership(struct iommu_table *tbl);
-extern void iommu_release_ownership(struct iommu_table *tbl);
 
 extern enum dma_data_direction iommu_tce_direction(unsigned long tce);
 extern unsigned long iommu_direction_to_tce_perm(enum dma_data_direction dir);
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index caebe1431596..d873c123ab49 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1088,7 +1088,7 @@ void iommu_tce_kill(struct iommu_table *tbl,
 }
 EXPORT_SYMBOL_GPL(iommu_tce_kill);
 
-int iommu_take_ownership(struct iommu_table *tbl)
+static int iommu_take_ownership(struct iommu_table *tbl)
 {
 	unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
 	int ret = 0;
@@ -1120,9 +1120,8 @@ int iommu_take_ownership(struct iommu_table *tbl)
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(iommu_take_ownership);
 
-void iommu_release_ownership(struct iommu_table *tbl)
+static void iommu_release_ownership(struct iommu_table *tbl)
 {
 	unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
 
@@ -1139,7 +1138,6 @@ void iommu_release_ownership(struct iommu_table *tbl)
 		spin_unlock(&tbl->pools[i].lock);
 	spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
 }
-EXPORT_SYMBOL_GPL(iommu_release_ownership);
 
 int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 {
@@ -1181,4 +1179,96 @@ void iommu_del_device(struct device *dev)
 	iommu_group_remove_device(dev);
 }
 EXPORT_SYMBOL_GPL(iommu_del_device);
+
+/*
+ * A simple iommu_table_group_ops which only allows reusing the existing
+ * iommu_table. This handles VFIO for POWER7 or the nested KVM.
+ * The ops does not allow creating windows and only allows reusing the existing
+ * one if it matches table_group->tce32_start/tce32_size/page_shift.
+ */
+static unsigned long spapr_tce_get_table_size(__u32 page_shift,
+					      __u64 window_size, __u32 levels)
+{
+	unsigned long size;
+
+	if (levels > 1)
+		return ~0U;
+	size = window_size >> (page_shift - 3);
+	return size;
+}
+
+static long spapr_tce_create_table(struct iommu_table_group *table_group,
+		int num, __u32 page_shift, __u64 window_size, __u32 levels,
+		struct iommu_table **ptbl)
+{
+	struct iommu_table *tbl = table_group->tables[0];
+
+	if (num > 0)
+		return -EPERM;
+
+	if (tbl->it_page_shift != page_shift ||
+	    tbl->it_size != (window_size >> page_shift) ||
+	    tbl->it_indirect_levels != levels - 1)
+		return -EINVAL;
+
+	*ptbl = iommu_tce_table_get(tbl);
+	return 0;
+}
+
+static long spapr_tce_set_window(struct iommu_table_group *table_group,
+			       int num, struct iommu_table *tbl)
+{
+	return tbl = table_group->tables[num] ? 0 : -EPERM;
+}
+
+static long spapr_tce_unset_window(struct iommu_table_group *table_group, int num)
+{
+	return 0;
+}
+
+static long spapr_tce_take_ownership(struct iommu_table_group *table_group)
+{
+	int i, j, rc = 0;
+
+	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
+		struct iommu_table *tbl = table_group->tables[i];
+
+		if (!tbl || !tbl->it_map)
+			continue;
+
+		rc = iommu_take_ownership(tbl);
+		if (!rc)
+			continue;
+		for (j = 0; j < i; ++j)
+			iommu_release_ownership(table_group->tables[j]);
+		return rc;
+	}
+	return 0;
+}
+
+static void spapr_tce_release_ownership(struct iommu_table_group *table_group)
+{
+	int i;
+
+	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
+		struct iommu_table *tbl = table_group->tables[i];
+
+		if (!tbl)
+			continue;
+
+		iommu_table_clear(tbl);
+		if (tbl->it_map)
+			iommu_release_ownership(tbl);
+	}
+}
+
+struct iommu_table_group_ops spapr_tce_table_group_ops = {
+	.get_table_size = spapr_tce_get_table_size,
+	.create_table = spapr_tce_create_table,
+	.set_window = spapr_tce_set_window,
+	.unset_window = spapr_tce_unset_window,
+	.take_ownership = spapr_tce_take_ownership,
+	.release_ownership = spapr_tce_release_ownership,
+};
+
 #endif /* CONFIG_IOMMU_API */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9de9b2fb163d..180965a309b6 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1554,6 +1554,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
 	if (WARN_ON(!tbl))
 		return;
 
+	pe->table_group.ops = &spapr_tce_table_group_ops;
+	pe->table_group.pgsizes = SZ_4K;
 	iommu_register_group(&pe->table_group, phb->hose->global_number,
 			pe->pe_number);
 	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
@@ -1888,7 +1890,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
 	}
 }
 
-static void pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
+static long pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
 {
 	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
 						table_group);
@@ -1902,6 +1904,8 @@ static void pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
 	else if (pe->pdev)
 		set_iommu_table_base(&pe->pdev->dev, NULL);
 	iommu_tce_table_put(tbl);
+
+	return 0;
 }
 
 static void pnv_ioda2_release_ownership(struct iommu_table_group *table_group)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index c3d425ef7b39..ae05b11c457d 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -74,6 +74,9 @@ static struct iommu_table_group *iommu_pseries_alloc_group(int node)
 	if (!table_group)
 		return NULL;
 
+	table_group->ops = &spapr_tce_table_group_ops;
+	table_group->pgsizes = SZ_4K;
+
 	table_group->tables[0] = iommu_pseries_alloc_table(node);
 	if (table_group->tables[0])
 		return table_group;
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index cd7b9c136889..8a65ea61744c 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -1141,52 +1141,6 @@ static long tce_iommu_ioctl(void *iommu_data,
 
 static void tce_iommu_release_ownership(struct tce_container *container,
 		struct iommu_table_group *table_group)
-{
-	int i;
-
-	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
-		struct iommu_table *tbl = container->tables[i];
-
-		if (!tbl)
-			continue;
-
-		tce_iommu_clear(container, tbl, tbl->it_offset, tbl->it_size);
-		if (tbl->it_map)
-			iommu_release_ownership(tbl);
-
-		container->tables[i] = NULL;
-	}
-}
-
-static int tce_iommu_take_ownership(struct tce_container *container,
-		struct iommu_table_group *table_group)
-{
-	int i, j, rc = 0;
-
-	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
-		struct iommu_table *tbl = table_group->tables[i];
-
-		if (!tbl || !tbl->it_map)
-			continue;
-
-		rc = iommu_take_ownership(tbl);
-		if (rc) {
-			for (j = 0; j < i; ++j)
-				iommu_release_ownership(
-						table_group->tables[j]);
-
-			return rc;
-		}
-	}
-
-	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
-		container->tables[i] = table_group->tables[i];
-
-	return 0;
-}
-
-static void tce_iommu_release_ownership_ddw(struct tce_container *container,
-		struct iommu_table_group *table_group)
 {
 	long i;
 
@@ -1202,18 +1156,14 @@ static void tce_iommu_release_ownership_ddw(struct tce_container *container,
 	table_group->ops->release_ownership(table_group);
 }
 
-static long tce_iommu_take_ownership_ddw(struct tce_container *container,
+static long tce_iommu_take_ownership(struct tce_container *container,
 		struct iommu_table_group *table_group)
 {
 	long i, ret = 0;
 
-	if (!table_group->ops->create_table || !table_group->ops->set_window ||
-			!table_group->ops->release_ownership) {
-		WARN_ON_ONCE(1);
-		return -EFAULT;
-	}
-
-	table_group->ops->take_ownership(table_group);
+	ret = table_group->ops->take_ownership(table_group);
+	if (ret)
+		return ret;
 
 	/* Set all windows to the new group */
 	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
@@ -1259,9 +1209,14 @@ static int tce_iommu_attach_group(void *iommu_data,
 		goto unlock_exit;
 	}
 
-	if (tce_groups_attached(container) && (!table_group->ops ||
-			!table_group->ops->take_ownership ||
-			!table_group->ops->release_ownership)) {
+	/* v2 requires full support of dynamic DMA windows */
+	if (container->v2 && table_group->max_dynamic_windows_supported = 0) {
+		ret = -EINVAL;
+		goto unlock_exit;
+	}
+
+	/* v1 reuses TCE tables and does not share them among PEs */
+	if (!container->v2 && tce_groups_attached(container)) {
 		ret = -EBUSY;
 		goto unlock_exit;
 	}
@@ -1293,29 +1248,15 @@ static int tce_iommu_attach_group(void *iommu_data,
 		goto unlock_exit;
 	}
 
-	if (!table_group->ops || !table_group->ops->take_ownership ||
-			!table_group->ops->release_ownership) {
-		if (container->v2) {
-			ret = -EPERM;
-			goto free_exit;
-		}
-		ret = tce_iommu_take_ownership(container, table_group);
-	} else {
-		if (!container->v2) {
-			ret = -EPERM;
-			goto free_exit;
-		}
-		ret = tce_iommu_take_ownership_ddw(container, table_group);
-		if (!tce_groups_attached(container) && !container->tables[0])
-			container->def_window_pending = true;
-	}
+	ret = tce_iommu_take_ownership(container, table_group);
+	if (!tce_groups_attached(container) && !container->tables[0])
+		container->def_window_pending = true;
 
 	if (!ret) {
 		tcegrp->grp = iommu_group;
 		list_add(&tcegrp->next, &container->group_list);
 	}
 
-free_exit:
 	if (ret && tcegrp)
 		kfree(tcegrp);
 
@@ -1354,10 +1295,7 @@ static void tce_iommu_detach_group(void *iommu_data,
 	table_group = iommu_group_get_iommudata(iommu_group);
 	BUG_ON(!table_group);
 
-	if (!table_group->ops || !table_group->ops->release_ownership)
-		tce_iommu_release_ownership(container, table_group);
-	else
-		tce_iommu_release_ownership_ddw(container, table_group);
+	tce_iommu_release_ownership(container, table_group);
 
 unlock_exit:
 	mutex_unlock(&container->lock);
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH kernel 2/3] powerpc/pci_64: Init pcibios subsys a bit later
  2022-07-14  8:18 ` Alexey Kardashevskiy
  (?)
@ 2022-07-14  8:18   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-14  8:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, Deming Wang, Robin Murphy, Jason Gunthorpe,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman,
	Alexey Kardashevskiy

The following patches are going to add dependency/use of iommu_ops which
is initialized in subsys_initcall as well.

This moves pciobios_init() to the next initcall level.

This should not cause behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kernel/pci_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 19b03ddf5631..79472d2f1739 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -73,7 +73,7 @@ static int __init pcibios_init(void)
 	return 0;
 }
 
-subsys_initcall(pcibios_init);
+subsys_initcall_sync(pcibios_init);
 
 int pcibios_unmap_io_space(struct pci_bus *bus)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH kernel 2/3] powerpc/pci_64: Init pcibios subsys a bit later
@ 2022-07-14  8:18   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-14  8:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, Fabiano Rosas, Alexey Kardashevskiy,
	Daniel Henrique Barboza, Deming Wang, kvm-ppc, Alex Williamson,
	Nicholas Piggin, Jason Gunthorpe, Murilo Opsfelder Araujo,
	Robin Murphy

The following patches are going to add dependency/use of iommu_ops which
is initialized in subsys_initcall as well.

This moves pciobios_init() to the next initcall level.

This should not cause behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kernel/pci_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 19b03ddf5631..79472d2f1739 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -73,7 +73,7 @@ static int __init pcibios_init(void)
 	return 0;
 }
 
-subsys_initcall(pcibios_init);
+subsys_initcall_sync(pcibios_init);
 
 int pcibios_unmap_io_space(struct pci_bus *bus)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH kernel 2/3] powerpc/pci_64: Init pcibios subsys a bit later
@ 2022-07-14  8:18   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-14  8:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, Deming Wang, Robin Murphy, Jason Gunthorpe,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman,
	Alexey Kardashevskiy

The following patches are going to add dependency/use of iommu_ops which
is initialized in subsys_initcall as well.

This moves pciobios_init() to the next initcall level.

This should not cause behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kernel/pci_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 19b03ddf5631..79472d2f1739 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -73,7 +73,7 @@ static int __init pcibios_init(void)
 	return 0;
 }
 
-subsys_initcall(pcibios_init);
+subsys_initcall_sync(pcibios_init);
 
 int pcibios_unmap_io_space(struct pci_bus *bus)
 {
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-14  8:18 ` Alexey Kardashevskiy
  (?)
@ 2022-07-14  8:18   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-14  8:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, Deming Wang, Robin Murphy, Jason Gunthorpe,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman,
	Alexey Kardashevskiy

Up until now PPC64 managed to avoid using iommu_ops. The VFIO driver
uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
the Type1 VFIO driver. Recent development added 2 uses of iommu_ops to
the generic VFIO which broke POWER:
- a coherency capability check;
- blocking IOMMU domain - iommu_group_dma_owner_claimed()/...

This adds a simple iommu_ops which reports support for cache
coherency and provides a basic support for blocking domains. No other
domain types are implemented so the default domain is NULL.

Since now iommu_ops controls the group ownership, this takes it out of
VFIO.

This adds an IOMMU device into a pci_controller (=PHB) and registers it
in the IOMMU subsystem, iommu_ops is registered at this point.
This setup is done in postcore_initcall_sync.

This replaces iommu_group_add_device() with iommu_probe_device() as
the former misses necessary steps in connecting PCI devices to IOMMU
devices. This adds a comment about why explicit iommu_probe_device()
is still needed.

The previous discussion is here:
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Fixes: 70693f470848 ("vfio: Set DMA ownership for VFIO devices")
Cc: Deming Wang <wangdeming@inspur.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: Fabiano Rosas <farosas@linux.ibm.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/pci-bridge.h     |   7 +
 arch/powerpc/platforms/pseries/pseries.h  |   5 +
 arch/powerpc/kernel/iommu.c               | 159 +++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci-ioda.c |  30 ++++
 arch/powerpc/platforms/pseries/iommu.c    |  24 ++++
 arch/powerpc/platforms/pseries/setup.c    |   3 +
 drivers/vfio/vfio_iommu_spapr_tce.c       |   8 --
 7 files changed, 226 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index c85f901227c9..338a45b410b4 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -8,6 +8,7 @@
 #include <linux/list.h>
 #include <linux/ioport.h>
 #include <linux/numa.h>
+#include <linux/iommu.h>
 
 struct device_node;
 
@@ -44,6 +45,9 @@ struct pci_controller_ops {
 #endif
 
 	void		(*shutdown)(struct pci_controller *hose);
+
+	struct iommu_group *(*device_group)(struct pci_controller *hose,
+					    struct pci_dev *pdev);
 };
 
 /*
@@ -131,6 +135,9 @@ struct pci_controller {
 	struct irq_domain	*dev_domain;
 	struct irq_domain	*msi_domain;
 	struct fwnode_handle	*fwnode;
+
+	/* iommu_ops support */
+	struct iommu_device	iommu;
 };
 
 /* These are used for config access before all the PCI probing
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index f5c916c839c9..9a49a16dd89a 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -122,4 +122,9 @@ void pseries_lpar_read_hblkrm_characteristics(void);
 static inline void pseries_lpar_read_hblkrm_characteristics(void) { }
 #endif
 
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose,
+					     struct pci_dev *pdev);
+#endif
+
 #endif /* _PSERIES_PSERIES_H */
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index d873c123ab49..b5301e289714 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -35,6 +35,7 @@
 #include <asm/vio.h>
 #include <asm/tce.h>
 #include <asm/mmu_context.h>
+#include <asm/ppc-pci.h>
 
 #define DBG(...)
 
@@ -1158,8 +1159,14 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 
 	pr_debug("%s: Adding %s to iommu group %d\n",
 		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
-
-	return iommu_group_add_device(table_group->group, dev);
+	/*
+	 * This is still not adding devices via the IOMMU bus notifier because
+	 * of pcibios_init() from arch/powerpc/kernel/pci_64.c which calls
+	 * pcibios_scan_phb() first (and this guy adds devices and triggers
+	 * the notifier) and only then it calls pci_bus_add_devices() which
+	 * configures DMA for buses which also creates PEs and IOMMU groups.
+	 */
+	return iommu_probe_device(dev);
 }
 EXPORT_SYMBOL_GPL(iommu_add_device);
 
@@ -1239,6 +1246,7 @@ static long spapr_tce_take_ownership(struct iommu_table_group *table_group)
 		rc = iommu_take_ownership(tbl);
 		if (!rc)
 			continue;
+
 		for (j = 0; j < i; ++j)
 			iommu_release_ownership(table_group->tables[j]);
 		return rc;
@@ -1271,4 +1279,151 @@ struct iommu_table_group_ops spapr_tce_table_group_ops = {
 	.release_ownership = spapr_tce_release_ownership,
 };
 
+/*
+ * A simple iommu_ops to allow less cruft in generic VFIO code.
+ */
+static bool spapr_tce_iommu_capable(enum iommu_cap cap)
+{
+	switch (cap) {
+	case IOMMU_CAP_CACHE_COHERENCY:
+		return true;
+	default:
+		break;
+	}
+
+	return false;
+}
+
+static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type)
+{
+	struct iommu_domain *dom;
+
+	if (type != IOMMU_DOMAIN_BLOCKED)
+		return NULL;
+
+	dom = kzalloc(sizeof(*dom), GFP_KERNEL);
+	if (!dom)
+		return NULL;
+
+	dom->geometry.aperture_start = 0;
+	dom->geometry.aperture_end = ~0ULL;
+	dom->geometry.force_aperture = true;
+
+	return dom;
+}
+
+static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
+{
+	struct pci_dev *pdev;
+	struct pci_controller *hose;
+
+	/* Weirdly iommu_device_register() assigns the same ops to all buses */
+	if (!dev_is_pci(dev))
+		return ERR_PTR(-EPERM);
+
+	pdev = to_pci_dev(dev);
+	hose = pdev->bus->sysdata;
+
+	return &hose->iommu;
+}
+
+static void spapr_tce_iommu_release_device(struct device *dev)
+{
+}
+
+static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
+{
+	struct pci_controller *hose;
+	struct pci_dev *pdev;
+
+	/* Weirdly iommu_device_register() assigns the same ops to all buses */
+	if (!dev_is_pci(dev))
+		return ERR_PTR(-EPERM);
+
+	pdev = to_pci_dev(dev);
+	hose = pdev->bus->sysdata;
+
+	if (!hose->controller_ops.device_group)
+		return ERR_PTR(-ENOENT);
+
+	return hose->controller_ops.device_group(hose, pdev);
+}
+
+static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
+				      struct device *dev)
+{
+	struct iommu_group *grp = iommu_group_get(dev);
+	struct iommu_table_group *table_group;
+	int ret = -EINVAL;
+
+	if (!grp)
+		return -ENODEV;
+
+	table_group = iommu_group_get_iommudata(grp);
+
+	if (dom->type == IOMMU_DOMAIN_BLOCKED)
+		ret = table_group->ops->take_ownership(table_group);
+
+	iommu_group_put(grp);
+
+	return ret;
+}
+
+static void spapr_tce_iommu_detach_dev(struct iommu_domain *dom,
+				       struct device *dev)
+{
+	struct iommu_group *grp = iommu_group_get(dev);
+	struct iommu_table_group *table_group;
+
+	table_group = iommu_group_get_iommudata(grp);
+	WARN_ON(dom->type != IOMMU_DOMAIN_BLOCKED);
+	table_group->ops->release_ownership(table_group);
+}
+
+static const struct iommu_ops spapr_tce_iommu_ops = {
+	.capable = spapr_tce_iommu_capable,
+	.domain_alloc = spapr_tce_iommu_domain_alloc,
+	.probe_device = spapr_tce_iommu_probe_device,
+	.release_device = spapr_tce_iommu_release_device,
+	.device_group = spapr_tce_iommu_device_group,
+	.default_domain_ops = &(const struct iommu_domain_ops) {
+		.attach_dev = spapr_tce_iommu_attach_dev,
+		.detach_dev = spapr_tce_iommu_detach_dev,
+	}
+};
+
+static struct attribute *spapr_tce_iommu_attrs[] = {
+	NULL,
+};
+
+static struct attribute_group spapr_tce_iommu_group = {
+	.name = "spapr-tce-iommu",
+	.attrs = spapr_tce_iommu_attrs,
+};
+
+static const struct attribute_group *spapr_tce_iommu_groups[] = {
+	&spapr_tce_iommu_group,
+	NULL,
+};
+
+/*
+ * This registers IOMMU devices of PHBs. This needs to happen
+ * after core_initcall(iommu_init) + postcore_initcall(pci_driver_init) and
+ * before subsys_initcall(iommu_subsys_init).
+ */
+static int __init spapr_tce_setup_phb_iommus_initcall(void)
+{
+	struct pci_controller *hose;
+
+	list_for_each_entry(hose, &hose_list, list_node) {
+		iommu_device_sysfs_add(&hose->iommu, hose->parent,
+				       spapr_tce_iommu_groups, "iommu-phb%04x",
+				       hose->global_number);
+		iommu_device_register(&hose->iommu, &spapr_tce_iommu_ops,
+				      hose->parent);
+	}
+	return 0;
+}
+postcore_initcall_sync(spapr_tce_setup_phb_iommus_initcall);
+
 #endif /* CONFIG_IOMMU_API */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 180965a309b6..683425a77233 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1897,6 +1897,13 @@ static long pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
 	/* Store @tbl as pnv_pci_ioda2_unset_window() resets it */
 	struct iommu_table *tbl = pe->table_group.tables[0];
 
+	/*
+	 * iommu_ops transfers the ownership per a device and we mode
+	 * the group ownership with the first device in the group.
+	 */
+	if (!tbl)
+		return 0;
+
 	pnv_pci_ioda2_set_bypass(pe, false);
 	pnv_pci_ioda2_unset_window(&pe->table_group, 0);
 	if (pe->pbus)
@@ -1913,6 +1920,9 @@ static void pnv_ioda2_release_ownership(struct iommu_table_group *table_group)
 	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
 						table_group);
 
+	/* See the comment about iommu_ops above */
+	if (pe->table_group.tables[0])
+		return;
 	pnv_pci_ioda2_setup_default_config(pe);
 	if (pe->pbus)
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
@@ -2918,6 +2928,25 @@ static void pnv_pci_ioda_dma_bus_setup(struct pci_bus *bus)
 	}
 }
 
+static struct iommu_group *pnv_pci_device_group(struct pci_controller *hose,
+						struct pci_dev *pdev)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct pnv_ioda_pe *pe;
+
+	if (WARN_ON(!phb))
+		return ERR_PTR(-ENODEV);
+
+	pe = pnv_pci_bdfn_to_pe(phb, pdev->devfn | (pdev->bus->number << 8));
+	if (!pe)
+		return ERR_PTR(-ENODEV);
+
+	if (!pe->table_group.group)
+		return ERR_PTR(-ENODEV);
+
+	return iommu_group_ref_get(pe->table_group.group);
+}
+
 static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.dma_dev_setup		= pnv_pci_ioda_dma_dev_setup,
 	.dma_bus_setup		= pnv_pci_ioda_dma_bus_setup,
@@ -2928,6 +2957,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.setup_bridge		= pnv_pci_fixup_bridge_resources,
 	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
 	.shutdown		= pnv_pci_ioda_shutdown,
+	.device_group		= pnv_pci_device_group,
 };
 
 static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index ae05b11c457d..57bebe51a564 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1725,3 +1725,27 @@ static int __init tce_iommu_bus_notifier_init(void)
 	return 0;
 }
 machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
+
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose,
+					     struct pci_dev *pdev)
+{
+	struct device_node *pdn, *dn = pdev->dev.of_node;
+	struct iommu_group *grp;
+	struct pci_dn *pci;
+
+	pdn = pci_dma_find(dn, NULL);
+	if (!pdn || !PCI_DN(pdn))
+		return ERR_PTR(-ENODEV);
+
+	pci = PCI_DN(pdn);
+	if (!pci->table_group)
+		return ERR_PTR(-ENODEV);
+
+	grp = pci->table_group->group;
+	if (!grp)
+		return ERR_PTR(-ENODEV);
+
+	return iommu_group_ref_get(grp);
+}
+#endif
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 233c64f59815..3e3b3f031029 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -1079,6 +1079,9 @@ static int pSeries_pci_probe_mode(struct pci_bus *bus)
 
 struct pci_controller_ops pseries_pci_controller_ops = {
 	.probe_mode		= pSeries_pci_probe_mode,
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+	.device_group		= pSeries_pci_device_group,
+#endif
 };
 
 define_machine(pseries) {
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index 8a65ea61744c..3b53b466e49b 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -1152,8 +1152,6 @@ static void tce_iommu_release_ownership(struct tce_container *container,
 	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
 		if (container->tables[i])
 			table_group->ops->unset_window(table_group, i);
-
-	table_group->ops->release_ownership(table_group);
 }
 
 static long tce_iommu_take_ownership(struct tce_container *container,
@@ -1161,10 +1159,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
 {
 	long i, ret = 0;
 
-	ret = table_group->ops->take_ownership(table_group);
-	if (ret)
-		return ret;
-
 	/* Set all windows to the new group */
 	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
 		struct iommu_table *tbl = container->tables[i];
@@ -1183,8 +1177,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
 	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
 		table_group->ops->unset_window(table_group, i);
 
-	table_group->ops->release_ownership(table_group);
-
 	return ret;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-14  8:18   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-14  8:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, Fabiano Rosas, Alexey Kardashevskiy,
	Daniel Henrique Barboza, Deming Wang, kvm-ppc, Alex Williamson,
	Nicholas Piggin, Jason Gunthorpe, Murilo Opsfelder Araujo,
	Robin Murphy

Up until now PPC64 managed to avoid using iommu_ops. The VFIO driver
uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
the Type1 VFIO driver. Recent development added 2 uses of iommu_ops to
the generic VFIO which broke POWER:
- a coherency capability check;
- blocking IOMMU domain - iommu_group_dma_owner_claimed()/...

This adds a simple iommu_ops which reports support for cache
coherency and provides a basic support for blocking domains. No other
domain types are implemented so the default domain is NULL.

Since now iommu_ops controls the group ownership, this takes it out of
VFIO.

This adds an IOMMU device into a pci_controller (=PHB) and registers it
in the IOMMU subsystem, iommu_ops is registered at this point.
This setup is done in postcore_initcall_sync.

This replaces iommu_group_add_device() with iommu_probe_device() as
the former misses necessary steps in connecting PCI devices to IOMMU
devices. This adds a comment about why explicit iommu_probe_device()
is still needed.

The previous discussion is here:
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Fixes: 70693f470848 ("vfio: Set DMA ownership for VFIO devices")
Cc: Deming Wang <wangdeming@inspur.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: Fabiano Rosas <farosas@linux.ibm.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/pci-bridge.h     |   7 +
 arch/powerpc/platforms/pseries/pseries.h  |   5 +
 arch/powerpc/kernel/iommu.c               | 159 +++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci-ioda.c |  30 ++++
 arch/powerpc/platforms/pseries/iommu.c    |  24 ++++
 arch/powerpc/platforms/pseries/setup.c    |   3 +
 drivers/vfio/vfio_iommu_spapr_tce.c       |   8 --
 7 files changed, 226 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index c85f901227c9..338a45b410b4 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -8,6 +8,7 @@
 #include <linux/list.h>
 #include <linux/ioport.h>
 #include <linux/numa.h>
+#include <linux/iommu.h>
 
 struct device_node;
 
@@ -44,6 +45,9 @@ struct pci_controller_ops {
 #endif
 
 	void		(*shutdown)(struct pci_controller *hose);
+
+	struct iommu_group *(*device_group)(struct pci_controller *hose,
+					    struct pci_dev *pdev);
 };
 
 /*
@@ -131,6 +135,9 @@ struct pci_controller {
 	struct irq_domain	*dev_domain;
 	struct irq_domain	*msi_domain;
 	struct fwnode_handle	*fwnode;
+
+	/* iommu_ops support */
+	struct iommu_device	iommu;
 };
 
 /* These are used for config access before all the PCI probing
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index f5c916c839c9..9a49a16dd89a 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -122,4 +122,9 @@ void pseries_lpar_read_hblkrm_characteristics(void);
 static inline void pseries_lpar_read_hblkrm_characteristics(void) { }
 #endif
 
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose,
+					     struct pci_dev *pdev);
+#endif
+
 #endif /* _PSERIES_PSERIES_H */
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index d873c123ab49..b5301e289714 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -35,6 +35,7 @@
 #include <asm/vio.h>
 #include <asm/tce.h>
 #include <asm/mmu_context.h>
+#include <asm/ppc-pci.h>
 
 #define DBG(...)
 
@@ -1158,8 +1159,14 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 
 	pr_debug("%s: Adding %s to iommu group %d\n",
 		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
-
-	return iommu_group_add_device(table_group->group, dev);
+	/*
+	 * This is still not adding devices via the IOMMU bus notifier because
+	 * of pcibios_init() from arch/powerpc/kernel/pci_64.c which calls
+	 * pcibios_scan_phb() first (and this guy adds devices and triggers
+	 * the notifier) and only then it calls pci_bus_add_devices() which
+	 * configures DMA for buses which also creates PEs and IOMMU groups.
+	 */
+	return iommu_probe_device(dev);
 }
 EXPORT_SYMBOL_GPL(iommu_add_device);
 
@@ -1239,6 +1246,7 @@ static long spapr_tce_take_ownership(struct iommu_table_group *table_group)
 		rc = iommu_take_ownership(tbl);
 		if (!rc)
 			continue;
+
 		for (j = 0; j < i; ++j)
 			iommu_release_ownership(table_group->tables[j]);
 		return rc;
@@ -1271,4 +1279,151 @@ struct iommu_table_group_ops spapr_tce_table_group_ops = {
 	.release_ownership = spapr_tce_release_ownership,
 };
 
+/*
+ * A simple iommu_ops to allow less cruft in generic VFIO code.
+ */
+static bool spapr_tce_iommu_capable(enum iommu_cap cap)
+{
+	switch (cap) {
+	case IOMMU_CAP_CACHE_COHERENCY:
+		return true;
+	default:
+		break;
+	}
+
+	return false;
+}
+
+static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type)
+{
+	struct iommu_domain *dom;
+
+	if (type != IOMMU_DOMAIN_BLOCKED)
+		return NULL;
+
+	dom = kzalloc(sizeof(*dom), GFP_KERNEL);
+	if (!dom)
+		return NULL;
+
+	dom->geometry.aperture_start = 0;
+	dom->geometry.aperture_end = ~0ULL;
+	dom->geometry.force_aperture = true;
+
+	return dom;
+}
+
+static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
+{
+	struct pci_dev *pdev;
+	struct pci_controller *hose;
+
+	/* Weirdly iommu_device_register() assigns the same ops to all buses */
+	if (!dev_is_pci(dev))
+		return ERR_PTR(-EPERM);
+
+	pdev = to_pci_dev(dev);
+	hose = pdev->bus->sysdata;
+
+	return &hose->iommu;
+}
+
+static void spapr_tce_iommu_release_device(struct device *dev)
+{
+}
+
+static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
+{
+	struct pci_controller *hose;
+	struct pci_dev *pdev;
+
+	/* Weirdly iommu_device_register() assigns the same ops to all buses */
+	if (!dev_is_pci(dev))
+		return ERR_PTR(-EPERM);
+
+	pdev = to_pci_dev(dev);
+	hose = pdev->bus->sysdata;
+
+	if (!hose->controller_ops.device_group)
+		return ERR_PTR(-ENOENT);
+
+	return hose->controller_ops.device_group(hose, pdev);
+}
+
+static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
+				      struct device *dev)
+{
+	struct iommu_group *grp = iommu_group_get(dev);
+	struct iommu_table_group *table_group;
+	int ret = -EINVAL;
+
+	if (!grp)
+		return -ENODEV;
+
+	table_group = iommu_group_get_iommudata(grp);
+
+	if (dom->type == IOMMU_DOMAIN_BLOCKED)
+		ret = table_group->ops->take_ownership(table_group);
+
+	iommu_group_put(grp);
+
+	return ret;
+}
+
+static void spapr_tce_iommu_detach_dev(struct iommu_domain *dom,
+				       struct device *dev)
+{
+	struct iommu_group *grp = iommu_group_get(dev);
+	struct iommu_table_group *table_group;
+
+	table_group = iommu_group_get_iommudata(grp);
+	WARN_ON(dom->type != IOMMU_DOMAIN_BLOCKED);
+	table_group->ops->release_ownership(table_group);
+}
+
+static const struct iommu_ops spapr_tce_iommu_ops = {
+	.capable = spapr_tce_iommu_capable,
+	.domain_alloc = spapr_tce_iommu_domain_alloc,
+	.probe_device = spapr_tce_iommu_probe_device,
+	.release_device = spapr_tce_iommu_release_device,
+	.device_group = spapr_tce_iommu_device_group,
+	.default_domain_ops = &(const struct iommu_domain_ops) {
+		.attach_dev = spapr_tce_iommu_attach_dev,
+		.detach_dev = spapr_tce_iommu_detach_dev,
+	}
+};
+
+static struct attribute *spapr_tce_iommu_attrs[] = {
+	NULL,
+};
+
+static struct attribute_group spapr_tce_iommu_group = {
+	.name = "spapr-tce-iommu",
+	.attrs = spapr_tce_iommu_attrs,
+};
+
+static const struct attribute_group *spapr_tce_iommu_groups[] = {
+	&spapr_tce_iommu_group,
+	NULL,
+};
+
+/*
+ * This registers IOMMU devices of PHBs. This needs to happen
+ * after core_initcall(iommu_init) + postcore_initcall(pci_driver_init) and
+ * before subsys_initcall(iommu_subsys_init).
+ */
+static int __init spapr_tce_setup_phb_iommus_initcall(void)
+{
+	struct pci_controller *hose;
+
+	list_for_each_entry(hose, &hose_list, list_node) {
+		iommu_device_sysfs_add(&hose->iommu, hose->parent,
+				       spapr_tce_iommu_groups, "iommu-phb%04x",
+				       hose->global_number);
+		iommu_device_register(&hose->iommu, &spapr_tce_iommu_ops,
+				      hose->parent);
+	}
+	return 0;
+}
+postcore_initcall_sync(spapr_tce_setup_phb_iommus_initcall);
+
 #endif /* CONFIG_IOMMU_API */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 180965a309b6..683425a77233 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1897,6 +1897,13 @@ static long pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
 	/* Store @tbl as pnv_pci_ioda2_unset_window() resets it */
 	struct iommu_table *tbl = pe->table_group.tables[0];
 
+	/*
+	 * iommu_ops transfers the ownership per a device and we mode
+	 * the group ownership with the first device in the group.
+	 */
+	if (!tbl)
+		return 0;
+
 	pnv_pci_ioda2_set_bypass(pe, false);
 	pnv_pci_ioda2_unset_window(&pe->table_group, 0);
 	if (pe->pbus)
@@ -1913,6 +1920,9 @@ static void pnv_ioda2_release_ownership(struct iommu_table_group *table_group)
 	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
 						table_group);
 
+	/* See the comment about iommu_ops above */
+	if (pe->table_group.tables[0])
+		return;
 	pnv_pci_ioda2_setup_default_config(pe);
 	if (pe->pbus)
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
@@ -2918,6 +2928,25 @@ static void pnv_pci_ioda_dma_bus_setup(struct pci_bus *bus)
 	}
 }
 
+static struct iommu_group *pnv_pci_device_group(struct pci_controller *hose,
+						struct pci_dev *pdev)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct pnv_ioda_pe *pe;
+
+	if (WARN_ON(!phb))
+		return ERR_PTR(-ENODEV);
+
+	pe = pnv_pci_bdfn_to_pe(phb, pdev->devfn | (pdev->bus->number << 8));
+	if (!pe)
+		return ERR_PTR(-ENODEV);
+
+	if (!pe->table_group.group)
+		return ERR_PTR(-ENODEV);
+
+	return iommu_group_ref_get(pe->table_group.group);
+}
+
 static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.dma_dev_setup		= pnv_pci_ioda_dma_dev_setup,
 	.dma_bus_setup		= pnv_pci_ioda_dma_bus_setup,
@@ -2928,6 +2957,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.setup_bridge		= pnv_pci_fixup_bridge_resources,
 	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
 	.shutdown		= pnv_pci_ioda_shutdown,
+	.device_group		= pnv_pci_device_group,
 };
 
 static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index ae05b11c457d..57bebe51a564 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1725,3 +1725,27 @@ static int __init tce_iommu_bus_notifier_init(void)
 	return 0;
 }
 machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
+
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose,
+					     struct pci_dev *pdev)
+{
+	struct device_node *pdn, *dn = pdev->dev.of_node;
+	struct iommu_group *grp;
+	struct pci_dn *pci;
+
+	pdn = pci_dma_find(dn, NULL);
+	if (!pdn || !PCI_DN(pdn))
+		return ERR_PTR(-ENODEV);
+
+	pci = PCI_DN(pdn);
+	if (!pci->table_group)
+		return ERR_PTR(-ENODEV);
+
+	grp = pci->table_group->group;
+	if (!grp)
+		return ERR_PTR(-ENODEV);
+
+	return iommu_group_ref_get(grp);
+}
+#endif
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 233c64f59815..3e3b3f031029 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -1079,6 +1079,9 @@ static int pSeries_pci_probe_mode(struct pci_bus *bus)
 
 struct pci_controller_ops pseries_pci_controller_ops = {
 	.probe_mode		= pSeries_pci_probe_mode,
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+	.device_group		= pSeries_pci_device_group,
+#endif
 };
 
 define_machine(pseries) {
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index 8a65ea61744c..3b53b466e49b 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -1152,8 +1152,6 @@ static void tce_iommu_release_ownership(struct tce_container *container,
 	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
 		if (container->tables[i])
 			table_group->ops->unset_window(table_group, i);
-
-	table_group->ops->release_ownership(table_group);
 }
 
 static long tce_iommu_take_ownership(struct tce_container *container,
@@ -1161,10 +1159,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
 {
 	long i, ret = 0;
 
-	ret = table_group->ops->take_ownership(table_group);
-	if (ret)
-		return ret;
-
 	/* Set all windows to the new group */
 	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
 		struct iommu_table *tbl = container->tables[i];
@@ -1183,8 +1177,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
 	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
 		table_group->ops->unset_window(table_group, i);
 
-	table_group->ops->release_ownership(table_group);
-
 	return ret;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-14  8:18   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-14  8:18 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, kvm-ppc, Deming Wang, Robin Murphy, Jason Gunthorpe,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman,
	Alexey Kardashevskiy

Up until now PPC64 managed to avoid using iommu_ops. The VFIO driver
uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
the Type1 VFIO driver. Recent development added 2 uses of iommu_ops to
the generic VFIO which broke POWER:
- a coherency capability check;
- blocking IOMMU domain - iommu_group_dma_owner_claimed()/...

This adds a simple iommu_ops which reports support for cache
coherency and provides a basic support for blocking domains. No other
domain types are implemented so the default domain is NULL.

Since now iommu_ops controls the group ownership, this takes it out of
VFIO.

This adds an IOMMU device into a pci_controller (=PHB) and registers it
in the IOMMU subsystem, iommu_ops is registered at this point.
This setup is done in postcore_initcall_sync.

This replaces iommu_group_add_device() with iommu_probe_device() as
the former misses necessary steps in connecting PCI devices to IOMMU
devices. This adds a comment about why explicit iommu_probe_device()
is still needed.

The previous discussion is here:
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Fixes: 70693f470848 ("vfio: Set DMA ownership for VFIO devices")
Cc: Deming Wang <wangdeming@inspur.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: Fabiano Rosas <farosas@linux.ibm.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/pci-bridge.h     |   7 +
 arch/powerpc/platforms/pseries/pseries.h  |   5 +
 arch/powerpc/kernel/iommu.c               | 159 +++++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci-ioda.c |  30 ++++
 arch/powerpc/platforms/pseries/iommu.c    |  24 ++++
 arch/powerpc/platforms/pseries/setup.c    |   3 +
 drivers/vfio/vfio_iommu_spapr_tce.c       |   8 --
 7 files changed, 226 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index c85f901227c9..338a45b410b4 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -8,6 +8,7 @@
 #include <linux/list.h>
 #include <linux/ioport.h>
 #include <linux/numa.h>
+#include <linux/iommu.h>
 
 struct device_node;
 
@@ -44,6 +45,9 @@ struct pci_controller_ops {
 #endif
 
 	void		(*shutdown)(struct pci_controller *hose);
+
+	struct iommu_group *(*device_group)(struct pci_controller *hose,
+					    struct pci_dev *pdev);
 };
 
 /*
@@ -131,6 +135,9 @@ struct pci_controller {
 	struct irq_domain	*dev_domain;
 	struct irq_domain	*msi_domain;
 	struct fwnode_handle	*fwnode;
+
+	/* iommu_ops support */
+	struct iommu_device	iommu;
 };
 
 /* These are used for config access before all the PCI probing
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index f5c916c839c9..9a49a16dd89a 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -122,4 +122,9 @@ void pseries_lpar_read_hblkrm_characteristics(void);
 static inline void pseries_lpar_read_hblkrm_characteristics(void) { }
 #endif
 
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose,
+					     struct pci_dev *pdev);
+#endif
+
 #endif /* _PSERIES_PSERIES_H */
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index d873c123ab49..b5301e289714 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -35,6 +35,7 @@
 #include <asm/vio.h>
 #include <asm/tce.h>
 #include <asm/mmu_context.h>
+#include <asm/ppc-pci.h>
 
 #define DBG(...)
 
@@ -1158,8 +1159,14 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 
 	pr_debug("%s: Adding %s to iommu group %d\n",
 		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
-
-	return iommu_group_add_device(table_group->group, dev);
+	/*
+	 * This is still not adding devices via the IOMMU bus notifier because
+	 * of pcibios_init() from arch/powerpc/kernel/pci_64.c which calls
+	 * pcibios_scan_phb() first (and this guy adds devices and triggers
+	 * the notifier) and only then it calls pci_bus_add_devices() which
+	 * configures DMA for buses which also creates PEs and IOMMU groups.
+	 */
+	return iommu_probe_device(dev);
 }
 EXPORT_SYMBOL_GPL(iommu_add_device);
 
@@ -1239,6 +1246,7 @@ static long spapr_tce_take_ownership(struct iommu_table_group *table_group)
 		rc = iommu_take_ownership(tbl);
 		if (!rc)
 			continue;
+
 		for (j = 0; j < i; ++j)
 			iommu_release_ownership(table_group->tables[j]);
 		return rc;
@@ -1271,4 +1279,151 @@ struct iommu_table_group_ops spapr_tce_table_group_ops = {
 	.release_ownership = spapr_tce_release_ownership,
 };
 
+/*
+ * A simple iommu_ops to allow less cruft in generic VFIO code.
+ */
+static bool spapr_tce_iommu_capable(enum iommu_cap cap)
+{
+	switch (cap) {
+	case IOMMU_CAP_CACHE_COHERENCY:
+		return true;
+	default:
+		break;
+	}
+
+	return false;
+}
+
+static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type)
+{
+	struct iommu_domain *dom;
+
+	if (type != IOMMU_DOMAIN_BLOCKED)
+		return NULL;
+
+	dom = kzalloc(sizeof(*dom), GFP_KERNEL);
+	if (!dom)
+		return NULL;
+
+	dom->geometry.aperture_start = 0;
+	dom->geometry.aperture_end = ~0ULL;
+	dom->geometry.force_aperture = true;
+
+	return dom;
+}
+
+static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
+{
+	struct pci_dev *pdev;
+	struct pci_controller *hose;
+
+	/* Weirdly iommu_device_register() assigns the same ops to all buses */
+	if (!dev_is_pci(dev))
+		return ERR_PTR(-EPERM);
+
+	pdev = to_pci_dev(dev);
+	hose = pdev->bus->sysdata;
+
+	return &hose->iommu;
+}
+
+static void spapr_tce_iommu_release_device(struct device *dev)
+{
+}
+
+static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
+{
+	struct pci_controller *hose;
+	struct pci_dev *pdev;
+
+	/* Weirdly iommu_device_register() assigns the same ops to all buses */
+	if (!dev_is_pci(dev))
+		return ERR_PTR(-EPERM);
+
+	pdev = to_pci_dev(dev);
+	hose = pdev->bus->sysdata;
+
+	if (!hose->controller_ops.device_group)
+		return ERR_PTR(-ENOENT);
+
+	return hose->controller_ops.device_group(hose, pdev);
+}
+
+static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
+				      struct device *dev)
+{
+	struct iommu_group *grp = iommu_group_get(dev);
+	struct iommu_table_group *table_group;
+	int ret = -EINVAL;
+
+	if (!grp)
+		return -ENODEV;
+
+	table_group = iommu_group_get_iommudata(grp);
+
+	if (dom->type = IOMMU_DOMAIN_BLOCKED)
+		ret = table_group->ops->take_ownership(table_group);
+
+	iommu_group_put(grp);
+
+	return ret;
+}
+
+static void spapr_tce_iommu_detach_dev(struct iommu_domain *dom,
+				       struct device *dev)
+{
+	struct iommu_group *grp = iommu_group_get(dev);
+	struct iommu_table_group *table_group;
+
+	table_group = iommu_group_get_iommudata(grp);
+	WARN_ON(dom->type != IOMMU_DOMAIN_BLOCKED);
+	table_group->ops->release_ownership(table_group);
+}
+
+static const struct iommu_ops spapr_tce_iommu_ops = {
+	.capable = spapr_tce_iommu_capable,
+	.domain_alloc = spapr_tce_iommu_domain_alloc,
+	.probe_device = spapr_tce_iommu_probe_device,
+	.release_device = spapr_tce_iommu_release_device,
+	.device_group = spapr_tce_iommu_device_group,
+	.default_domain_ops = &(const struct iommu_domain_ops) {
+		.attach_dev = spapr_tce_iommu_attach_dev,
+		.detach_dev = spapr_tce_iommu_detach_dev,
+	}
+};
+
+static struct attribute *spapr_tce_iommu_attrs[] = {
+	NULL,
+};
+
+static struct attribute_group spapr_tce_iommu_group = {
+	.name = "spapr-tce-iommu",
+	.attrs = spapr_tce_iommu_attrs,
+};
+
+static const struct attribute_group *spapr_tce_iommu_groups[] = {
+	&spapr_tce_iommu_group,
+	NULL,
+};
+
+/*
+ * This registers IOMMU devices of PHBs. This needs to happen
+ * after core_initcall(iommu_init) + postcore_initcall(pci_driver_init) and
+ * before subsys_initcall(iommu_subsys_init).
+ */
+static int __init spapr_tce_setup_phb_iommus_initcall(void)
+{
+	struct pci_controller *hose;
+
+	list_for_each_entry(hose, &hose_list, list_node) {
+		iommu_device_sysfs_add(&hose->iommu, hose->parent,
+				       spapr_tce_iommu_groups, "iommu-phb%04x",
+				       hose->global_number);
+		iommu_device_register(&hose->iommu, &spapr_tce_iommu_ops,
+				      hose->parent);
+	}
+	return 0;
+}
+postcore_initcall_sync(spapr_tce_setup_phb_iommus_initcall);
+
 #endif /* CONFIG_IOMMU_API */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 180965a309b6..683425a77233 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1897,6 +1897,13 @@ static long pnv_ioda2_take_ownership(struct iommu_table_group *table_group)
 	/* Store @tbl as pnv_pci_ioda2_unset_window() resets it */
 	struct iommu_table *tbl = pe->table_group.tables[0];
 
+	/*
+	 * iommu_ops transfers the ownership per a device and we mode
+	 * the group ownership with the first device in the group.
+	 */
+	if (!tbl)
+		return 0;
+
 	pnv_pci_ioda2_set_bypass(pe, false);
 	pnv_pci_ioda2_unset_window(&pe->table_group, 0);
 	if (pe->pbus)
@@ -1913,6 +1920,9 @@ static void pnv_ioda2_release_ownership(struct iommu_table_group *table_group)
 	struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
 						table_group);
 
+	/* See the comment about iommu_ops above */
+	if (pe->table_group.tables[0])
+		return;
 	pnv_pci_ioda2_setup_default_config(pe);
 	if (pe->pbus)
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
@@ -2918,6 +2928,25 @@ static void pnv_pci_ioda_dma_bus_setup(struct pci_bus *bus)
 	}
 }
 
+static struct iommu_group *pnv_pci_device_group(struct pci_controller *hose,
+						struct pci_dev *pdev)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct pnv_ioda_pe *pe;
+
+	if (WARN_ON(!phb))
+		return ERR_PTR(-ENODEV);
+
+	pe = pnv_pci_bdfn_to_pe(phb, pdev->devfn | (pdev->bus->number << 8));
+	if (!pe)
+		return ERR_PTR(-ENODEV);
+
+	if (!pe->table_group.group)
+		return ERR_PTR(-ENODEV);
+
+	return iommu_group_ref_get(pe->table_group.group);
+}
+
 static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.dma_dev_setup		= pnv_pci_ioda_dma_dev_setup,
 	.dma_bus_setup		= pnv_pci_ioda_dma_bus_setup,
@@ -2928,6 +2957,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.setup_bridge		= pnv_pci_fixup_bridge_resources,
 	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
 	.shutdown		= pnv_pci_ioda_shutdown,
+	.device_group		= pnv_pci_device_group,
 };
 
 static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index ae05b11c457d..57bebe51a564 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1725,3 +1725,27 @@ static int __init tce_iommu_bus_notifier_init(void)
 	return 0;
 }
 machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
+
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose,
+					     struct pci_dev *pdev)
+{
+	struct device_node *pdn, *dn = pdev->dev.of_node;
+	struct iommu_group *grp;
+	struct pci_dn *pci;
+
+	pdn = pci_dma_find(dn, NULL);
+	if (!pdn || !PCI_DN(pdn))
+		return ERR_PTR(-ENODEV);
+
+	pci = PCI_DN(pdn);
+	if (!pci->table_group)
+		return ERR_PTR(-ENODEV);
+
+	grp = pci->table_group->group;
+	if (!grp)
+		return ERR_PTR(-ENODEV);
+
+	return iommu_group_ref_get(grp);
+}
+#endif
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 233c64f59815..3e3b3f031029 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -1079,6 +1079,9 @@ static int pSeries_pci_probe_mode(struct pci_bus *bus)
 
 struct pci_controller_ops pseries_pci_controller_ops = {
 	.probe_mode		= pSeries_pci_probe_mode,
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+	.device_group		= pSeries_pci_device_group,
+#endif
 };
 
 define_machine(pseries) {
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index 8a65ea61744c..3b53b466e49b 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -1152,8 +1152,6 @@ static void tce_iommu_release_ownership(struct tce_container *container,
 	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
 		if (container->tables[i])
 			table_group->ops->unset_window(table_group, i);
-
-	table_group->ops->release_ownership(table_group);
 }
 
 static long tce_iommu_take_ownership(struct tce_container *container,
@@ -1161,10 +1159,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
 {
 	long i, ret = 0;
 
-	ret = table_group->ops->take_ownership(table_group);
-	if (ret)
-		return ret;
-
 	/* Set all windows to the new group */
 	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
 		struct iommu_table *tbl = container->tables[i];
@@ -1183,8 +1177,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
 	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
 		table_group->ops->unset_window(table_group, i);
 
-	table_group->ops->release_ownership(table_group);
-
 	return ret;
 }
 
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 1/3] powerpc/iommu: Add "borrowing" iommu_table_group_ops
  2022-07-14  8:18   ` Alexey Kardashevskiy
  (?)
@ 2022-07-18 16:49     ` Jason Gunthorpe
  -1 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-07-18 16:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm, kvm-ppc, Deming Wang, Robin Murphy,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman

On Thu, Jul 14, 2022 at 06:18:20PM +1000, Alexey Kardashevskiy wrote:
> PPC64 IOMMU API defines iommu_table_group_ops which handles DMA windows
> for PEs: control the ownership, create/set/unset a table the hardware
> for dynamic DMA windows (DDW). VFIO uses the API to implement support
> on POWER.
> 
> So far only PowerNV IODA2 (POWER8 and newer machines) implemented this and other cases (POWER7 or nested KVM) did not and instead reused
> existing iommu_table structs. This means 1) no DDW 2) ownership transfer
> is done directly in the VFIO SPAPR TCE driver.
> 
> Soon POWER is going to get its own iommu_ops and ownership control is
> going to move there. This implements spapr_tce_table_group_ops which
> borrows iommu_table tables. The upside is that VFIO needs to know less
> about POWER.
> 
> The new ops returns the existing table from create_table() and
> only checks if the same window is already set. This is only going to work
> if the default DMA window starts table_group.tce32_start and as big as
> pe->table_group.tce32_size (not the case for IODA2+ PowerNV).
> 
> This changes iommu_table_group_ops::take_ownership() to return an error
> if borrowing a table failed.
> 
> This should not cause any visible change in behavior for PowerNV.
> pSeries was not that well tested/supported anyway.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/include/asm/iommu.h          |  6 +-
>  arch/powerpc/kernel/iommu.c               | 98 ++++++++++++++++++++++-
>  arch/powerpc/platforms/powernv/pci-ioda.c |  6 +-
>  arch/powerpc/platforms/pseries/iommu.c    |  3 +
>  drivers/vfio/vfio_iommu_spapr_tce.c       | 94 ++++------------------
>  5 files changed, 121 insertions(+), 86 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 1/3] powerpc/iommu: Add "borrowing" iommu_table_group_ops
@ 2022-07-18 16:49     ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-07-18 16:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, Fabiano Rosas, Robin Murphy, Daniel Henrique Barboza,
	Deming Wang, kvm-ppc, Alex Williamson, Nicholas Piggin,
	Murilo Opsfelder Araujo, linuxppc-dev

On Thu, Jul 14, 2022 at 06:18:20PM +1000, Alexey Kardashevskiy wrote:
> PPC64 IOMMU API defines iommu_table_group_ops which handles DMA windows
> for PEs: control the ownership, create/set/unset a table the hardware
> for dynamic DMA windows (DDW). VFIO uses the API to implement support
> on POWER.
> 
> So far only PowerNV IODA2 (POWER8 and newer machines) implemented this and other cases (POWER7 or nested KVM) did not and instead reused
> existing iommu_table structs. This means 1) no DDW 2) ownership transfer
> is done directly in the VFIO SPAPR TCE driver.
> 
> Soon POWER is going to get its own iommu_ops and ownership control is
> going to move there. This implements spapr_tce_table_group_ops which
> borrows iommu_table tables. The upside is that VFIO needs to know less
> about POWER.
> 
> The new ops returns the existing table from create_table() and
> only checks if the same window is already set. This is only going to work
> if the default DMA window starts table_group.tce32_start and as big as
> pe->table_group.tce32_size (not the case for IODA2+ PowerNV).
> 
> This changes iommu_table_group_ops::take_ownership() to return an error
> if borrowing a table failed.
> 
> This should not cause any visible change in behavior for PowerNV.
> pSeries was not that well tested/supported anyway.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/include/asm/iommu.h          |  6 +-
>  arch/powerpc/kernel/iommu.c               | 98 ++++++++++++++++++++++-
>  arch/powerpc/platforms/powernv/pci-ioda.c |  6 +-
>  arch/powerpc/platforms/pseries/iommu.c    |  3 +
>  drivers/vfio/vfio_iommu_spapr_tce.c       | 94 ++++------------------
>  5 files changed, 121 insertions(+), 86 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 1/3] powerpc/iommu: Add "borrowing" iommu_table_group_ops
@ 2022-07-18 16:49     ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-07-18 16:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm, kvm-ppc, Deming Wang, Robin Murphy,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman

On Thu, Jul 14, 2022 at 06:18:20PM +1000, Alexey Kardashevskiy wrote:
> PPC64 IOMMU API defines iommu_table_group_ops which handles DMA windows
> for PEs: control the ownership, create/set/unset a table the hardware
> for dynamic DMA windows (DDW). VFIO uses the API to implement support
> on POWER.
> 
> So far only PowerNV IODA2 (POWER8 and newer machines) implemented this and other cases (POWER7 or nested KVM) did not and instead reused
> existing iommu_table structs. This means 1) no DDW 2) ownership transfer
> is done directly in the VFIO SPAPR TCE driver.
> 
> Soon POWER is going to get its own iommu_ops and ownership control is
> going to move there. This implements spapr_tce_table_group_ops which
> borrows iommu_table tables. The upside is that VFIO needs to know less
> about POWER.
> 
> The new ops returns the existing table from create_table() and
> only checks if the same window is already set. This is only going to work
> if the default DMA window starts table_group.tce32_start and as big as
> pe->table_group.tce32_size (not the case for IODA2+ PowerNV).
> 
> This changes iommu_table_group_ops::take_ownership() to return an error
> if borrowing a table failed.
> 
> This should not cause any visible change in behavior for PowerNV.
> pSeries was not that well tested/supported anyway.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/include/asm/iommu.h          |  6 +-
>  arch/powerpc/kernel/iommu.c               | 98 ++++++++++++++++++++++-
>  arch/powerpc/platforms/powernv/pci-ioda.c |  6 +-
>  arch/powerpc/platforms/pseries/iommu.c    |  3 +
>  drivers/vfio/vfio_iommu_spapr_tce.c       | 94 ++++------------------
>  5 files changed, 121 insertions(+), 86 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 2/3] powerpc/pci_64: Init pcibios subsys a bit later
  2022-07-14  8:18   ` Alexey Kardashevskiy
  (?)
@ 2022-07-18 16:49     ` Jason Gunthorpe
  -1 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-07-18 16:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm, kvm-ppc, Deming Wang, Robin Murphy,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman

On Thu, Jul 14, 2022 at 06:18:21PM +1000, Alexey Kardashevskiy wrote:
> The following patches are going to add dependency/use of iommu_ops which
> is initialized in subsys_initcall as well.
> 
> This moves pciobios_init() to the next initcall level.
> 
> This should not cause behavioral change.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kernel/pci_64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 2/3] powerpc/pci_64: Init pcibios subsys a bit later
@ 2022-07-18 16:49     ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-07-18 16:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, Fabiano Rosas, Robin Murphy, Daniel Henrique Barboza,
	Deming Wang, kvm-ppc, Alex Williamson, Nicholas Piggin,
	Murilo Opsfelder Araujo, linuxppc-dev

On Thu, Jul 14, 2022 at 06:18:21PM +1000, Alexey Kardashevskiy wrote:
> The following patches are going to add dependency/use of iommu_ops which
> is initialized in subsys_initcall as well.
> 
> This moves pciobios_init() to the next initcall level.
> 
> This should not cause behavioral change.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kernel/pci_64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 2/3] powerpc/pci_64: Init pcibios subsys a bit later
@ 2022-07-18 16:49     ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-07-18 16:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm, kvm-ppc, Deming Wang, Robin Murphy,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman

On Thu, Jul 14, 2022 at 06:18:21PM +1000, Alexey Kardashevskiy wrote:
> The following patches are going to add dependency/use of iommu_ops which
> is initialized in subsys_initcall as well.
> 
> This moves pciobios_init() to the next initcall level.
> 
> This should not cause behavioral change.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kernel/pci_64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-14  8:18   ` Alexey Kardashevskiy
  (?)
@ 2022-07-18 18:09     ` Jason Gunthorpe
  -1 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-07-18 18:09 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm, kvm-ppc, Deming Wang, Robin Murphy,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman

On Thu, Jul 14, 2022 at 06:18:22PM +1000, Alexey Kardashevskiy wrote:

> +/*
> + * A simple iommu_ops to allow less cruft in generic VFIO code.
> + */
> +static bool spapr_tce_iommu_capable(enum iommu_cap cap)
> +{
> +	switch (cap) {
> +	case IOMMU_CAP_CACHE_COHERENCY:

I would add a remark here that it is because vfio is going to use
SPAPR mode but still checks that the iommu driver support coherency -
with out that detail it looks very strange to have caps without
implementing unmanaged domains

> +static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type)
> +{
> +	struct iommu_domain *dom;
> +
> +	if (type != IOMMU_DOMAIN_BLOCKED)
> +		return NULL;
> +
> +	dom = kzalloc(sizeof(*dom), GFP_KERNEL);
> +	if (!dom)
> +		return NULL;
> +
> +	dom->geometry.aperture_start = 0;
> +	dom->geometry.aperture_end = ~0ULL;
> +	dom->geometry.force_aperture = true;

A blocked domain doesn't really have an aperture, all DMA is rejected,
so I think these can just be deleted and left at zero.

Generally I'm suggesting drivers just use a static singleton instance
for the blocked domain instead of the allocation like this, but that
is a very minor nit.

> +static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
> +{
> +	struct pci_dev *pdev;
> +	struct pci_controller *hose;
> +
> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
> +	if (!dev_is_pci(dev))
> +		return ERR_PTR(-EPERM);

Less "weirdly", more by design. The iommu driver should check if the
given struct device is supported or not, it isn't really a bus
specific operation.

> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
> +{
> +	struct pci_controller *hose;
> +	struct pci_dev *pdev;
> +
> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
> +	if (!dev_is_pci(dev))
> +		return ERR_PTR(-EPERM);

This doesn't need repeating, if probe_device() fails then this will
never be called.

> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
> +				      struct device *dev)
> +{
> +	struct iommu_group *grp = iommu_group_get(dev);
> +	struct iommu_table_group *table_group;
> +	int ret = -EINVAL;
> +
> +	if (!grp)
> +		return -ENODEV;
> +
> +	table_group = iommu_group_get_iommudata(grp);
> +
> +	if (dom->type == IOMMU_DOMAIN_BLOCKED)
> +		ret = table_group->ops->take_ownership(table_group);

Ideally there shouldn't be dom->type checks like this.


The blocking domain should have its own iommu_domain_ops that only
process the blocking operation. Ie call this like
spapr_tce_iommu_blocking_attach_dev()

Instead of having a "default_domain_ops" leave it NULL and create a
spapr_tce_blocking_domain_ops with these two functions and assign it
to domain->ops when creating. Then it is really clear these functions
are only called for the DOMAIN_BLOCKED type and you don't need to
check it.

> +static void spapr_tce_iommu_detach_dev(struct iommu_domain *dom,
> +				       struct device *dev)
> +{
> +	struct iommu_group *grp = iommu_group_get(dev);
> +	struct iommu_table_group *table_group;
> +
> +	table_group = iommu_group_get_iommudata(grp);
> +	WARN_ON(dom->type != IOMMU_DOMAIN_BLOCKED);
> +	table_group->ops->release_ownership(table_group);
> +}

Ditto

> +struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose,
> +					     struct pci_dev *pdev)
> +{
> +	struct device_node *pdn, *dn = pdev->dev.of_node;
> +	struct iommu_group *grp;
> +	struct pci_dn *pci;
> +
> +	pdn = pci_dma_find(dn, NULL);
> +	if (!pdn || !PCI_DN(pdn))
> +		return ERR_PTR(-ENODEV);
> +
> +	pci = PCI_DN(pdn);
> +	if (!pci->table_group)
> +		return ERR_PTR(-ENODEV);
> +
> +	grp = pci->table_group->group;
> +	if (!grp)
> +		return ERR_PTR(-ENODEV);
> +
> +	return iommu_group_ref_get(grp);

Not for this series, but this is kind of backwards, the driver
specific data (ie the table_group) should be in
iommu_group_get_iommudata()...

> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 8a65ea61744c..3b53b466e49b 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -1152,8 +1152,6 @@ static void tce_iommu_release_ownership(struct tce_container *container,
>  	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
>  		if (container->tables[i])
>  			table_group->ops->unset_window(table_group, i);
> -
> -	table_group->ops->release_ownership(table_group);
>  }
>  
>  static long tce_iommu_take_ownership(struct tce_container *container,
> @@ -1161,10 +1159,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
>  {
>  	long i, ret = 0;
>  
> -	ret = table_group->ops->take_ownership(table_group);
> -	if (ret)
> -		return ret;
> -
>  	/* Set all windows to the new group */
>  	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
>  		struct iommu_table *tbl = container->tables[i];
> @@ -1183,8 +1177,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
>  	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
>  		table_group->ops->unset_window(table_group, i);
>  
> -	table_group->ops->release_ownership(table_group);
> -

This is great, makes alot of sense.

Anyhow, it all looks fine to me as is even:

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-18 18:09     ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-07-18 18:09 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, Fabiano Rosas, Robin Murphy, Daniel Henrique Barboza,
	Deming Wang, kvm-ppc, Alex Williamson, Nicholas Piggin,
	Murilo Opsfelder Araujo, linuxppc-dev

On Thu, Jul 14, 2022 at 06:18:22PM +1000, Alexey Kardashevskiy wrote:

> +/*
> + * A simple iommu_ops to allow less cruft in generic VFIO code.
> + */
> +static bool spapr_tce_iommu_capable(enum iommu_cap cap)
> +{
> +	switch (cap) {
> +	case IOMMU_CAP_CACHE_COHERENCY:

I would add a remark here that it is because vfio is going to use
SPAPR mode but still checks that the iommu driver support coherency -
with out that detail it looks very strange to have caps without
implementing unmanaged domains

> +static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type)
> +{
> +	struct iommu_domain *dom;
> +
> +	if (type != IOMMU_DOMAIN_BLOCKED)
> +		return NULL;
> +
> +	dom = kzalloc(sizeof(*dom), GFP_KERNEL);
> +	if (!dom)
> +		return NULL;
> +
> +	dom->geometry.aperture_start = 0;
> +	dom->geometry.aperture_end = ~0ULL;
> +	dom->geometry.force_aperture = true;

A blocked domain doesn't really have an aperture, all DMA is rejected,
so I think these can just be deleted and left at zero.

Generally I'm suggesting drivers just use a static singleton instance
for the blocked domain instead of the allocation like this, but that
is a very minor nit.

> +static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
> +{
> +	struct pci_dev *pdev;
> +	struct pci_controller *hose;
> +
> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
> +	if (!dev_is_pci(dev))
> +		return ERR_PTR(-EPERM);

Less "weirdly", more by design. The iommu driver should check if the
given struct device is supported or not, it isn't really a bus
specific operation.

> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
> +{
> +	struct pci_controller *hose;
> +	struct pci_dev *pdev;
> +
> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
> +	if (!dev_is_pci(dev))
> +		return ERR_PTR(-EPERM);

This doesn't need repeating, if probe_device() fails then this will
never be called.

> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
> +				      struct device *dev)
> +{
> +	struct iommu_group *grp = iommu_group_get(dev);
> +	struct iommu_table_group *table_group;
> +	int ret = -EINVAL;
> +
> +	if (!grp)
> +		return -ENODEV;
> +
> +	table_group = iommu_group_get_iommudata(grp);
> +
> +	if (dom->type == IOMMU_DOMAIN_BLOCKED)
> +		ret = table_group->ops->take_ownership(table_group);

Ideally there shouldn't be dom->type checks like this.


The blocking domain should have its own iommu_domain_ops that only
process the blocking operation. Ie call this like
spapr_tce_iommu_blocking_attach_dev()

Instead of having a "default_domain_ops" leave it NULL and create a
spapr_tce_blocking_domain_ops with these two functions and assign it
to domain->ops when creating. Then it is really clear these functions
are only called for the DOMAIN_BLOCKED type and you don't need to
check it.

> +static void spapr_tce_iommu_detach_dev(struct iommu_domain *dom,
> +				       struct device *dev)
> +{
> +	struct iommu_group *grp = iommu_group_get(dev);
> +	struct iommu_table_group *table_group;
> +
> +	table_group = iommu_group_get_iommudata(grp);
> +	WARN_ON(dom->type != IOMMU_DOMAIN_BLOCKED);
> +	table_group->ops->release_ownership(table_group);
> +}

Ditto

> +struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose,
> +					     struct pci_dev *pdev)
> +{
> +	struct device_node *pdn, *dn = pdev->dev.of_node;
> +	struct iommu_group *grp;
> +	struct pci_dn *pci;
> +
> +	pdn = pci_dma_find(dn, NULL);
> +	if (!pdn || !PCI_DN(pdn))
> +		return ERR_PTR(-ENODEV);
> +
> +	pci = PCI_DN(pdn);
> +	if (!pci->table_group)
> +		return ERR_PTR(-ENODEV);
> +
> +	grp = pci->table_group->group;
> +	if (!grp)
> +		return ERR_PTR(-ENODEV);
> +
> +	return iommu_group_ref_get(grp);

Not for this series, but this is kind of backwards, the driver
specific data (ie the table_group) should be in
iommu_group_get_iommudata()...

> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 8a65ea61744c..3b53b466e49b 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -1152,8 +1152,6 @@ static void tce_iommu_release_ownership(struct tce_container *container,
>  	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
>  		if (container->tables[i])
>  			table_group->ops->unset_window(table_group, i);
> -
> -	table_group->ops->release_ownership(table_group);
>  }
>  
>  static long tce_iommu_take_ownership(struct tce_container *container,
> @@ -1161,10 +1159,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
>  {
>  	long i, ret = 0;
>  
> -	ret = table_group->ops->take_ownership(table_group);
> -	if (ret)
> -		return ret;
> -
>  	/* Set all windows to the new group */
>  	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
>  		struct iommu_table *tbl = container->tables[i];
> @@ -1183,8 +1177,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
>  	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
>  		table_group->ops->unset_window(table_group, i);
>  
> -	table_group->ops->release_ownership(table_group);
> -

This is great, makes alot of sense.

Anyhow, it all looks fine to me as is even:

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domain
@ 2022-07-18 18:09     ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-07-18 18:09 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm, kvm-ppc, Deming Wang, Robin Murphy,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman

On Thu, Jul 14, 2022 at 06:18:22PM +1000, Alexey Kardashevskiy wrote:

> +/*
> + * A simple iommu_ops to allow less cruft in generic VFIO code.
> + */
> +static bool spapr_tce_iommu_capable(enum iommu_cap cap)
> +{
> +	switch (cap) {
> +	case IOMMU_CAP_CACHE_COHERENCY:

I would add a remark here that it is because vfio is going to use
SPAPR mode but still checks that the iommu driver support coherency -
with out that detail it looks very strange to have caps without
implementing unmanaged domains

> +static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type)
> +{
> +	struct iommu_domain *dom;
> +
> +	if (type != IOMMU_DOMAIN_BLOCKED)
> +		return NULL;
> +
> +	dom = kzalloc(sizeof(*dom), GFP_KERNEL);
> +	if (!dom)
> +		return NULL;
> +
> +	dom->geometry.aperture_start = 0;
> +	dom->geometry.aperture_end = ~0ULL;
> +	dom->geometry.force_aperture = true;

A blocked domain doesn't really have an aperture, all DMA is rejected,
so I think these can just be deleted and left at zero.

Generally I'm suggesting drivers just use a static singleton instance
for the blocked domain instead of the allocation like this, but that
is a very minor nit.

> +static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
> +{
> +	struct pci_dev *pdev;
> +	struct pci_controller *hose;
> +
> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
> +	if (!dev_is_pci(dev))
> +		return ERR_PTR(-EPERM);

Less "weirdly", more by design. The iommu driver should check if the
given struct device is supported or not, it isn't really a bus
specific operation.

> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
> +{
> +	struct pci_controller *hose;
> +	struct pci_dev *pdev;
> +
> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
> +	if (!dev_is_pci(dev))
> +		return ERR_PTR(-EPERM);

This doesn't need repeating, if probe_device() fails then this will
never be called.

> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
> +				      struct device *dev)
> +{
> +	struct iommu_group *grp = iommu_group_get(dev);
> +	struct iommu_table_group *table_group;
> +	int ret = -EINVAL;
> +
> +	if (!grp)
> +		return -ENODEV;
> +
> +	table_group = iommu_group_get_iommudata(grp);
> +
> +	if (dom->type = IOMMU_DOMAIN_BLOCKED)
> +		ret = table_group->ops->take_ownership(table_group);

Ideally there shouldn't be dom->type checks like this.


The blocking domain should have its own iommu_domain_ops that only
process the blocking operation. Ie call this like
spapr_tce_iommu_blocking_attach_dev()

Instead of having a "default_domain_ops" leave it NULL and create a
spapr_tce_blocking_domain_ops with these two functions and assign it
to domain->ops when creating. Then it is really clear these functions
are only called for the DOMAIN_BLOCKED type and you don't need to
check it.

> +static void spapr_tce_iommu_detach_dev(struct iommu_domain *dom,
> +				       struct device *dev)
> +{
> +	struct iommu_group *grp = iommu_group_get(dev);
> +	struct iommu_table_group *table_group;
> +
> +	table_group = iommu_group_get_iommudata(grp);
> +	WARN_ON(dom->type != IOMMU_DOMAIN_BLOCKED);
> +	table_group->ops->release_ownership(table_group);
> +}

Ditto

> +struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose,
> +					     struct pci_dev *pdev)
> +{
> +	struct device_node *pdn, *dn = pdev->dev.of_node;
> +	struct iommu_group *grp;
> +	struct pci_dn *pci;
> +
> +	pdn = pci_dma_find(dn, NULL);
> +	if (!pdn || !PCI_DN(pdn))
> +		return ERR_PTR(-ENODEV);
> +
> +	pci = PCI_DN(pdn);
> +	if (!pci->table_group)
> +		return ERR_PTR(-ENODEV);
> +
> +	grp = pci->table_group->group;
> +	if (!grp)
> +		return ERR_PTR(-ENODEV);
> +
> +	return iommu_group_ref_get(grp);

Not for this series, but this is kind of backwards, the driver
specific data (ie the table_group) should be in
iommu_group_get_iommudata()...

> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 8a65ea61744c..3b53b466e49b 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -1152,8 +1152,6 @@ static void tce_iommu_release_ownership(struct tce_container *container,
>  	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
>  		if (container->tables[i])
>  			table_group->ops->unset_window(table_group, i);
> -
> -	table_group->ops->release_ownership(table_group);
>  }
>  
>  static long tce_iommu_take_ownership(struct tce_container *container,
> @@ -1161,10 +1159,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
>  {
>  	long i, ret = 0;
>  
> -	ret = table_group->ops->take_ownership(table_group);
> -	if (ret)
> -		return ret;
> -
>  	/* Set all windows to the new group */
>  	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
>  		struct iommu_table *tbl = container->tables[i];
> @@ -1183,8 +1177,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
>  	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
>  		table_group->ops->unset_window(table_group, i);
>  
> -	table_group->ops->release_ownership(table_group);
> -

This is great, makes alot of sense.

Anyhow, it all looks fine to me as is even:

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-18 18:09     ` [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains Jason Gunthorpe
  (?)
@ 2022-07-19  3:18       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-19  3:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, kvm, kvm-ppc, Deming Wang, Robin Murphy,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman



On 19/07/2022 04:09, Jason Gunthorpe wrote:
> On Thu, Jul 14, 2022 at 06:18:22PM +1000, Alexey Kardashevskiy wrote:
> 
>> +/*
>> + * A simple iommu_ops to allow less cruft in generic VFIO code.
>> + */
>> +static bool spapr_tce_iommu_capable(enum iommu_cap cap)
>> +{
>> +	switch (cap) {
>> +	case IOMMU_CAP_CACHE_COHERENCY:
> 
> I would add a remark here that it is because vfio is going to use
> SPAPR mode but still checks that the iommu driver support coherency -
> with out that detail it looks very strange to have caps without
> implementing unmanaged domains
> 
>> +static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type)
>> +{
>> +	struct iommu_domain *dom;
>> +
>> +	if (type != IOMMU_DOMAIN_BLOCKED)
>> +		return NULL;
>> +
>> +	dom = kzalloc(sizeof(*dom), GFP_KERNEL);
>> +	if (!dom)
>> +		return NULL;
>> +
>> +	dom->geometry.aperture_start = 0;
>> +	dom->geometry.aperture_end = ~0ULL;
>> +	dom->geometry.force_aperture = true;
> 
> A blocked domain doesn't really have an aperture, all DMA is rejected,
> so I think these can just be deleted and left at zero.
> 
> Generally I'm suggesting drivers just use a static singleton instance
> for the blocked domain instead of the allocation like this, but that
> is a very minor nit.
> 
>> +static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
>> +{
>> +	struct pci_dev *pdev;
>> +	struct pci_controller *hose;
>> +
>> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
>> +	if (!dev_is_pci(dev))
>> +		return ERR_PTR(-EPERM);
> 
> Less "weirdly", more by design. The iommu driver should check if the
> given struct device is supported or not, it isn't really a bus
> specific operation.
> 
>> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
>> +{
>> +	struct pci_controller *hose;
>> +	struct pci_dev *pdev;
>> +
>> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
>> +	if (!dev_is_pci(dev))
>> +		return ERR_PTR(-EPERM);
> 
> This doesn't need repeating, if probe_device() fails then this will
> never be called.
> 
>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>> +				      struct device *dev)
>> +{
>> +	struct iommu_group *grp = iommu_group_get(dev);
>> +	struct iommu_table_group *table_group;
>> +	int ret = -EINVAL;
>> +
>> +	if (!grp)
>> +		return -ENODEV;
>> +
>> +	table_group = iommu_group_get_iommudata(grp);
>> +
>> +	if (dom->type == IOMMU_DOMAIN_BLOCKED)
>> +		ret = table_group->ops->take_ownership(table_group);
> 
> Ideally there shouldn't be dom->type checks like this.
> 
> 
> The blocking domain should have its own iommu_domain_ops that only
> process the blocking operation. Ie call this like
> spapr_tce_iommu_blocking_attach_dev()
> 
> Instead of having a "default_domain_ops" leave it NULL and create a
> spapr_tce_blocking_domain_ops with these two functions and assign it
> to domain->ops when creating. Then it is really clear these functions
> are only called for the DOMAIN_BLOCKED type and you don't need to
> check it.
> 
>> +static void spapr_tce_iommu_detach_dev(struct iommu_domain *dom,
>> +				       struct device *dev)
>> +{
>> +	struct iommu_group *grp = iommu_group_get(dev);
>> +	struct iommu_table_group *table_group;
>> +
>> +	table_group = iommu_group_get_iommudata(grp);
>> +	WARN_ON(dom->type != IOMMU_DOMAIN_BLOCKED);
>> +	table_group->ops->release_ownership(table_group);
>> +}
> 
> Ditto
> 
>> +struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose,
>> +					     struct pci_dev *pdev)
>> +{
>> +	struct device_node *pdn, *dn = pdev->dev.of_node;
>> +	struct iommu_group *grp;
>> +	struct pci_dn *pci;
>> +
>> +	pdn = pci_dma_find(dn, NULL);
>> +	if (!pdn || !PCI_DN(pdn))
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	pci = PCI_DN(pdn);
>> +	if (!pci->table_group)
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	grp = pci->table_group->group;
>> +	if (!grp)
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	return iommu_group_ref_get(grp);
> 
> Not for this series, but this is kind of backwards, the driver
> specific data (ie the table_group) should be in
> iommu_group_get_iommudata()...


It is there but here we are getting from a device to a group - a device 
is not added to a group yet when iommu_probe_device() works and tries 
adding a device via iommu_group_get_for_dev().




>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
>> index 8a65ea61744c..3b53b466e49b 100644
>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>> @@ -1152,8 +1152,6 @@ static void tce_iommu_release_ownership(struct tce_container *container,
>>   	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
>>   		if (container->tables[i])
>>   			table_group->ops->unset_window(table_group, i);
>> -
>> -	table_group->ops->release_ownership(table_group);
>>   }
>>   
>>   static long tce_iommu_take_ownership(struct tce_container *container,
>> @@ -1161,10 +1159,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
>>   {
>>   	long i, ret = 0;
>>   
>> -	ret = table_group->ops->take_ownership(table_group);
>> -	if (ret)
>> -		return ret;
>> -
>>   	/* Set all windows to the new group */
>>   	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
>>   		struct iommu_table *tbl = container->tables[i];
>> @@ -1183,8 +1177,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
>>   	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
>>   		table_group->ops->unset_window(table_group, i);
>>   
>> -	table_group->ops->release_ownership(table_group);
>> -
> 
> This is great, makes alot of sense.
> 
> Anyhow, it all looks fine to me as is even:
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Thanks. I'll try now to find an interested party to test this :)


> 
> Jason

-- 
Alexey

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-19  3:18       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-19  3:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Fabiano Rosas, Robin Murphy, Daniel Henrique Barboza,
	Deming Wang, kvm-ppc, Alex Williamson, Nicholas Piggin,
	Murilo Opsfelder Araujo, linuxppc-dev



On 19/07/2022 04:09, Jason Gunthorpe wrote:
> On Thu, Jul 14, 2022 at 06:18:22PM +1000, Alexey Kardashevskiy wrote:
> 
>> +/*
>> + * A simple iommu_ops to allow less cruft in generic VFIO code.
>> + */
>> +static bool spapr_tce_iommu_capable(enum iommu_cap cap)
>> +{
>> +	switch (cap) {
>> +	case IOMMU_CAP_CACHE_COHERENCY:
> 
> I would add a remark here that it is because vfio is going to use
> SPAPR mode but still checks that the iommu driver support coherency -
> with out that detail it looks very strange to have caps without
> implementing unmanaged domains
> 
>> +static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type)
>> +{
>> +	struct iommu_domain *dom;
>> +
>> +	if (type != IOMMU_DOMAIN_BLOCKED)
>> +		return NULL;
>> +
>> +	dom = kzalloc(sizeof(*dom), GFP_KERNEL);
>> +	if (!dom)
>> +		return NULL;
>> +
>> +	dom->geometry.aperture_start = 0;
>> +	dom->geometry.aperture_end = ~0ULL;
>> +	dom->geometry.force_aperture = true;
> 
> A blocked domain doesn't really have an aperture, all DMA is rejected,
> so I think these can just be deleted and left at zero.
> 
> Generally I'm suggesting drivers just use a static singleton instance
> for the blocked domain instead of the allocation like this, but that
> is a very minor nit.
> 
>> +static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
>> +{
>> +	struct pci_dev *pdev;
>> +	struct pci_controller *hose;
>> +
>> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
>> +	if (!dev_is_pci(dev))
>> +		return ERR_PTR(-EPERM);
> 
> Less "weirdly", more by design. The iommu driver should check if the
> given struct device is supported or not, it isn't really a bus
> specific operation.
> 
>> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
>> +{
>> +	struct pci_controller *hose;
>> +	struct pci_dev *pdev;
>> +
>> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
>> +	if (!dev_is_pci(dev))
>> +		return ERR_PTR(-EPERM);
> 
> This doesn't need repeating, if probe_device() fails then this will
> never be called.
> 
>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>> +				      struct device *dev)
>> +{
>> +	struct iommu_group *grp = iommu_group_get(dev);
>> +	struct iommu_table_group *table_group;
>> +	int ret = -EINVAL;
>> +
>> +	if (!grp)
>> +		return -ENODEV;
>> +
>> +	table_group = iommu_group_get_iommudata(grp);
>> +
>> +	if (dom->type == IOMMU_DOMAIN_BLOCKED)
>> +		ret = table_group->ops->take_ownership(table_group);
> 
> Ideally there shouldn't be dom->type checks like this.
> 
> 
> The blocking domain should have its own iommu_domain_ops that only
> process the blocking operation. Ie call this like
> spapr_tce_iommu_blocking_attach_dev()
> 
> Instead of having a "default_domain_ops" leave it NULL and create a
> spapr_tce_blocking_domain_ops with these two functions and assign it
> to domain->ops when creating. Then it is really clear these functions
> are only called for the DOMAIN_BLOCKED type and you don't need to
> check it.
> 
>> +static void spapr_tce_iommu_detach_dev(struct iommu_domain *dom,
>> +				       struct device *dev)
>> +{
>> +	struct iommu_group *grp = iommu_group_get(dev);
>> +	struct iommu_table_group *table_group;
>> +
>> +	table_group = iommu_group_get_iommudata(grp);
>> +	WARN_ON(dom->type != IOMMU_DOMAIN_BLOCKED);
>> +	table_group->ops->release_ownership(table_group);
>> +}
> 
> Ditto
> 
>> +struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose,
>> +					     struct pci_dev *pdev)
>> +{
>> +	struct device_node *pdn, *dn = pdev->dev.of_node;
>> +	struct iommu_group *grp;
>> +	struct pci_dn *pci;
>> +
>> +	pdn = pci_dma_find(dn, NULL);
>> +	if (!pdn || !PCI_DN(pdn))
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	pci = PCI_DN(pdn);
>> +	if (!pci->table_group)
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	grp = pci->table_group->group;
>> +	if (!grp)
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	return iommu_group_ref_get(grp);
> 
> Not for this series, but this is kind of backwards, the driver
> specific data (ie the table_group) should be in
> iommu_group_get_iommudata()...


It is there but here we are getting from a device to a group - a device 
is not added to a group yet when iommu_probe_device() works and tries 
adding a device via iommu_group_get_for_dev().




>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
>> index 8a65ea61744c..3b53b466e49b 100644
>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>> @@ -1152,8 +1152,6 @@ static void tce_iommu_release_ownership(struct tce_container *container,
>>   	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
>>   		if (container->tables[i])
>>   			table_group->ops->unset_window(table_group, i);
>> -
>> -	table_group->ops->release_ownership(table_group);
>>   }
>>   
>>   static long tce_iommu_take_ownership(struct tce_container *container,
>> @@ -1161,10 +1159,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
>>   {
>>   	long i, ret = 0;
>>   
>> -	ret = table_group->ops->take_ownership(table_group);
>> -	if (ret)
>> -		return ret;
>> -
>>   	/* Set all windows to the new group */
>>   	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
>>   		struct iommu_table *tbl = container->tables[i];
>> @@ -1183,8 +1177,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
>>   	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
>>   		table_group->ops->unset_window(table_group, i);
>>   
>> -	table_group->ops->release_ownership(table_group);
>> -
> 
> This is great, makes alot of sense.
> 
> Anyhow, it all looks fine to me as is even:
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Thanks. I'll try now to find an interested party to test this :)


> 
> Jason

-- 
Alexey

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domain
@ 2022-07-19  3:18       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 33+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-19  3:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, kvm, kvm-ppc, Deming Wang, Robin Murphy,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman



On 19/07/2022 04:09, Jason Gunthorpe wrote:
> On Thu, Jul 14, 2022 at 06:18:22PM +1000, Alexey Kardashevskiy wrote:
> 
>> +/*
>> + * A simple iommu_ops to allow less cruft in generic VFIO code.
>> + */
>> +static bool spapr_tce_iommu_capable(enum iommu_cap cap)
>> +{
>> +	switch (cap) {
>> +	case IOMMU_CAP_CACHE_COHERENCY:
> 
> I would add a remark here that it is because vfio is going to use
> SPAPR mode but still checks that the iommu driver support coherency -
> with out that detail it looks very strange to have caps without
> implementing unmanaged domains
> 
>> +static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type)
>> +{
>> +	struct iommu_domain *dom;
>> +
>> +	if (type != IOMMU_DOMAIN_BLOCKED)
>> +		return NULL;
>> +
>> +	dom = kzalloc(sizeof(*dom), GFP_KERNEL);
>> +	if (!dom)
>> +		return NULL;
>> +
>> +	dom->geometry.aperture_start = 0;
>> +	dom->geometry.aperture_end = ~0ULL;
>> +	dom->geometry.force_aperture = true;
> 
> A blocked domain doesn't really have an aperture, all DMA is rejected,
> so I think these can just be deleted and left at zero.
> 
> Generally I'm suggesting drivers just use a static singleton instance
> for the blocked domain instead of the allocation like this, but that
> is a very minor nit.
> 
>> +static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
>> +{
>> +	struct pci_dev *pdev;
>> +	struct pci_controller *hose;
>> +
>> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
>> +	if (!dev_is_pci(dev))
>> +		return ERR_PTR(-EPERM);
> 
> Less "weirdly", more by design. The iommu driver should check if the
> given struct device is supported or not, it isn't really a bus
> specific operation.
> 
>> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
>> +{
>> +	struct pci_controller *hose;
>> +	struct pci_dev *pdev;
>> +
>> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
>> +	if (!dev_is_pci(dev))
>> +		return ERR_PTR(-EPERM);
> 
> This doesn't need repeating, if probe_device() fails then this will
> never be called.
> 
>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>> +				      struct device *dev)
>> +{
>> +	struct iommu_group *grp = iommu_group_get(dev);
>> +	struct iommu_table_group *table_group;
>> +	int ret = -EINVAL;
>> +
>> +	if (!grp)
>> +		return -ENODEV;
>> +
>> +	table_group = iommu_group_get_iommudata(grp);
>> +
>> +	if (dom->type = IOMMU_DOMAIN_BLOCKED)
>> +		ret = table_group->ops->take_ownership(table_group);
> 
> Ideally there shouldn't be dom->type checks like this.
> 
> 
> The blocking domain should have its own iommu_domain_ops that only
> process the blocking operation. Ie call this like
> spapr_tce_iommu_blocking_attach_dev()
> 
> Instead of having a "default_domain_ops" leave it NULL and create a
> spapr_tce_blocking_domain_ops with these two functions and assign it
> to domain->ops when creating. Then it is really clear these functions
> are only called for the DOMAIN_BLOCKED type and you don't need to
> check it.
> 
>> +static void spapr_tce_iommu_detach_dev(struct iommu_domain *dom,
>> +				       struct device *dev)
>> +{
>> +	struct iommu_group *grp = iommu_group_get(dev);
>> +	struct iommu_table_group *table_group;
>> +
>> +	table_group = iommu_group_get_iommudata(grp);
>> +	WARN_ON(dom->type != IOMMU_DOMAIN_BLOCKED);
>> +	table_group->ops->release_ownership(table_group);
>> +}
> 
> Ditto
> 
>> +struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose,
>> +					     struct pci_dev *pdev)
>> +{
>> +	struct device_node *pdn, *dn = pdev->dev.of_node;
>> +	struct iommu_group *grp;
>> +	struct pci_dn *pci;
>> +
>> +	pdn = pci_dma_find(dn, NULL);
>> +	if (!pdn || !PCI_DN(pdn))
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	pci = PCI_DN(pdn);
>> +	if (!pci->table_group)
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	grp = pci->table_group->group;
>> +	if (!grp)
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	return iommu_group_ref_get(grp);
> 
> Not for this series, but this is kind of backwards, the driver
> specific data (ie the table_group) should be in
> iommu_group_get_iommudata()...


It is there but here we are getting from a device to a group - a device 
is not added to a group yet when iommu_probe_device() works and tries 
adding a device via iommu_group_get_for_dev().




>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
>> index 8a65ea61744c..3b53b466e49b 100644
>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>> @@ -1152,8 +1152,6 @@ static void tce_iommu_release_ownership(struct tce_container *container,
>>   	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
>>   		if (container->tables[i])
>>   			table_group->ops->unset_window(table_group, i);
>> -
>> -	table_group->ops->release_ownership(table_group);
>>   }
>>   
>>   static long tce_iommu_take_ownership(struct tce_container *container,
>> @@ -1161,10 +1159,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
>>   {
>>   	long i, ret = 0;
>>   
>> -	ret = table_group->ops->take_ownership(table_group);
>> -	if (ret)
>> -		return ret;
>> -
>>   	/* Set all windows to the new group */
>>   	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
>>   		struct iommu_table *tbl = container->tables[i];
>> @@ -1183,8 +1177,6 @@ static long tce_iommu_take_ownership(struct tce_container *container,
>>   	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i)
>>   		table_group->ops->unset_window(table_group, i);
>>   
>> -	table_group->ops->release_ownership(table_group);
>> -
> 
> This is great, makes alot of sense.
> 
> Anyhow, it all looks fine to me as is even:
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Thanks. I'll try now to find an interested party to test this :)


> 
> Jason

-- 
Alexey

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-14  8:18 ` Alexey Kardashevskiy
  (?)
@ 2022-09-02  0:20   ` Jason Gunthorpe
  -1 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-09-02  0:20 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm, kvm-ppc, Deming Wang, Robin Murphy,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman

On Thu, Jul 14, 2022 at 06:18:19PM +1000, Alexey Kardashevskiy wrote:
> Here is another take on iommu_ops on POWER to make VFIO work
> again on POWERPC64.
> 
> The tree with all prerequisites is here:
> https://github.com/aik/linux/tree/kvm-fixes-wip
> 
> The previous discussion is here:
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
> 
> Please comment. Thanks.
>
> 
> 
> Alexey Kardashevskiy (3):
>   powerpc/iommu: Add "borrowing" iommu_table_group_ops
>   powerpc/pci_64: Init pcibios subsys a bit later
>   powerpc/iommu: Add iommu_ops to report capabilities and allow blocking
>     domains

It has been a little while - and I think this series is still badly
needed by powerpc, right?

So, reminder.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-09-02  0:20   ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-09-02  0:20 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, Fabiano Rosas, Robin Murphy, Daniel Henrique Barboza,
	Deming Wang, kvm-ppc, Alex Williamson, Nicholas Piggin,
	Murilo Opsfelder Araujo, linuxppc-dev

On Thu, Jul 14, 2022 at 06:18:19PM +1000, Alexey Kardashevskiy wrote:
> Here is another take on iommu_ops on POWER to make VFIO work
> again on POWERPC64.
> 
> The tree with all prerequisites is here:
> https://github.com/aik/linux/tree/kvm-fixes-wip
> 
> The previous discussion is here:
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
> 
> Please comment. Thanks.
>
> 
> 
> Alexey Kardashevskiy (3):
>   powerpc/iommu: Add "borrowing" iommu_table_group_ops
>   powerpc/pci_64: Init pcibios subsys a bit later
>   powerpc/iommu: Add iommu_ops to report capabilities and allow blocking
>     domains

It has been a little while - and I think this series is still badly
needed by powerpc, right?

So, reminder.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domain
@ 2022-09-02  0:20   ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-09-02  0:20 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm, kvm-ppc, Deming Wang, Robin Murphy,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, Michael Ellerman

On Thu, Jul 14, 2022 at 06:18:19PM +1000, Alexey Kardashevskiy wrote:
> Here is another take on iommu_ops on POWER to make VFIO work
> again on POWERPC64.
> 
> The tree with all prerequisites is here:
> https://github.com/aik/linux/tree/kvm-fixes-wip
> 
> The previous discussion is here:
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
> 
> Please comment. Thanks.
>
> 
> 
> Alexey Kardashevskiy (3):
>   powerpc/iommu: Add "borrowing" iommu_table_group_ops
>   powerpc/pci_64: Init pcibios subsys a bit later
>   powerpc/iommu: Add iommu_ops to report capabilities and allow blocking
>     domains

It has been a little while - and I think this series is still badly
needed by powerpc, right?

So, reminder.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-09-02  0:20   ` [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains Jason Gunthorpe
  (?)
@ 2022-09-02  7:33     ` Michael Ellerman
  -1 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2022-09-02  7:33 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm, kvm-ppc, Deming Wang, Robin Murphy,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

Jason Gunthorpe <jgg@nvidia.com> writes:
> On Thu, Jul 14, 2022 at 06:18:19PM +1000, Alexey Kardashevskiy wrote:
>> Here is another take on iommu_ops on POWER to make VFIO work
>> again on POWERPC64.
>> 
>> The tree with all prerequisites is here:
>> https://github.com/aik/linux/tree/kvm-fixes-wip
>> 
>> The previous discussion is here:
>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
>> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
>> 
>> Please comment. Thanks.
>>
>> 
>> 
>> Alexey Kardashevskiy (3):
>>   powerpc/iommu: Add "borrowing" iommu_table_group_ops
>>   powerpc/pci_64: Init pcibios subsys a bit later
>>   powerpc/iommu: Add iommu_ops to report capabilities and allow blocking
>>     domains
>
> It has been a little while - and I think this series is still badly
> needed by powerpc, right?

Your comments on patch 3 left me with the impression it needed a respin,
but maybe I misread that.

Alexey's reply that it needed testing also made me think it wasn't
ready to pick up.

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-09-02  7:33     ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2022-09-02  7:33 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexey Kardashevskiy
  Cc: kvm, Fabiano Rosas, Robin Murphy, Daniel Henrique Barboza,
	Deming Wang, kvm-ppc, Alex Williamson, Nicholas Piggin,
	Murilo Opsfelder Araujo, linuxppc-dev

Jason Gunthorpe <jgg@nvidia.com> writes:
> On Thu, Jul 14, 2022 at 06:18:19PM +1000, Alexey Kardashevskiy wrote:
>> Here is another take on iommu_ops on POWER to make VFIO work
>> again on POWERPC64.
>> 
>> The tree with all prerequisites is here:
>> https://github.com/aik/linux/tree/kvm-fixes-wip
>> 
>> The previous discussion is here:
>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
>> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
>> 
>> Please comment. Thanks.
>>
>> 
>> 
>> Alexey Kardashevskiy (3):
>>   powerpc/iommu: Add "borrowing" iommu_table_group_ops
>>   powerpc/pci_64: Init pcibios subsys a bit later
>>   powerpc/iommu: Add iommu_ops to report capabilities and allow blocking
>>     domains
>
> It has been a little while - and I think this series is still badly
> needed by powerpc, right?

Your comments on patch 3 left me with the impression it needed a respin,
but maybe I misread that.

Alexey's reply that it needed testing also made me think it wasn't
ready to pick up.

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domain
@ 2022-09-02  7:33     ` Michael Ellerman
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Ellerman @ 2022-09-02  7:33 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm, kvm-ppc, Deming Wang, Robin Murphy,
	Alex Williamson, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

Jason Gunthorpe <jgg@nvidia.com> writes:
> On Thu, Jul 14, 2022 at 06:18:19PM +1000, Alexey Kardashevskiy wrote:
>> Here is another take on iommu_ops on POWER to make VFIO work
>> again on POWERPC64.
>> 
>> The tree with all prerequisites is here:
>> https://github.com/aik/linux/tree/kvm-fixes-wip
>> 
>> The previous discussion is here:
>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
>> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
>> 
>> Please comment. Thanks.
>>
>> 
>> 
>> Alexey Kardashevskiy (3):
>>   powerpc/iommu: Add "borrowing" iommu_table_group_ops
>>   powerpc/pci_64: Init pcibios subsys a bit later
>>   powerpc/iommu: Add iommu_ops to report capabilities and allow blocking
>>     domains
>
> It has been a little while - and I think this series is still badly
> needed by powerpc, right?

Your comments on patch 3 left me with the impression it needed a respin,
but maybe I misread that.

Alexey's reply that it needed testing also made me think it wasn't
ready to pick up.

cheers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-09-02  7:33     ` [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains Michael Ellerman
  (?)
@ 2022-09-02 11:46       ` Jason Gunthorpe
  -1 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-09-02 11:46 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Alexey Kardashevskiy, linuxppc-dev, kvm, kvm-ppc, Deming Wang,
	Robin Murphy, Alex Williamson, Daniel Henrique Barboza,
	Fabiano Rosas, Murilo Opsfelder Araujo, Nicholas Piggin

On Fri, Sep 02, 2022 at 05:33:30PM +1000, Michael Ellerman wrote:
> Jason Gunthorpe <jgg@nvidia.com> writes:
> > On Thu, Jul 14, 2022 at 06:18:19PM +1000, Alexey Kardashevskiy wrote:
> >> Here is another take on iommu_ops on POWER to make VFIO work
> >> again on POWERPC64.
> >> 
> >> The tree with all prerequisites is here:
> >> https://github.com/aik/linux/tree/kvm-fixes-wip
> >> 
> >> The previous discussion is here:
> >> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
> >> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
> >> 
> >> Please comment. Thanks.
> >>
> >> 
> >> 
> >> Alexey Kardashevskiy (3):
> >>   powerpc/iommu: Add "borrowing" iommu_table_group_ops
> >>   powerpc/pci_64: Init pcibios subsys a bit later
> >>   powerpc/iommu: Add iommu_ops to report capabilities and allow blocking
> >>     domains
> >
> > It has been a little while - and I think this series is still badly
> > needed by powerpc, right?
> 
> Your comments on patch 3 left me with the impression it needed a respin,
> but maybe I misread that.

It would be nice, but I understand Alexey will not work on it anymore,
so I wouldn't object to as-is

> Alexey's reply that it needed testing also made me think it wasn't
> ready to pick up.

Well, if so, someone still needs to finish this work. But I think he
tested it, he fixed things that could have only been found by
testing..

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-09-02 11:46       ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-09-02 11:46 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: kvm, Fabiano Rosas, Alexey Kardashevskiy, Robin Murphy,
	Daniel Henrique Barboza, Deming Wang, kvm-ppc, Alex Williamson,
	Nicholas Piggin, Murilo Opsfelder Araujo, linuxppc-dev

On Fri, Sep 02, 2022 at 05:33:30PM +1000, Michael Ellerman wrote:
> Jason Gunthorpe <jgg@nvidia.com> writes:
> > On Thu, Jul 14, 2022 at 06:18:19PM +1000, Alexey Kardashevskiy wrote:
> >> Here is another take on iommu_ops on POWER to make VFIO work
> >> again on POWERPC64.
> >> 
> >> The tree with all prerequisites is here:
> >> https://github.com/aik/linux/tree/kvm-fixes-wip
> >> 
> >> The previous discussion is here:
> >> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
> >> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
> >> 
> >> Please comment. Thanks.
> >>
> >> 
> >> 
> >> Alexey Kardashevskiy (3):
> >>   powerpc/iommu: Add "borrowing" iommu_table_group_ops
> >>   powerpc/pci_64: Init pcibios subsys a bit later
> >>   powerpc/iommu: Add iommu_ops to report capabilities and allow blocking
> >>     domains
> >
> > It has been a little while - and I think this series is still badly
> > needed by powerpc, right?
> 
> Your comments on patch 3 left me with the impression it needed a respin,
> but maybe I misread that.

It would be nice, but I understand Alexey will not work on it anymore,
so I wouldn't object to as-is

> Alexey's reply that it needed testing also made me think it wasn't
> ready to pick up.

Well, if so, someone still needs to finish this work. But I think he
tested it, he fixed things that could have only been found by
testing..

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domain
@ 2022-09-02 11:46       ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2022-09-02 11:46 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Alexey Kardashevskiy, linuxppc-dev, kvm, kvm-ppc, Deming Wang,
	Robin Murphy, Alex Williamson, Daniel Henrique Barboza,
	Fabiano Rosas, Murilo Opsfelder Araujo, Nicholas Piggin

On Fri, Sep 02, 2022 at 05:33:30PM +1000, Michael Ellerman wrote:
> Jason Gunthorpe <jgg@nvidia.com> writes:
> > On Thu, Jul 14, 2022 at 06:18:19PM +1000, Alexey Kardashevskiy wrote:
> >> Here is another take on iommu_ops on POWER to make VFIO work
> >> again on POWERPC64.
> >> 
> >> The tree with all prerequisites is here:
> >> https://github.com/aik/linux/tree/kvm-fixes-wip
> >> 
> >> The previous discussion is here:
> >> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/
> >> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
> >> 
> >> Please comment. Thanks.
> >>
> >> 
> >> 
> >> Alexey Kardashevskiy (3):
> >>   powerpc/iommu: Add "borrowing" iommu_table_group_ops
> >>   powerpc/pci_64: Init pcibios subsys a bit later
> >>   powerpc/iommu: Add iommu_ops to report capabilities and allow blocking
> >>     domains
> >
> > It has been a little while - and I think this series is still badly
> > needed by powerpc, right?
> 
> Your comments on patch 3 left me with the impression it needed a respin,
> but maybe I misread that.

It would be nice, but I understand Alexey will not work on it anymore,
so I wouldn't object to as-is

> Alexey's reply that it needed testing also made me think it wasn't
> ready to pick up.

Well, if so, someone still needs to finish this work. But I think he
tested it, he fixed things that could have only been found by
testing..

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-09-02 11:47 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14  8:18 [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains Alexey Kardashevskiy
2022-07-14  8:18 ` Alexey Kardashevskiy
2022-07-14  8:18 ` Alexey Kardashevskiy
2022-07-14  8:18 ` [PATCH kernel 1/3] powerpc/iommu: Add "borrowing" iommu_table_group_ops Alexey Kardashevskiy
2022-07-14  8:18   ` Alexey Kardashevskiy
2022-07-14  8:18   ` Alexey Kardashevskiy
2022-07-18 16:49   ` Jason Gunthorpe
2022-07-18 16:49     ` Jason Gunthorpe
2022-07-18 16:49     ` Jason Gunthorpe
2022-07-14  8:18 ` [PATCH kernel 2/3] powerpc/pci_64: Init pcibios subsys a bit later Alexey Kardashevskiy
2022-07-14  8:18   ` Alexey Kardashevskiy
2022-07-14  8:18   ` Alexey Kardashevskiy
2022-07-18 16:49   ` Jason Gunthorpe
2022-07-18 16:49     ` Jason Gunthorpe
2022-07-18 16:49     ` Jason Gunthorpe
2022-07-14  8:18 ` [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains Alexey Kardashevskiy
2022-07-14  8:18   ` Alexey Kardashevskiy
2022-07-14  8:18   ` Alexey Kardashevskiy
2022-07-18 18:09   ` Jason Gunthorpe
2022-07-18 18:09     ` [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domain Jason Gunthorpe
2022-07-18 18:09     ` [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains Jason Gunthorpe
2022-07-19  3:18     ` Alexey Kardashevskiy
2022-07-19  3:18       ` [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domain Alexey Kardashevskiy
2022-07-19  3:18       ` [PATCH kernel 3/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains Alexey Kardashevskiy
2022-09-02  0:20 ` [PATCH kernel 0/3] " Jason Gunthorpe
2022-09-02  0:20   ` [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domain Jason Gunthorpe
2022-09-02  0:20   ` [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains Jason Gunthorpe
2022-09-02  7:33   ` Michael Ellerman
2022-09-02  7:33     ` [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domain Michael Ellerman
2022-09-02  7:33     ` [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains Michael Ellerman
2022-09-02 11:46     ` Jason Gunthorpe
2022-09-02 11:46       ` [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domain Jason Gunthorpe
2022-09-02 11:46       ` [PATCH kernel 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.